CS 425 / ECE 428 Distributed Systems Fall

CS 425 / ECE 428 Distributed Systems Fall 2019 Indranil Gupta (Indy) Lecture 2-3: Introduction to Cloud Computing 1 All slides IG 2 The Hype! Forrester in 2010 Cloud computing will go from $40.7 billion in 2010 to $241 billion in 2020. Goldman Sachs says cloud computing will grow at annual rate of 30% from 2013-2018 Hadoop market to reach $20.8 B by by 2018: Transparency Market Research Companies and even Federal/state governments using cloud computing now: fbo.gov

3 Many Cloud Providers AWS: Amazon Web Services EC2: Elastic Compute Cloud S3: Simple Storage Service EBS: Elastic Block Storage Microsoft Azure Google Cloud/Compute Engine/AppEngine Rightscale, Salesforce, EMC, Gigaspaces, 10gen, Datastax, Oracle, VMWare, Yahoo, Cloudera And many many more! 4 Two Categories of Clouds Can be either a (i) public cloud, or (ii) private cloud

Private clouds are accessible only to company employees Public clouds provide service to any paying customer: Amazon S3 (Simple Storage Service): store arbitrary datasets, pay per GB-month stored As of 2019: 0.4c-3 c per GB month Amazon EC2 (Elastic Compute Cloud): upload and run arbitrary OS images, pay per CPU hour used As of 2019: 0.2 c per CPU hr to $7.2 per CPU hr (depending on strength) Google cloud: similar pricing as above Google AppEngine/Compute Engine: develop applications within their appengine framework, upload data that will be imported into their format, and run 5 Customers Save Time and

$$$ Dave Power, Associate Information Consultant at Eli Lilly and Company: With AWS, Powers said, a new server can be up and running in three minutes (it used to take Eli Lilly seven and a half weeks to deploy a server internally) and a 64-node Linux cluster can be online in five minutes (compared with three months internally). It's just shy of instantaneous. Ingo Elfering, Vice President of Information Technology Strategy, GlaxoSmithKline: With Online Services, we are able to reduce our IT operational costs by roughly 30% of what were spending Jim Swartz, CIO, Sybase: At Sybase, a private cloud of virtual servers inside its datacenter has saved nearly $US2 million annually since

2006, Swartz says, because the company can share computing power and storage resources across servers. 100s of startups in Silicon Valley can harness large computing resources without buying their own machines. 6 But what exactly IS a cloud? 7 What is a Cloud?

Its a cluster! Its a supercomputer! Its a datastore! Its superman! None of the above All of the above Cloud = Lots of storage + compute cycles nearby 8 What is a Cloud? A single-site cloud (aka Datacenter) consists of

Compute nodes (grouped into racks) (2) Switches, connecting the racks A network topology, e.g., hierarchical Storage (backend) nodes connected to the network (3) Front-end for submitting jobs and receiving client requests (1) (1-3: Often called three-tier architecture) Software Services A geographically distributed cloud consists of Multiple such sites Each site perhaps with a different structure and services 9 A Sample Cloud Topology

So then, what is a cluster? 10 A Cloudy History of Time The first datacenters! Timesharing Companies & Data Processing Industry 1940 1950 Clouds and datacenters Clusters 1960 Grids

1970 1980 PCs (not distributed!) 1990 2000 Peer to peer systems 2012 11 A Cloudy History of Time First large datacenters: ENIAC, ORDVAC, ILLIAC Many used vacuum tubes and mechanical relays Berkeley NOW Project Supercomputers Server Farms (e.g., Oceano) 1940

1950 P2P Systems (90s-00s) Many Millions of users Many GB per day 1960 1970 Data Processing Industry - 1968: $70 M. 1978: $3.15 Billion Timesharing Industry (1975): Market Share: Honeywell 34%, IBM 15%, Xerox 10%, CDC 10%, DEC 10%, UNIVAC 10% Honeywell 6000 & 635, IBM 370/168, Xerox 940 & Sigma 9, DEC PDP-10, UNIVAC 1108 1980 1990 2000 Grids (1980s-2000s):

2012 GriPhyN (1970s-80s) Open Science Grid and Lambda Rail (2000s) Globus & other standards (1990s-2000s) Clouds 12 Trends: Technology Doubling Periods storage: 12 mos, bandwidth: 9 mos, and (what law is this?) cpu compute capacity: 18 mos Then and Now Bandwidth 1985: mostly 56Kbps links nationwide 2015: Tbps links widespread Disk capacity Todays PCs have TBs, far more than a 1990 supercomputer

13 Trends: Users Then and Now Biologists: 1990: were running small single-molecule simulations Today: CERNs Large Hadron Collider producing many PB/year 14 Prophecies In 1965, MIT's Fernando Corbat and the other designers of the Multics operating system envisioned a computer facility operating like a power company or water company. Plug your thin client into the computing Utility and Play your favorite Intensive Compute & Communicate Application Have todays clouds brought us closer to this reality? Think about it.

15 Four Features New in Todays Clouds I. Massive scale. II. On-demand access: Pay-as-you-go, no upfront commitment. III. And anyone can access it Data-intensive Nature: What was MBs has now become TBs, PBs and XBs.

IV. Daily logs, forensics, Web data, etc. Humans have data numbness: Wikipedia (large) compressed is only about 10 GB! New Cloud Programming Paradigms: MapReduce/Hadoop, NoSQL/Cassandra/MongoDB and many others. High in accessibility and ease of programmability Lots of open-source Combination of one or more of these gives rise to novel and unsolved distributed computing problems in cloud computing. 16

I. Massive Scale Facebook [GigaOm, 2012] 30K in 2009 -> 60K in 2010 -> 180K in 2012 Microsoft [NYTimes, 2008] 150K machines

Growth rate of 10K per month 80K total running Bing In 2013, Microsoft Cosmos had 110K machines (4 sites) Yahoo! [2009]: 100K Split into clusters of 4000

AWS EC2 [Randy Bias, 2009] 40K machines 8 cores/machine eBay [2012]: 50K machines HP [2012]: 380K in 180 DCs Google [2011, Data Center Knowledge] : 900K

17 Quiz: Where is the Worlds Largest Datacenter? 18 Quiz: Where is the Worlds Largest (2018) China Telecom. 10.7 Million sq. ft. Datacenter? (2017) The Citadel Nevada. 7.2 Million sq. ft. (2015) In Chicago! 350 East Cermak, Chicago, 1.1 MILLION sq. ft.

Shared by many different carriers Critical to Chicago Mercantile Exchange See: https://www.gigabitmagazine.com/top10/top-10-biggest-data-centres-world https://www.racksolutions.com/news/data-center-news/top-10-largest-data-centers-world/ 19 What does a datacenter look like from inside? A virtual walk through a datacenter Reference: http://gigaom.com/cleantech/a-rare-lookinside-facebooks-oregon-data-center-photos-video/ 20

Servers Front In Back Some highly secure (e.g., financial21info) Power Off-site On-site WUE = Annual Water Usage / IT Equipment Energy (L/kWh) low is PUE = Total facility Power / IT Equipment Power low is good (e.g., Google~1.1)

22 Cooling Air sucked in from top (also, Bugzappers) Water sprayed into air Water purified 15 motors per server bank 23 Extra - Fun Videos to Watch Microsoft GFS Datacenter Tour (Youtube)

http://www.youtube.com/watch?v=hOxA1l1pQIw Timelapse of a Datacenter Construction on the Inside (Fortune 500 company) http://www.youtube.com/watch?v=ujO-xNvXj3g 24 II. On-demand access: *aaS Classification On-demand: renting a cab vs. (previously) renting a car, or buying one. E.g.: AWS Elastic Compute Cloud (EC2): a few cents to a few $ per CPU hour AWS Simple Storage Service (S3): a few cents per GB-month HaaS: Hardware as a Service You get access to barebones hardware machines, do whatever you want with them, Ex: Your own cluster Not always a good idea because of security risks

IaaS: Infrastructure as a Service You get access to flexible computing and storage infrastructure. Virtualization is one way of achieving this (cgroups, Kubernetes, Dockers, VMs,). Often said to subsume HaaS. Ex: Amazon Web Services (AWS: EC2 and S3), OpenStack, Eucalyptus, Rightscale, Microsoft Azure, Google Cloud. 25 II. On-demand access: *aaS Classification PaaS: Platform as a Service You get access to flexible computing and storage infrastructure, coupled with a software platform (often tightly coupled) Ex: Googles AppEngine (Python, Java, Go) SaaS: Software as a Service

You get access to software services, when you need them. Often said to subsume SOA (Service Oriented Architectures). Ex: Google docs, MS Office 365 Online 26 III. Data-intensive Computing Computation-Intensive Computing Example areas: MPI-based, High-performance computing, Grids Typically run on supercomputers (e.g., NCSA Blue Waters) Data-Intensive Typically store data at datacenters Use compute nodes nearby Compute nodes run computation services

In data-intensive computing, the focus shifts from computation to the data: CPU utilization no longer the most important resource metric, instead I/O is (disk and/or network) 27 IV. New Cloud Programming Paradigms Easy to write and run highly parallel programs in new cloud programming paradigms: Google: MapReduce and Sawzall Amazon: Elastic MapReduce service (pay-as-you-go)

Google (MapReduce) Yahoo! (Hadoop + Pig) WebMap: a chain of several MapReduce jobs 300 TB of data, 10K cores, many tens of hours (~2008) Facebook (Hadoop + Hive)

Indexing: a chain of 24 MapReduce jobs ~200K jobs processing 50PB/month (in 2006) ~300TB total, adding 2TB/day (in 2008) 3K jobs processing 55TB/day Similar numbers from other companies, e.g., Yieldex, eharmony.com, etc. NoSQL: MySQL is an industry standard, but Cassandra is 2400 times faster! 28 Two Categories of Clouds Can be either a (i) public cloud, or (ii) private cloud Private clouds are accessible only to company employees Public clouds provide service to any paying customer

Youre starting a new service/company: should you use a public cloud or purchase your own private cloud? 29 Single site Cloud: to Outsource or Own? Medium-sized organization: wishes to run a service for M months Outsource (e.g., via AWS): monthly cost

Service requires 128 servers (1024 cores) and 524 TB Same as UIUC CCT (Cloud Computing Testbed) cloud site (bought in 2009, now decommissioned) S3 costs: $0.12 per GB month. EC2 costs: $0.10 per CPU hour (costs from 2009) Storage = $ 0.12 X 524 X 1000 ~ $62 K Total = Storage + CPUs = $62 K + $0.10 X 1024 X 24 X 30 ~ $136 K Own: monthly cost Storage ~ $349 K / M Total ~ $ 1555 K / M + 7.5 K (includes 1 sysadmin / 100 nodes) using 0.45:0.4:0.15 split for hardware:power:network and 3 year lifetime of hardware

30 Single site Cloud: to Outsource or Own? Breakeven analysis: more preferable to own if: - $349 K / M < $62 K (storage) - $ 1555 K / M + 7.5 K < $136 K (overall) Breakeven points - M > 5.55 months (storage) M > 12 months (overall) As a result -

Startups use clouds a lot Cloud providers benefit monetarily most from storage 31 Academic Clouds: Emulab A community resource open to researchers in academia and industry. Very widely used by researchers everywhere today. https://www.emulab.net/ A cluster, with currently ~500 servers Founded and owned by University of Utah (led by Late Prof. Jay Lepreau) As a user, you can:

Grab a set of machines for your experiment You get root-level (sudo) access to these machines You can specify a network topology for your cluster You can emulate any topology All images Emulab 32

A community resource open to researchers in academia and industry http://www.planet-lab.org/ Currently, ~ 1077 nodes at ~500 sites across the world Founded at Princeton University (led by Prof. Larry Peterson), but owned in a federated manner by the sites All images PlanetLab Node: Dedicated server that runs components of PlanetLab services. Site: A location, e.g., UIUC, that hosts a number of nodes. Sliver: Virtual division of each node. Currently, uses VMs, but it could also other technology. Needed for timesharing across users. Slice: A spatial cut-up of the PL nodes. Per user. A slice is a way of giving each user (Unix-shell like) access to a subset of PL machines, selected by the user. A slice consists of multiple slivers, one at each component node. Thus, PlanetLab allows you to run real world-wide experiments. Many services have been deployed atop it, used by millions (not just researchers): Application-level DNS services, Monitoring services, CoralCDN, etc. PlanetLab is basis for NSF GENI https://www.geni.net/

33 Public Research Clouds Accessible to researchers with a qualifying grant Chameleon Cloud: https://www.chameleoncloud.org/ HaaS OpenStack (~AWS) CloudLab: https://www.cloudlab.us/

Build your own cloud on their hardware 34 Summary Clouds build on many previous generations of distributed systems Especially the timesharing and data processing industry of the 1960-70s. Need to identify unique aspects of a problem to classify it as a new cloud computing problem Scale, On-demand access, data-intensive, new programming Otherwise, the solutions to your problem may already exist! Next: Mapreduce! 35

Recently Viewed Presentations

  • †Coming together is a beginning. Keeping together is ...

    †Coming together is a beginning. Keeping together is ...

    Small people always do that, but the really great make you feel that you, too, can become great.†- Mark Twain TEAM BUILDING QUOTES - TEAM BUILDING QUOTES †If things seem under control, you are just not going fast...
  • Figuring Out Vocabulary in Context

    Figuring Out Vocabulary in Context

    SYNONYMS (Similar words) Searching students' lockers without permission is considered by some to be an . infringement, or violation of students' rights. ... At the end of each workday, Daniel arranged his papers into neat piles, dusted his computer, and...
  • Substance abuse program overview

    Substance abuse program overview

    The SAP team provides Soldiers with the Unit Risk Inventory survey and the Reintegration Unit Risk Inventory survey. The . Unit Risk Inventories (URIs) and the Reintegration-Unit Risk Inventories (R-URIs) are anonymous questionnaires . that screen . for high-risk behaviors...
  • LOIS DE SNELL-DESCARTES Dans un milieu homogène et isotrope ...

    LOIS DE SNELL-DESCARTES Dans un milieu homogène et isotrope ...

    C S normale n1 n2 > n1 Plan focal objet F Foyer objet Sens de la lumière La position des foyers détermine, dans les conditions de Gauss, les plans focaux objet et image. (voir figures) Calcul de la distance focale...
  • Segmenting, Targeting & Positioning MKTG 201: First Semester,

    Segmenting, Targeting & Positioning MKTG 201: First Semester,

    MKTG 201: First Semester, 2010 Week 6 Reading: Chapter 9 up to p. 245 only Lecture Overview Segmentation Targeting Positioning * * * * * * * * * * Steps in Segmentation, Targeting, and Positioning Market Segmentation. Market segmentation...
  • Healthy Aging: Promoting Optimal Cognitive Health Across the

    Healthy Aging: Promoting Optimal Cognitive Health Across the

    Throughout the dementia continuum (purple), the public health community (blue) can intervene by promoting health behaviors to reduce risk of cognitive decline, encourage early detections and diagnosis of cognitive impairment and dementia, ensure the safety of those with memory issues,...
  • A Marketing Perspective for UMM Your Speaker: Jill

    A Marketing Perspective for UMM Your Speaker: Jill

    90% of users of other major platforms (LinkedIn, Twitter, Instagram, Snapchat) also use Facebook. Ability to reach targeted audiences through promoted posts & advertising. Facebook limits free post reach. ... PowerPoint Presentation Last modified by:
  • Acids, Bases, &amp; Salts - Amazon S3

    Acids, Bases, & Salts - Amazon S3

    Acids taste sour, will change the color of an acid-base indicator, and can be strong or weak electrolytes in aqueous solution. Citrus fruits contain citric acid. Tea contains tannic acid.