SlideShare a Scribd company logo
Introduction to Cloud Data Center
       and Network Issues



  Presenter: Jason, Tsung-Cheng, HOU
  Advisor: Wanjiun Liao
                                       July 2nd, 2012
                                                        1
Agenda
• Cloud Computing / Data Center
  Basic Background
• Enabling Technology
• Infrastructure as a Service
  A Cloud DC System Example
• Networking Issues in Cloud DC




                                  2
Brand New Technology ??
• Not exactly, for large scale computing in the past:
  utility mainframe, grid computing, super computer
• Past demand: scientific computing, large scale
  engineering (finance, construction, aerospace)
• New demand: search, e-commerce, content
  streaming, application/web hosting, IT
  outsourcing, mobile/remote apps, big data
  processing…
• Difference: aggregated individual small
  demand, highly volatile and dynamic, not all
  profitable
  – Seek for economy of scale to cost down
  – Rely on resilient, flexible and scalable infrastructure   3
Cloud Data Center
                   Traditional Data Center         Cloud Data Center
                       Co-located                     Integrated
Servers
                    Dependent Failure               Fault-Tolerant
                       Partitioned                    Unified
Resources
                 Performance Interrelated       Performance Isolated
                         Separated              Centralized Full Control
Management
                          Manual                  With Automation
                       Plan Ahead                      Flexible
Scheduling
                     Overprovisioning                  Scalable
Renting           Per Physical Machines            Per Logical Usage
Application /                                   Runs and Moves across
                Fixes on Designated Servers
Services                                               All VMs
Cloud Computing:        Cloud DC Requirements:
  End Device / Client   • On-Demand Self-Service • Measured Usage
                        • Resource Pooling       • Broad Network Access
Common App. Platform    • Rapid Elasticity         Network Dependent
   Cloud Data Center                                                   4
Server and Switch Organization




                 What’s on Amazon?
                 Dropbox, Instagram, Netflix, Pinterest
                 Foursquare, Quora, Twitter, Yelp
                 Nasdaq, New York Times….
                 …and a lot more
Data Center Components




                         6
Clusters of Commodities
• Current cloud DC achieves high performance
  using commodity servers and switches
  →no specialized solution for supercomputing
• Supercomputing still exists, an example is
  Symmetric Multi-Processing server
  →128-core on shared RAM-like memory
• Compare to 32 4-core LAN connected server
  – Accessing global data, SMP: 100ns, LAN: 100us
• Computing penalty from delayed LAN-access
• Performance gain when clusters grow large
                                                7
Penalty for Latency in LAN-access

  This is not a comparison of server-cluster and single high-end server.


                                f: # global data I/O in 10ms
                                High: f = 100
                                Medium: f = 10
                                Low: f = 1




                                                                           ?
Performance gain when clusters grow
               large




                                      9
Agenda
• Cloud Computing / Data Center
  Basic Background
• Enabling Technology
• Infrastructure as a Service
  A Cloud DC System Example
• Networking Issues in Cloud DC




                                  10
A DC-wide System
• Has software systems consisting of:
  – Distributed system, logical clocks, coordination
    and locks, remote procedural call…etc
  – Distributed file system
  – (We do not go deeper into above components)
  – Parallel computation: MapReduce, Hadoop
• Virtualized Infrastructure:
  – Computing: Virtual Machine / Hypervisor
  – Storage: Virtualized / distributed storage
  – Network: Network virtualization…the next step?
                                                       11
MapReduce
• 100 TB datasets
  – Scanning on 1 node – 23 days
  – On 1000 nodes – 33 minutes
• Single machine performance does not matter
  – Just add more… but HOW to use so many clusters ?
  – How to make distributed programming simple and elegant ?
• Sounds great, but what about MTBF?
  – MTBF = Mean Time Between Failures
  – 1 node – once per 3 years
  – 1000 nodes – 1 node per 1 day
• MapReduce refer to both:
  – Programming framework
  – Fault-tolerant runtime system
                                                         12
MapRaduce: Word Counting

          Shuffle and Sort
          ↓↓↓




                             13
MapReduce: A Diagram




                 ←Shuffle
                 ← Sort




                            14
Distributed Execution Overview
                                                          Master also deals with:
                                                          • Worker status updates
                                       User
                                                          • Fault-tolerance
                                     Program
                                                          • I/O Scheduling
                                     fork        fork     • Automatic distribution
                          fork
                                                          • Automatic parallelization
                                      Master
                          assign                 assign
                          map                    reduce
Input Data       Worker
                                                                write      Output
                             local                   Worker                 File 0
  Split 0
          read               write
  Split 1        Worker
  Split 2                                                                  Output
                                                     Worker                 File 1
                 Worker                      remote
                                             read,sort
                                        ↑↑↑↑↑
                                        Shuffle & Sort
VM and Hypervisor
• Virtual Machine: A software
  package, sometimes using hardware
  acceleration, that allows an isolated guest
  operating system to run within a host
  operating system.
• Stateless: Once shut down, all HW states
  disappear.
• Hypervisor: A software platform that is
  responsible for creating, running, and
  destroying multiple virtual machines.
• Type I and Type II hypervisor                 16
Type 1 vs. Type 2 Hypervisor




                               17
Concept of Virtualization
• Decoupling HW/SW by abstraction & layering
• Using, demanding,
  but not owning or configuring
• Resource pool: flexible to
  slice, resize, combine, and distribute
• A degree of automation by software
        HOST 1         HOST 2         HOST 3        HOST 4,




 VMs

         Hypervisor:
         Turns 1 server into many “virtual machines” (instances or VMs)
         (VMWare ESX, Citrix XEN Server, KVM, Etc.)                       18
Concept of Virtualization
• Hypervisor: abstraction for HW/SW
• For SW: Abstraction and automation of
  physical resources
  – Pause, erase, create, and monitor
  – Charge services per usage units
• For HW: Generalized interaction with SW
  – Access control
  – Multiplex and demultiplex
• Ultimate hypervisor control from operator
• Benefit? Monetize operator capital expense
                                            19
I/O Virtualization Model
• Protect I/O access, multiplex / demultiplex traffic
• Deliver PKTs among VMs in shared memory
• Performance bottleneck: Overhead when communicating
  between driver domain and VMs
• VM scheduling and long queue→delay/throughput variance


                                                    Bottleneck:
                                                 CPU/RAM I/O lag
                                                   VM Scheduling
                                                  I/O Buffer Queue




                                                               20
Agenda
• Cloud Computing / Data Center
  Basic Background
• Enabling Technology
• Infrastructure as a Service
  A Cloud DC System Example
• Networking Issues in Cloud DC




                                  21
OpenStack Status
• OpenStack
    – Founded by NASA and Rackspace in 2010
    – Today 183 companies and 3386 people
    – Was only 125 and 1500 in fall, 2011.
    – Growing fast now, latest release Essex, Apr. 5th
•   Aligned release cycle with Ubuntu, Apr. / Oct.
•   Aim to be the “Linux” in cloud computing sys.
•   Open-source v.s. Amazon and vmware
•   Start-ups are happening around OpenStack
•   Still lacks big use cases and implementation
                                                     22
A Cloud Management Layer is
Questions arise as the environment grows...
“VM sprawl” can make things unmanageable very quickly
                                                  Missing
                                 APPS                                      USERS          ADMINS



                                                               How do you empower employees to self-
               How do you make your apps cloud aware?
                                                                            service?

                 Where should you provision new VMs?               How do you keep track of it all?



                             +




                  A Cloud Management Layer Is Missing
1. Server Virtualization
   Virtualization                       2. Cloud Data Center                3. Cloud Federation
Solution: OpenStack, The Cloud Operating System
                                  Cloud Operating System
A new management layer that adds automation and control



                           APPS                               USERS          ADMINS




                                      CLOUD OPERATING SYSTEM




1. Server Virtualization
   Server Virtualization               2. Cloud Data Center      3. Cloud Federation
A common platform is here.
                                   Common Platform
OpenStack is open source software powering public and private clouds.



                       Private Cloud:                                Public Cloud:




OpenStack enables cloud federation
Connecting clouds to create global resource pools
                                                                                           Washington
                                   Common software
                                    platform making
                                       Federation
                                        possible



                           Texas                        California                         Europe

1. Server Virtualization
   Virtualization                          2. Cloud Data Center      3. Cloud Federation
Horizon
       OpenStack Key Components




                       Glance
                                  Swift
Nova




            Keystone
Keystone Main Functions
• Provides 4 primary services:
  – Identity: User information authentication
  – Token: After logged in, replace account-password
  – Service catalog: Service units registered
  – Policies: Enforces different user levels
• Can be backed by different databases.




                                                 27
Swift Main Components
Swift Implementation
                         Duplicated storage, load balancing




 ↑ Logical view



 ↓Physical arrangement

                                     ← Stores real objects

                                     ←Stores object metadata



                           ↑Stores container / object
                           metadata
Glance
• Image storage and indexing.
• Keeps a database of metadata associated
  with an image, discover, register, and retrieve.
• Built on top of Swift, images store in Swift
• Two servers:
  – Glance-api: public interface for uploading and
    managing images.
  – Glance-registry: private interface to metadata
    database
• Support multiple image formats
                                                     30
Glance Process


Upload or Store




                      Download or Get


                                   31
Nova
• Major components:
  – API: public facing interface
  – Message Queue: Broker to handle interactions
    between services, currently based on RabbitMQ
  – Scheduler: coordinates all services, determines
    placement of new resources requested
  – Compute Worker: hosts VMs, controls hypervisor
    and VMs when receives cmds on Msg Queue
  – Volume: manages permanent storage


                                                32
Messaging (RabbitMQ)




                       33
General Nova Process




                       34
Launching a VM




                 35
Complete System Logical View




                               36
Agenda
• Cloud Computing / Data Center
  Basic Background
• Enabling Technology
• Infrastructure as a Service
  A Cloud DC System Example
• Networking Issues in Cloud DC




                                  37
Primitive OpenStack Network
• Each VM network owned by one network host
  – Simply a Linux running Nova-network daemon
• Nova Network node is the only gateway
• Flat Network Manager:
  – Linux networking bridge forms a subnet
  – All instances attached same bridge
  – Manually configure server, controller, and IP
• Flat DHCP Network Manager:
  – Add DHCP server along same bridge
• Only gateway, per-cluster, fragmentation
                                                    38
OpenStack Network




Linux server running Nova-network daemon.            VMs bridged in to a raw Ethernet
                                                     device                           39
The only gateway of all NICs bridged into the net.
Conventional DCN Topology
Public Internet

  DC Layer-3

  DC Layer-2




• Oversubscription                        • Scale-up proprietary design – expensive
• Fragmentation of resources:             • Inflexible addressing, static routing
  Network limits cross-DC communication   • Inflexible network configuration
• Hinders applications’ scalability         Protocol baked / embedded on chips
• Only reachability isolation
  Dependent performance bottleneck                                              40
A New DCN Topology
                                                                              Core Switches

                                                                                Full Bisection

Aggr
                                                                                Full Bisection
Edge




              Pod-0               Pod-1              k=4
•   k pod with (k2/4 hosts, k switches)per pod   • Cabling explosion, copper trans. range
•   (k/2)2 core switches, (k/2)2 paths for S-D   • Existing addressing/routing/forwarding do
•   (5k2/4) k-port switches, k3/4 hosts            not work well on fat-tree / clos
•   48-port: 27,648 hosts, 2,880 switches        • Scalability issue with millions of end hosts
•   Full bisection BW at each level              • Configuration of millions of parts
•   Modular scale-out cheap design

                                                                                          41
Cost of DCN




              42
IP           PMAC (Location)
    10.5.1.2
    10.2.4.5
                  (00:00):01:02:(00:01)
                  (00:02):00:02:(00:01)
                                                 Addressing
    Controller                               4


    Proxy ARP

    2                                                          Switch PKT Rewrite
                                                                   IP      AMAC (Identity)     PMAC (Location)
                                                       5        10.2.4.5   00:19:B9:FA:88:E2   00:02:00:02:00:01




               dst IP         dst MAC
                               MAC
1   ARP
           10.2.4.5      00:02:00:02:00:01
                                ???          3
• Switches: 32~64 K flow entries, 640 KB              • AMAC: Identity, maintained at switches
• Assume 10,000k VMs on 500k servers                  • PMAC: (pod,position,port,vmid)
• Identity-based: 10,000k flat entries,                 IP→ PMAC, mapped at controller
  100 MB huge, flexible, per-VM/APP                   • Routing: Static VLAN or ECMP-hashing
  VM migration, continuous connection                   (To be presented later)
• Location-based: 1k hierarchical entries             • Consistency / efficiency / fault-tolerant?
  10 KB easy storage, fixed, per-server                 Solve by (controller, SW, host) diff. roles
  Easy forwarding, no extra reconfiguration           • Implemented: server- / switch- centric   43
Load Balancing / Multipathing
    Per-flow hashing                                                          Pre-configured
    Randomization                                                             VLAN Tunnels




      End hosts “transparent”: Sends traffic to networks as usual, without seeing detail
      OpenFlow: Controller talks to (HW/SW switches, kernel agents), manipulates entries
• Clusters grow larger, nodes demand faster •       Need to utilize multiple paths and
• Network delay / PKT loss → Performance ↓          capacity!
• Still, only commodity hardware                •   VLAN: multiple preconfigured tunnels
• Aggregated individual small demand                →Topological dependent
  → Traffic extremely volatile / unpredictable •    Multipath-TCP: modified transport mech.
• Traffic matrix: dynamic, evolving, not steady     →Distributes and shifts load among paths
• User: Don’t know infrastructure, topology •       ECMP/VLB: Randomization, header hash
• Operator: Don’t know application, traffic         →Only randomized upward paths        44
                                                    →Only for symmetric traffic
Flow Scheduling




• ECMP-hashing → per-flow static path      • Flow-to-core mappings, re-allocat flows
• Long-live elephant flows may collide     • What time granularity? Fast enough?
• Some links full, others under-utilized   • Controller computation? Scalable?
                                                                                  45
Reactive Reroute
             Qeq
Qoff
Q




       • Congestion Point: Switch               • QCN in IEEE 802.1Q task group
         -Samples incoming PKTs                   -For converged networks, assure zero drop
FB       -Monitor and maintain queue level        -Like TCP AIMD but on L2, w/ diff purpose
         -Send feedback msg to src                -CP directly reacts, not end-to-end
         -Feedback msg according to Q-len         -Can be utilized for reactive reroute
         -Choose to re-hash elephant flows      • May differentiate FB msg
       • Reaction Point: Source Rate Limiter      -Decrease more for lower classes (QoS)
         -Decrease rate according to feedback     -Decrease more for larger flows (Fairness)
         -Increase rate by counter / timer      • Large flows are suppressed →High delay46
Controller
• DCN relies on controller for many functions:
  – Address mapping / mgmt / registration / reuse
  – Traffic load scheduling / balancing
  – Route computation, switch entries configuration
  – Logical network view ↔ physical construction
• An example: Onix
  – Distributed system
  – Maintain, exchange &
    distribute net states
     • Hard static: SQL DB
     • Soft dynamic: DHT
  – Asynchronous but
    eventually consistent
                                                  47
Tenant View vs Provider View
Onix Functions
                               Control Plane / Applications
API
  Provides


      Abstraction        Logical Forwarding Plane
                       Control                           Logical States
   Provides         Commands                             Abstractions
                                                  Network
Distributed                        Mapping
                                                  Info Base
System              Network Hypervisor           Onix / Network OS
   Distributes, Configures                                    Real States
                                    OpenFlow



                                                                            49
OpenStack Quantum Service




XenServer: Domain 0   Kernel-based VM: Linux Kernel
Always Call for Controller?

ASIC switching rate
Latency: 5 s




                                      51
Always Call for Controller?
CPU Controller
Latency: 2 ms
A huge waste
of resources!




                                    52
Conclusion
• Concept of cloud computing is not brand new
  – But with new usage, demand, and economy
  – Aggregated individual small demands
  – Thus pressures traditional data centers
  – Clusters of commodities for performance and
    economy of scale
• Data Center Network challenges
  – Carry tons of apps, tenants, and compute tasks
  – Network delay / loss = service bottleneck
  – Still no consistent sys / traffic / analysis model
  – Large scale construct, no public traces, practical?
                                                    53
Questions?




             54
Reference
•   YA-YUNN SU, “Topics in Cloud Computing”, NTU CSIE 7324
•   Luiz André Barroso and Urs Hölzle, “The Datacenter as a Computer - An Introduction to the
    Design of Warehouse-Scale Machines”, Google Inc.
•   吳柏均,郭嘉偉, “MapReduce: Simplified Data Processing on Large Clusters”, CSIE 7324 in
    class presentation slides.
•   Stanford, “Data Mining”, CS345A,
    http://www.stanford.edu/class/cs345a/slides/02-mapreduce.pdf
•   Dr. Allen D. Malony, CIS 607: Seminar in Cloud Computing, Spring 2012, U. Oregon
    http://prodigal.nic.uoregon.edu/~hoge/cis607/
•   Manel Bourguiba, Kamel Haddadou, Guy Pujolle, “Packet aggregation based network i/o
    virtualization for cloud computing”, Computer Communication 35, 2012
•   Eric Keller, Jen Roxford, “The „Platform as a Service‟ Model for Networking”, in Proc. INM
    WREN , 2010
•   Martin Casado, Teemu Koponen, Rajiv Ramanathan, Scott Shenker, “Virtualizing the
    Network Forwarding Plane”, in Proc. PRESTO (November 2010)
•   Guohui Wang T. S. Eugene Ng, “The Impact of Virtualization on Network Performance of
    Amazon EC2 Data Center”, INFOCOMM 2010
•   OpenStack Documentation
    http://docs.openstack.org/
                                                                                           55
Reference
•   Bret Piatt, OpenStack Overview, OpenStack Tutorial
    http://salsahpc.indiana.edu/CloudCom2010/slides/PDF/tutorials/OpenStackTutorialIEEEClo
    udCom.pdf
    http://www.omg.org/news/meetings/tc/ca-10/special-events/pdf/5-3_Piatt.pdf
•   Vishvananda Ishaya, Networking in Nova
    http://unchainyourbrain.com/openstack/13-networking-in-nova
•   Jaesuk Ahn, OpenStack, XenSummit Asia
    http://www.slideshare.net/ckpeter/openstack-at-xen-summit-asia
    http://www.slideshare.net/xen_com_mgr/2-xs-asia11kahnopenstack
•   Salvatore Orlando, Quantum: Virtual Networks for Openstack
    http://qconlondon.com/dl/qcon-london-
    2012/slides/SalvatoreOrlando_QuantumVirtualNetworksForOpenStackClouds.pdf
•   Dan Wendlandt, Openstack Quantum: Virtual Networks for OpenStack
    http://www.ovirt.org/wp-content/uploads/2011/11/Quantum_Ovirt_discussion.pdf
•   David A. Maltz, “Data Center Challenges: Building Networks for Agility, Senior
    Researcher, Microsoft”, Invited Talk, 3rd Workshop on I/O Virtualization, 2011
    http://static.usenix.org/event/wiov11/tech/slides/maltz.pdf
•   Amin Vahdat, “PortLand: Scaling Data Center Networks to 100,000 Ports and
    Beyond”, Stanford EE Computer Systems Colloquium, 2009
    http://www.stanford.edu/class/ee380/Abstracts/091118-DataCenterSwitch.pdf
                                                                                     56
Reference
•   Mohammad Al-Fares , Alexander Loukissas , Amin Vahdat, “A scalable, commodity data
    center network architecture”, ACM SIGCOMM 2008
•   Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon
    Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, Sudipta Sengupta, “VL2: a scalable
    and flexible data center network”, ACM SIGCOMM 2009
•   Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis
    Miri, Sivasankar Radhakrishnan, Vikram Subramanya, Amin Vahdat, “PortLand: a scalable
    fault-tolerant layer 2 data center network fabric”, ACM SIGCOMM 2009
•   Jayaram Mudigonda, Praveen Yalagandula, Mohammad Al-Fares, Jeffrey C.
    Mogul, “SPAIN: COTS data-center Ethernet for multipathing over arbitrary
    topologies”, USENIX NSDI 2010
•   Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer
    Rexford, Scott Shenker, Jonathan Turner, “OpenFlow: enabling innovation in campus
    networks”, ACM SIGCOMM 2008
•   Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, Amin
    Vahdat, “Hedera: dynamic flow scheduling for data center networks”, USENIX NSDI 2010
•   M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, R. Pan, B. Prabhakar, and M.
    Seaman, “Data center transport mechanisms: Congestion control theory and IEEE
    standardization,” Communication, Control, and Computing, 2008 46th Annual Allerton
    Conference on
                                                                                           57
Reference
•   A. Kabbani, M. Alizadeh, M. Yasuda, R. Pan, and B. Prabhakar. “AF-QCN: Approximate
    fairness with quantized congestion notification for multitenanted data centers”, In High
    Performance Interconnects (HOTI), 2010, IEEE 18th Annual Symposium on
•   Adrian S.-W. Tam, Kang Xi H., Jonathan Chao , “Leveraging Performance of Multiroot Data
    Center Networks by Reactive Reroute”, 2010 18th IEEE Symposium on High Performance
    Interconnects
•   Daniel Crisan, Mitch Gusat, Cyriel Minkenberg, “Comparative Evaluation of CEE-based
    Switch Adaptive Routing”, 2nd Workshop on Data Center - Converged and Virtual Ethernet
    Switching (DC CAVES), 2010
•   Teemu Koponen et al., “Onix: A distributed control platform for large-scale production
    networks”, OSDI, Oct, 2010
•   Andrew R. Curtis (University of Waterloo); Jeffrey C. Mogul, Jean Tourrilhes, Praveen
    Yalagandula, Puneet Sharma, Sujata Banerjee (HP Labs), SIGCOMM 2011




                                                                                             58
Backup Slides




                59
Symmetric Multi-Processing (SMP): Several
 CPUs on shared RAM-like memory




↑Data distributed evenly among nodes


                                             60
Computer Room Air Conditioning
Computer Room Air Conditioning

More Related Content

What's hot (20)

PPT
Leach & Pegasis
ReenaShekar
 
PPT
Virtualization.ppt
vishal choudhary
 
PPTX
Edge Computing.pptx
PriyaMaurya52
 
PPTX
Cloud Computing Security Threats and Responses
shafzonly
 
PPTX
Cluster computing
reddivarihareesh
 
PPSX
Cloud computing
Ankita Khadatkar
 
PPTX
Cluster computing
pooja khatana
 
PPTX
Peer To Peer Networking
icanhasfay
 
PPTX
Cloud security
Niharika Varshney
 
PDF
Infrastructure as a Service ( IaaS)
Ravindra Dastikop
 
PDF
Cloud Computing Using OpenStack
Bangladesh Network Operators Group
 
PPTX
Network virtualization
Damian Parniewicz
 
PPTX
Wireless LANs(IEEE802.11) Architecture
Raj vardhan
 
PPT
Virtualization in cloud computing ppt
Mehul Patel
 
PPTX
PRESENTATION ON CLOUD COMPUTING
vipluv mittal
 
PPTX
Cluster computing
Kajal Thakkar
 
PPTX
Cloud computing
DebrajKarmakar
 
PPT
Cloud Computing - Benefits and Challenges
ThoughtWorks Studios
 
PPT
Cloud deployment models
Ashok Kumar
 
PDF
Federated Cloud Computing - The OpenNebula Experience v1.0s
Ignacio M. Llorente
 
Leach & Pegasis
ReenaShekar
 
Virtualization.ppt
vishal choudhary
 
Edge Computing.pptx
PriyaMaurya52
 
Cloud Computing Security Threats and Responses
shafzonly
 
Cluster computing
reddivarihareesh
 
Cloud computing
Ankita Khadatkar
 
Cluster computing
pooja khatana
 
Peer To Peer Networking
icanhasfay
 
Cloud security
Niharika Varshney
 
Infrastructure as a Service ( IaaS)
Ravindra Dastikop
 
Cloud Computing Using OpenStack
Bangladesh Network Operators Group
 
Network virtualization
Damian Parniewicz
 
Wireless LANs(IEEE802.11) Architecture
Raj vardhan
 
Virtualization in cloud computing ppt
Mehul Patel
 
PRESENTATION ON CLOUD COMPUTING
vipluv mittal
 
Cluster computing
Kajal Thakkar
 
Cloud computing
DebrajKarmakar
 
Cloud Computing - Benefits and Challenges
ThoughtWorks Studios
 
Cloud deployment models
Ashok Kumar
 
Federated Cloud Computing - The OpenNebula Experience v1.0s
Ignacio M. Llorente
 

Similar to Introduction to Cloud Data Center and Network Issues (20)

PDF
Xrm xensummit
The Linux Foundation
 
PDF
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-1
tcloudcomputing-tw
 
PPTX
Взгляд на облака с точки зрения HPC
Olga Lavrentieva
 
PPTX
Harness the Power of the Cloud
InnoTech
 
PDF
SAP Virtualization Week 2012 - The Lego Cloud
aidanshribman
 
PDF
Changing Landscape of Data Centers
Suhas Kelkar
 
PDF
Building Scale Free Applications with Hadoop and Cascading
cwensel
 
PDF
Тенденции развития современных Центров Обработки Данных
Cisco Russia
 
PPT
Cloud Computing with .Net
Wesley Faler
 
PDF
Google Compute and MapR
MapR Technologies
 
PDF
Brief about Windows Azure Platform
K.Mohamed Faizal
 
PDF
CloudStack Architecture Future
Kimihiko Kitase
 
PDF
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
cwensel
 
PDF
The Network\'s IN the (virtualised) Server: Virtualized Io In Heterogeneous M...
scarisbrick
 
PDF
Virtual sharp cloud aware bc dr up 2012 cloud
Khazret Sapenov
 
PPTX
Architecting a Private Cloud - Cloud Expo
smw355
 
PDF
2012 open storage summit keynote
Randy Bias
 
PPTX
Cloud stack overview
gavin_lee
 
PDF
High Performance Computing: an Introduction for the Society of Actuaries
Adam DeConinck
 
Xrm xensummit
The Linux Foundation
 
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-1
tcloudcomputing-tw
 
Взгляд на облака с точки зрения HPC
Olga Lavrentieva
 
Harness the Power of the Cloud
InnoTech
 
SAP Virtualization Week 2012 - The Lego Cloud
aidanshribman
 
Changing Landscape of Data Centers
Suhas Kelkar
 
Building Scale Free Applications with Hadoop and Cascading
cwensel
 
Тенденции развития современных Центров Обработки Данных
Cisco Russia
 
Cloud Computing with .Net
Wesley Faler
 
Google Compute and MapR
MapR Technologies
 
Brief about Windows Azure Platform
K.Mohamed Faizal
 
CloudStack Architecture Future
Kimihiko Kitase
 
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
cwensel
 
The Network\'s IN the (virtualised) Server: Virtualized Io In Heterogeneous M...
scarisbrick
 
Virtual sharp cloud aware bc dr up 2012 cloud
Khazret Sapenov
 
Architecting a Private Cloud - Cloud Expo
smw355
 
2012 open storage summit keynote
Randy Bias
 
Cloud stack overview
gavin_lee
 
High Performance Computing: an Introduction for the Society of Actuaries
Adam DeConinck
 
Ad

More from Jason TC HOU (侯宗成) (14)

PDF
A Data Culture in Daily Work - Examples @ KKTV
Jason TC HOU (侯宗成)
 
PDF
Triangulating Data to Drive Growth
Jason TC HOU (侯宗成)
 
PDF
Design & Growth @ KKTV - uP!ck Sharing
Jason TC HOU (侯宗成)
 
PDF
文武雙全的產品設計 DESIGNING WITH DATA
Jason TC HOU (侯宗成)
 
PDF
Growth @ KKTV
Jason TC HOU (侯宗成)
 
PDF
Growth 的基石 用戶行為追蹤
Jason TC HOU (侯宗成)
 
PDF
App 的隱形殺手 - 留存率
Jason TC HOU (侯宗成)
 
PPTX
Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of ...
Jason TC HOU (侯宗成)
 
PPTX
DevoFlow - Scaling Flow Management for High-Performance Networks
Jason TC HOU (侯宗成)
 
PPTX
Valiant Load Balancing and Traffic Oblivious Routing
Jason TC HOU (侯宗成)
 
PPTX
Software-Defined Networking , Survey of HotSDN 2012
Jason TC HOU (侯宗成)
 
PPTX
Software-Defined Networking SDN - A Brief Introduction
Jason TC HOU (侯宗成)
 
PPTX
Data Center Network Multipathing
Jason TC HOU (侯宗成)
 
PPTX
OpenStack Framework Introduction
Jason TC HOU (侯宗成)
 
A Data Culture in Daily Work - Examples @ KKTV
Jason TC HOU (侯宗成)
 
Triangulating Data to Drive Growth
Jason TC HOU (侯宗成)
 
Design & Growth @ KKTV - uP!ck Sharing
Jason TC HOU (侯宗成)
 
文武雙全的產品設計 DESIGNING WITH DATA
Jason TC HOU (侯宗成)
 
Growth 的基石 用戶行為追蹤
Jason TC HOU (侯宗成)
 
App 的隱形殺手 - 留存率
Jason TC HOU (侯宗成)
 
Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of ...
Jason TC HOU (侯宗成)
 
DevoFlow - Scaling Flow Management for High-Performance Networks
Jason TC HOU (侯宗成)
 
Valiant Load Balancing and Traffic Oblivious Routing
Jason TC HOU (侯宗成)
 
Software-Defined Networking , Survey of HotSDN 2012
Jason TC HOU (侯宗成)
 
Software-Defined Networking SDN - A Brief Introduction
Jason TC HOU (侯宗成)
 
Data Center Network Multipathing
Jason TC HOU (侯宗成)
 
OpenStack Framework Introduction
Jason TC HOU (侯宗成)
 
Ad

Recently uploaded (20)

PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
July Patch Tuesday
Ivanti
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 

Introduction to Cloud Data Center and Network Issues

  • 1. Introduction to Cloud Data Center and Network Issues Presenter: Jason, Tsung-Cheng, HOU Advisor: Wanjiun Liao July 2nd, 2012 1
  • 2. Agenda • Cloud Computing / Data Center Basic Background • Enabling Technology • Infrastructure as a Service A Cloud DC System Example • Networking Issues in Cloud DC 2
  • 3. Brand New Technology ?? • Not exactly, for large scale computing in the past: utility mainframe, grid computing, super computer • Past demand: scientific computing, large scale engineering (finance, construction, aerospace) • New demand: search, e-commerce, content streaming, application/web hosting, IT outsourcing, mobile/remote apps, big data processing… • Difference: aggregated individual small demand, highly volatile and dynamic, not all profitable – Seek for economy of scale to cost down – Rely on resilient, flexible and scalable infrastructure 3
  • 4. Cloud Data Center Traditional Data Center Cloud Data Center Co-located Integrated Servers Dependent Failure Fault-Tolerant Partitioned Unified Resources Performance Interrelated Performance Isolated Separated Centralized Full Control Management Manual With Automation Plan Ahead Flexible Scheduling Overprovisioning Scalable Renting Per Physical Machines Per Logical Usage Application / Runs and Moves across Fixes on Designated Servers Services All VMs Cloud Computing: Cloud DC Requirements: End Device / Client • On-Demand Self-Service • Measured Usage • Resource Pooling • Broad Network Access Common App. Platform • Rapid Elasticity Network Dependent Cloud Data Center 4
  • 5. Server and Switch Organization What’s on Amazon? Dropbox, Instagram, Netflix, Pinterest Foursquare, Quora, Twitter, Yelp Nasdaq, New York Times…. …and a lot more
  • 7. Clusters of Commodities • Current cloud DC achieves high performance using commodity servers and switches →no specialized solution for supercomputing • Supercomputing still exists, an example is Symmetric Multi-Processing server →128-core on shared RAM-like memory • Compare to 32 4-core LAN connected server – Accessing global data, SMP: 100ns, LAN: 100us • Computing penalty from delayed LAN-access • Performance gain when clusters grow large 7
  • 8. Penalty for Latency in LAN-access This is not a comparison of server-cluster and single high-end server. f: # global data I/O in 10ms High: f = 100 Medium: f = 10 Low: f = 1 ?
  • 9. Performance gain when clusters grow large 9
  • 10. Agenda • Cloud Computing / Data Center Basic Background • Enabling Technology • Infrastructure as a Service A Cloud DC System Example • Networking Issues in Cloud DC 10
  • 11. A DC-wide System • Has software systems consisting of: – Distributed system, logical clocks, coordination and locks, remote procedural call…etc – Distributed file system – (We do not go deeper into above components) – Parallel computation: MapReduce, Hadoop • Virtualized Infrastructure: – Computing: Virtual Machine / Hypervisor – Storage: Virtualized / distributed storage – Network: Network virtualization…the next step? 11
  • 12. MapReduce • 100 TB datasets – Scanning on 1 node – 23 days – On 1000 nodes – 33 minutes • Single machine performance does not matter – Just add more… but HOW to use so many clusters ? – How to make distributed programming simple and elegant ? • Sounds great, but what about MTBF? – MTBF = Mean Time Between Failures – 1 node – once per 3 years – 1000 nodes – 1 node per 1 day • MapReduce refer to both: – Programming framework – Fault-tolerant runtime system 12
  • 13. MapRaduce: Word Counting Shuffle and Sort ↓↓↓ 13
  • 14. MapReduce: A Diagram ←Shuffle ← Sort 14
  • 15. Distributed Execution Overview Master also deals with: • Worker status updates User • Fault-tolerance Program • I/O Scheduling fork fork • Automatic distribution fork • Automatic parallelization Master assign assign map reduce Input Data Worker write Output local Worker File 0 Split 0 read write Split 1 Worker Split 2 Output Worker File 1 Worker remote read,sort ↑↑↑↑↑ Shuffle & Sort
  • 16. VM and Hypervisor • Virtual Machine: A software package, sometimes using hardware acceleration, that allows an isolated guest operating system to run within a host operating system. • Stateless: Once shut down, all HW states disappear. • Hypervisor: A software platform that is responsible for creating, running, and destroying multiple virtual machines. • Type I and Type II hypervisor 16
  • 17. Type 1 vs. Type 2 Hypervisor 17
  • 18. Concept of Virtualization • Decoupling HW/SW by abstraction & layering • Using, demanding, but not owning or configuring • Resource pool: flexible to slice, resize, combine, and distribute • A degree of automation by software HOST 1 HOST 2 HOST 3 HOST 4, VMs Hypervisor: Turns 1 server into many “virtual machines” (instances or VMs) (VMWare ESX, Citrix XEN Server, KVM, Etc.) 18
  • 19. Concept of Virtualization • Hypervisor: abstraction for HW/SW • For SW: Abstraction and automation of physical resources – Pause, erase, create, and monitor – Charge services per usage units • For HW: Generalized interaction with SW – Access control – Multiplex and demultiplex • Ultimate hypervisor control from operator • Benefit? Monetize operator capital expense 19
  • 20. I/O Virtualization Model • Protect I/O access, multiplex / demultiplex traffic • Deliver PKTs among VMs in shared memory • Performance bottleneck: Overhead when communicating between driver domain and VMs • VM scheduling and long queue→delay/throughput variance Bottleneck: CPU/RAM I/O lag VM Scheduling I/O Buffer Queue 20
  • 21. Agenda • Cloud Computing / Data Center Basic Background • Enabling Technology • Infrastructure as a Service A Cloud DC System Example • Networking Issues in Cloud DC 21
  • 22. OpenStack Status • OpenStack – Founded by NASA and Rackspace in 2010 – Today 183 companies and 3386 people – Was only 125 and 1500 in fall, 2011. – Growing fast now, latest release Essex, Apr. 5th • Aligned release cycle with Ubuntu, Apr. / Oct. • Aim to be the “Linux” in cloud computing sys. • Open-source v.s. Amazon and vmware • Start-ups are happening around OpenStack • Still lacks big use cases and implementation 22
  • 23. A Cloud Management Layer is Questions arise as the environment grows... “VM sprawl” can make things unmanageable very quickly Missing APPS USERS ADMINS How do you empower employees to self- How do you make your apps cloud aware? service? Where should you provision new VMs? How do you keep track of it all? + A Cloud Management Layer Is Missing 1. Server Virtualization Virtualization 2. Cloud Data Center 3. Cloud Federation
  • 24. Solution: OpenStack, The Cloud Operating System Cloud Operating System A new management layer that adds automation and control APPS USERS ADMINS CLOUD OPERATING SYSTEM 1. Server Virtualization Server Virtualization 2. Cloud Data Center 3. Cloud Federation
  • 25. A common platform is here. Common Platform OpenStack is open source software powering public and private clouds. Private Cloud: Public Cloud: OpenStack enables cloud federation Connecting clouds to create global resource pools Washington Common software platform making Federation possible Texas California Europe 1. Server Virtualization Virtualization 2. Cloud Data Center 3. Cloud Federation
  • 26. Horizon OpenStack Key Components Glance Swift Nova Keystone
  • 27. Keystone Main Functions • Provides 4 primary services: – Identity: User information authentication – Token: After logged in, replace account-password – Service catalog: Service units registered – Policies: Enforces different user levels • Can be backed by different databases. 27
  • 29. Swift Implementation Duplicated storage, load balancing ↑ Logical view ↓Physical arrangement ← Stores real objects ←Stores object metadata ↑Stores container / object metadata
  • 30. Glance • Image storage and indexing. • Keeps a database of metadata associated with an image, discover, register, and retrieve. • Built on top of Swift, images store in Swift • Two servers: – Glance-api: public interface for uploading and managing images. – Glance-registry: private interface to metadata database • Support multiple image formats 30
  • 31. Glance Process Upload or Store Download or Get 31
  • 32. Nova • Major components: – API: public facing interface – Message Queue: Broker to handle interactions between services, currently based on RabbitMQ – Scheduler: coordinates all services, determines placement of new resources requested – Compute Worker: hosts VMs, controls hypervisor and VMs when receives cmds on Msg Queue – Volume: manages permanent storage 32
  • 37. Agenda • Cloud Computing / Data Center Basic Background • Enabling Technology • Infrastructure as a Service A Cloud DC System Example • Networking Issues in Cloud DC 37
  • 38. Primitive OpenStack Network • Each VM network owned by one network host – Simply a Linux running Nova-network daemon • Nova Network node is the only gateway • Flat Network Manager: – Linux networking bridge forms a subnet – All instances attached same bridge – Manually configure server, controller, and IP • Flat DHCP Network Manager: – Add DHCP server along same bridge • Only gateway, per-cluster, fragmentation 38
  • 39. OpenStack Network Linux server running Nova-network daemon. VMs bridged in to a raw Ethernet device 39 The only gateway of all NICs bridged into the net.
  • 40. Conventional DCN Topology Public Internet DC Layer-3 DC Layer-2 • Oversubscription • Scale-up proprietary design – expensive • Fragmentation of resources: • Inflexible addressing, static routing Network limits cross-DC communication • Inflexible network configuration • Hinders applications’ scalability Protocol baked / embedded on chips • Only reachability isolation Dependent performance bottleneck 40
  • 41. A New DCN Topology Core Switches Full Bisection Aggr Full Bisection Edge Pod-0 Pod-1 k=4 • k pod with (k2/4 hosts, k switches)per pod • Cabling explosion, copper trans. range • (k/2)2 core switches, (k/2)2 paths for S-D • Existing addressing/routing/forwarding do • (5k2/4) k-port switches, k3/4 hosts not work well on fat-tree / clos • 48-port: 27,648 hosts, 2,880 switches • Scalability issue with millions of end hosts • Full bisection BW at each level • Configuration of millions of parts • Modular scale-out cheap design 41
  • 43. IP PMAC (Location) 10.5.1.2 10.2.4.5 (00:00):01:02:(00:01) (00:02):00:02:(00:01) Addressing Controller 4 Proxy ARP 2 Switch PKT Rewrite IP AMAC (Identity) PMAC (Location) 5 10.2.4.5 00:19:B9:FA:88:E2 00:02:00:02:00:01 dst IP dst MAC MAC 1 ARP 10.2.4.5 00:02:00:02:00:01 ??? 3 • Switches: 32~64 K flow entries, 640 KB • AMAC: Identity, maintained at switches • Assume 10,000k VMs on 500k servers • PMAC: (pod,position,port,vmid) • Identity-based: 10,000k flat entries, IP→ PMAC, mapped at controller 100 MB huge, flexible, per-VM/APP • Routing: Static VLAN or ECMP-hashing VM migration, continuous connection (To be presented later) • Location-based: 1k hierarchical entries • Consistency / efficiency / fault-tolerant? 10 KB easy storage, fixed, per-server Solve by (controller, SW, host) diff. roles Easy forwarding, no extra reconfiguration • Implemented: server- / switch- centric 43
  • 44. Load Balancing / Multipathing Per-flow hashing Pre-configured Randomization VLAN Tunnels End hosts “transparent”: Sends traffic to networks as usual, without seeing detail OpenFlow: Controller talks to (HW/SW switches, kernel agents), manipulates entries • Clusters grow larger, nodes demand faster • Need to utilize multiple paths and • Network delay / PKT loss → Performance ↓ capacity! • Still, only commodity hardware • VLAN: multiple preconfigured tunnels • Aggregated individual small demand →Topological dependent → Traffic extremely volatile / unpredictable • Multipath-TCP: modified transport mech. • Traffic matrix: dynamic, evolving, not steady →Distributes and shifts load among paths • User: Don’t know infrastructure, topology • ECMP/VLB: Randomization, header hash • Operator: Don’t know application, traffic →Only randomized upward paths 44 →Only for symmetric traffic
  • 45. Flow Scheduling • ECMP-hashing → per-flow static path • Flow-to-core mappings, re-allocat flows • Long-live elephant flows may collide • What time granularity? Fast enough? • Some links full, others under-utilized • Controller computation? Scalable? 45
  • 46. Reactive Reroute Qeq Qoff Q • Congestion Point: Switch • QCN in IEEE 802.1Q task group -Samples incoming PKTs -For converged networks, assure zero drop FB -Monitor and maintain queue level -Like TCP AIMD but on L2, w/ diff purpose -Send feedback msg to src -CP directly reacts, not end-to-end -Feedback msg according to Q-len -Can be utilized for reactive reroute -Choose to re-hash elephant flows • May differentiate FB msg • Reaction Point: Source Rate Limiter -Decrease more for lower classes (QoS) -Decrease rate according to feedback -Decrease more for larger flows (Fairness) -Increase rate by counter / timer • Large flows are suppressed →High delay46
  • 47. Controller • DCN relies on controller for many functions: – Address mapping / mgmt / registration / reuse – Traffic load scheduling / balancing – Route computation, switch entries configuration – Logical network view ↔ physical construction • An example: Onix – Distributed system – Maintain, exchange & distribute net states • Hard static: SQL DB • Soft dynamic: DHT – Asynchronous but eventually consistent 47
  • 48. Tenant View vs Provider View
  • 49. Onix Functions Control Plane / Applications API Provides Abstraction Logical Forwarding Plane Control Logical States Provides Commands Abstractions Network Distributed Mapping Info Base System Network Hypervisor Onix / Network OS Distributes, Configures Real States OpenFlow 49
  • 50. OpenStack Quantum Service XenServer: Domain 0 Kernel-based VM: Linux Kernel
  • 51. Always Call for Controller? ASIC switching rate Latency: 5 s 51
  • 52. Always Call for Controller? CPU Controller Latency: 2 ms A huge waste of resources! 52
  • 53. Conclusion • Concept of cloud computing is not brand new – But with new usage, demand, and economy – Aggregated individual small demands – Thus pressures traditional data centers – Clusters of commodities for performance and economy of scale • Data Center Network challenges – Carry tons of apps, tenants, and compute tasks – Network delay / loss = service bottleneck – Still no consistent sys / traffic / analysis model – Large scale construct, no public traces, practical? 53
  • 55. Reference • YA-YUNN SU, “Topics in Cloud Computing”, NTU CSIE 7324 • Luiz André Barroso and Urs Hölzle, “The Datacenter as a Computer - An Introduction to the Design of Warehouse-Scale Machines”, Google Inc. • 吳柏均,郭嘉偉, “MapReduce: Simplified Data Processing on Large Clusters”, CSIE 7324 in class presentation slides. • Stanford, “Data Mining”, CS345A, http://www.stanford.edu/class/cs345a/slides/02-mapreduce.pdf • Dr. Allen D. Malony, CIS 607: Seminar in Cloud Computing, Spring 2012, U. Oregon http://prodigal.nic.uoregon.edu/~hoge/cis607/ • Manel Bourguiba, Kamel Haddadou, Guy Pujolle, “Packet aggregation based network i/o virtualization for cloud computing”, Computer Communication 35, 2012 • Eric Keller, Jen Roxford, “The „Platform as a Service‟ Model for Networking”, in Proc. INM WREN , 2010 • Martin Casado, Teemu Koponen, Rajiv Ramanathan, Scott Shenker, “Virtualizing the Network Forwarding Plane”, in Proc. PRESTO (November 2010) • Guohui Wang T. S. Eugene Ng, “The Impact of Virtualization on Network Performance of Amazon EC2 Data Center”, INFOCOMM 2010 • OpenStack Documentation http://docs.openstack.org/ 55
  • 56. Reference • Bret Piatt, OpenStack Overview, OpenStack Tutorial http://salsahpc.indiana.edu/CloudCom2010/slides/PDF/tutorials/OpenStackTutorialIEEEClo udCom.pdf http://www.omg.org/news/meetings/tc/ca-10/special-events/pdf/5-3_Piatt.pdf • Vishvananda Ishaya, Networking in Nova http://unchainyourbrain.com/openstack/13-networking-in-nova • Jaesuk Ahn, OpenStack, XenSummit Asia http://www.slideshare.net/ckpeter/openstack-at-xen-summit-asia http://www.slideshare.net/xen_com_mgr/2-xs-asia11kahnopenstack • Salvatore Orlando, Quantum: Virtual Networks for Openstack http://qconlondon.com/dl/qcon-london- 2012/slides/SalvatoreOrlando_QuantumVirtualNetworksForOpenStackClouds.pdf • Dan Wendlandt, Openstack Quantum: Virtual Networks for OpenStack http://www.ovirt.org/wp-content/uploads/2011/11/Quantum_Ovirt_discussion.pdf • David A. Maltz, “Data Center Challenges: Building Networks for Agility, Senior Researcher, Microsoft”, Invited Talk, 3rd Workshop on I/O Virtualization, 2011 http://static.usenix.org/event/wiov11/tech/slides/maltz.pdf • Amin Vahdat, “PortLand: Scaling Data Center Networks to 100,000 Ports and Beyond”, Stanford EE Computer Systems Colloquium, 2009 http://www.stanford.edu/class/ee380/Abstracts/091118-DataCenterSwitch.pdf 56
  • 57. Reference • Mohammad Al-Fares , Alexander Loukissas , Amin Vahdat, “A scalable, commodity data center network architecture”, ACM SIGCOMM 2008 • Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, Sudipta Sengupta, “VL2: a scalable and flexible data center network”, ACM SIGCOMM 2009 • Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya, Amin Vahdat, “PortLand: a scalable fault-tolerant layer 2 data center network fabric”, ACM SIGCOMM 2009 • Jayaram Mudigonda, Praveen Yalagandula, Mohammad Al-Fares, Jeffrey C. Mogul, “SPAIN: COTS data-center Ethernet for multipathing over arbitrary topologies”, USENIX NSDI 2010 • Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, Jonathan Turner, “OpenFlow: enabling innovation in campus networks”, ACM SIGCOMM 2008 • Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, Amin Vahdat, “Hedera: dynamic flow scheduling for data center networks”, USENIX NSDI 2010 • M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, R. Pan, B. Prabhakar, and M. Seaman, “Data center transport mechanisms: Congestion control theory and IEEE standardization,” Communication, Control, and Computing, 2008 46th Annual Allerton Conference on 57
  • 58. Reference • A. Kabbani, M. Alizadeh, M. Yasuda, R. Pan, and B. Prabhakar. “AF-QCN: Approximate fairness with quantized congestion notification for multitenanted data centers”, In High Performance Interconnects (HOTI), 2010, IEEE 18th Annual Symposium on • Adrian S.-W. Tam, Kang Xi H., Jonathan Chao , “Leveraging Performance of Multiroot Data Center Networks by Reactive Reroute”, 2010 18th IEEE Symposium on High Performance Interconnects • Daniel Crisan, Mitch Gusat, Cyriel Minkenberg, “Comparative Evaluation of CEE-based Switch Adaptive Routing”, 2nd Workshop on Data Center - Converged and Virtual Ethernet Switching (DC CAVES), 2010 • Teemu Koponen et al., “Onix: A distributed control platform for large-scale production networks”, OSDI, Oct, 2010 • Andrew R. Curtis (University of Waterloo); Jeffrey C. Mogul, Jean Tourrilhes, Praveen Yalagandula, Puneet Sharma, Sujata Banerjee (HP Labs), SIGCOMM 2011 58
  • 60. Symmetric Multi-Processing (SMP): Several CPUs on shared RAM-like memory ↑Data distributed evenly among nodes 60
  • 61. Computer Room Air Conditioning Computer Room Air Conditioning

Editor's Notes

  • #16: This is the distributed overview1.User run it’s program, the input data will be split into many pieces, each pieces is 64MB2. The program will copy to many machine, one of those is master, and the master will assign some worker be mapper, some workers be reducer.3. Mapper will red the content of the corresponding input split. It pass each key-value pair to the map function, and the intermediate generate by mapper will be stored in memory4. The mapper will write intermediate data to local disk periodically. 5. After all mapper finished. The reducer will read the corresponding intermediate data and sort each key-value pair by key. This make sure that the data with same key will be group together.6. The reducer run reduce function and output the result.7. When all map tasks and reduce task finish, the mapreduce job is finished.