CloudComputingNetworking ChallengesandOpportunitiesforInnovations Final
CloudComputingNetworking ChallengesandOpportunitiesforInnovations Final
Abstract
Cloud computing materializes the vision of utility computing. Tenants can benefit from
on-demand provisioning of compute, storage and networking resources according to a pay-per-
use business model. Tenants have only limited visibility and control over network resources. The
owners of cloud computing facilities are also facing challenges in various aspect of providing
and efficiently managing infrastructure as a service (IaaS) facilities. In this work we present the
networking issues in IaaS and federation challenges that are currently addressed with existing
technologies. We also present innovative software-define networking (SDN) proposals, which
are applied to some of the challenges and could be used in future deployments as efficient
solutions.
Introduction
Over the past few years, cloud computing has rapidly emerged as a widely accepted
computing paradigm built around core concepts such as on-demand computing resources,
elastic scaling, elimination of up-front investment, reduction of operational expenses, and
establishing a pay-per-usage business model for information technology and computing
services. There are different models of cloud computing that are offered today as services like
Software as a Service (SaaS), Platform as a Service (PaaS), and infrastructure as a Service
(IaaS) [1]. IaaS, which is the focus of this work, refers to the capability that is provided to the
consumers to provision processing, storage and networks, and other fundamental computing
resources where they are able to deploy and run arbitrary software. The consumer does not
manage or control the underlying cloud infrastructure but has control over operating systems,
storage, deployed applications, and possibly selected networking components (e.g., firewalls,
load balancers, etc.). Amazon is arguably the first major proponent of IaaS through its Elastic
Computing Cloud (EC2) service.
Cloud-computing technology is still evolving. Various companies, standards bodies, and
alliances are addressing several remaining gaps and concerns. Some of these concerns are:
What are the challenges behind the virtual networking in IaaS deployment? What are the
potential solutions using the existing technologies for the implementation of virtual networks
inside IaaS vision? Is there any room to utilize innovative paradigms like Software Defined
Networking (SDN) [2] to address virtual networking challenges? When cloud federation (or even
cloud bursting) is involved, should the servers in the cloud be on the same Layer 2 network as
the servers in the enterprise or, should a Layer 3 topology be involved because the cloud
servers are on a network outside the enterprise? In addition, how would this approach work
across multiple cloud data centers?
Consider a case where an enterprise uses two separate cloud service providers.
Compute and storage resource sharing along with common authentication (or migration of
authentication information) are some of the problems with having the clouds “interoperate.”
For virtualized cloud services, VM migration is another factor to be considered in federation.
In this work we present a tutorial of networking in IaaS and key challenges and issues,
which should be addressed using existing technologies or novel and innovative mechanisms.
Virtual networking and extensions of cloud computing facilities along with federation issues are
the focus of this work. SDN as a novel and innovative mechanism provides proper solutions for
these issues, which are also included in a comparison of virtual networking techniques. A high-
level SDN-based cloud federation framework as an innovative opportunity is presented in this
work and the last part of this paper concludes our contribution.
Networking in IaaS
Although cloud computing does not necessarily depend on virtualization, several cloud
infrastructures are built with virtualized servers. Within a virtualized environment, some of the
networking functionalities (e.g., switching, firewall, application-delivery controllers, and load
balancers) can reside inside a physical server. Consider the case of the software-based Virtual
Switch as shown in Figure 1. The Virtual switch inside the same physical server can be used to
switch the traffic between the VMs and aggregate the traffic for connection to the external
physical switch. The Virtual Switch is often implemented as a plug-in to the hypervisor. The VMs
have virtual Ethernet adapters that connect to the Virtual Switch, which in turn connects to the
physical Ethernet adapter on the server and to the external Ethernet switch. Unlike physical
switches, the Virtual Switch does not necessarily have to run network protocols for its
operation, nor does it need to treat all its ports the same because it knows that some of them
are connected to virtual Ethernet ports. It can function through appropriate configuration from
an external management entity.
Challenges in IaaS
Among various challenges that should be addressed in an IaaS deployment, in this work
we focus on virtual networking and cloud extension and cloud federation issues and in the
sequel we provide innovative opportunities that could be utilized to address these issues.
Existing networking protocols and architectures such as Spanning Tree protocol and
Multi-Chassis Link Aggregation (MC-LAG) can limit the scale, latency, throughput and VM
migration of enterprise cloud networks. Therefore open standards and proprietary protocols
are proposed to address cloud computing networking issues. While existing layer 3 “fat tree”
networks provide a proven approach to address the requirements for a highly virtualized cloud
data center, there are several industry standards that enhance features of a flattened layer 2
network, using Transparent Interconnection of Lots of Links (TRILL), Shortest Path Bridging
(SPB) or have the potential to enhance future systems based on SDN concepts and OpenFlow.
The key motivation behind TRILL and SPB and SDN-based approach is the relatively flat nature
of the data-center topology and the requirement to forward packets across the shortest path
between the endpoints (servers) to reduce latency, rather than a root bridge or priority
mechanism normally used in the Spanning Tree Protocol (STP). The IEEE 802.1Qaz, known as
Enhanced Transmission Selection (ETS), in line with other efforts, allows low-priority traffic to
burst and use the unused bandwidth from the higher-priority traffic queues, thus providing
greater flexibility [4]. Vendor proprietary protocols are also developed by major networking
equipment manufacturers to address the same issues. For instance Juniper Networks produces
switches, using a proprietary multipath L2/L3 encapsulation protocol called QFabric, which
allows multiple distributed physical devices in the network to share a common control plane
and a separate common management plane. Virtual Cluster Switching (VCS) is a multipath layer
2 encapsulation protocol by Brocade, based on TRILL and Fabric Shortest Path First (FSPF) path
selection protocol and a proprietary method to discover neighboring switches. Cisco’s
FabricPath, is a multipath layer 2 encapsulation based on TRILL, which does not include TRILL’s
next-hop header, and has a different MAC learning technique. They all address the same issues
with different features for scalability, latency, oversubscription, and management. However,
none of these solutions have reached the same level of maturity as STP and MAC-LAG [4].
Layer 2 (switching) and Layer 3 (routing) are two possible options for cloud
infrastructure networking. Layer 2 is the simpler option, where the Ethernet MAC address and
Virtual LAN (VLAN) information are used for forwarding. The drawback of switching (L2) is
scalability. L2 networking flattens the network topology, which is not ideal when there is large
number of nodes. Routing (L3) option and subnets provide segmentation for the appropriate
functions at the cost of lower forwarding performance and network complexity.
Existing cloud networking architectures follow the “one size fits all” paradigm in meeting
the diverse requirements of a cloud. The network topology, forwarding protocols, and security
policies are all designed looking at the sum of all requirements preventing the optimal usage
and proper management of the network. Some of the challenges in the existing cloud networks
are:
Providers of cloud computing services are currently operating their own data centers.
Connectivity between the data centers to provide the vision of “one cloud” is completely within
the control of the cloud service provider. There may be situations where an organization or
enterprise needs to be able to work with multiple cloud providers due to locality of access,
migration from one cloud service to another, merger of companies working with different cloud
providers, cloud providers who provide best-of-class services, and similar cases. Cloud
interoperability and the ability to share various types of information between clouds become
important in such scenarios. Although cloud service providers might see less immediate need
for any interoperability, enterprise customers will see a need to push them in this direction.
This broad area of cloud interoperability is sometimes known as cloud federation. Cloud
federation manages consistency and access controls when two or more independent cloud
computing facilities share either authentication, computing resources, command and control,
or access to storage resources. Some of the considerations in cloud federation are as follows:
• An enterprise user wishing to access multiple cloud services would be better served if
there were just a single authentication and/or authorization mechanism (i.e., single
sign-on scheme). This may be implemented through an authentication server
maintained by an enterprise that provides the appropriate credentials to the cloud
service providers. Alternatively, a central trusted authentication server could be used to
which all cloud services are interfaced. Computing and storage resources may be
orchestrated through the individual enterprise or through an interoperability scheme
established between the cloud providers. Files may need to be exchanged, services
invoked, and computing resources added or removed in a proper and transparent
manner. A related area is VM migration and how it can be done transparently.
• Cloud federation has to provide transparent workload orchestration between the clouds
on behalf of the enterprise user. Connectivity between clouds includes Layer 2 and/or
Layer 3 considerations and tunneling technologies that need to be agreed upon.
Consistency and a common understanding are required independent of the
technologies. An often ignored concern for cloud federation is charging or billing and
reconciliation. Management and billing systems need to work together for cloud
federation to be a viable option. This reality is underlined by the fact that clouds rely on
per-usage billing. Cloud service providers might need to look closely at telecom service
provider business models for peering arrangements as a possible starting point. Cloud
federation is a relatively new area in cloud computing. It is likely that standard
organizations will first need to agree on a set of requirements before the service
interfaces can be defined and subsequently materialized.
Consider an IaaS cloud, to which an enterprise connects to temporarily augment its server
capacity. It would be ideal if the additional servers provided by the IaaS cloud were part of the
same addressing scheme of the enterprise (e.g., 10.x.x.x). As depicted in Figure 2, the IaaS cloud
service provider has partitioned a portion of its public cloud to materialize a private cloud for
enterprise “E”. The private cloud is reachable as a LAN extension to the servers in enterprise E’s
data center. A secure VPN tunnel establishes the site-to-site VPN connection. The VPN gateway
on the cloud service provider side (private cloud “C”) maintains multiple contexts for each
private cloud. Traffic for enterprise “E” is decrypted and forwarded to an Ethernet switch to
the private cloud. A server on enterprise “E”’s internal data center sees a server on private
cloud “C” to be on the same network. Some evolution scenarios can be considered for this
scheme [5]:
• Automation of the VPN connection between the enterprise and cloud service provider:
This automation can be done through a management system responsible for the cloud
bursting and server augmentation. The system sets up the VPN tunnels and configures
the servers on the cloud service provider end. The management system is set up and
operated by the cloud service provider.
• Integration of the VPN functions with the site-to-site VPN network functions from
service providers: For instance, service providers offer MPLS Layer 3 VPNs and Layer 2
VPNs (also known as Virtual Private LAN Service, or VPLS) as part of their offerings.
Enterprise and cloud service providers could be set up to use these network services.
• Cloud service providers using multiple data centers: In such a situation, a VPLS-like
service can be used to bridge the individual data centers, providing complete
transparency from the enterprise side about the location of the cloud servers.
Cloud networking is not a trivial task. Modern data centers designed to provide cloud service
offerings face similar challenges to build the Internet itself due to their size. At its simplest case
(e.g. providing VMs like Amazon’s EC2), we are talking about data centers that need to provide
as much as 1 million networked devices in a single facility. These requirement means the need
for technologies with high performance, scalable, robust, reliable, flexible, easy to monitor,
control and manageable.
SDN-based Cloud Computing Networking
SDN [7] is an emerging network architecture where “network control functionality” is
decoupled from “forwarding functionality” and is directly programmable [6], [7]. This migration
of control, formerly tightly integrated in individual networking equipment, into accessible
computing devices (logically centralized) enables the underlying infrastructure to be
“abstracted” for applications and network services. Therefore applications can treat the
network as a logical or virtual entity. As a result, enterprises and carriers gain unprecedented
programmability, automation, and network control, enabling them to build innovative, highly
scalable, flexible networks that readily adapt to changing business needs.
A logical view of the SDN architecture is depicted in Figure 3. OpenFlow is the first
standard interface designed specifically for SDN, providing high-performance, granular traffic
control across multiple vendors’ network devices. Network intelligence is logically centralized in
SDN control software (e.g. OpenFlow controllers), which maintain a global view of the network.
As a result the network, in its ultimate abstracted view, appears as a single logical switch.
Adapting SDN architecture, greatly simplifies the design and operation of networks since it
removes the need to know and understand the operation details of hundreds of
protocols/standards. Enterprises and carriers gain vendor-independent control over the entire
network from a single logical point.
In addition to the network abstraction, SDN architecture will provide and support a set
of APIs that simplifies the implementation of common network services (e.g., slicing,
virtualization, routing, multicast, security, access control, bandwidth management, traffic
engineering, QoS, processor and/or storage optimization, energy consumption, and various
form of policy management). SDN’s promise is to enable the following key features:
A list of SDN & OpenFlow-based open source projects and initiatives are compiled in
Table 1. OpenFlow-based SDN has created opportunities to help enterprises build more
deterministic, more manageable and more scalable virtual networks that extend beyond
enterprise on-premises data centers or private clouds, to public IT resources, while ensuring
higher network efficiency to carriers seeking to improve their services profitability by
provisioning more services, with fewer, better-optimized resources.
Table 1: A categorized list of OpenFlow-based Open Source projects
Innovation opportunities
The first constraint of VLANs is 4K limitation of VLANs. Secondly, all the MAC addresses
from all the VMs are visible in the physical switches of the network. This can fill up the MAC
table of physical switches, especially if the deployed switches are legacy ones. Typical NICs are
able to receive unicast frames for a few MAC addresses. If the number of VMs are more than
these limit, then the NIC has to be put in promiscuous mode, which engages the CPU to handle
flooded packets. This will waste CPU cycles of hypervisor and bandwidth.
The VM-aware networking (architectural group b) scales a bit better. The whole idea is
that the VLAN list on the physical switch to the hypervisor link is dynamically adjusted based on
the server need. This can be done with VM-aware TOR switches (Arista, Force 10, Brocade), or
VM-Aware network management server (Juniper, Alcatel-Lucent, NEC), which configures the
physical switches dynamically, or VM-FEX from Cisco, or EVB from IBM. This approach reduces
flooding to the servers and CPU utilization and using proprietary protocols (e.g., Qfabric) it is
possible to decrease the flooding in physical switches. However, MAC addresses are still visible
in the physical network, the 4K limitations remain intact and the transport in physical network
is L2 based with associated flooding problems. This approach could be used for large virtualized
data centers but not for IaaS clouds.
The main idea behind vCDNI is that there is a virtual distributed switch which is isolated
from the rest of the network and controlled by vCloud director and instead of VLAN, uses a
proprietary MAC-in-MAC encapsulation. Therefore the VM MAC addresses are not visible in the
physical network. Since there is a longer header in vCDNI protocol, the 4K limitation of VLANs is
not intact anymore. Although unicast flooding is not exist in this solution, but multicast flooding
indeed exist in this approach. Furthermore it still uses L2 transport.
Conceptually, VXLAN is similar to the vCDNI approach, however instead of having a
proprietary protocol on top of L2; it runs on top of UDP and IP. Therefore, inside the hypervisor
the port groups are available, which are tight to VXLAN framing, which generates UDP packets,
going down through IP stack in the hypervisor and reaches the physical IP network. VXLAN
segments are virtual layer 2 segments over L3 transport infrastructure with a 24-bit segment ID
to alleviate the traditional VLAN limitation. L2 flooding is emulated using IP multicast. The only
issue of VXLAN is that it doesn’t have a control plane.
Nicira NVP is very similar to VXLAN with a different encapsulation format, which is point-
to-point GRE tunnels; however the MAC-to-IP mapping is downloaded to Open vSwitch [8]
using a centralized OpenFlow controller. This controller removes the need for any flooding as it
was required in VXLAN (using IP multicast). To be precise, this solution utilizes the MAC over IP
with a control plane. The virtual switches, which are used in this approach, are OpenFlow
enabled, which means that the virtual switches can be controlled by an external OpenFlow
controller (e.g., NOX). These Open vSwitches use point-to-point GRE tunnels that unfortunately
cannot be provisioned by OpenFlow. These tunnels have to be provisioned using other
mechanisms, because OpenFlow has no Tunnel provisioning message. The Open vSwitch
Database Management Protocol (OVSDB) [9] , which is a provisioning protocol, is used to
construct a full mesh GRE tunnels between the hosts that have VMs from the same tenant.
Whenever two hosts have one VM each that belong to the same tenant a GRE tunnel will be
established between them. Instead of using dynamic MAC learning and multicast the MAC to IP
mapping are downloaded as flow forwarding rules through OpenFlow to the Open vSwitches.
This approach scales much better than VXLAN, because there is no state to maintain in the
physical network. Furthermore, ARP proxy can be used to stop L2 flooding. This approach
requires an OpenFlow and OVSDB controller to work in parallel to automatically provision GRE
tunnels.
SDN-based Federation
There are general advantages to be realized by enterprises that adopt OpenFlow-
enabled SDN as the connectivity foundation for private and/or hybrid cloud connectivity. A
logically centralized SDN control plane will provide a comprehensive view (abstract view) of
data center and cloud resources and access network availability. This will ensure cloud-
federation (cloud extensions) are directed to adequately resourced data centers, on links
providing sufficient bandwidth and service levels. Using the SDN terminologies, a high level
description of key building blocks for an SDN-based cloud federation are:
• OpenFlow enabled cloud backbone edge nodes, which connect to the enterprise and
cloud provider data center
• OpenFlow enabled core nodes which efficiently switch traffic between these edge nodes
• An OpenFlow and/or SDN-based controller to configure the flow forwarding tables in
the cloud backbone nodes and providing a WAN network virtualization application (e.g.
Optical FlowVisor [10]).
• Hybrid cloud operation and orchestration software to manage the enterprise and
provider data center federation, inter-cloud workflow, and resource management of
compute/storage and inter-data center network management
SDN-based federation will facilitate multi-vendor networks between enterprise and service
provider data centers, helping enterprise customers to choose best-in-class vendors, while
avoiding vendor lock-in; pick a proper access technology from a wider variety (e.g. DWDM, DSL,
HFC, LTE, PON, etc.); access dynamic bandwidth for ad-hoc, timely inter-data center workload
migration and processing; and eliminate the burden of underutilized, costly high-capacity fixed
private leased lines. SDN-enabled bandwidth-on-demand services provide automated and
intelligent service provisioning, driven by cloud service orchestration logic and customer
requirements.
Conclusions
In this article the infrastructure as a service (IaaS) architecture and key challenges with a focus
on virtual networks and cloud federation were presented. IaaS has provided a flexible model, in
which customers are billed according to their compute usage, storage consumption, and the
duration of usage. Some of the challenges in the existing Cloud Networks are: guaranteed
performance of applications when applications are moved from on-premises to the cloud
facility, flexible deployment of appliances (e.g., deep packet inspection, intrusion detection
systems, or firewalls), and associated complexities to the policy enforcement and topology
dependence. A typical three layer data center network includes TOR layer connecting the
servers in a rack, aggregation layer and core layer, which provides connectivity to/from the
Internet edge. This multi-layer architecture imposes significant complexities in defining
boundaries of L2 domains, L3 forwarding networks and policies, and layer-specific multi-vendor
networking equipment. Applications should run “out of the box” as much as possible, in
particular for IP addresses and for network-dependent failover mechanisms. Network
appliances and servers (e.g., hypervisors) are typically tied to a statically configured physical
network, which implicitly creates a location dependency constraint. SDN architecture in
addition to decoupling the data forwarding and control planes will provide and support a set of
APIs that simplifies the implementation of common network services. VLAN, VM-aware
networking, vCDNI, VXLAN and Nicira NVP are technologies to provide virtual networks in cloud
infrastructures. Nicira NVP, which utilizes MAC in IP encapsulation and external control plane
provides the efficient solution for virtual network implementation. OpenFlow core and edge
nodes with a proper OpenFlow controller can be considered as a novel cloud federation
mechanism. SDN-based federation will facilitate multi-vendor networks between enterprise
and service provider data centers, helping enterprise customers to choose best-in-class
vendorsNetwork fabric, which is a proposal for network edge version of OpenFlow is one of the
recent proposals towards extension of SDN to increase the simplicity and flexibility of future
network designs. What we should make clear, is that SDN does not, by itself, solve all the issues
of cloud computing networking. The performance of SDN deployments, the scalability issue, the
proper specification of northbound interface in SDN and co-existence and/or integration of SDN
and network function virtualization, and proper extension to the OpenFlow to make it a viable
approach in WAN-based application (e.g. EU FP7 SPARC project) are among the topics that need
further research and investigations.
References:
[1]. P. Mell and T. Grance, “The NIST Definition of Cloud Computing,” September 2011:
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf, accessed 30
November 2012.
[2].T. Koponen , M. Casado , N. Gude , J. Stribling , L. Poutievski , M. Zhu , R. Ramanathan , Y.
Iwata , H. Inoue , T. Hama , S. Shenker, “Onix: A Distributed Control Platform for Large-
scale Production Networks”, in Proc. OSDI, 2010.
[3].R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V.
Subramanya, A. Vahdat, “PortLand: A Scalable Fault-Tolerant Layer 2 Data Center
Network Fabric,” ACM SIGCOMM Comput. Communi., vol. 39, no. 4, pp. 39-50, October
2009.
[4]. C. J. Sher Decusatis, A. Carranza, C. M. Decusatis, "Communication within clouds: open
standards and proprietary protocols for data center networking," Communications
Magazine, IEEE , vol.50, no.9, pp.26-33, September 2012.
[5]. T. Wood, P. Shenoy, K. K. Ramakrishnan, J. Von der Merwe, “CloudNet: A Platform for
Optimized WAN migration of Virtual Machines,” Technical Report 2010-002, University
of Massachusetts, http://people.cs.umass.edu/~twood/pubs/cloudnet-tr.pdf, accessed
30 November 2012.
[6]. N. McKeowen, et. al, “OpenFlow: Enabling Innovation in Campus Networks”, OpenFlow
white paper, 14 March 2008, available online:
http://www.openflow.org//documents/openflow-wp-latest.pdf, accessed 30 November
2012.
[7]. OpenFlow Switch Specifiation, version 1.3.1 (wire protocol 0x04), Open Networking
Foundation, 6 September 2012,
https://www.opennetworking.org/images/stories/downloads/specification/openflow-
spec-v1.3.1.pdf, access: 30 November 2012.
[8]. Open vSwitch, An Open Virtual Switch: http://openvswitch.org/
[9]. B. Pfaff, B. Davie, “The Open vSwitch Database Management Protocol,” Internet-draft,
draft-pfaff-ovsdb-proto-00, Nicira Inc., 20 August 2012.
[10]. S. Azodolmolky, R. Nejabati, S. Peng, A. Hammad, M. P. Channegowda, N. Efstathiou,
A. Autenrieth, P. Kaczmarek, and D. Simeonidou, “Optical FlowVisor: An OpenFlow-
based optical network virtualization approach,” In proceedings of OFC/NFOEC 2012,
paper JTh2A.41, 4-8 March 2012.