Cloud Computing - Cloud Technologies and Advancements

UNIT V
CLOUD TECHNOLOGIES AND ADVANCEMENTS
• Hadoop
• Apache Hadoop is an open source software framework used to develop data
processing applications which are executed in a distributed computing
environment.
• Applications built using HADOOP are run on large data sets distributed across
clusters of commodity computers.
• Commodity computers are mainly useful for achieving greater computational
power at low cost.
• Similar to data residing in a local file system of a personal computer system, in
Hadoop, data resides in a distributed file system which is called as a Hadoop
Distributed File system.

• Apache Hadoop consists of two sub-projects –
• Hadoop MapReduce:
• MapReduce is a computational model and software
framework for writing applications which are run on Hadoop.
• These MapReduce programs are capable of processing
enormous data in parallel on large clusters of computation
nodes.

• HDFS (Hadoop Distributed File System):
• HDFS takes care of the storage part of Hadoop applications.
• MapReduce applications consume data from HDFS.
• HDFS creates multiple replicas of data blocks and distributes
them on compute nodes in a cluster.
• This distribution enables reliable and extremely rapid
computations.

• NameNode and DataNodes
• HDFS has a master/slave architecture.
• A HDFS cluster consists of a single NameNode, a master server that
manages the file system namespace and regulates access to files by
clients.
• In addition, there are a number of DataNodes, usually one per node in
the cluster, which manage storage attached to the nodes that they run
on.
• HDFS exposes a file system namespace and allows user data to be
stored in files.

• Internally, a file is split into one or more blocks and these blocks are
stored in a set of DataNodes.
• The NameNode executes file system namespace operations like
opening, closing, and renaming files and directories.
• It also determines the mapping of blocks to DataNodes.
• The DataNodes are responsible for serving read and write requests
from the file system’s clients.
• The DataNodes also perform block creation, deletion, and replication
upon instruction from the NameNode.

Functions of NameNode:
• It is the master daemon that maintains and manages the DataNodes (slave nodes)
• It records the metadata of all the files stored in the cluster, e.g. The location of
blocks stored, the size of the files, permissions, hierarchy, etc. There are two files
associated with the metadata:
• FsImage: It contains the complete state of the file system namespace since the start of the
NameNode.
• EditLogs: It contains all the recent modifications made to the file system with respect to the
most recent FsImage.
• It records each change that takes place to the file system metadata. For example, if
a file is deleted in HDFS, the NameNode will immediately record this in the EditLog.

• It regularly receives a Heartbeat and a block report from all the
DataNodes in the cluster to ensure that the DataNodes are live.
• It keeps a record of all the blocks in HDFS and in which nodes these
blocks are located.
• The NameNode is also responsible to take care of
the replication factor of all the blocks.
• In case of the DataNode failure, the NameNode chooses new
DataNodes for new replicas, balance disk usage and manages the
communication traffic to the DataNodes.

• The NameNode and DataNode are pieces of software designed to run on commodity
machines.
• These machines typically run a GNU/Linux operating system (OS). HDFS is built using
the Java language;
• Any machine that supports Java can run the NameNode or the DataNode software.
• A typical deployment has a dedicated machine that runs only the NameNode
software.
• Each of the other machines in the cluster runs one instance of the DataNode
software.
• The architecture does not preclude running multiple DataNodes on the same
machine but in a real deployment that is rarely the case.

DataNodes
• DataNodes are the slave nodes in HDFS. Unlike NameNode, DataNode is a
commodity hardware, that is, a non-expensive system which is not of high
quality or high-availability. The DataNode is a block server that stores the data in
the local file ext3 or ext4.
• Functions of DataNode:
• These are slave daemons or process which runs on each slave machine.
• The actual data is stored on DataNodes.
• The DataNodes perform the low-level read and write requests from the file
system’s clients.
• They send heartbeats to the NameNode periodically to report the overall health
of HDFS, by default, this frequency is set to 3 seconds

• The existence of a single NameNode in a cluster greatly simplifies the
architecture of the system.
• The NameNode is the arbitrator and repository for all HDFS metadata.
• The system is designed in such a way that user data never flows
through the NameNode.

• Secondary NameNode:
• Apart from these two daemons, there is a third daemon or a process
called Secondary NameNode. The Secondary NameNode works
concurrently with the primary NameNode as a helper daemon. And
don’t be confused about the Secondary NameNode being a backup
NameNode because it is not.

• Functions of Secondary NameNode:
• The Secondary NameNode is one which constantly reads all the file
systems and metadata from the RAM of the NameNode and writes it
into the hard disk or the file system.
• It is responsible for combining the EditLogs with FsImage from the
NameNode.
• It downloads the EditLogs from the NameNode at regular intervals
and applies to FsImage. The new FsImage is copied back to the
NameNode, which is used whenever the NameNode is started the
next time

• Blocks:
• Now, as we know that the data in HDFS is scattered across the DataNodes
as blocks. Let’s have a look at what is a block and how is it formed?
• Blocks are the nothing but the smallest continuous location on your hard
drive where data is stored. In general, in any of the File System, you store
the data as a collection of blocks. Similarly, HDFS stores each file as
blocks which are scattered throughout the Apache Hadoop cluster. The
default size of each block is 128 MB in Apache Hadoop 2.x (64 MB in
Apache Hadoop 1.x) which you can configure as per your requirement.

• t is not necessary that in HDFS, each file is stored in exact multiple of
the configured block size (128 MB, 256 MB etc.). Let’s take an
example where I have a file “example.txt” of size 514 MB as shown in
above figure. Suppose that we are using the default configuration of
block size, which is 128 MB. Then, how many blocks will be created?
5, Right. The first four blocks will be of 128 MB. But, the last block will
be of 2 MB size only.

• The File System Namespace
• HDFS supports a traditional hierarchical file organization.
• A user or an application can create directories and store files inside
these directories.
• The file system namespace hierarchy is similar to most other existing
file systems;
• one can create and remove files, move a file from one directory to
another, or rename a file.
• HDFS supports user access permissions.

• While HDFS follows naming convention of the FileSystem, some paths
and names (e.g. /.reserved and .snapshot ) are reserved.
• The NameNode maintains the file system namespace.
• Any change to the file system namespace or its properties is recorded
by the NameNode.
• An application can specify the number of replicas of a file that should
be maintained by HDFS.
• The number of copies of a file is called the replication factor of that
file. This information is stored by the NameNode.

• Data Replication
• HDFS is designed to reliably store very large files across machines in a
large cluster.
• It stores each file as a sequence of blocks. The blocks of a file are
replicated for fault tolerance.
• The block size and replication factor are configurable per file.
• All blocks in a file except the last block are the same size
• HDFS provides a reliable way to store huge data in a distributed
environment as data blocks. The blocks are also replicated to provide fault
tolerance. The default replication factor is 3 which is again configurable.

• Files in HDFS are write-once (except for appends and truncates) and
have strictly one writer at any time.
• The NameNode makes all decisions regarding replication of blocks.
• It periodically receives a Heartbeat and a Blockreport from each of
the DataNodes in the cluster.
• Receipt of a Heartbeat implies that the DataNode is functioning
properly.
• A Blockreport contains a list of all blocks on a DataNode

• Replication
• The placement of replicas is critical to HDFS reliability and
performance.
• Optimizing replica placement distinguishes HDFS from most other
distributed file systems.
• This is a feature that needs lots of tuning and experience.
• The purpose of a rack-aware replica placement policy is to improve
data reliability, availability, and network bandwidth utilization.

• Now, the following protocol will be followed whenever the data is
written into HDFS:
• At first, the HDFS client will reach out to the NameNode for a Write
Request against the two blocks, say, Block A & Block B.
• The NameNode will then grant the client the write permission and
will provide the IP addresses of the DataNodes where the file blocks
will be copied eventually.
• The selection of IP addresses of DataNodes is purely randomized
based on availability, replication factor and rack awareness that we
have discussed earlier.

• Let’s say the replication factor is set to default i.e. 3. Therefore, for
each block the NameNode will be providing the client a list of (3) IP
addresses of DataNodes. The list will be unique for each block.
• Suppose, the NameNode provided following lists of IP addresses to
the client:
• For Block A, list A = {IP of DataNode 1, IP of DataNode 4, IP of DataNode 6}
• For Block B, set B = {IP of DataNode 3, IP of DataNode 7, IP of DataNode 9}
• Each block will be copied in three different DataNodes to maintain the
replication factor consistent throughout the cluster.

• Replica Selection
• To minimize global bandwidth consumption and read latency, HDFS
tries to satisfy a read request from a replica that is closest to the
reader.
• If HDFS cluster spans multiple data centers, then a replica that is
resident in the local data center is preferred over any remote replica.

• Files in HDFS are write-once .
• The NameNode makes all decisions regarding replication of blocks.
• It periodically receives a Heartbeat and a Blockreport from each of
the DataNodes in the cluster.
• Receipt of a Heartbeat implies that the DataNode is functioning
properly.
• A Blockreport contains a list of all blocks on a DataNodes

MapReduce
• Hadoop MapReduce (Hadoop Map/Reduce) is a software framework
for distributed processing of large data sets on computing clusters.
• It is a sub-project of the Apache Hadoop project.
• Apache Hadoop is an open-source framework that allows to store and
process big data in a distributed environment across clusters of
computers using simple programming models.
• MapReduce is the core component for data processing in Hadoop
framework.

• Mapreduce helps to split the input data set into a number of parts
and run a program on all data parts parallel at once.
• The term MapReduce refers to two separate and distinct tasks.
• The first is the map operation, takes a set of data and converts it into
another set of data, where individual elements are broken down into
tuples (key/value pairs).
• The reduce operation combines those data tuples based on the key
and accordingly modifies the value of the key.

• Map Task
The Map task run in the following phases:-
a. RecordReader
• The recordreader transforms the input split into records.
• It provides the data to the mapper function in key-value pairs.
• Usually, the key is the positional information and value is the data that
comprises the record.

Types of Hadoop RecordReader in
MapReduce
• The RecordReader instance is defined by the InputFormat.
• By default, it uses TextInputFormat for converting data into a key-
value pair. TextInputFormat provides 2 types of RecordReaders:
i. LineRecordReader
ii. SequenceFileRecordReader

• b. Map
• In this phase, the mapper which is the user-defined function
processes the key-value pair from the recordreader.
• It produces zero or multiple intermediate key-value pairs.
• The key is usually the data on which the reducer function does the
grouping operation.
• And value is the data which gets aggregated to get the final result in
the reducer function.

• c. Combiner
• The combiner is actually a localized reducer which groups the data in the
map phase. It is optional.
• Combiner takes the intermediate data from the mapper and aggregates
them.
• It does so within the small scope of one mapper.
• In many situations, this decreases the amount of data needed to move
over the network. For example, moving (Hello World, 1) three times
consumes more network bandwidth than moving
(Hello World, 3).

Reduce Task
• The various phases in reduce task are as follows:
i. Shuffle and Sort
• The reducer starts with shuffle and sort step.
• This step sorts the individual data pieces into a large data list.
• The purpose of this sort is to collect the equivalent keys together.

• ii. Reduce
• The reducer performs the reduce function once per key
grouping.
• The framework passes the function key and an iterator object
containing all the values pertaining to the key.
• We can write reducer to filter, aggregate and combine data in a
number of different ways.
• Once the reduce function gets finished it gives zero or more key-
value pairs to the output format.

• iii. Output Format
• This is the final step.
• It takes the key-value pair from the reducer and writes it to the file by
record writer.
• By default, it separates the key and value by a tab and each record by
a newline character.
• Final data gets written to HDFS.

Virtual Box
• VirtualBox is opensource software for virtualizing the X86 computing
architecture.
• It acts as a hypervisor, creating a VM (Virtual Machine) in which the
user can run another OS (operating system).
• The operating system in which VirtualBox runs is called the "host" OS.
• The operating system running in the VM is called the "guest" OS.
VirtualBox supports Windows, Linux, or macOS as its host OS.

• Why Is VirtualBox Useful?
• One:
• VirtualBox allows you to run more than one operating system at a
time.
• This way, you can run software written for one operating system on
another (for example, Windows software on Linux or a Mac) without
having to reboot to use it (as would be needed if you used
partitioning and dual-booting).

• Two:
• By using a VirtualBox feature called “snapshots”, you can save a
particular state of a virtual machine and revert back to that state, if
necessary.
• This way, you can freely experiment with a computing environment.
• If something goes wrong (e.g. after installing misbehaving software or
infecting the guest with a virus), you can easily switch back to a
previous snapshot and avoid the need of frequent backups and
restores.

• Three:
• Software vendors can use virtual machines to ship entire software
configurations. For example, installing a complete mail server solution
on a real machine can be a tedious task (think of rocket science!).
• With VirtualBox, such a complex setup (then often called an
“appliance”) can be packed into a virtual machine. Installing and
running a mail server becomes as easy as importing such an appliance
into VirtualBox.
• Along these same lines, I find the “clone” feature of virtual box just
awesome!

• Four:
• On an enterprise level, virtualization can significantly reduce
hardware and electricity costs.
• Most of the time, computers today only use a fraction of their
potential power and run with low average system loads.
• A lot of hardware resources as well as electricity is thereby wasted.
• So, instead of running many such physical computers that are only
partially used, one can pack many virtual machines onto a few
powerful hosts and balance the loads between them.

VirtualBox Terminology
• When dealing with virtualization, it helps towards oneself with a bit of
crucial terminology, especially the following terms:
• Host Operating System (Host OS):
• The operating system of the physical computer on which VirtualBox
was installed. There are versions of VirtualBox for Windows, Mac OS ,
Linux and Solaris hosts.
• Guest Operating System (Guest OS):
• The operating system that is running inside the virtual machine.

• Virtual Machine (VM):
• We’ve used this term often already. It is the special environment that
VirtualBox creates for your guest operating system while it is running.
In other words, you run your guest operating system “in” a VM.
Normally, a VM will be shown as a window on your computers
desktop, but depending on which of the various frontends of
VirtualBox you use, it can be displayed in full screen mode or
remotely on another computer.

Google App Engine
• Google App Engine is a Platform as a Service and cloud computing
platform for developing and hosting web applications in Google-
managed data centers.
• App Engine is a fully managed, serverless platform for developing and
hosting web applications at scale.
• You can choose from several popular languages, libraries, and
frameworks to develop your apps, then let App Engine take care of
provisioning servers and scaling your app instances based on demand
• The App Engine requires that apps be written in Java or Python, store
data in Google BigTable and use the Google query language.

• Google App Engine provides more infrastructure than other
scalable hosting services such as Amazon Elastic Compute
Cloud (EC2).
• The App Engine also eliminates some system administration
and developmental tasks to make it easier to write scalable
applications.
• Google App Engine is free up to a certain amount of resource
usage.
• Users exceeding the per-day or per-minute usage rates for
CPU resources, storage, number of API calls or requests and
concurrent requests can pay for more of these resources.

• Modern web applications
• Quickly reach customers and end users by deploying web
apps on App Engine.
• With zero-config deployments and zero server management,
App Engine allows you to focus on writing code.
• Plus, App Engine automatically scales to support sudden
traffic spikes without provisioning, patching, or monitoring.

• Features
• Popular languages
• Build your application in Node.js, Java, Ruby, C#, Go, Python, or PHP—
or bring your own language runtime.
• Open and flexible
• Custom runtimes allow you to bring any library and framework to App
Engine.

• Fully managed
• A fully managed environment lets you focus on code while App Engine
manages infrastructure concerns.
• Powerful application diagnostics
• Use Cloud Monitoring and Cloud Logging to monitor the health and
performance of your app and Cloud Debugger and Error Reporting to
diagnose and fix bugs quickly.
• Application versioning
• Easily host different versions of your app, easily create development, test,
staging, and production environments.

• Application security
• Help safeguard your application by defining access rules with App
Engine firewall and leverage managed SSL/TLS certificates by default
on your custom domain at no additional cost.

• Advantages of Google App Engine
• There are many advantages to the Google App Engine that helps to
take your app ideas to the next level. This includes:
• Infrastructure for Security
• Around the world, the Internet infrastructure that Google has is
probably the most secure. There is rarely any type of unauthorized
access to date as the application data and code are stored in highly
secure servers.

• Quick to Start
• With no product or hardware to purchase and maintain, you can
prototype and deploy the app to your users without taking much
time.
• Easy to Use
• Google App Engine (GAE) incorporates the tools that you need to
develop, test, launch, and update the applications.

• Scalability
• Regardless of the amount of data or number of users that your app
stores, the app engine can meet your needs by scaling up or down as
required.

• Performance and Reliability
• Google is among the leaders worldwide among global brands. So, when you discuss
performance and reliability you have to keep that in mind. In the past 15 years, the
company has created new benchmarks based on its services’ and products’
performance. The app engine provides the same reliability and performance as any
other Google product.
• Cost Savings
• You don’t have to hire engineers to manage your servers or to do that yourself. You
can invest the money saved into other parts of your business.
• Platform Independence
• You can move all your data to another environment without any difficulty as there are
not many dependencies on the app engine platform.

• Open Stack
• OpenStack is a free open standard cloud computing platform, mostly
deployed as infrastructure-as-a-service in both public and private
clouds where virtual servers and other resources are made available
to users.
• OpenStack is a set of software tools for building and managing cloud
computing platforms for public and private clouds.
• OpenStack is managed by the OpenStack Foundation, a non-profit
that oversees both development and community-building around the
project.

• Introduction to OpenStack
• OpenStack lets users deploy virtual machines and other instances that
handle different tasks for managing a cloud environment on the fly.
• It makes horizontal scaling easy, which means that tasks that benefit
from running concurrently can easily serve more or fewer users on
the fly by just spinning up more instances.
• For example, a mobile application that needs to communicate with a
remote server might be able to divide the work of communicating
with each user across many different instances, all communicating
with one another but scaling quickly and easily as the application
gains more users.

• And most importantly, OpenStack is open source software, which
means that anyone who chooses to can access the source code, make
any changes or modifications they need, and freely share these
changes.
• It also means that OpenStack has the benefit of thousands of
developers all over the world working in tandem to develop the
strongest, most robust, and most secure product that they can.

• How is OpenStack used in a cloud environment?
• The cloud is all about providing computing for end users in a remote
environment, where the actual software runs as a service on reliable and
scalable servers rather than on each end-user's computer.
• Cloud computing can refer to a lot of different things, but typically the
industry talks about running different items "as a service"—software,
platforms, and infrastructure.
• OpenStack is considered Infrastructure as a Service (IaaS).
• Providing infrastructure means that OpenStack makes it easy for users to
quickly add new instance, upon which other cloud components can run.
• Typically, the infrastructure then runs a "platform" upon which a
developer can create software applications that are delivered to the end
users.

• What are the components of OpenStack?
• Because of its open nature, anyone can add additional components to
OpenStack to help it to meet their needs.
• But the OpenStack community has collaboratively identified nine key
components that are a part of the "core" of OpenStack, which are
distributed as a part of any OpenStack system and officially
maintained by the OpenStack community.
• Nova is the primary computing engine behind OpenStack. It is used
for deploying and managing large numbers of virtual machines and
other instances to handle computing tasks.

• Swift is a storage system for objects and files.
• The OpenStack Object Store project, known as Swift, offers cloud
storage software so that you can store and retrieve lots of data with a
simple API.
• It's built for scale and optimized for durability, availability, and
concurrency across the entire data set.
• Swift is ideal for storing unstructured data that can grow without
bound.

• Cinder is a block storage component, which is more analogous to the
traditional notion of a computer being able to access specific
locations on a disk drive. This more traditional way of accessing files
might be important in scenarios in which data access speed is the
most important consideration.
• Neutron provides the networking capability for OpenStack. It helps to
ensure that each of the components of an OpenStack deployment can
communicate with one another quickly and efficiently.

• Horizon is the dashboard behind OpenStack.
• It is the only graphical interface to OpenStack, so for users wanting to
give OpenStack a try, this may be the first component they actually
“see.”
• Developers can access all of the components of OpenStack
individually through an application programming interface (API), but
the dashboard provides system administrators a look at what is going
on in the cloud, and to manage it as needed.

• Keystone provides identity services for OpenStack. It is essentially a
central list of all of the users of the OpenStack cloud, mapped against
all of the services provided by the cloud, which they have permission
to use. It provides multiple means of access, meaning developers can
easily map their existing user access methods against Keystone.
• Glance provides image services to OpenStack. In this case, "images"
refers to images (or virtual copies) of hard disks.

• Prerequisite for minimum production deployment
• There are some basic requirements you’ll have to meet to deploy
OpenStack. Here are the prerequisites, drawn from the OpenStack
manual.
• Hardware: For OpenStack controller node, 12 GB RAM are needed as
well as a disk space of 30 GB to run OpenStack services. Two
SATA(Serial Advanced Technology Attachment) disks of 2 TB will be
necessary to store volumes used by instances. Communication with
compute nodes requires a network interface card (NIC) of 1 Gbps.

• Operating system (OS):
• OpenStack supports the following operating systems: Debian, Fedora,
Red Hat Enterprise Linux (RHEL), openSUSE, SLES Linux Enterprise
Server and Ubuntu.

• Federation in the Cloud
• Cloud federation is the practice of interconnecting
the cloud computing environments of two or more service providers
for the purpose of load balancing traffic and accommodating spikes in
demand. Cloud federation requires one provider to wholesale or rent
computing resources to another cloud provider.
• “Cloud federation manages consistency and access controls when two
or more independent geographically distinct Clouds share either
authentication, files, computing resources, command and control or
access to storage resources.”

• Cloud federation introduces additional issues that have to be
addressed in order to provide a secure environment in which to move
applications and services among a collection of federated providers.
• Baseline security needs to be guaranteed across all cloud vendors that
are part of the federation.

• An interesting aspect is represented by the management of the digital
identity across diverse organizations, security domains, and
application platforms.
• In particular, the term federated identity management refers to
standards-based approaches for handling authentication, single sign-
on (SSO), role-based access control, and session management in a
federated environment .

• No matter the specific protocol and framework, two main approaches can be
considered:
• Centralized federation model
• This is the approach taken by several identity federation standards. It
distinguishes two operational roles in an SSO transaction: the identity
provider and the service provider.
• Claim-based model
• This approach addresses the problem of user authentication from a different
perspective and requires users to provide claims answering who they are and
what they can do in order to access content or complete a transaction.

• The first model is currently used today; the second constitutes a
future vision for identity management in the cloud.
• Digital identity management constitutes a fundamental aspect of
security management in a cloud federation.
• To transparently perform operations across different administrative
domains, it is of mandatory importance to have a robust framework
for authentication and authorization, and federated identity
management addresses this issue.
• Federated identity management allows us to tie together the
computing stacks of different vendors and present them as a single
environment to users from a security point of view.

OpenNebula:
• OpenNebula is a cloud computing platform for managing
heterogeneous distributed data center infrastructures.
• The OpenNebula platform manages a data center's virtual
infrastructure, to build private, public and hybrid implementations of
Infrastructure as a Service.

• Much research work has been developed around OpenNebula.
• For example, the University of Chicago has come up with an advance
reservation system called Haizea Lease Manager.
• IBM Haifa has developed a policy-driven probabilistic admission
control and dynamic placement optimization for site level
management policies called the RESERVOIR Policy Engine
• Nephele is an SLA-driven automatic service management tool
developed by Telefonica and Virtual Cluster Tool for atomic cluster
management with versioning with multiple transport protocols from
CRS4 Distributed Computing Group.

Development
• OpenNebula follows a rapid release cycle to improve user satisfaction
by rapidly delivering features and innovations based on user
requirements and feedback.
• In other words, giving customers what they want more quickly, in
smaller increments, while additionally increasing technical quality.
• Major upgrades generally occur every 3-5 years and each upgrade
generally has 3-5 updates.

• Cloud Federations and Server Coalitions
• In large-scale systems, coalition formation supports more
effective use of resources, as well as convenient means to access
these resources.
• It is therefore not surprising that coalition formation for
computational grids has been investigated in the past.
• The interest in grid computing is fading away, while cloud
computing is widely accepted today and its adoption by more
and more institutions and individuals seems to be guaranteed at
least for the foreseeable future.

• Two classes of applications of cloud coalitions are reported in the
literature:
• 1.Coalitions among CSPs for the formation of cloud federations. A cloud
federation is an infrastructure allowing a group of CSPs to share
resources; the goal is to balance the load and improve system reliability.
• 2.Coalitions among the servers of a data center. The goal is to assemble
a pool of resources larger than the ones available from a single server.
• In recent years the number of CSPs has increased significantly. The
question if they should cooperate to share their resources led to the
idea of cloud federations, groups of CSPs who have agreed on a set of
common standards and are able to share their resources.

• Cloud coalition formation raises a number of technical, as well as
nontechnical problems.
• Cloud federations require a set of standards.
• The cloud computing landscape is still evolving and an early
standardization may slowdown and negatively affects the adoption of
new ideas and technologies.
• At the same time, CSPs want to maintain their competitive
advantages by closely guarding the details of their internal algorithms
and protocols.

• Four Levels of Federation
• Creating a cloud federation involves research and development at
different levels: conceptual, logical and operational, and
infrastructural.

• Figure provides a comprehensive view of the challenges faced in
designing and implementing an organizational structure that
coordinates together, cloud services that belong to different
administrative domains and makes them operate within a
context of a single unified service middleware.
• Each cloud federation level presents different challenges and
operates at a different layer of the IT stack.
• It then requires the use of different approaches and
technologies.

• CONCEPTUAL LEVEL
• The conceptual level addresses the challenges in presenting a cloud
federation as a favourable solution.
• In this level it is important to clearly identify the advantages for either
service providers or service consumers in joining a federation.
• To describe the new opportunities that a federated environment
creates.

• Elements of concern at this level are:
• Motivations for cloud providers to join a federation.
• Motivations for service consumers to influence a federation.
• Advantages for providers in leasing their services to other providers.
• Responsibilities of providers once they have joined the federation.
• Trust agreements between providers.
• Transparency versus consumers.
• Among these aspects, the most relevant are the motivations of both
service providers and consumers in joining a federation.

• LOGICAL & OPERATIONAL LEVEL
• The logical and operational level of a federated cloud identifies and
addresses the challenges in creating a framework that enables the
aggregation of providers that belong to different administrative domains
• At this level, policies and rules for interoperation are defined.
• Moreover, this is the layer at which decisions are made as to how and when
to lease a service to—or to leverage a service from— another provider.
• The logical component defines a context in which agreements among
providers are settled and services are conveyed, whereas the operational
component characterizes and shapes the dynamic behaviour of the
federation as a result of the single providers’ choices.

It is important at this level to address the following challenges:
• How should a federation be represented?
• How should we model and represent a cloud service, a cloud
provider, or an agreement?
• How should we define the rules and policies that allow providers
to join a federation?
• What are the mechanisms in place for settling agreements
among providers?
• What are provider’s responsibilities with respect to each other?

• When should providers and consumers take advantage of the
federation?
• Which kinds of services are more likely to be leased or bought?
• How should we price resources that are leased, and which
fraction of resources should we lease?

• INFRASTRUCTURE LEVEL
• The infrastructural level addresses the technical challenges involved
in enabling heterogeneous cloud computing systems to interoperate
seamlessly.
• It deals with the technology barriers that keep separate cloud
computing systems belonging to different administrative domains.
• By having standardized protocols and interfaces, these barriers can be
overcome.

At this level it is important to address the following issues:
• What kind of standards should be used?
• How should design interfaces and protocols be designed for interoperation?
• Which are the technologies to use for interoperation?
• How can we realize a software system, design platform components, and
services enabling interoperability?
Interoperation and composition among different cloud computing vendors is
possible only by means of open standards and interfaces. Moreover,
interfaces and protocols change considerably at each layer of the Cloud
Computing Reference Model.

Future of Federation
• The federated cloud model is a force for real democratization in the
cloud market.
• It’s how businesses will be able to use local cloud providers to
connect with customers, partners and employees anywhere in the
world.
• It’s how end users will finally get to realize the promise of the cloud.
• And, it’s how data center operators and other service providers will
finally be able to compete with, and beat, today’s so-called global
cloud providers.

• The future of cloud computing as one big public cloud.
• Others believe that enterprises will ultimately build a single large
cloud to host all their corporate services.
• This is, of course, because the benefit of cloud computing is
dependent on large – very large – scale infrastructure, which provides
administrators and service administrators and consumers the ability
for ease of deployment, self service, elasticity, resource pooling and
economies of scale.
• However, as cloud continues to evolve – so do the services being
offered.

• Cloud Services & Hybrid Clouds
• Services are now able to reach a wider range of consumers, partners,
competitors and public audiences.
• It is also clear that storage, compute power, streaming, analytics and
other advanced services are best served when they are in an
environment tailored for the proficiency of that service.

• One method of addressing the need of these service environments is
through the advent of hybrid clouds.
• Hybrid clouds, by definition, are composed of multiple distinct cloud
infrastructures connected in a manner that enables services and data
access across the combined infrastructure.
• The intent is to leverage the additional benefits that hybrid cloud
offers without disrupting the traditional cloud benefits.
• While hybrid cloud benefits come through the ability to distribute the
work stream, the goal is to continue to realize the ability for managing
peaks in demand, to quickly make services available and capitalize on
new business opportunities.

• The Solution: Federation
• Federation creates a hybrid cloud environment with an increased focus
on maintaining the integrity of corporate policies and data integrity.
• Think of federation as a pool of clouds connected through a channel of
gateways;
• gateways which can be used to optimize a cloud for a service or set of
specific services.
• Such gateways can be used to segment service audiences or to limit
access to specific data sets.
• In essence, federation has the ability for enterprises to service their
audiences with economy of scale without exposing critical applications or
vital data through weak policies or vulnerabilities.

• Many would raise the question: if Federation creates multiples of
clouds, doesn’t that mean cloud benefits are diminished?
• I believe the answer is no, due to the fact that a fundamental change
has transformed enterprises through the original adoption of cloud
computing, namely the creation of a flexible environment able to
adapt rapidly to changing needs based on policy and automation.
• Cloud end-users are often tied to a unique cloud provider, because of
the different APIs, image formats, and access methods exposed by
different providers that make very difficult for an average user to
move its applications from one cloud to another, so leading to a
vendor lock-in problem.

• Many SMEs have their own on-premise private cloud infrastructures
to support the internal computing necessities and workloads. These
infrastructures are often over-sized to satisfy peak demand periods,
and avoid performance slow-down. Hybrid cloud (or cloud bursting)
model is a solution to reduce the on-premise infrastructure size, so
that it can be dimensioned for an average load, and it is
complemented with external resources from a public cloud provider
to satisfy peak demands.

• Many big companies (e.g. banks, hosting companies, etc.) and also
many large institutions maintain several distributed data-centers or
server-farms, for example to serve to multiple geographically
distributed offices, to implement HA, or to guarantee server proximity
to the end user.
• Resources and networks in these distributed data-centers are usually
configured as non-cooperative separate elements.

• Many educational and research centers often deploy their own
computing infrastructures, that usually do not cooperate with other
institutions, except in same punctual situations (e.g. in joint projects or
initiatives).
• Many times, even different departments within the same institution
maintain their own non-cooperative infrastructures.This Study Group
will evaluate the main challenges to enable the provision of federated
cloud infrastructures, with special emphasis on inter-cloud networking
and security issues:
• Security and Privacy
• Interoperability and Portability
• Performance and Networking Cost

• The first key action aims at “Cutting through the Jungle of Standards”
to help the adoption of cloud computing by encouraging compliance
of cloud services with respect to standards and thus providing
evidence of compliance to legal and audit obligations.
• These standards aim to avoid customer lock in by promoting
interoperability, data portability and reversibility.

• The second key action “Safe and Fair Contract Terms and Conditions”
aims to protect the cloud consumer from insufficiently specific and
balanced contracts with cloud providers that do not “provide for
liability for data integrity, confidentiality or service continuity”.
• The cloud consumer is often presented with "take-it-or-leave-it
standard contracts that might be cost-saving for the provider but is
often undesirable for the user”.

• Interface: Various cloud service providers have different APIs, pricing
models and cloud infrastructure.
• Open cloud computing interface is necessary to be initiated to provide
a common application programming interface for multiple cloud
environments.
• The simplest solution is to use a software component that allows the
federated system to connect with a given cloud environment.

• Trusted Servers
• In order to make it easier to find people on other servers we introduced the
concept of “trusted servers” as one of our last steps.
• This allows administrator to define other servers they trust.
• If two servers trust each other they will sync their user lists.
• This way the share dialogue can auto-complete not only local users but also
users on other trusted servers.
• The administrator can decide to define the lists of trusted servers manually or
allow the server to auto add every other server to which at least one
federated share was successfully created.
• This way it is possible to let your cloud server learn about more and more
other servers over time, connect with them and increase the network of
trusted servers.

• Open Challenges: where we’re taking Federated Cloud Sharing
• Of course there are still many areas to improve.
• For example the way you can discover users on different server to
share with them, for which we’re working on a global, shared address
book solution.
• Another point is that at the moment this is limited to sharing files.
• A logical next step would be to extend this to many other areas like
address books, calendars and to real-time text, voice and video
communication and we are, of course, planning for that.

Cloud Computing - Cloud Technologies and Advancements

Recommended

More Related Content

What's hot (20)

Similar to Cloud Computing - Cloud Technologies and Advancements (20)

More from Sathishkumar Jaganathan (11)

Recently uploaded (20)

Cloud Computing - Cloud Technologies and Advancements