SlideShare a Scribd company logo
UNIT V
CLOUD TECHNOLOGIES AND ADVANCEMENTS
• Hadoop
• Apache Hadoop is an open source software framework used to develop data
processing applications which are executed in a distributed computing
environment.
• Applications built using HADOOP are run on large data sets distributed across
clusters of commodity computers.
• Commodity computers are mainly useful for achieving greater computational
power at low cost.
• Similar to data residing in a local file system of a personal computer system, in
Hadoop, data resides in a distributed file system which is called as a Hadoop
Distributed File system.
• Apache Hadoop consists of two sub-projects –
• Hadoop MapReduce:
• MapReduce is a computational model and software
framework for writing applications which are run on Hadoop.
• These MapReduce programs are capable of processing
enormous data in parallel on large clusters of computation
nodes.
• HDFS (Hadoop Distributed File System):
• HDFS takes care of the storage part of Hadoop applications.
• MapReduce applications consume data from HDFS.
• HDFS creates multiple replicas of data blocks and distributes
them on compute nodes in a cluster.
• This distribution enables reliable and extremely rapid
computations.
Cloud Computing - Cloud Technologies and Advancements
• NameNode and DataNodes
• HDFS has a master/slave architecture.
• A HDFS cluster consists of a single NameNode, a master server that
manages the file system namespace and regulates access to files by
clients.
• In addition, there are a number of DataNodes, usually one per node in
the cluster, which manage storage attached to the nodes that they run
on.
• HDFS exposes a file system namespace and allows user data to be
stored in files.
Cloud Computing - Cloud Technologies and Advancements
• Internally, a file is split into one or more blocks and these blocks are
stored in a set of DataNodes.
• The NameNode executes file system namespace operations like
opening, closing, and renaming files and directories.
• It also determines the mapping of blocks to DataNodes.
• The DataNodes are responsible for serving read and write requests
from the file system’s clients.
• The DataNodes also perform block creation, deletion, and replication
upon instruction from the NameNode.
Functions of NameNode:
• It is the master daemon that maintains and manages the DataNodes (slave nodes)
• It records the metadata of all the files stored in the cluster, e.g. The location of
blocks stored, the size of the files, permissions, hierarchy, etc. There are two files
associated with the metadata:
• FsImage: It contains the complete state of the file system namespace since the start of the
NameNode.
• EditLogs: It contains all the recent modifications made to the file system with respect to the
most recent FsImage.
• It records each change that takes place to the file system metadata. For example, if
a file is deleted in HDFS, the NameNode will immediately record this in the EditLog.
• It regularly receives a Heartbeat and a block report from all the
DataNodes in the cluster to ensure that the DataNodes are live.
• It keeps a record of all the blocks in HDFS and in which nodes these
blocks are located.
• The NameNode is also responsible to take care of
the replication factor of all the blocks.
• In case of the DataNode failure, the NameNode chooses new
DataNodes for new replicas, balance disk usage and manages the
communication traffic to the DataNodes.
• The NameNode and DataNode are pieces of software designed to run on commodity
machines.
• These machines typically run a GNU/Linux operating system (OS). HDFS is built using
the Java language;
• Any machine that supports Java can run the NameNode or the DataNode software.
• A typical deployment has a dedicated machine that runs only the NameNode
software.
• Each of the other machines in the cluster runs one instance of the DataNode
software.
• The architecture does not preclude running multiple DataNodes on the same
machine but in a real deployment that is rarely the case.
DataNodes
• DataNodes are the slave nodes in HDFS. Unlike NameNode, DataNode is a
commodity hardware, that is, a non-expensive system which is not of high
quality or high-availability. The DataNode is a block server that stores the data in
the local file ext3 or ext4.
• Functions of DataNode:
• These are slave daemons or process which runs on each slave machine.
• The actual data is stored on DataNodes.
• The DataNodes perform the low-level read and write requests from the file
system’s clients.
• They send heartbeats to the NameNode periodically to report the overall health
of HDFS, by default, this frequency is set to 3 seconds
• The existence of a single NameNode in a cluster greatly simplifies the
architecture of the system.
• The NameNode is the arbitrator and repository for all HDFS metadata.
• The system is designed in such a way that user data never flows
through the NameNode.
• Secondary NameNode:
• Apart from these two daemons, there is a third daemon or a process
called Secondary NameNode. The Secondary NameNode works
concurrently with the primary NameNode as a helper daemon. And
don’t be confused about the Secondary NameNode being a backup
NameNode because it is not.
• Functions of Secondary NameNode:
• The Secondary NameNode is one which constantly reads all the file
systems and metadata from the RAM of the NameNode and writes it
into the hard disk or the file system.
• It is responsible for combining the EditLogs with FsImage from the
NameNode.
• It downloads the EditLogs from the NameNode at regular intervals
and applies to FsImage. The new FsImage is copied back to the
NameNode, which is used whenever the NameNode is started the
next time
• Blocks:
• Now, as we know that the data in HDFS is scattered across the DataNodes
as blocks. Let’s have a look at what is a block and how is it formed?
• Blocks are the nothing but the smallest continuous location on your hard
drive where data is stored. In general, in any of the File System, you store
the data as a collection of blocks. Similarly, HDFS stores each file as
blocks which are scattered throughout the Apache Hadoop cluster. The
default size of each block is 128 MB in Apache Hadoop 2.x (64 MB in
Apache Hadoop 1.x) which you can configure as per your requirement.
• t is not necessary that in HDFS, each file is stored in exact multiple of
the configured block size (128 MB, 256 MB etc.). Let’s take an
example where I have a file “example.txt” of size 514 MB as shown in
above figure. Suppose that we are using the default configuration of
block size, which is 128 MB. Then, how many blocks will be created?
5, Right. The first four blocks will be of 128 MB. But, the last block will
be of 2 MB size only.
• The File System Namespace
• HDFS supports a traditional hierarchical file organization.
• A user or an application can create directories and store files inside
these directories.
• The file system namespace hierarchy is similar to most other existing
file systems;
• one can create and remove files, move a file from one directory to
another, or rename a file.
• HDFS supports user access permissions.
• While HDFS follows naming convention of the FileSystem, some paths
and names (e.g. /.reserved and .snapshot ) are reserved.
• The NameNode maintains the file system namespace.
• Any change to the file system namespace or its properties is recorded
by the NameNode.
• An application can specify the number of replicas of a file that should
be maintained by HDFS.
• The number of copies of a file is called the replication factor of that
file. This information is stored by the NameNode.
• Data Replication
• HDFS is designed to reliably store very large files across machines in a
large cluster.
• It stores each file as a sequence of blocks. The blocks of a file are
replicated for fault tolerance.
• The block size and replication factor are configurable per file.
• All blocks in a file except the last block are the same size
• HDFS provides a reliable way to store huge data in a distributed
environment as data blocks. The blocks are also replicated to provide fault
tolerance. The default replication factor is 3 which is again configurable.
Cloud Computing - Cloud Technologies and Advancements
• Files in HDFS are write-once (except for appends and truncates) and
have strictly one writer at any time.
• The NameNode makes all decisions regarding replication of blocks.
• It periodically receives a Heartbeat and a Blockreport from each of
the DataNodes in the cluster.
• Receipt of a Heartbeat implies that the DataNode is functioning
properly.
• A Blockreport contains a list of all blocks on a DataNode
Cloud Computing - Cloud Technologies and Advancements
• Replication
• The placement of replicas is critical to HDFS reliability and
performance.
• Optimizing replica placement distinguishes HDFS from most other
distributed file systems.
• This is a feature that needs lots of tuning and experience.
• The purpose of a rack-aware replica placement policy is to improve
data reliability, availability, and network bandwidth utilization.
• Now, the following protocol will be followed whenever the data is
written into HDFS:
• At first, the HDFS client will reach out to the NameNode for a Write
Request against the two blocks, say, Block A & Block B.
• The NameNode will then grant the client the write permission and
will provide the IP addresses of the DataNodes where the file blocks
will be copied eventually.
• The selection of IP addresses of DataNodes is purely randomized
based on availability, replication factor and rack awareness that we
have discussed earlier.
• Let’s say the replication factor is set to default i.e. 3. Therefore, for
each block the NameNode will be providing the client a list of (3) IP
addresses of DataNodes. The list will be unique for each block.
• Suppose, the NameNode provided following lists of IP addresses to
the client:
• For Block A, list A = {IP of DataNode 1, IP of DataNode 4, IP of DataNode 6}
• For Block B, set B = {IP of DataNode 3, IP of DataNode 7, IP of DataNode 9}
• Each block will be copied in three different DataNodes to maintain the
replication factor consistent throughout the cluster.
• Replica Selection
• To minimize global bandwidth consumption and read latency, HDFS
tries to satisfy a read request from a replica that is closest to the
reader.
• If HDFS cluster spans multiple data centers, then a replica that is
resident in the local data center is preferred over any remote replica.
• Files in HDFS are write-once .
• The NameNode makes all decisions regarding replication of blocks.
• It periodically receives a Heartbeat and a Blockreport from each of
the DataNodes in the cluster.
• Receipt of a Heartbeat implies that the DataNode is functioning
properly.
• A Blockreport contains a list of all blocks on a DataNodes
MapReduce
• Hadoop MapReduce (Hadoop Map/Reduce) is a software framework
for distributed processing of large data sets on computing clusters.
• It is a sub-project of the Apache Hadoop project.
• Apache Hadoop is an open-source framework that allows to store and
process big data in a distributed environment across clusters of
computers using simple programming models.
• MapReduce is the core component for data processing in Hadoop
framework.
• Mapreduce helps to split the input data set into a number of parts
and run a program on all data parts parallel at once.
• The term MapReduce refers to two separate and distinct tasks.
• The first is the map operation, takes a set of data and converts it into
another set of data, where individual elements are broken down into
tuples (key/value pairs).
• The reduce operation combines those data tuples based on the key
and accordingly modifies the value of the key.
Cloud Computing - Cloud Technologies and Advancements
• Map Task
The Map task run in the following phases:-
a. RecordReader
• The recordreader transforms the input split into records.
• It provides the data to the mapper function in key-value pairs.
• Usually, the key is the positional information and value is the data that
comprises the record.
Cloud Computing - Cloud Technologies and Advancements
Types of Hadoop RecordReader in
MapReduce
• The RecordReader instance is defined by the InputFormat.
• By default, it uses TextInputFormat for converting data into a key-
value pair. TextInputFormat provides 2 types of RecordReaders:
i. LineRecordReader
ii. SequenceFileRecordReader
• b. Map
• In this phase, the mapper which is the user-defined function
processes the key-value pair from the recordreader.
• It produces zero or multiple intermediate key-value pairs.
• The key is usually the data on which the reducer function does the
grouping operation.
• And value is the data which gets aggregated to get the final result in
the reducer function.
• c. Combiner
• The combiner is actually a localized reducer which groups the data in the
map phase. It is optional.
• Combiner takes the intermediate data from the mapper and aggregates
them.
• It does so within the small scope of one mapper.
• In many situations, this decreases the amount of data needed to move
over the network. For example, moving (Hello World, 1) three times
consumes more network bandwidth than moving
(Hello World, 3).
Cloud Computing - Cloud Technologies and Advancements
Reduce Task
• The various phases in reduce task are as follows:
i. Shuffle and Sort
• The reducer starts with shuffle and sort step.
• This step sorts the individual data pieces into a large data list.
• The purpose of this sort is to collect the equivalent keys together.
• ii. Reduce
• The reducer performs the reduce function once per key
grouping.
• The framework passes the function key and an iterator object
containing all the values pertaining to the key.
• We can write reducer to filter, aggregate and combine data in a
number of different ways.
• Once the reduce function gets finished it gives zero or more key-
value pairs to the output format.
• iii. Output Format
• This is the final step.
• It takes the key-value pair from the reducer and writes it to the file by
record writer.
• By default, it separates the key and value by a tab and each record by
a newline character.
• Final data gets written to HDFS.
Virtual Box
• VirtualBox is opensource software for virtualizing the X86 computing
architecture.
• It acts as a hypervisor, creating a VM (Virtual Machine) in which the
user can run another OS (operating system).
• The operating system in which VirtualBox runs is called the "host" OS.
• The operating system running in the VM is called the "guest" OS.
VirtualBox supports Windows, Linux, or macOS as its host OS.
• Why Is VirtualBox Useful?
• One:
• VirtualBox allows you to run more than one operating system at a
time.
• This way, you can run software written for one operating system on
another (for example, Windows software on Linux or a Mac) without
having to reboot to use it (as would be needed if you used
partitioning and dual-booting).
• Two:
• By using a VirtualBox feature called “snapshots”, you can save a
particular state of a virtual machine and revert back to that state, if
necessary.
• This way, you can freely experiment with a computing environment.
• If something goes wrong (e.g. after installing misbehaving software or
infecting the guest with a virus), you can easily switch back to a
previous snapshot and avoid the need of frequent backups and
restores.
• Three:
• Software vendors can use virtual machines to ship entire software
configurations. For example, installing a complete mail server solution
on a real machine can be a tedious task (think of rocket science!).
• With VirtualBox, such a complex setup (then often called an
“appliance”) can be packed into a virtual machine. Installing and
running a mail server becomes as easy as importing such an appliance
into VirtualBox.
• Along these same lines, I find the “clone” feature of virtual box just
awesome!
• Four:
• On an enterprise level, virtualization can significantly reduce
hardware and electricity costs.
• Most of the time, computers today only use a fraction of their
potential power and run with low average system loads.
• A lot of hardware resources as well as electricity is thereby wasted.
• So, instead of running many such physical computers that are only
partially used, one can pack many virtual machines onto a few
powerful hosts and balance the loads between them.
VirtualBox Terminology
• When dealing with virtualization, it helps towards oneself with a bit of
crucial terminology, especially the following terms:
• Host Operating System (Host OS):
• The operating system of the physical computer on which VirtualBox
was installed. There are versions of VirtualBox for Windows, Mac OS ,
Linux and Solaris hosts.
• Guest Operating System (Guest OS):
• The operating system that is running inside the virtual machine.
• Virtual Machine (VM):
• We’ve used this term often already. It is the special environment that
VirtualBox creates for your guest operating system while it is running.
In other words, you run your guest operating system “in” a VM.
Normally, a VM will be shown as a window on your computers
desktop, but depending on which of the various frontends of
VirtualBox you use, it can be displayed in full screen mode or
remotely on another computer.
Google App Engine
• Google App Engine is a Platform as a Service and cloud computing
platform for developing and hosting web applications in Google-
managed data centers.
• App Engine is a fully managed, serverless platform for developing and
hosting web applications at scale.
• You can choose from several popular languages, libraries, and
frameworks to develop your apps, then let App Engine take care of
provisioning servers and scaling your app instances based on demand
• The App Engine requires that apps be written in Java or Python, store
data in Google BigTable and use the Google query language.
• Google App Engine provides more infrastructure than other
scalable hosting services such as Amazon Elastic Compute
Cloud (EC2).
• The App Engine also eliminates some system administration
and developmental tasks to make it easier to write scalable
applications.
• Google App Engine is free up to a certain amount of resource
usage.
• Users exceeding the per-day or per-minute usage rates for
CPU resources, storage, number of API calls or requests and
concurrent requests can pay for more of these resources.
• Modern web applications
• Quickly reach customers and end users by deploying web
apps on App Engine.
• With zero-config deployments and zero server management,
App Engine allows you to focus on writing code.
• Plus, App Engine automatically scales to support sudden
traffic spikes without provisioning, patching, or monitoring.
• Features
• Popular languages
• Build your application in Node.js, Java, Ruby, C#, Go, Python, or PHP—
or bring your own language runtime.
• Open and flexible
• Custom runtimes allow you to bring any library and framework to App
Engine.
• Fully managed
• A fully managed environment lets you focus on code while App Engine
manages infrastructure concerns.
• Powerful application diagnostics
• Use Cloud Monitoring and Cloud Logging to monitor the health and
performance of your app and Cloud Debugger and Error Reporting to
diagnose and fix bugs quickly.
• Application versioning
• Easily host different versions of your app, easily create development, test,
staging, and production environments.
• Application security
• Help safeguard your application by defining access rules with App
Engine firewall and leverage managed SSL/TLS certificates by default
on your custom domain at no additional cost.
• Advantages of Google App Engine
• There are many advantages to the Google App Engine that helps to
take your app ideas to the next level. This includes:
• Infrastructure for Security
• Around the world, the Internet infrastructure that Google has is
probably the most secure. There is rarely any type of unauthorized
access to date as the application data and code are stored in highly
secure servers.
• Quick to Start
• With no product or hardware to purchase and maintain, you can
prototype and deploy the app to your users without taking much
time.
• Easy to Use
• Google App Engine (GAE) incorporates the tools that you need to
develop, test, launch, and update the applications.
• Scalability
• Regardless of the amount of data or number of users that your app
stores, the app engine can meet your needs by scaling up or down as
required.
• Performance and Reliability
• Google is among the leaders worldwide among global brands. So, when you discuss
performance and reliability you have to keep that in mind. In the past 15 years, the
company has created new benchmarks based on its services’ and products’
performance. The app engine provides the same reliability and performance as any
other Google product.
• Cost Savings
• You don’t have to hire engineers to manage your servers or to do that yourself. You
can invest the money saved into other parts of your business.
• Platform Independence
• You can move all your data to another environment without any difficulty as there are
not many dependencies on the app engine platform.
• Open Stack
• OpenStack is a free open standard cloud computing platform, mostly
deployed as infrastructure-as-a-service in both public and private
clouds where virtual servers and other resources are made available
to users.
• OpenStack is a set of software tools for building and managing cloud
computing platforms for public and private clouds.
• OpenStack is managed by the OpenStack Foundation, a non-profit
that oversees both development and community-building around the
project.
• Introduction to OpenStack
• OpenStack lets users deploy virtual machines and other instances that
handle different tasks for managing a cloud environment on the fly.
• It makes horizontal scaling easy, which means that tasks that benefit
from running concurrently can easily serve more or fewer users on
the fly by just spinning up more instances.
• For example, a mobile application that needs to communicate with a
remote server might be able to divide the work of communicating
with each user across many different instances, all communicating
with one another but scaling quickly and easily as the application
gains more users.
• And most importantly, OpenStack is open source software, which
means that anyone who chooses to can access the source code, make
any changes or modifications they need, and freely share these
changes.
• It also means that OpenStack has the benefit of thousands of
developers all over the world working in tandem to develop the
strongest, most robust, and most secure product that they can.
• How is OpenStack used in a cloud environment?
• The cloud is all about providing computing for end users in a remote
environment, where the actual software runs as a service on reliable and
scalable servers rather than on each end-user's computer.
• Cloud computing can refer to a lot of different things, but typically the
industry talks about running different items "as a service"—software,
platforms, and infrastructure.
• OpenStack is considered Infrastructure as a Service (IaaS).
• Providing infrastructure means that OpenStack makes it easy for users to
quickly add new instance, upon which other cloud components can run.
• Typically, the infrastructure then runs a "platform" upon which a
developer can create software applications that are delivered to the end
users.
• What are the components of OpenStack?
• Because of its open nature, anyone can add additional components to
OpenStack to help it to meet their needs.
• But the OpenStack community has collaboratively identified nine key
components that are a part of the "core" of OpenStack, which are
distributed as a part of any OpenStack system and officially
maintained by the OpenStack community.
• Nova is the primary computing engine behind OpenStack. It is used
for deploying and managing large numbers of virtual machines and
other instances to handle computing tasks.
• Swift is a storage system for objects and files.
• The OpenStack Object Store project, known as Swift, offers cloud
storage software so that you can store and retrieve lots of data with a
simple API.
• It's built for scale and optimized for durability, availability, and
concurrency across the entire data set.
• Swift is ideal for storing unstructured data that can grow without
bound.
• Cinder is a block storage component, which is more analogous to the
traditional notion of a computer being able to access specific
locations on a disk drive. This more traditional way of accessing files
might be important in scenarios in which data access speed is the
most important consideration.
• Neutron provides the networking capability for OpenStack. It helps to
ensure that each of the components of an OpenStack deployment can
communicate with one another quickly and efficiently.
• Horizon is the dashboard behind OpenStack.
• It is the only graphical interface to OpenStack, so for users wanting to
give OpenStack a try, this may be the first component they actually
“see.”
• Developers can access all of the components of OpenStack
individually through an application programming interface (API), but
the dashboard provides system administrators a look at what is going
on in the cloud, and to manage it as needed.
• Keystone provides identity services for OpenStack. It is essentially a
central list of all of the users of the OpenStack cloud, mapped against
all of the services provided by the cloud, which they have permission
to use. It provides multiple means of access, meaning developers can
easily map their existing user access methods against Keystone.
• Glance provides image services to OpenStack. In this case, "images"
refers to images (or virtual copies) of hard disks.
• Prerequisite for minimum production deployment
• There are some basic requirements you’ll have to meet to deploy
OpenStack. Here are the prerequisites, drawn from the OpenStack
manual.
• Hardware: For OpenStack controller node, 12 GB RAM are needed as
well as a disk space of 30 GB to run OpenStack services. Two
SATA(Serial Advanced Technology Attachment) disks of 2 TB will be
necessary to store volumes used by instances. Communication with
compute nodes requires a network interface card (NIC) of 1 Gbps.
• Operating system (OS):
• OpenStack supports the following operating systems: Debian, Fedora,
Red Hat Enterprise Linux (RHEL), openSUSE, SLES Linux Enterprise
Server and Ubuntu.
• Federation in the Cloud
• Cloud federation is the practice of interconnecting
the cloud computing environments of two or more service providers
for the purpose of load balancing traffic and accommodating spikes in
demand. Cloud federation requires one provider to wholesale or rent
computing resources to another cloud provider.
• “Cloud federation manages consistency and access controls when two
or more independent geographically distinct Clouds share either
authentication, files, computing resources, command and control or
access to storage resources.”
• Cloud federation introduces additional issues that have to be
addressed in order to provide a secure environment in which to move
applications and services among a collection of federated providers.
• Baseline security needs to be guaranteed across all cloud vendors that
are part of the federation.
• An interesting aspect is represented by the management of the digital
identity across diverse organizations, security domains, and
application platforms.
• In particular, the term federated identity management refers to
standards-based approaches for handling authentication, single sign-
on (SSO), role-based access control, and session management in a
federated environment .
• No matter the specific protocol and framework, two main approaches can be
considered:
• Centralized federation model
• This is the approach taken by several identity federation standards. It
distinguishes two operational roles in an SSO transaction: the identity
provider and the service provider.
• Claim-based model
• This approach addresses the problem of user authentication from a different
perspective and requires users to provide claims answering who they are and
what they can do in order to access content or complete a transaction.
• The first model is currently used today; the second constitutes a
future vision for identity management in the cloud.
• Digital identity management constitutes a fundamental aspect of
security management in a cloud federation.
• To transparently perform operations across different administrative
domains, it is of mandatory importance to have a robust framework
for authentication and authorization, and federated identity
management addresses this issue.
• Federated identity management allows us to tie together the
computing stacks of different vendors and present them as a single
environment to users from a security point of view.
OpenNebula:
• OpenNebula is a cloud computing platform for managing
heterogeneous distributed data center infrastructures.
• The OpenNebula platform manages a data center's virtual
infrastructure, to build private, public and hybrid implementations of
Infrastructure as a Service.
Cloud Computing - Cloud Technologies and Advancements
Cloud Computing - Cloud Technologies and Advancements
• Much research work has been developed around OpenNebula.
• For example, the University of Chicago has come up with an advance
reservation system called Haizea Lease Manager.
• IBM Haifa has developed a policy-driven probabilistic admission
control and dynamic placement optimization for site level
management policies called the RESERVOIR Policy Engine
• Nephele is an SLA-driven automatic service management tool
developed by Telefonica and Virtual Cluster Tool for atomic cluster
management with versioning with multiple transport protocols from
CRS4 Distributed Computing Group.
Development
• OpenNebula follows a rapid release cycle to improve user satisfaction
by rapidly delivering features and innovations based on user
requirements and feedback.
• In other words, giving customers what they want more quickly, in
smaller increments, while additionally increasing technical quality.
• Major upgrades generally occur every 3-5 years and each upgrade
generally has 3-5 updates.
• Cloud Federations and Server Coalitions
• In large-scale systems, coalition formation supports more
effective use of resources, as well as convenient means to access
these resources.
• It is therefore not surprising that coalition formation for
computational grids has been investigated in the past.
• The interest in grid computing is fading away, while cloud
computing is widely accepted today and its adoption by more
and more institutions and individuals seems to be guaranteed at
least for the foreseeable future.
• Two classes of applications of cloud coalitions are reported in the
literature:
• 1.Coalitions among CSPs for the formation of cloud federations. A cloud
federation is an infrastructure allowing a group of CSPs to share
resources; the goal is to balance the load and improve system reliability.
• 2.Coalitions among the servers of a data center. The goal is to assemble
a pool of resources larger than the ones available from a single server.
• In recent years the number of CSPs has increased significantly. The
question if they should cooperate to share their resources led to the
idea of cloud federations, groups of CSPs who have agreed on a set of
common standards and are able to share their resources.
• Cloud coalition formation raises a number of technical, as well as
nontechnical problems.
• Cloud federations require a set of standards.
• The cloud computing landscape is still evolving and an early
standardization may slowdown and negatively affects the adoption of
new ideas and technologies.
• At the same time, CSPs want to maintain their competitive
advantages by closely guarding the details of their internal algorithms
and protocols.
• Four Levels of Federation
• Creating a cloud federation involves research and development at
different levels: conceptual, logical and operational, and
infrastructural.
Cloud Computing - Cloud Technologies and Advancements
• Figure provides a comprehensive view of the challenges faced in
designing and implementing an organizational structure that
coordinates together, cloud services that belong to different
administrative domains and makes them operate within a
context of a single unified service middleware.
• Each cloud federation level presents different challenges and
operates at a different layer of the IT stack.
• It then requires the use of different approaches and
technologies.
• CONCEPTUAL LEVEL
• The conceptual level addresses the challenges in presenting a cloud
federation as a favourable solution.
• In this level it is important to clearly identify the advantages for either
service providers or service consumers in joining a federation.
• To describe the new opportunities that a federated environment
creates.
• Elements of concern at this level are:
• Motivations for cloud providers to join a federation.
• Motivations for service consumers to influence a federation.
• Advantages for providers in leasing their services to other providers.
• Responsibilities of providers once they have joined the federation.
• Trust agreements between providers.
• Transparency versus consumers.
• Among these aspects, the most relevant are the motivations of both
service providers and consumers in joining a federation.
• LOGICAL & OPERATIONAL LEVEL
• The logical and operational level of a federated cloud identifies and
addresses the challenges in creating a framework that enables the
aggregation of providers that belong to different administrative domains
• At this level, policies and rules for interoperation are defined.
• Moreover, this is the layer at which decisions are made as to how and when
to lease a service to—or to leverage a service from— another provider.
• The logical component defines a context in which agreements among
providers are settled and services are conveyed, whereas the operational
component characterizes and shapes the dynamic behaviour of the
federation as a result of the single providers’ choices.
It is important at this level to address the following challenges:
• How should a federation be represented?
• How should we model and represent a cloud service, a cloud
provider, or an agreement?
• How should we define the rules and policies that allow providers
to join a federation?
• What are the mechanisms in place for settling agreements
among providers?
• What are provider’s responsibilities with respect to each other?
• When should providers and consumers take advantage of the
federation?
• Which kinds of services are more likely to be leased or bought?
• How should we price resources that are leased, and which
fraction of resources should we lease?
• INFRASTRUCTURE LEVEL
• The infrastructural level addresses the technical challenges involved
in enabling heterogeneous cloud computing systems to interoperate
seamlessly.
• It deals with the technology barriers that keep separate cloud
computing systems belonging to different administrative domains.
• By having standardized protocols and interfaces, these barriers can be
overcome.
At this level it is important to address the following issues:
• What kind of standards should be used?
• How should design interfaces and protocols be designed for interoperation?
• Which are the technologies to use for interoperation?
• How can we realize a software system, design platform components, and
services enabling interoperability?
Interoperation and composition among different cloud computing vendors is
possible only by means of open standards and interfaces. Moreover,
interfaces and protocols change considerably at each layer of the Cloud
Computing Reference Model.
Future of Federation
• The federated cloud model is a force for real democratization in the
cloud market.
• It’s how businesses will be able to use local cloud providers to
connect with customers, partners and employees anywhere in the
world.
• It’s how end users will finally get to realize the promise of the cloud.
• And, it’s how data center operators and other service providers will
finally be able to compete with, and beat, today’s so-called global
cloud providers.
• The future of cloud computing as one big public cloud.
• Others believe that enterprises will ultimately build a single large
cloud to host all their corporate services.
• This is, of course, because the benefit of cloud computing is
dependent on large – very large – scale infrastructure, which provides
administrators and service administrators and consumers the ability
for ease of deployment, self service, elasticity, resource pooling and
economies of scale.
• However, as cloud continues to evolve – so do the services being
offered.
• Cloud Services & Hybrid Clouds
• Services are now able to reach a wider range of consumers, partners,
competitors and public audiences.
• It is also clear that storage, compute power, streaming, analytics and
other advanced services are best served when they are in an
environment tailored for the proficiency of that service.
• One method of addressing the need of these service environments is
through the advent of hybrid clouds.
• Hybrid clouds, by definition, are composed of multiple distinct cloud
infrastructures connected in a manner that enables services and data
access across the combined infrastructure.
• The intent is to leverage the additional benefits that hybrid cloud
offers without disrupting the traditional cloud benefits.
• While hybrid cloud benefits come through the ability to distribute the
work stream, the goal is to continue to realize the ability for managing
peaks in demand, to quickly make services available and capitalize on
new business opportunities.
• The Solution: Federation
• Federation creates a hybrid cloud environment with an increased focus
on maintaining the integrity of corporate policies and data integrity.
• Think of federation as a pool of clouds connected through a channel of
gateways;
• gateways which can be used to optimize a cloud for a service or set of
specific services.
• Such gateways can be used to segment service audiences or to limit
access to specific data sets.
• In essence, federation has the ability for enterprises to service their
audiences with economy of scale without exposing critical applications or
vital data through weak policies or vulnerabilities.
• Many would raise the question: if Federation creates multiples of
clouds, doesn’t that mean cloud benefits are diminished?
• I believe the answer is no, due to the fact that a fundamental change
has transformed enterprises through the original adoption of cloud
computing, namely the creation of a flexible environment able to
adapt rapidly to changing needs based on policy and automation.
• Cloud end-users are often tied to a unique cloud provider, because of
the different APIs, image formats, and access methods exposed by
different providers that make very difficult for an average user to
move its applications from one cloud to another, so leading to a
vendor lock-in problem.
• Many SMEs have their own on-premise private cloud infrastructures
to support the internal computing necessities and workloads. These
infrastructures are often over-sized to satisfy peak demand periods,
and avoid performance slow-down. Hybrid cloud (or cloud bursting)
model is a solution to reduce the on-premise infrastructure size, so
that it can be dimensioned for an average load, and it is
complemented with external resources from a public cloud provider
to satisfy peak demands.
• Many big companies (e.g. banks, hosting companies, etc.) and also
many large institutions maintain several distributed data-centers or
server-farms, for example to serve to multiple geographically
distributed offices, to implement HA, or to guarantee server proximity
to the end user.
• Resources and networks in these distributed data-centers are usually
configured as non-cooperative separate elements.
• Many educational and research centers often deploy their own
computing infrastructures, that usually do not cooperate with other
institutions, except in same punctual situations (e.g. in joint projects or
initiatives).
• Many times, even different departments within the same institution
maintain their own non-cooperative infrastructures.This Study Group
will evaluate the main challenges to enable the provision of federated
cloud infrastructures, with special emphasis on inter-cloud networking
and security issues:
• Security and Privacy
• Interoperability and Portability
• Performance and Networking Cost
• The first key action aims at “Cutting through the Jungle of Standards”
to help the adoption of cloud computing by encouraging compliance
of cloud services with respect to standards and thus providing
evidence of compliance to legal and audit obligations.
• These standards aim to avoid customer lock in by promoting
interoperability, data portability and reversibility.
• The second key action “Safe and Fair Contract Terms and Conditions”
aims to protect the cloud consumer from insufficiently specific and
balanced contracts with cloud providers that do not “provide for
liability for data integrity, confidentiality or service continuity”.
• The cloud consumer is often presented with "take-it-or-leave-it
standard contracts that might be cost-saving for the provider but is
often undesirable for the user”.
• Interface: Various cloud service providers have different APIs, pricing
models and cloud infrastructure.
• Open cloud computing interface is necessary to be initiated to provide
a common application programming interface for multiple cloud
environments.
• The simplest solution is to use a software component that allows the
federated system to connect with a given cloud environment.
• Trusted Servers
• In order to make it easier to find people on other servers we introduced the
concept of “trusted servers” as one of our last steps.
• This allows administrator to define other servers they trust.
• If two servers trust each other they will sync their user lists.
• This way the share dialogue can auto-complete not only local users but also
users on other trusted servers.
• The administrator can decide to define the lists of trusted servers manually or
allow the server to auto add every other server to which at least one
federated share was successfully created.
• This way it is possible to let your cloud server learn about more and more
other servers over time, connect with them and increase the network of
trusted servers.
• Open Challenges: where we’re taking Federated Cloud Sharing
• Of course there are still many areas to improve.
• For example the way you can discover users on different server to
share with them, for which we’re working on a global, shared address
book solution.
• Another point is that at the moment this is limited to sharing files.
• A logical next step would be to extend this to many other areas like
address books, calendars and to real-time text, voice and video
communication and we are, of course, planning for that.
Ad

More Related Content

What's hot (20)

Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
Tom Eberle
 
Big data 101
Big data 101Big data 101
Big data 101
Paresh Motiwala, PMP®
 
The life cycle of a virtual machine (VM) provisioning process
The life cycle of a virtual machine (VM) provisioning processThe life cycle of a virtual machine (VM) provisioning process
The life cycle of a virtual machine (VM) provisioning process
Hitesh Mohapatra
 
AWS SQS SNS
AWS SQS SNSAWS SQS SNS
AWS SQS SNS
Durgesh Vaishnav
 
Server Consolidation in Cloud Computing Environment
Server Consolidation in Cloud Computing EnvironmentServer Consolidation in Cloud Computing Environment
Server Consolidation in Cloud Computing Environment
Hitesh Mohapatra
 
Cloud computing notes unit II
Cloud computing notes unit II Cloud computing notes unit II
Cloud computing notes unit II
NANDINI SHARMA
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
AnandMHadoop
 
Introduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best PracticesIntroduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best Practices
Gary Silverman
 
Cloud Management Mechanisms
Cloud Management MechanismsCloud Management Mechanisms
Cloud Management Mechanisms
Mohammed Sajjad Ali
 
Resource replication in cloud computing.
Resource replication in cloud computing.Resource replication in cloud computing.
Resource replication in cloud computing.
Hitesh Mohapatra
 
Presentation on dns
Presentation on dnsPresentation on dns
Presentation on dns
Anand Grewal
 
Aws cloud watch
Aws cloud watchAws cloud watch
Aws cloud watch
Mahesh Raj
 
Man in the Middle.pptx
Man in the Middle.pptxMan in the Middle.pptx
Man in the Middle.pptx
AVNIKASODARIYA
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
purplesea
 
Storage As A Service (StAAS)
Storage As A Service (StAAS)Storage As A Service (StAAS)
Storage As A Service (StAAS)
Shreyans Jain
 
Cloud Analytics
Cloud AnalyticsCloud Analytics
Cloud Analytics
Ahmed Banafa
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
Jeff Hammerbacher
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
Sandeep Patil
 
BUSINESS CONSIDERATIONS FOR CLOUD COMPUTING
BUSINESS CONSIDERATIONS FOR CLOUD COMPUTINGBUSINESS CONSIDERATIONS FOR CLOUD COMPUTING
BUSINESS CONSIDERATIONS FOR CLOUD COMPUTING
Hitesh Mohapatra
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
Tom Eberle
 
The life cycle of a virtual machine (VM) provisioning process
The life cycle of a virtual machine (VM) provisioning processThe life cycle of a virtual machine (VM) provisioning process
The life cycle of a virtual machine (VM) provisioning process
Hitesh Mohapatra
 
Server Consolidation in Cloud Computing Environment
Server Consolidation in Cloud Computing EnvironmentServer Consolidation in Cloud Computing Environment
Server Consolidation in Cloud Computing Environment
Hitesh Mohapatra
 
Cloud computing notes unit II
Cloud computing notes unit II Cloud computing notes unit II
Cloud computing notes unit II
NANDINI SHARMA
 
Introduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best PracticesIntroduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best Practices
Gary Silverman
 
Resource replication in cloud computing.
Resource replication in cloud computing.Resource replication in cloud computing.
Resource replication in cloud computing.
Hitesh Mohapatra
 
Presentation on dns
Presentation on dnsPresentation on dns
Presentation on dns
Anand Grewal
 
Aws cloud watch
Aws cloud watchAws cloud watch
Aws cloud watch
Mahesh Raj
 
Man in the Middle.pptx
Man in the Middle.pptxMan in the Middle.pptx
Man in the Middle.pptx
AVNIKASODARIYA
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
purplesea
 
Storage As A Service (StAAS)
Storage As A Service (StAAS)Storage As A Service (StAAS)
Storage As A Service (StAAS)
Shreyans Jain
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
Sandeep Patil
 
BUSINESS CONSIDERATIONS FOR CLOUD COMPUTING
BUSINESS CONSIDERATIONS FOR CLOUD COMPUTINGBUSINESS CONSIDERATIONS FOR CLOUD COMPUTING
BUSINESS CONSIDERATIONS FOR CLOUD COMPUTING
Hitesh Mohapatra
 

Similar to Cloud Computing - Cloud Technologies and Advancements (20)

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
Subhas Kumar Ghosh
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
SwarnaSLcse
 
Big Data-Session, data engineering and scala
Big Data-Session, data engineering and scalaBig Data-Session, data engineering and scala
Big Data-Session, data engineering and scala
ssusera3b277
 
HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
Bharathi567510
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Milad Sobhkhiz
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
Aisha Siddiqa
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
AakashBerlia1
 
unit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
unit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyunit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
unit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
0710harish
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
SatyaHadoop
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
Kalyan Hadoop
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
ssuser6e8e41
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Hadoop
HadoopHadoop
Hadoop
Shahbaz Sidhu
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
srikanthhadoop
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
WasyihunSema2
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Simplilearn
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
Unmesh Baile
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
SwarnaSLcse
 
Big Data-Session, data engineering and scala
Big Data-Session, data engineering and scalaBig Data-Session, data engineering and scala
Big Data-Session, data engineering and scala
ssusera3b277
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Milad Sobhkhiz
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
AakashBerlia1
 
unit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
unit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyunit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
unit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
0710harish
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
Kalyan Hadoop
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
srikanthhadoop
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Simplilearn
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
Unmesh Baile
 
Ad

More from Sathishkumar Jaganathan (11)

Principles of Electronic Commerce_Unit_V.ppt
Principles of Electronic Commerce_Unit_V.pptPrinciples of Electronic Commerce_Unit_V.ppt
Principles of Electronic Commerce_Unit_V.ppt
Sathishkumar Jaganathan
 
Principles of Electronic Commerce_Unit_IV_Internet_Marketing.ppt
Principles of Electronic Commerce_Unit_IV_Internet_Marketing.pptPrinciples of Electronic Commerce_Unit_IV_Internet_Marketing.ppt
Principles of Electronic Commerce_Unit_IV_Internet_Marketing.ppt
Sathishkumar Jaganathan
 
Principles of Electronic Commerce_Unit_III.ppt
Principles of Electronic Commerce_Unit_III.pptPrinciples of Electronic Commerce_Unit_III.ppt
Principles of Electronic Commerce_Unit_III.ppt
Sathishkumar Jaganathan
 
Principles of Electronic Commerce_Unit_II.ppt
Principles of Electronic Commerce_Unit_II.pptPrinciples of Electronic Commerce_Unit_II.ppt
Principles of Electronic Commerce_Unit_II.ppt
Sathishkumar Jaganathan
 
Principles of Electronic Commerce_Unit_I.ppt
Principles  of Electronic Commerce_Unit_I.pptPrinciples  of Electronic Commerce_Unit_I.ppt
Principles of Electronic Commerce_Unit_I.ppt
Sathishkumar Jaganathan
 
UNIT3_Cloud computing- Cloud Architecture Services and Storage
UNIT3_Cloud computing- Cloud Architecture Services and StorageUNIT3_Cloud computing- Cloud Architecture Services and Storage
UNIT3_Cloud computing- Cloud Architecture Services and Storage
Sathishkumar Jaganathan
 
UNIT2_Cloud Computing - Cloud Enabling Technologies
UNIT2_Cloud Computing - Cloud Enabling TechnologiesUNIT2_Cloud Computing - Cloud Enabling Technologies
UNIT2_Cloud Computing - Cloud Enabling Technologies
Sathishkumar Jaganathan
 
UNIT1_CLOUD COMPUTING Introduction , Basics
UNIT1_CLOUD COMPUTING Introduction , BasicsUNIT1_CLOUD COMPUTING Introduction , Basics
UNIT1_CLOUD COMPUTING Introduction , Basics
Sathishkumar Jaganathan
 
Resource Management and Security in Cloud
Resource Management and Security in CloudResource Management and Security in Cloud
Resource Management and Security in Cloud
Sathishkumar Jaganathan
 
Cloud Computing Course Material - Virtualization
Cloud Computing Course Material -  VirtualizationCloud Computing Course Material -  Virtualization
Cloud Computing Course Material - Virtualization
Sathishkumar Jaganathan
 
20CS402_Unit_1.pptx
20CS402_Unit_1.pptx20CS402_Unit_1.pptx
20CS402_Unit_1.pptx
Sathishkumar Jaganathan
 
Principles of Electronic Commerce_Unit_V.ppt
Principles of Electronic Commerce_Unit_V.pptPrinciples of Electronic Commerce_Unit_V.ppt
Principles of Electronic Commerce_Unit_V.ppt
Sathishkumar Jaganathan
 
Principles of Electronic Commerce_Unit_IV_Internet_Marketing.ppt
Principles of Electronic Commerce_Unit_IV_Internet_Marketing.pptPrinciples of Electronic Commerce_Unit_IV_Internet_Marketing.ppt
Principles of Electronic Commerce_Unit_IV_Internet_Marketing.ppt
Sathishkumar Jaganathan
 
Principles of Electronic Commerce_Unit_III.ppt
Principles of Electronic Commerce_Unit_III.pptPrinciples of Electronic Commerce_Unit_III.ppt
Principles of Electronic Commerce_Unit_III.ppt
Sathishkumar Jaganathan
 
Principles of Electronic Commerce_Unit_II.ppt
Principles of Electronic Commerce_Unit_II.pptPrinciples of Electronic Commerce_Unit_II.ppt
Principles of Electronic Commerce_Unit_II.ppt
Sathishkumar Jaganathan
 
Principles of Electronic Commerce_Unit_I.ppt
Principles  of Electronic Commerce_Unit_I.pptPrinciples  of Electronic Commerce_Unit_I.ppt
Principles of Electronic Commerce_Unit_I.ppt
Sathishkumar Jaganathan
 
UNIT3_Cloud computing- Cloud Architecture Services and Storage
UNIT3_Cloud computing- Cloud Architecture Services and StorageUNIT3_Cloud computing- Cloud Architecture Services and Storage
UNIT3_Cloud computing- Cloud Architecture Services and Storage
Sathishkumar Jaganathan
 
UNIT2_Cloud Computing - Cloud Enabling Technologies
UNIT2_Cloud Computing - Cloud Enabling TechnologiesUNIT2_Cloud Computing - Cloud Enabling Technologies
UNIT2_Cloud Computing - Cloud Enabling Technologies
Sathishkumar Jaganathan
 
UNIT1_CLOUD COMPUTING Introduction , Basics
UNIT1_CLOUD COMPUTING Introduction , BasicsUNIT1_CLOUD COMPUTING Introduction , Basics
UNIT1_CLOUD COMPUTING Introduction , Basics
Sathishkumar Jaganathan
 
Resource Management and Security in Cloud
Resource Management and Security in CloudResource Management and Security in Cloud
Resource Management and Security in Cloud
Sathishkumar Jaganathan
 
Cloud Computing Course Material - Virtualization
Cloud Computing Course Material -  VirtualizationCloud Computing Course Material -  Virtualization
Cloud Computing Course Material - Virtualization
Sathishkumar Jaganathan
 
Ad

Recently uploaded (20)

Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
Political History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptxPolitical History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptx
Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
To study the nervous system of insect.pptx
To study the nervous system of insect.pptxTo study the nervous system of insect.pptx
To study the nervous system of insect.pptx
Arshad Shaikh
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
Odoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo SlidesOdoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo Slides
Celine George
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
To study the nervous system of insect.pptx
To study the nervous system of insect.pptxTo study the nervous system of insect.pptx
To study the nervous system of insect.pptx
Arshad Shaikh
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
Odoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo SlidesOdoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo Slides
Celine George
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 

Cloud Computing - Cloud Technologies and Advancements

  • 1. UNIT V CLOUD TECHNOLOGIES AND ADVANCEMENTS • Hadoop • Apache Hadoop is an open source software framework used to develop data processing applications which are executed in a distributed computing environment. • Applications built using HADOOP are run on large data sets distributed across clusters of commodity computers. • Commodity computers are mainly useful for achieving greater computational power at low cost. • Similar to data residing in a local file system of a personal computer system, in Hadoop, data resides in a distributed file system which is called as a Hadoop Distributed File system.
  • 2. • Apache Hadoop consists of two sub-projects – • Hadoop MapReduce: • MapReduce is a computational model and software framework for writing applications which are run on Hadoop. • These MapReduce programs are capable of processing enormous data in parallel on large clusters of computation nodes.
  • 3. • HDFS (Hadoop Distributed File System): • HDFS takes care of the storage part of Hadoop applications. • MapReduce applications consume data from HDFS. • HDFS creates multiple replicas of data blocks and distributes them on compute nodes in a cluster. • This distribution enables reliable and extremely rapid computations.
  • 5. • NameNode and DataNodes • HDFS has a master/slave architecture. • A HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. • In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. • HDFS exposes a file system namespace and allows user data to be stored in files.
  • 7. • Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. • The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. • It also determines the mapping of blocks to DataNodes. • The DataNodes are responsible for serving read and write requests from the file system’s clients. • The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.
  • 8. Functions of NameNode: • It is the master daemon that maintains and manages the DataNodes (slave nodes) • It records the metadata of all the files stored in the cluster, e.g. The location of blocks stored, the size of the files, permissions, hierarchy, etc. There are two files associated with the metadata: • FsImage: It contains the complete state of the file system namespace since the start of the NameNode. • EditLogs: It contains all the recent modifications made to the file system with respect to the most recent FsImage. • It records each change that takes place to the file system metadata. For example, if a file is deleted in HDFS, the NameNode will immediately record this in the EditLog.
  • 9. • It regularly receives a Heartbeat and a block report from all the DataNodes in the cluster to ensure that the DataNodes are live. • It keeps a record of all the blocks in HDFS and in which nodes these blocks are located. • The NameNode is also responsible to take care of the replication factor of all the blocks. • In case of the DataNode failure, the NameNode chooses new DataNodes for new replicas, balance disk usage and manages the communication traffic to the DataNodes.
  • 10. • The NameNode and DataNode are pieces of software designed to run on commodity machines. • These machines typically run a GNU/Linux operating system (OS). HDFS is built using the Java language; • Any machine that supports Java can run the NameNode or the DataNode software. • A typical deployment has a dedicated machine that runs only the NameNode software. • Each of the other machines in the cluster runs one instance of the DataNode software. • The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case.
  • 11. DataNodes • DataNodes are the slave nodes in HDFS. Unlike NameNode, DataNode is a commodity hardware, that is, a non-expensive system which is not of high quality or high-availability. The DataNode is a block server that stores the data in the local file ext3 or ext4. • Functions of DataNode: • These are slave daemons or process which runs on each slave machine. • The actual data is stored on DataNodes. • The DataNodes perform the low-level read and write requests from the file system’s clients. • They send heartbeats to the NameNode periodically to report the overall health of HDFS, by default, this frequency is set to 3 seconds
  • 12. • The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. • The NameNode is the arbitrator and repository for all HDFS metadata. • The system is designed in such a way that user data never flows through the NameNode.
  • 13. • Secondary NameNode: • Apart from these two daemons, there is a third daemon or a process called Secondary NameNode. The Secondary NameNode works concurrently with the primary NameNode as a helper daemon. And don’t be confused about the Secondary NameNode being a backup NameNode because it is not.
  • 14. • Functions of Secondary NameNode: • The Secondary NameNode is one which constantly reads all the file systems and metadata from the RAM of the NameNode and writes it into the hard disk or the file system. • It is responsible for combining the EditLogs with FsImage from the NameNode. • It downloads the EditLogs from the NameNode at regular intervals and applies to FsImage. The new FsImage is copied back to the NameNode, which is used whenever the NameNode is started the next time
  • 15. • Blocks: • Now, as we know that the data in HDFS is scattered across the DataNodes as blocks. Let’s have a look at what is a block and how is it formed? • Blocks are the nothing but the smallest continuous location on your hard drive where data is stored. In general, in any of the File System, you store the data as a collection of blocks. Similarly, HDFS stores each file as blocks which are scattered throughout the Apache Hadoop cluster. The default size of each block is 128 MB in Apache Hadoop 2.x (64 MB in Apache Hadoop 1.x) which you can configure as per your requirement.
  • 16. • t is not necessary that in HDFS, each file is stored in exact multiple of the configured block size (128 MB, 256 MB etc.). Let’s take an example where I have a file “example.txt” of size 514 MB as shown in above figure. Suppose that we are using the default configuration of block size, which is 128 MB. Then, how many blocks will be created? 5, Right. The first four blocks will be of 128 MB. But, the last block will be of 2 MB size only.
  • 17. • The File System Namespace • HDFS supports a traditional hierarchical file organization. • A user or an application can create directories and store files inside these directories. • The file system namespace hierarchy is similar to most other existing file systems; • one can create and remove files, move a file from one directory to another, or rename a file. • HDFS supports user access permissions.
  • 18. • While HDFS follows naming convention of the FileSystem, some paths and names (e.g. /.reserved and .snapshot ) are reserved. • The NameNode maintains the file system namespace. • Any change to the file system namespace or its properties is recorded by the NameNode. • An application can specify the number of replicas of a file that should be maintained by HDFS. • The number of copies of a file is called the replication factor of that file. This information is stored by the NameNode.
  • 19. • Data Replication • HDFS is designed to reliably store very large files across machines in a large cluster. • It stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance. • The block size and replication factor are configurable per file. • All blocks in a file except the last block are the same size • HDFS provides a reliable way to store huge data in a distributed environment as data blocks. The blocks are also replicated to provide fault tolerance. The default replication factor is 3 which is again configurable.
  • 21. • Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any time. • The NameNode makes all decisions regarding replication of blocks. • It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. • Receipt of a Heartbeat implies that the DataNode is functioning properly. • A Blockreport contains a list of all blocks on a DataNode
  • 23. • Replication • The placement of replicas is critical to HDFS reliability and performance. • Optimizing replica placement distinguishes HDFS from most other distributed file systems. • This is a feature that needs lots of tuning and experience. • The purpose of a rack-aware replica placement policy is to improve data reliability, availability, and network bandwidth utilization.
  • 24. • Now, the following protocol will be followed whenever the data is written into HDFS: • At first, the HDFS client will reach out to the NameNode for a Write Request against the two blocks, say, Block A & Block B. • The NameNode will then grant the client the write permission and will provide the IP addresses of the DataNodes where the file blocks will be copied eventually. • The selection of IP addresses of DataNodes is purely randomized based on availability, replication factor and rack awareness that we have discussed earlier.
  • 25. • Let’s say the replication factor is set to default i.e. 3. Therefore, for each block the NameNode will be providing the client a list of (3) IP addresses of DataNodes. The list will be unique for each block. • Suppose, the NameNode provided following lists of IP addresses to the client: • For Block A, list A = {IP of DataNode 1, IP of DataNode 4, IP of DataNode 6} • For Block B, set B = {IP of DataNode 3, IP of DataNode 7, IP of DataNode 9} • Each block will be copied in three different DataNodes to maintain the replication factor consistent throughout the cluster.
  • 26. • Replica Selection • To minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request from a replica that is closest to the reader. • If HDFS cluster spans multiple data centers, then a replica that is resident in the local data center is preferred over any remote replica.
  • 27. • Files in HDFS are write-once . • The NameNode makes all decisions regarding replication of blocks. • It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. • Receipt of a Heartbeat implies that the DataNode is functioning properly. • A Blockreport contains a list of all blocks on a DataNodes
  • 28. MapReduce • Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on computing clusters. • It is a sub-project of the Apache Hadoop project. • Apache Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. • MapReduce is the core component for data processing in Hadoop framework.
  • 29. • Mapreduce helps to split the input data set into a number of parts and run a program on all data parts parallel at once. • The term MapReduce refers to two separate and distinct tasks. • The first is the map operation, takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). • The reduce operation combines those data tuples based on the key and accordingly modifies the value of the key.
  • 31. • Map Task The Map task run in the following phases:- a. RecordReader • The recordreader transforms the input split into records. • It provides the data to the mapper function in key-value pairs. • Usually, the key is the positional information and value is the data that comprises the record.
  • 33. Types of Hadoop RecordReader in MapReduce • The RecordReader instance is defined by the InputFormat. • By default, it uses TextInputFormat for converting data into a key- value pair. TextInputFormat provides 2 types of RecordReaders: i. LineRecordReader ii. SequenceFileRecordReader
  • 34. • b. Map • In this phase, the mapper which is the user-defined function processes the key-value pair from the recordreader. • It produces zero or multiple intermediate key-value pairs. • The key is usually the data on which the reducer function does the grouping operation. • And value is the data which gets aggregated to get the final result in the reducer function.
  • 35. • c. Combiner • The combiner is actually a localized reducer which groups the data in the map phase. It is optional. • Combiner takes the intermediate data from the mapper and aggregates them. • It does so within the small scope of one mapper. • In many situations, this decreases the amount of data needed to move over the network. For example, moving (Hello World, 1) three times consumes more network bandwidth than moving (Hello World, 3).
  • 37. Reduce Task • The various phases in reduce task are as follows: i. Shuffle and Sort • The reducer starts with shuffle and sort step. • This step sorts the individual data pieces into a large data list. • The purpose of this sort is to collect the equivalent keys together.
  • 38. • ii. Reduce • The reducer performs the reduce function once per key grouping. • The framework passes the function key and an iterator object containing all the values pertaining to the key. • We can write reducer to filter, aggregate and combine data in a number of different ways. • Once the reduce function gets finished it gives zero or more key- value pairs to the output format.
  • 39. • iii. Output Format • This is the final step. • It takes the key-value pair from the reducer and writes it to the file by record writer. • By default, it separates the key and value by a tab and each record by a newline character. • Final data gets written to HDFS.
  • 40. Virtual Box • VirtualBox is opensource software for virtualizing the X86 computing architecture. • It acts as a hypervisor, creating a VM (Virtual Machine) in which the user can run another OS (operating system). • The operating system in which VirtualBox runs is called the "host" OS. • The operating system running in the VM is called the "guest" OS. VirtualBox supports Windows, Linux, or macOS as its host OS.
  • 41. • Why Is VirtualBox Useful? • One: • VirtualBox allows you to run more than one operating system at a time. • This way, you can run software written for one operating system on another (for example, Windows software on Linux or a Mac) without having to reboot to use it (as would be needed if you used partitioning and dual-booting).
  • 42. • Two: • By using a VirtualBox feature called “snapshots”, you can save a particular state of a virtual machine and revert back to that state, if necessary. • This way, you can freely experiment with a computing environment. • If something goes wrong (e.g. after installing misbehaving software or infecting the guest with a virus), you can easily switch back to a previous snapshot and avoid the need of frequent backups and restores.
  • 43. • Three: • Software vendors can use virtual machines to ship entire software configurations. For example, installing a complete mail server solution on a real machine can be a tedious task (think of rocket science!). • With VirtualBox, such a complex setup (then often called an “appliance”) can be packed into a virtual machine. Installing and running a mail server becomes as easy as importing such an appliance into VirtualBox. • Along these same lines, I find the “clone” feature of virtual box just awesome!
  • 44. • Four: • On an enterprise level, virtualization can significantly reduce hardware and electricity costs. • Most of the time, computers today only use a fraction of their potential power and run with low average system loads. • A lot of hardware resources as well as electricity is thereby wasted. • So, instead of running many such physical computers that are only partially used, one can pack many virtual machines onto a few powerful hosts and balance the loads between them.
  • 45. VirtualBox Terminology • When dealing with virtualization, it helps towards oneself with a bit of crucial terminology, especially the following terms: • Host Operating System (Host OS): • The operating system of the physical computer on which VirtualBox was installed. There are versions of VirtualBox for Windows, Mac OS , Linux and Solaris hosts. • Guest Operating System (Guest OS): • The operating system that is running inside the virtual machine.
  • 46. • Virtual Machine (VM): • We’ve used this term often already. It is the special environment that VirtualBox creates for your guest operating system while it is running. In other words, you run your guest operating system “in” a VM. Normally, a VM will be shown as a window on your computers desktop, but depending on which of the various frontends of VirtualBox you use, it can be displayed in full screen mode or remotely on another computer.
  • 47. Google App Engine • Google App Engine is a Platform as a Service and cloud computing platform for developing and hosting web applications in Google- managed data centers. • App Engine is a fully managed, serverless platform for developing and hosting web applications at scale. • You can choose from several popular languages, libraries, and frameworks to develop your apps, then let App Engine take care of provisioning servers and scaling your app instances based on demand • The App Engine requires that apps be written in Java or Python, store data in Google BigTable and use the Google query language.
  • 48. • Google App Engine provides more infrastructure than other scalable hosting services such as Amazon Elastic Compute Cloud (EC2). • The App Engine also eliminates some system administration and developmental tasks to make it easier to write scalable applications. • Google App Engine is free up to a certain amount of resource usage. • Users exceeding the per-day or per-minute usage rates for CPU resources, storage, number of API calls or requests and concurrent requests can pay for more of these resources.
  • 49. • Modern web applications • Quickly reach customers and end users by deploying web apps on App Engine. • With zero-config deployments and zero server management, App Engine allows you to focus on writing code. • Plus, App Engine automatically scales to support sudden traffic spikes without provisioning, patching, or monitoring.
  • 50. • Features • Popular languages • Build your application in Node.js, Java, Ruby, C#, Go, Python, or PHP— or bring your own language runtime. • Open and flexible • Custom runtimes allow you to bring any library and framework to App Engine.
  • 51. • Fully managed • A fully managed environment lets you focus on code while App Engine manages infrastructure concerns. • Powerful application diagnostics • Use Cloud Monitoring and Cloud Logging to monitor the health and performance of your app and Cloud Debugger and Error Reporting to diagnose and fix bugs quickly. • Application versioning • Easily host different versions of your app, easily create development, test, staging, and production environments.
  • 52. • Application security • Help safeguard your application by defining access rules with App Engine firewall and leverage managed SSL/TLS certificates by default on your custom domain at no additional cost.
  • 53. • Advantages of Google App Engine • There are many advantages to the Google App Engine that helps to take your app ideas to the next level. This includes: • Infrastructure for Security • Around the world, the Internet infrastructure that Google has is probably the most secure. There is rarely any type of unauthorized access to date as the application data and code are stored in highly secure servers.
  • 54. • Quick to Start • With no product or hardware to purchase and maintain, you can prototype and deploy the app to your users without taking much time. • Easy to Use • Google App Engine (GAE) incorporates the tools that you need to develop, test, launch, and update the applications.
  • 55. • Scalability • Regardless of the amount of data or number of users that your app stores, the app engine can meet your needs by scaling up or down as required.
  • 56. • Performance and Reliability • Google is among the leaders worldwide among global brands. So, when you discuss performance and reliability you have to keep that in mind. In the past 15 years, the company has created new benchmarks based on its services’ and products’ performance. The app engine provides the same reliability and performance as any other Google product. • Cost Savings • You don’t have to hire engineers to manage your servers or to do that yourself. You can invest the money saved into other parts of your business. • Platform Independence • You can move all your data to another environment without any difficulty as there are not many dependencies on the app engine platform.
  • 57. • Open Stack • OpenStack is a free open standard cloud computing platform, mostly deployed as infrastructure-as-a-service in both public and private clouds where virtual servers and other resources are made available to users. • OpenStack is a set of software tools for building and managing cloud computing platforms for public and private clouds. • OpenStack is managed by the OpenStack Foundation, a non-profit that oversees both development and community-building around the project.
  • 58. • Introduction to OpenStack • OpenStack lets users deploy virtual machines and other instances that handle different tasks for managing a cloud environment on the fly. • It makes horizontal scaling easy, which means that tasks that benefit from running concurrently can easily serve more or fewer users on the fly by just spinning up more instances. • For example, a mobile application that needs to communicate with a remote server might be able to divide the work of communicating with each user across many different instances, all communicating with one another but scaling quickly and easily as the application gains more users.
  • 59. • And most importantly, OpenStack is open source software, which means that anyone who chooses to can access the source code, make any changes or modifications they need, and freely share these changes. • It also means that OpenStack has the benefit of thousands of developers all over the world working in tandem to develop the strongest, most robust, and most secure product that they can.
  • 60. • How is OpenStack used in a cloud environment? • The cloud is all about providing computing for end users in a remote environment, where the actual software runs as a service on reliable and scalable servers rather than on each end-user's computer. • Cloud computing can refer to a lot of different things, but typically the industry talks about running different items "as a service"—software, platforms, and infrastructure. • OpenStack is considered Infrastructure as a Service (IaaS). • Providing infrastructure means that OpenStack makes it easy for users to quickly add new instance, upon which other cloud components can run. • Typically, the infrastructure then runs a "platform" upon which a developer can create software applications that are delivered to the end users.
  • 61. • What are the components of OpenStack? • Because of its open nature, anyone can add additional components to OpenStack to help it to meet their needs. • But the OpenStack community has collaboratively identified nine key components that are a part of the "core" of OpenStack, which are distributed as a part of any OpenStack system and officially maintained by the OpenStack community. • Nova is the primary computing engine behind OpenStack. It is used for deploying and managing large numbers of virtual machines and other instances to handle computing tasks.
  • 62. • Swift is a storage system for objects and files. • The OpenStack Object Store project, known as Swift, offers cloud storage software so that you can store and retrieve lots of data with a simple API. • It's built for scale and optimized for durability, availability, and concurrency across the entire data set. • Swift is ideal for storing unstructured data that can grow without bound.
  • 63. • Cinder is a block storage component, which is more analogous to the traditional notion of a computer being able to access specific locations on a disk drive. This more traditional way of accessing files might be important in scenarios in which data access speed is the most important consideration. • Neutron provides the networking capability for OpenStack. It helps to ensure that each of the components of an OpenStack deployment can communicate with one another quickly and efficiently.
  • 64. • Horizon is the dashboard behind OpenStack. • It is the only graphical interface to OpenStack, so for users wanting to give OpenStack a try, this may be the first component they actually “see.” • Developers can access all of the components of OpenStack individually through an application programming interface (API), but the dashboard provides system administrators a look at what is going on in the cloud, and to manage it as needed.
  • 65. • Keystone provides identity services for OpenStack. It is essentially a central list of all of the users of the OpenStack cloud, mapped against all of the services provided by the cloud, which they have permission to use. It provides multiple means of access, meaning developers can easily map their existing user access methods against Keystone. • Glance provides image services to OpenStack. In this case, "images" refers to images (or virtual copies) of hard disks.
  • 66. • Prerequisite for minimum production deployment • There are some basic requirements you’ll have to meet to deploy OpenStack. Here are the prerequisites, drawn from the OpenStack manual. • Hardware: For OpenStack controller node, 12 GB RAM are needed as well as a disk space of 30 GB to run OpenStack services. Two SATA(Serial Advanced Technology Attachment) disks of 2 TB will be necessary to store volumes used by instances. Communication with compute nodes requires a network interface card (NIC) of 1 Gbps.
  • 67. • Operating system (OS): • OpenStack supports the following operating systems: Debian, Fedora, Red Hat Enterprise Linux (RHEL), openSUSE, SLES Linux Enterprise Server and Ubuntu.
  • 68. • Federation in the Cloud • Cloud federation is the practice of interconnecting the cloud computing environments of two or more service providers for the purpose of load balancing traffic and accommodating spikes in demand. Cloud federation requires one provider to wholesale or rent computing resources to another cloud provider. • “Cloud federation manages consistency and access controls when two or more independent geographically distinct Clouds share either authentication, files, computing resources, command and control or access to storage resources.”
  • 69. • Cloud federation introduces additional issues that have to be addressed in order to provide a secure environment in which to move applications and services among a collection of federated providers. • Baseline security needs to be guaranteed across all cloud vendors that are part of the federation.
  • 70. • An interesting aspect is represented by the management of the digital identity across diverse organizations, security domains, and application platforms. • In particular, the term federated identity management refers to standards-based approaches for handling authentication, single sign- on (SSO), role-based access control, and session management in a federated environment .
  • 71. • No matter the specific protocol and framework, two main approaches can be considered: • Centralized federation model • This is the approach taken by several identity federation standards. It distinguishes two operational roles in an SSO transaction: the identity provider and the service provider. • Claim-based model • This approach addresses the problem of user authentication from a different perspective and requires users to provide claims answering who they are and what they can do in order to access content or complete a transaction.
  • 72. • The first model is currently used today; the second constitutes a future vision for identity management in the cloud. • Digital identity management constitutes a fundamental aspect of security management in a cloud federation. • To transparently perform operations across different administrative domains, it is of mandatory importance to have a robust framework for authentication and authorization, and federated identity management addresses this issue. • Federated identity management allows us to tie together the computing stacks of different vendors and present them as a single environment to users from a security point of view.
  • 73. OpenNebula: • OpenNebula is a cloud computing platform for managing heterogeneous distributed data center infrastructures. • The OpenNebula platform manages a data center's virtual infrastructure, to build private, public and hybrid implementations of Infrastructure as a Service.
  • 76. • Much research work has been developed around OpenNebula. • For example, the University of Chicago has come up with an advance reservation system called Haizea Lease Manager. • IBM Haifa has developed a policy-driven probabilistic admission control and dynamic placement optimization for site level management policies called the RESERVOIR Policy Engine • Nephele is an SLA-driven automatic service management tool developed by Telefonica and Virtual Cluster Tool for atomic cluster management with versioning with multiple transport protocols from CRS4 Distributed Computing Group.
  • 77. Development • OpenNebula follows a rapid release cycle to improve user satisfaction by rapidly delivering features and innovations based on user requirements and feedback. • In other words, giving customers what they want more quickly, in smaller increments, while additionally increasing technical quality. • Major upgrades generally occur every 3-5 years and each upgrade generally has 3-5 updates.
  • 78. • Cloud Federations and Server Coalitions • In large-scale systems, coalition formation supports more effective use of resources, as well as convenient means to access these resources. • It is therefore not surprising that coalition formation for computational grids has been investigated in the past. • The interest in grid computing is fading away, while cloud computing is widely accepted today and its adoption by more and more institutions and individuals seems to be guaranteed at least for the foreseeable future.
  • 79. • Two classes of applications of cloud coalitions are reported in the literature: • 1.Coalitions among CSPs for the formation of cloud federations. A cloud federation is an infrastructure allowing a group of CSPs to share resources; the goal is to balance the load and improve system reliability. • 2.Coalitions among the servers of a data center. The goal is to assemble a pool of resources larger than the ones available from a single server. • In recent years the number of CSPs has increased significantly. The question if they should cooperate to share their resources led to the idea of cloud federations, groups of CSPs who have agreed on a set of common standards and are able to share their resources.
  • 80. • Cloud coalition formation raises a number of technical, as well as nontechnical problems. • Cloud federations require a set of standards. • The cloud computing landscape is still evolving and an early standardization may slowdown and negatively affects the adoption of new ideas and technologies. • At the same time, CSPs want to maintain their competitive advantages by closely guarding the details of their internal algorithms and protocols.
  • 81. • Four Levels of Federation • Creating a cloud federation involves research and development at different levels: conceptual, logical and operational, and infrastructural.
  • 83. • Figure provides a comprehensive view of the challenges faced in designing and implementing an organizational structure that coordinates together, cloud services that belong to different administrative domains and makes them operate within a context of a single unified service middleware. • Each cloud federation level presents different challenges and operates at a different layer of the IT stack. • It then requires the use of different approaches and technologies.
  • 84. • CONCEPTUAL LEVEL • The conceptual level addresses the challenges in presenting a cloud federation as a favourable solution. • In this level it is important to clearly identify the advantages for either service providers or service consumers in joining a federation. • To describe the new opportunities that a federated environment creates.
  • 85. • Elements of concern at this level are: • Motivations for cloud providers to join a federation. • Motivations for service consumers to influence a federation. • Advantages for providers in leasing their services to other providers. • Responsibilities of providers once they have joined the federation. • Trust agreements between providers. • Transparency versus consumers. • Among these aspects, the most relevant are the motivations of both service providers and consumers in joining a federation.
  • 86. • LOGICAL & OPERATIONAL LEVEL • The logical and operational level of a federated cloud identifies and addresses the challenges in creating a framework that enables the aggregation of providers that belong to different administrative domains • At this level, policies and rules for interoperation are defined. • Moreover, this is the layer at which decisions are made as to how and when to lease a service to—or to leverage a service from— another provider. • The logical component defines a context in which agreements among providers are settled and services are conveyed, whereas the operational component characterizes and shapes the dynamic behaviour of the federation as a result of the single providers’ choices.
  • 87. It is important at this level to address the following challenges: • How should a federation be represented? • How should we model and represent a cloud service, a cloud provider, or an agreement? • How should we define the rules and policies that allow providers to join a federation? • What are the mechanisms in place for settling agreements among providers? • What are provider’s responsibilities with respect to each other?
  • 88. • When should providers and consumers take advantage of the federation? • Which kinds of services are more likely to be leased or bought? • How should we price resources that are leased, and which fraction of resources should we lease?
  • 89. • INFRASTRUCTURE LEVEL • The infrastructural level addresses the technical challenges involved in enabling heterogeneous cloud computing systems to interoperate seamlessly. • It deals with the technology barriers that keep separate cloud computing systems belonging to different administrative domains. • By having standardized protocols and interfaces, these barriers can be overcome.
  • 90. At this level it is important to address the following issues: • What kind of standards should be used? • How should design interfaces and protocols be designed for interoperation? • Which are the technologies to use for interoperation? • How can we realize a software system, design platform components, and services enabling interoperability? Interoperation and composition among different cloud computing vendors is possible only by means of open standards and interfaces. Moreover, interfaces and protocols change considerably at each layer of the Cloud Computing Reference Model.
  • 91. Future of Federation • The federated cloud model is a force for real democratization in the cloud market. • It’s how businesses will be able to use local cloud providers to connect with customers, partners and employees anywhere in the world. • It’s how end users will finally get to realize the promise of the cloud. • And, it’s how data center operators and other service providers will finally be able to compete with, and beat, today’s so-called global cloud providers.
  • 92. • The future of cloud computing as one big public cloud. • Others believe that enterprises will ultimately build a single large cloud to host all their corporate services. • This is, of course, because the benefit of cloud computing is dependent on large – very large – scale infrastructure, which provides administrators and service administrators and consumers the ability for ease of deployment, self service, elasticity, resource pooling and economies of scale. • However, as cloud continues to evolve – so do the services being offered.
  • 93. • Cloud Services & Hybrid Clouds • Services are now able to reach a wider range of consumers, partners, competitors and public audiences. • It is also clear that storage, compute power, streaming, analytics and other advanced services are best served when they are in an environment tailored for the proficiency of that service.
  • 94. • One method of addressing the need of these service environments is through the advent of hybrid clouds. • Hybrid clouds, by definition, are composed of multiple distinct cloud infrastructures connected in a manner that enables services and data access across the combined infrastructure. • The intent is to leverage the additional benefits that hybrid cloud offers without disrupting the traditional cloud benefits. • While hybrid cloud benefits come through the ability to distribute the work stream, the goal is to continue to realize the ability for managing peaks in demand, to quickly make services available and capitalize on new business opportunities.
  • 95. • The Solution: Federation • Federation creates a hybrid cloud environment with an increased focus on maintaining the integrity of corporate policies and data integrity. • Think of federation as a pool of clouds connected through a channel of gateways; • gateways which can be used to optimize a cloud for a service or set of specific services. • Such gateways can be used to segment service audiences or to limit access to specific data sets. • In essence, federation has the ability for enterprises to service their audiences with economy of scale without exposing critical applications or vital data through weak policies or vulnerabilities.
  • 96. • Many would raise the question: if Federation creates multiples of clouds, doesn’t that mean cloud benefits are diminished? • I believe the answer is no, due to the fact that a fundamental change has transformed enterprises through the original adoption of cloud computing, namely the creation of a flexible environment able to adapt rapidly to changing needs based on policy and automation. • Cloud end-users are often tied to a unique cloud provider, because of the different APIs, image formats, and access methods exposed by different providers that make very difficult for an average user to move its applications from one cloud to another, so leading to a vendor lock-in problem.
  • 97. • Many SMEs have their own on-premise private cloud infrastructures to support the internal computing necessities and workloads. These infrastructures are often over-sized to satisfy peak demand periods, and avoid performance slow-down. Hybrid cloud (or cloud bursting) model is a solution to reduce the on-premise infrastructure size, so that it can be dimensioned for an average load, and it is complemented with external resources from a public cloud provider to satisfy peak demands.
  • 98. • Many big companies (e.g. banks, hosting companies, etc.) and also many large institutions maintain several distributed data-centers or server-farms, for example to serve to multiple geographically distributed offices, to implement HA, or to guarantee server proximity to the end user. • Resources and networks in these distributed data-centers are usually configured as non-cooperative separate elements.
  • 99. • Many educational and research centers often deploy their own computing infrastructures, that usually do not cooperate with other institutions, except in same punctual situations (e.g. in joint projects or initiatives). • Many times, even different departments within the same institution maintain their own non-cooperative infrastructures.This Study Group will evaluate the main challenges to enable the provision of federated cloud infrastructures, with special emphasis on inter-cloud networking and security issues: • Security and Privacy • Interoperability and Portability • Performance and Networking Cost
  • 100. • The first key action aims at “Cutting through the Jungle of Standards” to help the adoption of cloud computing by encouraging compliance of cloud services with respect to standards and thus providing evidence of compliance to legal and audit obligations. • These standards aim to avoid customer lock in by promoting interoperability, data portability and reversibility.
  • 101. • The second key action “Safe and Fair Contract Terms and Conditions” aims to protect the cloud consumer from insufficiently specific and balanced contracts with cloud providers that do not “provide for liability for data integrity, confidentiality or service continuity”. • The cloud consumer is often presented with "take-it-or-leave-it standard contracts that might be cost-saving for the provider but is often undesirable for the user”.
  • 102. • Interface: Various cloud service providers have different APIs, pricing models and cloud infrastructure. • Open cloud computing interface is necessary to be initiated to provide a common application programming interface for multiple cloud environments. • The simplest solution is to use a software component that allows the federated system to connect with a given cloud environment.
  • 103. • Trusted Servers • In order to make it easier to find people on other servers we introduced the concept of “trusted servers” as one of our last steps. • This allows administrator to define other servers they trust. • If two servers trust each other they will sync their user lists. • This way the share dialogue can auto-complete not only local users but also users on other trusted servers. • The administrator can decide to define the lists of trusted servers manually or allow the server to auto add every other server to which at least one federated share was successfully created. • This way it is possible to let your cloud server learn about more and more other servers over time, connect with them and increase the network of trusted servers.
  • 104. • Open Challenges: where we’re taking Federated Cloud Sharing • Of course there are still many areas to improve. • For example the way you can discover users on different server to share with them, for which we’re working on a global, shared address book solution. • Another point is that at the moment this is limited to sharing files. • A logical next step would be to extend this to many other areas like address books, calendars and to real-time text, voice and video communication and we are, of course, planning for that.