20CS701 CC Unit2
20CS701 CC Unit2
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
20CS701
CLOUD COMPUTING
Department: CSE
Batch/Year: 2020-2024/IV
Created by:
Ms. A. Jasmine Gilda, Assistant Professor/
CSE
Ms. S. D. Lalitha, Assistant Professor/CSE
Date: 07.08.2023
1. CONTENTS
1 Contents
2 Course Objectives
3 Pre Requisites
4 Syllabus
5 Course outcomes
7 Lecture Plan
9 Lecture Notes
10 Assignments
12 Part B Questions
15 Assessment Schedule
cloud computing.
computing services.
virtualization.
infrastructure.
Pre-requisite Chart
CO2 K4 3 3 2 2 - - - - - - - - 3 - -
CO3 K5 3 3 1 1 - - - - - - - - 3 - -
CO4 K2 3 3 1 1 - - - - - - - - 3 - -
CO5 K4 3 3 1 1 - - - - - - - - 3 - -
Correlation Level:
1. Slight (Low)
2. Moderate (Medium)
3. Substantial (High)
If there is no correlation, put “-“.
Knowledge Level Description
K6 Evaluation
K5 Synthesis
K4 Analysis
K3 Application
K2 Comprehension
K1 Knowledge
7. LECTURE PLAN
Number Actual
Sl. Proposed Taxonomy Mode of
Topic of Lecture CO
No Date Level Delivery
Periods Date
Compute Services
1 22.8.2023 PPT
Storage Service 1 CO2 K2
Database Services
1 23.8.2023 K2 PPT
2 Application CO2
Services
Content Delivery
1 23.8.2023 CO2 K2 PPT
3 Services
The bottommost layer in the cloud reference model is the infrastructure and facilities layer
that includes the physical infrastructure such as datacenter facilities, electrical and
mechanical equipment, etc. On top of the infrastructure layer is the hardware layer that
includes physical compute, network and storage hardware. On top of the hardware layer the
virtualization layer partitions the physical hardware resources into multiple virtual resources
that enabling pooling of resources. The various types of virtualization approaches are full
virtualization, para-
13
virtualization and hardware virtualization. The computing services are
delivered in the form of Virtual Machines (VMs) along with the storage
and network resources.
The platform and middleware layer builds upon the laaS layers below
and provides standardized stacks of services such as database service,
queuing service, application frameworks and run-time environments,
messaging services, monitoring services, analytics services, etc. The
service management layer provides APls for requesting, managing and
monitoring cloud resources. The topmost layer is the applications layer
that includes SaaS applications such as Email, cloud storage application,
productivity applications, management portals, customer self-service
portals, etc.
Figure 3.1 (b) shows various types of cloud services and the associated
layers in the cloud reference model.
14
Features
Scalable: Compute services allow rapidly provisioning as
many virtual machine instances as required. The provisioned
capacity can be scaled-up or down based on the workload
levels. Auto-scaling policies can be defined for compute
services that are triggered when the monitored metrics (such
as CPU usage, memory usage, etc.) go above pre-defined
thresholds.
• Flexible: Compute services give a wide range of options for
virtual machines with multiple instance types, operating
systems, zones/regions, etc.
• Secure: Compute services provide various security features
that control the access to the virtual machine instances such
as security groups, access control lists, network fire• walls, etc.
Users can securely connect to the instances with SSH using
authentication mechanisms such as OAuth or security
certificates and keypairs.
• Cost effective: Cloud service providers offer various billing
options such as on• demand instances which are billed per-
hour, reserved instances which are reserved after one-time
initial payment, spot instances for which users can place bids,
etc.
15
wizard. Security groups are used to open or block a specific network
port for the launched instances.
When the instance is launched its status can be viewed in the EC2
console. Upon launching a new instance, its state is pending. It takes a
couple of minutes for the instance to come into the running state.
When the instance comes into the running state, it is assigned a public
DNS, private DNS, public IP and private IP. The public DNS can be used
to securely connect to the instance using SSH.
16
connections from anywhere are bled. To enable other connections,
additional firewall rules can be added.
17
Features
• Scalability: Cloud storage services provide high capacity and
scalability. Objects upto several tera-bytes in size can be uploaded and
multiple buckets/containers can be created on cloud storages.
• Replication: When an object is uploaded it is replicated at
multiple facilities and/or on multiple devices within each facility.
Access Policies: Cloud storage services provide several security features
such as Access Control Lists (ACLs), bucket/container level policies,
etc. ACLs can be used to selectively grant access permissions on
individual objects. Bucket/container level policies can also be defined
to allow or deny permissions across some or all of the objects within a
single bucket/container.
• Encryption: Cloud storage services provide Server Side
Encryption (SSE) options to encrypt all data stored in the cloud
storage.
• Consistency: Strong data consistency is provided for all upload
and delete operations. Therefore, any object that is uploaded can be
immediately downloaded after the upload is complete.
18
2.2.2 Google Cloud Storage
Figure 3.6 shows a screenshot of the Google Cloud Storage (GCS)
console. Objects in GCS are organized into buckets. ACLs are used to
control access to objects and buckets. ACLs can be configured to share
objects and buckets with the entire world, a Google group, a Google-
hosted domain, or specific Google account holders.
19
service allows storing unstructured binary data or binary large objects
(blobs). Blobs are organized into containers. Two kinds of blobs can be
stored - block blobs and page blobs. A block blob can be subdivided into
some number of blocks. If a failure occurs while transferring a block
blob, retransmission can resume with the most recent block rather than
sending the entire blob again. Page blobs are divided into number of
pages and are designed for random access. Applications can read and
write individual pages at random in a page blob.
20
per second (IOPS) which can be provisioned upfront.
Security: Cloud database services provide several security features to
restrict the access to the database instances and stored data, such as
network firewalls and authentication mechanisms.
32
the users. All stored data is automatically replicated across multiple
availability zones to provide data durability.
22
the entity. Figure 3.11 shows a screenshot of the Google Cloud
Datastore console.
23
2.3.6 Windows Azure Table Service
Windows Azure Table Service is a non-relational (No-SQL) database
service from Microsoft. The Azure Table Service data model consists of
tables having multiple entities. Tables are divided into some number of
partitions, each of which can be stored on a separate machine. Each
partition in a table holds a specified number of entities, each containing
as many as 255 properties. Each property can be one of the several
supported data types such as integers and strings. Tables do not have a
fixed schema and different entities in a table can have different
properties.
4. Application Services
In this section you will learn about various cloud application
services such as application runtimes and frameworks, queuing
services, email services, notification services and media services.
24
App Engine features include:
Runtimes: App Engine supports applications developed in
Java, Python, PHP and Go programming languages. App Engine
provides runtime environments for Java, Python, PHP and Go
programming language.
Sandbox: Applications run in a secure sandbox environment
isolated from other applications. The sandbox environment
provides a limited access to the underlying operating system.
App Engine can only execute application code called from
HTTP requests. The sandbox environment allows App Engine
to distribute web requests for the application across multiple
servers.
Web Frameworks: App Engine provides a simple Python web
application framework called webapp2. App Engine also supports
any framework written in pure Python that speaks WSGI,
including Django, CherryPy, Pylons, web.py, and web2py.
Datastore: App Engine provides a no-SQL data storage service.
Authentication: App Engine applications can be integrated with
Google Accounts for user authentication.
URL Fetch service: URL Fetch service allows applications to
access resources on the
Internet, such as web services or other data
Email service: Email service allows applications to send email
messages.
Image Manipulation service: Image Manipulation service
allows applications to resize, crop, rotate, flip and enhance
images.
Memcache: Memcache service is a high performance in-
memory key-value cache service that applications can use for
caching data items that do not need a persistent storage.
Task Queues: Task queues allow applications to do work in
the background by breaking up work into small, discrete units,
called tasks which are enqueued in task queues.
Scheduled Tasks service : App Engine provides a Cron service
for scheduled tasks that trigger events at specified times and
regular intervals. This service allows applications to perform
tasks at defined times or regular intervals.
25
created in ASP.NET, PHP, Node.js and Python programming languages.
Multiple copies of an application can be run in different VMs, with Web
Sites automatically load balancing requests across them.
26
64KB.
38
questionable by ISPs. SES service can be accessed and used from
the SES console, the Simple Mail Transfer Protocol (SMTP) interface,
or the SES API.
28
the server.
Google Cloud Messaging for Chrome is another notification service
from Google that allows messages to be delivered from the cloud to
apps and extensions running in Chrome.
29
Windows Azure Media Services
Windows Azure Media Services provides the various media
services such as encoding & format conversion, content protection
and on-demand and live streaming capabilities. Azure Media
Services provides applications the capability to build media
workflows for uploading, storing, encoding, format conversion,
content protection, and media delivery. To use Azure Media
Services, you can create jobs that process media content in
several ways such as encoding, encrypting, doing format
conversions, etc. Each Media Services job has one or more tasks.
Each task has preset string, an input asset and an output asset.
Media assets in the Azure Media Service can be delivered either
by download or by streaming.
1. Amazon CloudFront
Amazon CloudFront is a content delivery service from Amazon.
CloudFront can be used to deliver dynamic, static and streaming
content using a global network of edge locations. The content in
CloudFront is organized into distributions. Each distribution specifies
the original location of the content to be delivered which can be an
Amazon S3 bucket, an Amazon EC2 instance, or an Elastic Load
Balancer, or your own origin server. Distributions can be accessed by
their domain names. Figure 3.16 shows a screenshot of the Amazon
CloudFront console. CloudFront helps in improving the performance of
websites in several ways: (1) by caching the static content (such as
JavaScript, CSS, images, etc.) at the edge location, (2) by proxying
requests for dynamic or interactive content back to the origin (such as
an Amazon EC2 instance) running in the AWS cloud.
30
performance of web sites. Azure CDN can be enabled on a Windows
Azure storage account.
42
built on top of Hadoop. EMR allows you to launch an HBase
cluster. HBase can be used for various purposes such as
referencing data for Hadoop analytics, real-time log
ingestion and batch log analytics, etc.
Figure 3.17 shows a screenshot of the Amazon EMR console. The EMR
console provides a simple wizard for creating new MapReduce job
flows. To create a MapReduce job you enter the job name, select the
streaming option for the job flow, specify the locations of input,
output and the mapper and reducer programs and specify the
number of nodes to use in the Hadoop cluster and the instance sizes.
The job flow takes several minutes to launch and configure. A
Hadoop cluster is created as specified in the job flow and the
MapReduce program specified in the input is executed. On completion
of the MapReduce job, the results are copied to the output location
specified and the Hadoop cluster is terminated.
32
3. Google BigQuery
Google BigQuery is a service for querying massive datasets.
BigQuery allows querying datasets using SQL-like queries. The
BigQuery queries are run against append-only tables and use the
processing power of Google's infrastructure for speeding up queries.
To query data, it is first loaded into BigQuery using the BigQuery
console or BigQuery command line tool or BigQuery API. Data can be
either in CSV or JSON format. The uploaded data can be queried
using BigQuery's SQL dialect.
33
Figure 3.18: Screenshot of Amazon Elastic Beanstalk console
34
2.8 Identity & Access Management Services
Identity & Access Management (IDAM) services allow managing
the authentication and authorization of users to provide secure
access to cloud resources. IDAM services are useful for
organizations which have multiple users who access the cloud
resources. Using IDAM services you can manage user identifiers,
user permissions, security credentials and access keys.
35
9. Open Source Private Cloud Software
In the previous sections you learned about popular public cloud
platforms. This section covers open source cloud software that
can be used to build private clouds.
1. CloudStack
Apache CloudStack is an open source cloud software that can be
used for creating private cloud offerings. CloudStack manages the
network, storage, and compute nodes that make up a cloud
infrastructure. A CloudStack installation consists of a Management
Server and the cloud infrastructure that it manages. The cloud
infrastructure can be as simple as one host running the hypervisor or
a large cluster of hundreds of hosts. The Management Server allows
you to configure and manage the cloud resources. Figure 3.21 shows
the architecture of CloudStack which is basically the Management
Server. The Management Server manages one or more zones where
each zone is typically a single datacenter. Each zone has one or more
pods. A pod is a rack of hardware comprising of a switch and one or
more clusters. A cluster consists of one or more hosts and a primary
storage. A host is a compute node that runs guest virtual machines.
The primary storage of a cluster stores the disk volumes for all the
virtual machines running on the hosts in that cluster. Each zone has a
secondary storage that stores templates, ISO images, and disk
volume snapshots.
2. Eucalyptus
Eucalyptus is an open source private cloud software for building
private and hybrid clouds that are compatible with Amazon Web
Services (AWS) APls. Figure 3.22 shows the architecture of
Eucalyptus. The Node Controller (NC) hosts the virtual machine
instances and manages the virtual network endpoints. The cluster-
level (availability-zone) consists of three components - Cluster
Controller (CC), Storage Controller (SC) and VMWare Broker. The CC
manages the virtual machines and is the front-end for a cluster. The
SC manages the Eucalyptus block volumes and snapshots to the
instances within its specific cluster. SC is equivalent to AWS Elastic
Block Store (EBS). The VMWare Broker is an optional component
that provides an AWS-compatible interface for VMware
environments. At the cloud-level there are two components - Cloud
Controller (CLC) and Walrus. CLC provides an administrative
interface for cloud management and performs high-level resource
scheduling, system accounting, authentication and quota
management.
Walrus is equivalent to Amazon S3 and serves as a persistent storage
to all of the virtual machines in the Eucalyptus cloud. Walrus can be
used as a simple Storage-as-a-Service solution.
36
Figure 3.21: CloudStack
architecture Figure 3.22: Eucalyptus
architecture
37
api which accepts and responds to end user compute API calls.
The OpenStack dashboard (called horizon) provides web-based
interface for managing OpenStack services.
38
MATCHING PAIR PUZZLES
39
PICTUREJIGSAWPUZZLE
https://im-a-puzzle.com/share/8860991227f6c4c
40
Video Links
9 https://www.youtube.com/watch
Open Source ?v=IseEhw-Dxrc
Private Cloud
Software
https://www.youtube.com/watch
?v=bfndu5duogQ
41
10. Assignment
KnowledgeCheck:
https://awsacademy.instructure.com/courses/3781/assignments
42
11. Part A- Questions & Answers
43
Windows Azure Storage provides various storage services such as
blob storage service, table service and queue service. The blob
storage service allows storing unstructured binary data or binary
large objects (blobs). Blobs are organized into containers. Two
kinds of blobs can be stored - block blobs and page blobs. A block
blob can be subdivided into some number of blocks. If a failure
occurs while transferring a block blob, retransmission can resume
with the most recent block rather than sending the entire blob
again. Page blobs are divided into number of pages and are
designed for random access. Applications can read and write
individual pages at random in a page blob.
44
the instance is available, you can note the instance end point
from the instance properties tab. This end point can then be used
for securely connecting to the instance.
45
12.Which cloud service is most important for
developing loosely coupled applications? [K2, CO2]
Cloud-based queuing services allow de-coupling
application components. The de-coupled components
communicate via messaging queues. Queues are useful for
asynchronous processing. Another use of queues is to act as
overflow buffers to handle temporary volume spikes or
mismatches in message generation and consumption rates from
application components. Queuing services from various cloud
service providers allow short messages of a few kilo-bytes in
size. Messages can be enqueued and read from the queues
simultaneously. The enqueued messages are typically retained
for a couple of days to a couple of weeks.
46
15. What are the various types of MapReduce jobs
supported by Amazon EMR? [K2, CO2]
EMR supports various job types:
• Custom JAR: Custom JAR job flow runs a Java program
that you have uploaded to Amazon S3.
• Hive program: Hive is a data warehouse system for
Hadoop. You can use Hive to process data using the SQL-
like language, called Hive-QL. You can create a Hive job
flow with EMR which can either be an interactive Hive job
or a Hive script.
• Streaming job: Streaming job flow runs a single Hadoop
job consisting of map and reduce functions implemented in
a script or binary that you have uploaded to Amazon S3.
You can write map and reduce scripts in Ruby, Perl,
Python, PHP, R, Bash, or C++.
• Pig programs: Apache Pig is a platform for analyzing
large data sets that consists of a high-level language (Pig
Latin) for expressing data analysis programs, coupled with
infrastructure for evaluating these programs. You can
create a Pig job flow with EMR which can either be an
interactive Pig job or a Pig script.
• HBase: HBase is a distributed, scalable, No-SQL database
built on top of Hadoop. EMR allows you to launch an HBase
cluster. HBase can be used for various purposes such as
referencing data for Hadoop analytics, real-time log
ingestion and batch log analytics, etc.
47
choose a number and type of clusters to create and to manage
over time.
High Availability – BigQuery automatically replicates data
between zones to enable high availability. It also automatically
load balances to provide optimal performance and to minimize the
impact of any hardware failures. This is different from competing
solutions which typically focus on one zone only.
Google BigQuery is unique in that it takes an “Analytics-as-a-
Service” approach to the cloud.
48
12. Part B Questions
Q. Questions K Level CO
No. Mapping
1 Explain the various layers in the K2 CO2
cloud reference model.
2 Describe three applications of K2 CO2
compute services.
3 Describe the various security K3 CO2
mechanisms of cloud-storage
services.
4 What is a push messaging service? K3 CO2
What are its uses?
5 Explain a Content Delivery Network. K3 CO2
49
13. Supportive Online Certification Courses
• NPTEL:
• Cloud Computing and Distributed
Systems
• https://onlinecourses.nptel.ac.in/noc2
3_cs27/preview
• Infosys SpringBoard:
• Introduction to Cloud Computing
• https://infyspringboard.onwingspan.c
om/web/en/app/toc/lex_292450150
89922640000_shared/overview
50
14. Real time Applications
51
15. ASSESSMENT SCHEDULE
Name of the
S.NO Start Date End Date Portion
Assessment
52
16. Prescribed Text Books & Reference Books
TEXT BOOKS:
REFERENCES:
53
17. MINI PROJECT SUGGESTIONS
55
Thank you
Disclaimer:
56