0% found this document useful (0 votes)
27 views

20CS701 CC Unit2

The document discusses various cloud computing services including compute, storage, database, application, content delivery, analytics, deployment and management services. It describes examples of services provided by Amazon, Google and Microsoft under each category of cloud services. It also shows the cloud computing reference model along with cloud service models of IaaS, PaaS and SaaS.

Uploaded by

2087 NIRMAL S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

20CS701 CC Unit2

The document discusses various cloud computing services including compute, storage, database, application, content delivery, analytics, deployment and management services. It describes examples of services provided by Amazon, Google and Microsoft under each category of cloud services. It also shows the cloud computing reference model along with cloud service models of IaaS, PaaS and SaaS.

Uploaded by

2087 NIRMAL S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Please read this disclaimer beforeproceeding:

This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
20CS701

CLOUD COMPUTING

Department: CSE
Batch/Year: 2020-2024/IV
Created by:
Ms. A. Jasmine Gilda, Assistant Professor/
CSE
Ms. S. D. Lalitha, Assistant Professor/CSE

Date: 07.08.2023
1. CONTENTS

Sl. No. Contents

1 Contents

2 Course Objectives

3 Pre Requisites

4 Syllabus

5 Course outcomes

6 CO- PO/PSO Mapping

7 Lecture Plan

8 Activity based learning

9 Lecture Notes

10 Assignments

11 Part A Questions & Answers

12 Part B Questions

13 Supportive online Certification courses

14 Real time Applications

15 Assessment Schedule

16 Prescribed Text Books & Reference Books

17 Mini Project Suggestions


2. COURSE OBJECTIVES

✓ To understand the concepts and technologies of

cloud computing.

✓ To have knowledge on the various types of cloud

computing services.

✓ To describe the cloud infrastructure and

virtualization.

✓ To describe high-level automation and

orchestration systems that manage the virtualized

infrastructure.

✓ To describe the programming paradigms used in

cloud and how cloud software deployments scale

to large numbers of users.


3. PRE REQUISITES

Pre-requisite Chart

20CS701 - CLOUD COMPUTING

20CS404 – OPERATING 20IT403 – DATABASE


SYSTEMS MANAGEMENT SYSTEMS
4. SYLLABUS
L T P C
20CS701 CLOUD COMPUTING
3 0 0 3
UNIT I INTRODUCTION 10
Introduction to Cloud Computing - Definition of Cloud Computing -
Characteristics of Cloud Computing - Cloud Models - Cloud Services Examples
- Cloud-based Services & Applications.
Cloud Concepts & Technologies: Virtualization - Load Balancing - Scalability &
Elasticity – Deployment – Replication – Monitoring - Software Defined
Networking - Network Function Virtualization – MapReduce - Identity and
Access Management - Service Level Agreements – Billing.
UNIT II CLOUD SERVICES AND PLATFORMS 8
Compute Services – Storage Services – Database Services – Application
Services – Content Delivery Services – Analytics Services – Deployment and
Management Services – Identity and Access Management Services – Open
Source Private Cloud Software.
UNIT III CLOUD INFRASTRUCTURE ANDVIRTUALIZATION 10
Data Center Infrastructure And Equipment – Virtual Machines – Containers –
Virtual Networks - Virtual Storage: Persistent storage – NAS Technology- SAN
Technology – Mapping virtual disks to physical disks - Object Storage.
UNIT IV AUTOMATION AND ORCHESTRATION 8
Automation - Orchestration: Automated Replication And Parallelism - The
MapReduce Paradigm: The MapReduce Programming Paradigm – Splitting
Input – Parallelism and Data size – Data access and Data Transmission –
Apache Hadoop – Parts of Hadoop – HDFS Components – Block Replication
and Fault Tolerance – HDFS and MapReduce.
UNIT V CLOUD PROGRAMMING PARADIGMS 9
Microservices - Serverless Computing And Event Processing – DevOps:
Software Creation and Development – Software Development Cycle – The
DevOps Approach – Continuous Integration – Continuous Delivery -
Deployment.
TOTAL: 45 PERIODS
5. COURSE OUTCOMES

At the end of this course, the students will be able to:

CO1: Articulate the main concepts and key technologies of


cloud computing.

CO2: Learn various cloud services and platforms to cater the


requirements in the growth of the businesses.

CO3: Develop the ability to understand the cloud


infrastructure and virtualization that help in the development
of cloud.

CO4: Explain the high-level automation and orchestration


systems that manage the virtualized infrastructure.

CO5: Summarizes the programming paradigms used in cloud


and how cloud software deployments scale to large numbers
of users.
6. CO - PO / PSO MAPPING

PROGRAM OUTCOMES PSO


K3,
K3 K4 K5 K5 K4, A3 A2 A3 A3 A3 A3 A2 P P P
CO KL
K5 S S S
PO PO PO PO PO PO PO PO PO PO PO PO O O O
- - - - - - - - - - - - 1 2 3
1 2 3 4 5 6 7 8 9 10 11 12
CO1 K2 3 2 1 1 - - - - - - - - 3 - -

CO2 K4 3 3 2 2 - - - - - - - - 3 - -

CO3 K5 3 3 1 1 - - - - - - - - 3 - -

CO4 K2 3 3 1 1 - - - - - - - - 3 - -

CO5 K4 3 3 1 1 - - - - - - - - 3 - -

Correlation Level:
1. Slight (Low)
2. Moderate (Medium)
3. Substantial (High)
If there is no correlation, put “-“.
Knowledge Level Description
K6 Evaluation
K5 Synthesis
K4 Analysis
K3 Application
K2 Comprehension
K1 Knowledge
7. LECTURE PLAN

Number Actual
Sl. Proposed Taxonomy Mode of
Topic of Lecture CO
No Date Level Delivery
Periods Date

Compute Services
1 22.8.2023 PPT
Storage Service 1 CO2 K2

Database Services
1 23.8.2023 K2 PPT
2 Application CO2
Services
Content Delivery
1 23.8.2023 CO2 K2 PPT
3 Services

4 Analytics Services 25.8.2023 CO2 K2 PPT


1

Deployment and 29.8.2023


2 & CO2 K2 PPT
5 Management
Services 30.8.2023
Identity and
Access
6 1 30.8.2023 CO2 K2 PPT
Management
Services
Open Source
7 Private Cloud 1 1.9.2023 CO2 K2 PPT
Software
8. ACTIVITY BASED LEARNING

• Matching Pairs Puzzle – Cloud Services

• Picture Jigsaw Puzzle – Open Source Cloud Architectures


9. LECTURE NOTES
UNIT II – CLOUD SERVICES AND PLATFORMS
Compute Services – Storage Services – Database Services – Application Services –
Content Delivery Services – Analytics Services – Deployment and Management
Services – Identity and Access Management Services – Open Source Private Cloud
Software.
___________________________________________________________________
The various types of cloud computing services including compute, storage, database,
application, content delivery, analytics, deployment & management and identity & access
management. For each category of cloud services, examples of services provided by various
cloud service providers including Amazon, Google and Microsoft are described.
Figure 3.1 (a) shows the cloud computing reference model along with the various cloud
service models (IaaS, PaaS and SaaS). Infrastructure- as-a-Service (laaS) provides virtualized
dynamically scalable resources using a virtualized infrastructure. Platform• as-a-Service
(PaaS) simplifies application development by providing development tools, application
programming interfaces (APls), software libraries that can be used for wide range of
applications. Software-as-a-Service (SaaS) provides multi- tenant applications hosted in the
cloud.

Figure 3.1 Cloud Computing reference model & services

The bottommost layer in the cloud reference model is the infrastructure and facilities layer
that includes the physical infrastructure such as datacenter facilities, electrical and
mechanical equipment, etc. On top of the infrastructure layer is the hardware layer that
includes physical compute, network and storage hardware. On top of the hardware layer the
virtualization layer partitions the physical hardware resources into multiple virtual resources
that enabling pooling of resources. The various types of virtualization approaches are full
virtualization, para-

13
virtualization and hardware virtualization. The computing services are
delivered in the form of Virtual Machines (VMs) along with the storage
and network resources.

The platform and middleware layer builds upon the laaS layers below
and provides standardized stacks of services such as database service,
queuing service, application frameworks and run-time environments,
messaging services, monitoring services, analytics services, etc. The
service management layer provides APls for requesting, managing and
monitoring cloud resources. The topmost layer is the applications layer
that includes SaaS applications such as Email, cloud storage application,
productivity applications, management portals, customer self-service
portals, etc.
Figure 3.1 (b) shows various types of cloud services and the associated
layers in the cloud reference model.

2.1 Compute Services


Compute services provide dynamically scalable compute capacity in the
cloud. Compute resources can be provisioned on-demand in the form of
virtual machines. Virtual machines can be created from standard images
provided by the cloud service provider (e.g. Ubuntu image, Windows
server image, etc.) or custom images created by the users. A machine
image is a template that contains a software configuration (operating
system, application server, and applications). Compute services can be
accessed from the web consoles of these services that provide graphical
user interfaces for provisioning, managing and monitoring these
services. Cloud service providers also provide APis for various
programming languages (such as Java, Python, etc. ) that allow
developers to access and manage these services programmatically.

14
Features
Scalable: Compute services allow rapidly provisioning as
many virtual machine instances as required. The provisioned
capacity can be scaled-up or down based on the workload
levels. Auto-scaling policies can be defined for compute
services that are triggered when the monitored metrics (such
as CPU usage, memory usage, etc.) go above pre-defined
thresholds.
• Flexible: Compute services give a wide range of options for
virtual machines with multiple instance types, operating
systems, zones/regions, etc.
• Secure: Compute services provide various security features
that control the access to the virtual machine instances such
as security groups, access control lists, network fire• walls, etc.
Users can securely connect to the instances with SSH using
authentication mechanisms such as OAuth or security
certificates and keypairs.
• Cost effective: Cloud service providers offer various billing
options such as on• demand instances which are billed per-
hour, reserved instances which are reserved after one-time
initial payment, spot instances for which users can place bids,
etc.

2.1.1 Amazon Elastic Compute Cloud


Amazon Elastic Compute Cloud (EC2) is a compute service provided by
Amazon. Figure 3.2 shows a screenshot of the Amazon EC2 console. To
launch a new instance click on the launch instance button. This will open
a wizard where you can select the Amazon machine image (AMI) with
which you want to launch the instance. You can also create their own
AMis with custom applications, libraries and data. Instances can be
launched with a variety of operating systems. When you launch an
instance you specify the instance type (micro, small, medium, large,
extra-large, etc.), the number of instances to launch based on the
selected AMI and availability zones for the instances. The instance
launch wizard also allows you to specify the meta-data tags for the
instance that simplify the administration of EC2 instances. When
launching a new instance, the user selects a key-pair from existing
keypairs or creates a new keypair for the instance. Keypairs are used to
securely connect to an instance after it launches. The security groups to
be associated with the instance can be selected om the instance launch

15
wizard. Security groups are used to open or block a specific network
port for the launched instances.
When the instance is launched its status can be viewed in the EC2
console. Upon launching a new instance, its state is pending. It takes a
couple of minutes for the instance to come into the running state.
When the instance comes into the running state, it is assigned a public
DNS, private DNS, public IP and private IP. The public DNS can be used
to securely connect to the instance using SSH.

Figure 3.2: Screenshot of Amazon EC2 console

2.1.2 Google Compute Engine


Google Compute Engine is a compute service provided by Google.
Figure 3.3 shows a screenshot of the Google Compute Engine (GCE)
console. GCE console allows users to create and manage compute
instances. To create a new instance, the user selects an instance
machine type, a zone in which the instance will be launched, a machine
image for the instance and provides an instance name, instance tags
and meta-data. Every instance is launched with a disk resource.
Depending on the instance type, the disk resource can be a scratch
disk space or persistent disk space. The scratch disk space is deleted
when the instance terminates. Whereas, persistent disks live beyond
the life of an instance. Network option allows you to control the traffic
to and from the instances. By default, traffic between instances in the
same network, over any port and any protocol and incoming SSH

16
connections from anywhere are bled. To enable other connections,
additional firewall rules can be added.

Figure 3.3: Screenshot of Google Compute Engine console

2.1.3 Windows Azure Virtual Machines


Windows Azure Virtual Machines is the compute service from Microsoft.
Figure 3.4 shows a screenshot of Windows Azure Virtual Machines
console. To create a new instance, you select the instance type and
the machine image. You can either provide a user name and password
or upload a certificate file for securely connecting to the instance. Any
changes made to the VM are persistently stored and new VMs can be
created from the previously stored machine images.

Figure 3.4: Screenshot of Windows Azure Virtual Machines console

2.2 Storage Services


d storage services allow storage and retrieval of any amount of data, at
any time from anywhere on the web. Most cloud storage services
organize data into buckets or containers. Buckets or containers store
objects which are individual pieces of data.

17
Features
• Scalability: Cloud storage services provide high capacity and
scalability. Objects upto several tera-bytes in size can be uploaded and
multiple buckets/containers can be created on cloud storages.
• Replication: When an object is uploaded it is replicated at
multiple facilities and/or on multiple devices within each facility.
Access Policies: Cloud storage services provide several security features
such as Access Control Lists (ACLs), bucket/container level policies,
etc. ACLs can be used to selectively grant access permissions on
individual objects. Bucket/container level policies can also be defined
to allow or deny permissions across some or all of the objects within a
single bucket/container.
• Encryption: Cloud storage services provide Server Side
Encryption (SSE) options to encrypt all data stored in the cloud
storage.
• Consistency: Strong data consistency is provided for all upload
and delete operations. Therefore, any object that is uploaded can be
immediately downloaded after the upload is complete.

2.2.1 Amazon Simple Storage Service


Amazon Simple Storage Service(S3) is an online cloud-based data
storage infrastructure for storing and retrieving any amount of data. S3
provides highly reliable, scalable, fast, fully redundant and affordable
storage infrastructure. Figure 3.5 shows a screenshot of the Amazon S3
console. Data stored on S3 is organized in the form of buckets. You
must create a bucket before you can store data on S3. S3 console
provides simple wizards for creating a new bucket and uploading files.
You can upload any kind of file to S3. While uploading a file you can
specify the redundancy and encryption options and access permissions.

Figure 3.5: Screenshot of Amazon S3 console

18
2.2.2 Google Cloud Storage
Figure 3.6 shows a screenshot of the Google Cloud Storage (GCS)
console. Objects in GCS are organized into buckets. ACLs are used to
control access to objects and buckets. ACLs can be configured to share
objects and buckets with the entire world, a Google group, a Google-
hosted domain, or specific Google account holders.

Figure 3.6: Screenshot of Google Cloud Storage console

2.2.3 Windows Azure Storage


Windows Azure Storage is the cloud storage service from Microsoft.
Figure 3.7 shows a screenshot of the Windows Azure Storage console.

Figure 3.7: Screenshot of Windows Azure Storage console


Windows Azure Storage provides various storage services such as blob
storage service, table service and queue service. The blob storage

19
service allows storing unstructured binary data or binary large objects
(blobs). Blobs are organized into containers. Two kinds of blobs can be
stored - block blobs and page blobs. A block blob can be subdivided into
some number of blocks. If a failure occurs while transferring a block
blob, retransmission can resume with the most recent block rather than
sending the entire blob again. Page blobs are divided into number of
pages and are designed for random access. Applications can read and
write individual pages at random in a page blob.

2.3 Database Services


Cloud database services allow you to set-up and operate relational or
non-relational databases in the cloud. The benefit of using cloud
database services is that it relieves the application developers from the
time consuming database administration tasks. Popular relational
databases provided by various cloud service providers include MySQL,
Oracle, SQL Server, etc. The non-relational (No-SQL) databases provided
by cloud service providers are mostly proprietary solutions. No- SQL
databases are usually fully-managed and deliver seamless throughput
and scalability.

Figure 3.8: Screenshot of Amazon RDS console


Features
Scalability: Cloud database services allow provisioning as much compute
and storage resources as required to meet the application workload
levels. Provisioned capacity can be scaled-up or down. For read-heavy
workloads, read-replicas can be created.
Reliability: Cloud database services are reliable and provide
automated backup and snapshot options.
Performance: Cloud database services provide guaranteed
performance with options such as guaranteed input/output operations

20
per second (IOPS) which can be provisioned upfront.
Security: Cloud database services provide several security features to
restrict the access to the database instances and stored data, such as
network firewalls and authentication mechanisms.

2.3.1 Amazon Relational Data Store


Amazon Relational Database Service (RDS) is a web service that makes
it easy to setup, operate and scale a relational database in the cloud.
Figure 3.8 shows a screenshot of the Amazon RDS console. The console
provides an instance launch wizard that allows you to select the type of
database to create (MySQL, Oracle or SQL Server) database instance
size. allocated storage, DB instance identifier, DB username and
password. The status of the launched DB instances can be viewed from
the console. It takes several minutes for the instance to become
available. Once the instance is available, you can note the instance end
point from the instance properties tab. This end point can then be used
for securely connecting to the instance.

Figure 3.9: Screenshot of Amazon DynamoDB console


2.3.2 Amazon DynamoDB
Amazon DynamoDB is the non-relational (No-SQL) database service
from Amazon. Figure 3.9 shows a screenshot of the Amazon DynamoDB
console. The DynamoDB data model includes include tables, items and
attributes. A table is a collection of items and each item is a collection of
attributes. To store data in DynamoDB you have to create a one or more
tables and specify how much throughput capacity you want to provision
and reserve for reads and writes. DynamoDB is a fully managed service
that automatically spreads the data and traffic for the stored tables over
a number of servers to meet the throughput requirements specified by

32
the users. All stored data is automatically replicated across multiple
availability zones to provide data durability.

Figure 3.10: Screenshot of Google Cloud SQL console

3. Google Cloud SQL


Google SQL is the relational database service from Google. Google Cloud
SQL service allows you to host MySQL databases in the Google's cloud.
Cloud SQL provides both synchronous or asynchronous geographic
replication and the ability to import/ export databases. Figure 3.10
shows a screenshot of the Google Cloud SQL console. You can create
new database instances from the console and manage existing
instances. To create a new instance you select a region, database tier,
billing plan and replication mode. You can schedule daily backups for
your Google Cloud SQL instances, and also restore backed-up
databases.

4. Google Cloud Datastore


Google Cloud Datastore is a fully managed non-relational database from
Google. Cloud Datastore offers ACID transactions and high availability of
reads and writes. The Cloud Datastore data model consists of entities.
Each entity has one or more properties (key-value pairs) which can be of
one of several supported data types, such as strings and integers. Each
entity has a kind and a key. The entity kind is used for categorizing the
entity for the purpose of queries and the entity key uniquely identifies

22
the entity. Figure 3.11 shows a screenshot of the Google Cloud
Datastore console.

Figure 3.11: Screenshot of Google Cloud Datastore console

Figure 3.12: Screenshot of Windows Azure SQL Database console

2.3.5 Windows Azure SQL Database


Windows Azure SQL Database is the relational database service from
Microsoft. Azure SQL Database is based on the SQL server, but it does
not give each customer a separate instance of SQL server. Instead the
SQL Database is a multi-tenant service, with a logical SQL Database
server for each customer. Figure 3.12 shows a screenshot of the
Windows Azure SQL Database console.

23
2.3.6 Windows Azure Table Service
Windows Azure Table Service is a non-relational (No-SQL) database
service from Microsoft. The Azure Table Service data model consists of
tables having multiple entities. Tables are divided into some number of
partitions, each of which can be stored on a separate machine. Each
partition in a table holds a specified number of entities, each containing
as many as 255 properties. Each property can be one of the several
supported data types such as integers and strings. Tables do not have a
fixed schema and different entities in a table can have different
properties.

Figure 3.13: Screenshot of Google App Engine console

4. Application Services
In this section you will learn about various cloud application
services such as application runtimes and frameworks, queuing
services, email services, notification services and media services.

1. Application Runtimes & Frameworks


Cloud-based application runtimes and frameworks allow developers
to develop and host applications in the cloud. Application runtimes
provide support for programming languages e.g., Java, Python, or
Ruby). Application runtimes automatically allocate resources for
applications and handle the application scaling, without the need lo run
and maintain servers.

Google App Engine


Google App Engine is the platform-as-a-service (PaaS) from Google,
which includes both an application runtime and web frameworks.
Figure 3.13 shows a screenshot of the Google App Engine console.

24
App Engine features include:
Runtimes: App Engine supports applications developed in
Java, Python, PHP and Go programming languages. App Engine
provides runtime environments for Java, Python, PHP and Go
programming language.
Sandbox: Applications run in a secure sandbox environment
isolated from other applications. The sandbox environment
provides a limited access to the underlying operating system.
App Engine can only execute application code called from
HTTP requests. The sandbox environment allows App Engine
to distribute web requests for the application across multiple
servers.
Web Frameworks: App Engine provides a simple Python web
application framework called webapp2. App Engine also supports
any framework written in pure Python that speaks WSGI,
including Django, CherryPy, Pylons, web.py, and web2py.
Datastore: App Engine provides a no-SQL data storage service.
Authentication: App Engine applications can be integrated with
Google Accounts for user authentication.
URL Fetch service: URL Fetch service allows applications to
access resources on the
Internet, such as web services or other data
Email service: Email service allows applications to send email
messages.
Image Manipulation service: Image Manipulation service
allows applications to resize, crop, rotate, flip and enhance
images.
Memcache: Memcache service is a high performance in-
memory key-value cache service that applications can use for
caching data items that do not need a persistent storage.
Task Queues: Task queues allow applications to do work in
the background by breaking up work into small, discrete units,
called tasks which are enqueued in task queues.
Scheduled Tasks service : App Engine provides a Cron service
for scheduled tasks that trigger events at specified times and
regular intervals. This service allows applications to perform
tasks at defined times or regular intervals.

Windows Azure WebSites


Windows Azure Web Sites is a Platform-as-a-Service (PaaS) from
Microsoft. Azure Web Sites allows you to host web applications in the
Azure cloud. Azure Web Sites provides shared and standard options. In
the shared option, Azure Web Sites run on a set of virtual machines
that may contain multiple web sites created by multiple users. In the
standard option, Azure Web Sites run on virtual machines (VMs) that
belong to an individual user. Azure Web Sites supports applications

25
created in ASP.NET, PHP, Node.js and Python programming languages.
Multiple copies of an application can be run in different VMs, with Web
Sites automatically load balancing requests across them.

2.4.2 Queuing Services


Cloud-based queuing services allow de-coupling application
components. The de-coupled components communicate via
messaging queues. Queues are useful for asynchronous processing.
Another use of queues is to act as overflow buffers to handle
temporary volume spikes or mismatches in message generation and
consumption rates from application components. Queuing services
from various cloud service providers allow short messages of a few
kilo-bytes in size. Messages can be enqueued and read from the
queues simultaneously. The enqueued messages are typically
retained for a couple of days to a couple of weeks.

Amazon Simple Queue Service


Amazon Simple Queue Service (SQS) is a queuing service from Amazon.
SQS is a distributed queue that supports messages of up to 256 KB in
size. SQS supports multiple writers and readers and locks messages
while they are being processed. To ensure high availability for
delivering messages, SQS service trade-offs on the first in, first out
capability and does not guarantee that messages will be delivered in
FIFO order. Applications that require FIFO ordering of messages can
place additional sequencing information in each message so that they
can be re-ordered after retrieving from a queue. Figure 3.14 shows a
screenshot of the Amazon Simple Queue Service console.

Google Task Queue Service


Google Task Queues service is a queuing service from Google and is a
part of the Google App Engine platform. Task queues allow applications
to execute tasks in background. Task is a unit of work to be performed
by an application. The task objects consist of application-specific URL
with a request handler for the task, and an optional data payload that
parameterizes the task. There are two different configurations for
Task Queues - Push Queue and Pull Queue. Push Queue is the default
queue that processes tasks based on the processing rate configured in
the queue definition. Pull Queues allow task consumers to lease a
specific number of tasks for a specific duration. The tasks are
processed and deleted before the lease ends.

Windows Azure Queue Service


Windows Azure Queue service is a queuing service from Microsoft.
Azure Queue service allows storing large numbers of messages that
can be accessed from anywhere in the world via authenticated calls
using HTTP or HTTPS. The size of a single message can be up to

26
64KB.

Figure 3.14: Screenshot of Amazon SQS console

Figure 3.15: Screenshot of Amazon SNS console

2.4.3 Email Services


Cloud-based email services allow applications hosted in the cloud to
send emails.

Amazon Simple Email Service


Amazon Simple Email Service is bulk and transactional email-sending
service from Amazon. SES is an outbound-only email-sending service
that allows applications hosted in the Amazon cloud to send emails
such as marketing emails, transactional emails and other types of
correspondence. To ensure high email deliverability, SES uses content
filtering technologies to scan the outgoing email messages to help
ensure that they do not contain material that is typically flagged as

38
questionable by ISPs. SES service can be accessed and used from
the SES console, the Simple Mail Transfer Protocol (SMTP) interface,
or the SES API.

Google Email Service


Google Email service is part of the Google App Engine platform that
allows App Engine applications to send email messages on behalf of
the app's administrators, and on behalf of users with Google
Accounts. App Engine apps can also receive emails. Apps send
messages using the Mail service and receive messages in the form of
HTTP requests initiated by App Engine and posted to the app.

2.4.4 Notification Services


Cloud-based notification services or push messaging services allow
applications to push messages to internet connected smart devices
such as smartphones, tablets, etc. Push messaging services are
based on publish-subscribe model in which consumers subscribe to
various topics/channels provided by a publisher/producer. Whenever
new content is available on one of those topics/channels, the
notification service pushes that information out to the consumer.
Push notifications are used for such smart devices as they help in
displaying the latest information while remaining energy efficient.
Consumer applications on such devices can increase their consumer
engagement with the help of push notifications.

Amazon Simple Notification Service


Amazon Simple Notification Service is a push messaging service from
Amazon. SNS has two types of clients - publishers and subscribers.
Publishers communicate asynchronously with subscribers by
producing and sending messages to topics. A topic is a logical access
point and a communication channel. Subscribers are the consumers
who subscribe to topics to receive notifications. SNS can deliver
notifications as SMS, email, or to SQS queues, or any HTTP endpoint.
Figure 3.15 shows a screenshot of the Amazon Simple Notification
Service console. The SNS console has wizards for creating a new
topic, publishing to a topic and subscribing to a topic.

Google Cloud Messaging


Google Cloud Messaging for Android provides push messaging for
Android devices. GCM allows applications to send data from the
application servers to their users' Android devices, and also to receive
messages from devices on the same connection. GCM is useful for
notifying applications on Android devices that there is new data to be
fetched from the application servers. GCM supports messages with
payload data upto 4 KB. GCM provides a 'send-to-sync' message
capability that can be used to inform an application to sync data from

28
the server.
Google Cloud Messaging for Chrome is another notification service
from Google that allows messages to be delivered from the cloud to
apps and extensions running in Chrome.

Windows Azure Notification Hubs


Windows Azure Notification Hubs is a push notification service from
Microsoft that provides a common interface to send notifications to all
major mobile platforms including Windows Store/Windows Phone 8,
iOS, and Android. Platform specific infrastructures called Platform
Notification Systems (PNS) are used to deliver notification messages.
Devices register their PNS handles with the Notification Hub. Each
notification hub contains credentials for each supported PNS. These
credentials are used to connect to the PNSs and send push notifications
to the applications.

2.4.5 Media Services


Cloud service providers provide various types of media services
that can be used by ap• plications for manipulating, transforming
or transcoding media such as images, videos, etc.

Amazon Elastic Transcoder


Amazon Elastic Transcoder is a cloud-based video transcoding
service from Amazon. Elastic Transcoder can be used to convert
video files from their source format into various other formats
that can be played on devices such as desktops, mobiles, tablets,
etc. Elastic Transcoder provides a number of pre-defined
transcoding presets. Transcoding pipelines are used to perform
multiple transcodes in parallel. Elastic Transcoder works with the
Amazon S3 storage where the input and output video files are stored.
Users can create transcoding jobs by specifying the input and output
locations (on S3), preset to use, and optional thumbnails and job
specific parameters such as frame-rate.

Google Images Manipulation Service


Google Images Manipulation service is a part of the Google App
Engine platform. Image Manipulation service provides the
capability to resize, crop, rotate, flip and enhance images. The
Images service can accept image data directly from the App
Engine apps, or from Google Blobstore or Google Cloud Storage.
Image Service accepts images in various formats including JPEG,
PNG, WEBP, GIF, BMP, TIFF and ICO formats and can return
transformed images in JPEG, WEBP and PNG formats.

29
Windows Azure Media Services
Windows Azure Media Services provides the various media
services such as encoding & format conversion, content protection
and on-demand and live streaming capabilities. Azure Media
Services provides applications the capability to build media
workflows for uploading, storing, encoding, format conversion,
content protection, and media delivery. To use Azure Media
Services, you can create jobs that process media content in
several ways such as encoding, encrypting, doing format
conversions, etc. Each Media Services job has one or more tasks.
Each task has preset string, an input asset and an output asset.
Media assets in the Azure Media Service can be delivered either
by download or by streaming.

5. Content Delivery Services


Cloud-based content delivery service include Content Delivery
Networks (CDNs). A CDN is a distributed system of servers located
across multiple geographic locations to serve content to end-users
with high availability and high performance. CDNs are useful for
serving static content such as text, images, scripts, etc., and
streaming media. CDNs have a number of edge locations deployed
in multiple locations, often over multiple backbones. Requests for
static or streaming media content that is served by a CDN are
directed to the nearest edge location. CDNs cache the popular
content on the edge servers which helps in reducing bandwidth
costs and improving response times.

1. Amazon CloudFront
Amazon CloudFront is a content delivery service from Amazon.
CloudFront can be used to deliver dynamic, static and streaming
content using a global network of edge locations. The content in
CloudFront is organized into distributions. Each distribution specifies
the original location of the content to be delivered which can be an
Amazon S3 bucket, an Amazon EC2 instance, or an Elastic Load
Balancer, or your own origin server. Distributions can be accessed by
their domain names. Figure 3.16 shows a screenshot of the Amazon
CloudFront console. CloudFront helps in improving the performance of
websites in several ways: (1) by caching the static content (such as
JavaScript, CSS, images, etc.) at the edge location, (2) by proxying
requests for dynamic or interactive content back to the origin (such as
an Amazon EC2 instance) running in the AWS cloud.

2. Windows Azure Content Delivery Network


Windows Azure Content Delivery Network (CDN) is the content
delivery service from Microsoft. Azure CDN caches Windows Azure
blobs and static content at the edge locations to improve the

30
performance of web sites. Azure CDN can be enabled on a Windows
Azure storage account.

Figure 3.16: Screenshot of Amazon CloudFront console


6. Analytics Services
Cloud-based analytics services allow analyzing massive data sets stored
in the cloud either in cloud storages or in cloud databases using
programming models such as MapReduce. Using cloud analytics
services applications can perform data-intensive tasks such as such
as data mining, log file analysis, machine learning, web indexing, etc.

1. Amazon Elastic MapReduce


Amazon Elastic MapReduce is the MapReduce service from Amazon
based the Hadoop framework running on Amazon EC2 and Amazon
S3. EMR supports various job types:
• Custom JAR: Custom JAR job flow runs a Java program
that you have uploaded to Amazon S3.
• Hive program: Hive is a data warehouse system for
Hadoop. You can use Hive to process data using the SQL-
like language, called Hive-QL. You can create a Hive job
flow with EMR which can either be an interactive Hive job
or a Hive script.
• Streaming job: Streaming job flow runs a single Hadoop
job consisting of map and reduce functions implemented in
a script or binary that you have uploaded to Amazon S3.
You can write map and reduce scripts in Ruby, Perl,
Python, PHP, R, Bash, or C++.
• Pig programs: Apache Pig is a platform for analyzing
large data sets that consists of a high-level language (Pig
Latin) for expressing data analysis programs, coupled with
infrastructure for evaluating these programs. You can
create a Pig job flow with EMR which can either be an
interactive Pig job or a Pig script.
• HBase: HBase is a distributed, scalable, No-SQL database

42
built on top of Hadoop. EMR allows you to launch an HBase
cluster. HBase can be used for various purposes such as
referencing data for Hadoop analytics, real-time log
ingestion and batch log analytics, etc.

Figure 3.17: Screenshot of Amazon EMR console

Figure 3.17 shows a screenshot of the Amazon EMR console. The EMR
console provides a simple wizard for creating new MapReduce job
flows. To create a MapReduce job you enter the job name, select the
streaming option for the job flow, specify the locations of input,
output and the mapper and reducer programs and specify the
number of nodes to use in the Hadoop cluster and the instance sizes.
The job flow takes several minutes to launch and configure. A
Hadoop cluster is created as specified in the job flow and the
MapReduce program specified in the input is executed. On completion
of the MapReduce job, the results are copied to the output location
specified and the Hadoop cluster is terminated.

2.6.2 Google MapReduce Service


Google MapReduce Service is a part of the App Engine platform. App
Engine MapReduce is optimized for App Engine environment and
provides capabilities such as automatic sharding for faster execution,
standard data input readers for iterating over blob and datastore
data, standard output writers, etc. The MapReduce service can be
accessed using the Google MapReduce APL To execute a MapReduce
job a MapReduce pipeline object is instantiated within the App Engine
application. MapReduce pipeline specifies the mapper, reducer, data
input reader, output writer.

32
3. Google BigQuery
Google BigQuery is a service for querying massive datasets.
BigQuery allows querying datasets using SQL-like queries. The
BigQuery queries are run against append-only tables and use the
processing power of Google's infrastructure for speeding up queries.
To query data, it is first loaded into BigQuery using the BigQuery
console or BigQuery command line tool or BigQuery API. Data can be
either in CSV or JSON format. The uploaded data can be queried
using BigQuery's SQL dialect.

4. Windows Azure HDInsight


Windows Azure HDInsight is an analytics service from Microsoft.
HDInsight deploys and provisions Hadoop clusters in the Azure cloud
and makes Hadoop available as a service. HDInsight Service uses
Windows Azure Blob Storage as the default file system. HDInsight
provides interactive consoles for both JavaScript and Hive.

7. Deployment & Management Services


Cloud-based deployment & management services allow you to easily
deploy and manage applications in the cloud. These services
automatically handle deployment tasks such as capacity provisioning,
load balancing, auto-scaling, and application health monitoring.

1. Amazon Elastic Beanstalk


Amazon provides a deployment service called Elastic Beanstalk that
allows you to quickly de• ploy and manage applications in the AWS
cloud. Elastic Beanstalk supports Java, PHP, .NET, Node.js, Python,
and Ruby applications. With Elastic Beanstalk you just need to upload
the application and specify configuration settings in a simple wizard
and the service automatically handles instance provisioning, server
configuration, load balancing and monitoring. Figure 3.18 shows a
screenshot of the Amazon Elastic Beanstalk console. The launch
wizard allows you to specify the environment details such as name,
URL, application file, container type, instance type, etc. When the
environment is launched Elastic Beanstalk automatically creates a
new load balancer, launches and configures application and database
servers as specified in the launch wizard, and deploys the application
package on the application servers. The load balancer sits in front of
the application servers which are a part of an Amazon Auto Scaling
group. If the load on the application increases, Auto Scaling
automatically launches new application servers to handle the
increased load. If the load decreases, Auto Scaling stops additional
instances and leaves at least one instance running.

33
Figure 3.18: Screenshot of Amazon Elastic Beanstalk console

Figure 3.19: Screenshot of Amazon CloudFormation console

2.7.2 Amazon CloudFormation


Amazon CloudFormation is a deployment management service from
Amazon. With Cloud• Front you can create deployments from a
collection of AWS resources such as Amazon Elastic Compute Cloud,
Amazon Elastic Block Store, Amazon Simple Notification Service,
Elastic Load Balancing and Auto Scaling. A collection of AWS
resources that you want to manage together are organized into a
stack. CloudFormation stacks are created from Cloud• Formation
templates. You can create your own templates or use the predefined
templates. The AWS infrastructure requirements for the stack are
specified in the template. Figure 3.19 shows a screenshot of the
Amazon CloudFormation console.

34
2.8 Identity & Access Management Services
Identity & Access Management (IDAM) services allow managing
the authentication and authorization of users to provide secure
access to cloud resources. IDAM services are useful for
organizations which have multiple users who access the cloud
resources. Using IDAM services you can manage user identifiers,
user permissions, security credentials and access keys.

Figure 3.20: Screenshot of Amazon IAM console

1. Amazon Identity & Access Management


AWS Identity and Access Management (1AM) allows you to
manage users and user permissions for an AWS account. With
1AM you can manage users, security credentials such as access
keys, and permissions that control which AWS resources users
can access. Using TAM you can control what data users can
access and what resources users can create. 1AM also allows you
to control creation, rotation, and revocation security credentials of
users. Figure 3.20 shows a screenshot of the Amazon Identity &
Access Management console.

2. Windows Azure Active Directory


Windows Azure Active Directory is an Identity & Access Management
Service from Microsoft. Azure Active Directory provides a cloud-
based identity provider that easily integrates with your on-premises
active directory deployments and also provides support for third
party identity providers. By integrating your on-premises active
directory, you can authenticate users to Windows Azure with their
existing corporate credentials. With Azure Active Directory you can
control access to your applications in Windows Azure.

35
9. Open Source Private Cloud Software
In the previous sections you learned about popular public cloud
platforms. This section covers open source cloud software that
can be used to build private clouds.

1. CloudStack
Apache CloudStack is an open source cloud software that can be
used for creating private cloud offerings. CloudStack manages the
network, storage, and compute nodes that make up a cloud
infrastructure. A CloudStack installation consists of a Management
Server and the cloud infrastructure that it manages. The cloud
infrastructure can be as simple as one host running the hypervisor or
a large cluster of hundreds of hosts. The Management Server allows
you to configure and manage the cloud resources. Figure 3.21 shows
the architecture of CloudStack which is basically the Management
Server. The Management Server manages one or more zones where
each zone is typically a single datacenter. Each zone has one or more
pods. A pod is a rack of hardware comprising of a switch and one or
more clusters. A cluster consists of one or more hosts and a primary
storage. A host is a compute node that runs guest virtual machines.
The primary storage of a cluster stores the disk volumes for all the
virtual machines running on the hosts in that cluster. Each zone has a
secondary storage that stores templates, ISO images, and disk
volume snapshots.

2. Eucalyptus
Eucalyptus is an open source private cloud software for building
private and hybrid clouds that are compatible with Amazon Web
Services (AWS) APls. Figure 3.22 shows the architecture of
Eucalyptus. The Node Controller (NC) hosts the virtual machine
instances and manages the virtual network endpoints. The cluster-
level (availability-zone) consists of three components - Cluster
Controller (CC), Storage Controller (SC) and VMWare Broker. The CC
manages the virtual machines and is the front-end for a cluster. The
SC manages the Eucalyptus block volumes and snapshots to the
instances within its specific cluster. SC is equivalent to AWS Elastic
Block Store (EBS). The VMWare Broker is an optional component
that provides an AWS-compatible interface for VMware
environments. At the cloud-level there are two components - Cloud
Controller (CLC) and Walrus. CLC provides an administrative
interface for cloud management and performs high-level resource
scheduling, system accounting, authentication and quota
management.
Walrus is equivalent to Amazon S3 and serves as a persistent storage
to all of the virtual machines in the Eucalyptus cloud. Walrus can be
used as a simple Storage-as-a-Service solution.
36
Figure 3.21: CloudStack
architecture Figure 3.22: Eucalyptus
architecture

Figure 3.23: OpenStack architecture


2.9.3 OpenStack
OpenStack is a cloud operating system comprising of a collection
of interacting services that control computing, storage, and
networking resources. Figure 3.23 shows the architecture of
OpenStack. The OpenStack compute service (called nova-
compute) manages networks of virtual machines running on
nodes, providing virtual servers on demand. The network service
(called nova-networking) provides connectivity between the
interfaces of other OpenStack services. The volume service
(Cinder) manages storage volumes for virtual machines. The
object storage service (swift) allows users to store and retrieve
files. The identity service (keystone) provides authentication and
authorization for other services. The image registry (glance) acts
as a catalog and repository for virtual machine images. The
OpenStack scheduler (nova-scheduler) maps the nova-AP! calls to the
appropriate OpenStack components. The scheduler takes the virtual
machine requests from the queue and determines where they
should run. The messaging service (rabbit-mq) acts as a central
node for message passing between daemons. Orchestration
activities such as running an instance are performed by the nova-

37
api which accepts and responds to end user compute API calls.
The OpenStack dashboard (called horizon) provides web-based
interface for managing OpenStack services.

38
MATCHING PAIR PUZZLES

39
PICTUREJIGSAWPUZZLE

https://im-a-puzzle.com/share/8860991227f6c4c

40
Video Links

Sl. Topic Video Link


No.
1 https://www.youtube.com/watch
Compute Services
?v=r6Q_QfmYpLQ
2 https://www.youtube.com/watch
Storage Service
?v=8ti6yyBBJhk
3 https://www.youtube.com/watch
Database Services
?v=1pkdKCB-NBA
4 https://www.youtube.com/watch
Application Services
?v=70PyEXKqUIs
5 Content Delivery https://www.youtube.com/watch
Services ?v=4A7IS5ns8S4
6 https://www.youtube.com/watch
Analytics Services
?v=JUQXx0R0RfE
7 Deployment and https://www.youtube.com/watch
Management ?v=6yZKs6HfRDQ
Services

8 Identity and Access https://www.youtube.com/watch


Management ?v=o0p04B7-NFY
Services

9 https://www.youtube.com/watch
Open Source ?v=IseEhw-Dxrc
Private Cloud
Software
https://www.youtube.com/watch
?v=bfndu5duogQ

41
10. Assignment

Complete the Assignments in AWS Academy Cloud Foundations Module 5 to 7

KnowledgeCheck:

https://awsacademy.instructure.com/courses/3781/assignments

42
11. Part A- Questions & Answers

1. What are the various layers in the cloud reference


model? [K2, CO2]
• Infrastructure / Facilities
• Hardware Layer
• Virtualization Layer
• Virtual Machines
• Platform and Middleware
• Service Management Layer (APIs)
• Applications

2. List the features of compute services. [K2, CO2]


• Scalable, Flexible, Secure, Cost Effective

3. Briefly discuss the features of Amazons Compute


service EC2. [K2, CO2]
The simple web interface of Amazon EC2 allows you to obtain
and configure capacity with minimal friction.
It provides you with complete control of your computing
resources and lets you run on Amazon’s proven computing
environment.
Amazon EC2 reduces the time required to obtain and boot new
server instances (called Amazon EC2 instances) to minutes,
allowing you to quickly scale capacity, both up and down, as your
computing requirements change.
Amazon EC2 changes the economics of computing by allowing
you to pay only for capacity that you actually use.
Amazon EC2 provides developers and system administrators the
tools to build failure resilient applications and isolate themselves
from common failure scenarios.
You pay a very low rate for the compute capacity you actually
consume.
Amazon EC2 Auto Scaling helps you maintain application
availability and allows you to automatically add or remove EC2
instances according to conditions you define.

4. List the features of storage services. [K2, CO2]


Scalability, Replication, Access Policies, Encryption, Consistency

5. What are the various storage services provided by


Microsoft Azure? [K2, CO2]

43
Windows Azure Storage provides various storage services such as
blob storage service, table service and queue service. The blob
storage service allows storing unstructured binary data or binary
large objects (blobs). Blobs are organized into containers. Two
kinds of blobs can be stored - block blobs and page blobs. A block
blob can be subdivided into some number of blocks. If a failure
occurs while transferring a block blob, retransmission can resume
with the most recent block rather than sending the entire blob
again. Page blobs are divided into number of pages and are
designed for random access. Applications can read and write
individual pages at random in a page blob.

6. What are the differences between an SQL and No-


SQL databases? [K2, CO2]
SQL NoSQL
RELATIONAL DATABASE Non-relational or distributed
MANAGEMENT SYSTEM (RDBMS) database system.
These databases have fixed or
They have dynamic schema
static or predefined schema
These databases are not suited These databases are best suited
for hierarchical data storage. for hierarchical data storage.
These databases are best suited These databases are not so good
for complex queries for complex queries
Vertically Scalable Horizontally scalable
Follows CAP(consistency,
Follows ACID property
availability, partition tolerance)
Examples: MySQL, PostgreSQL, Examples: MongoDB, GraphQL,
Oracle, MS-SQL Server, etc HBase, Neo4j, Cassandra, etc

7. What are the features of Database Services? [K2, CO2]


Scalability, Reliability, Performance, Security

8. Enumerate the features of Amazon Relational Data


Store. [K2, CO2]
Amazon Relational Database Service (RDS) is a web service that
makes it easy to setup, operate and scale a relational database in
the cloud. Figure 3.8 shows a screenshot of the Amazon RDS
console. The console provides an instance launch wizard that
allows you to select the type of database to create (MySQL,
Oracle or SQL Server) database instance size. allocated storage,
DB instance identifier, DB username and password. The status of
the launched DB instances can be viewed from the console. It
takes several minutes for the instance to become available. Once

44
the instance is available, you can note the instance end point
from the instance properties tab. This end point can then be used
for securely connecting to the instance.

9. Enumerate the features of Amazon DynamoDB. [K2,


CO2]
Amazon DynamoDB is the non-relational (No-SQL) database
service from Amazon. Figure 3.9 shows a screenshot of the
Amazon DynamoDB console. The DynamoDB data model includes
include tables, items and attributes. A table is a collection of
items and each item is a collection of attributes. To store data in
DynamoDB you have to create a one or more tables and specify
how much throughput capacity you want to provision and reserve
for reads and writes. DynamoDB is a fully managed service that
automatically spreads the data and traffic for the stored tables
over a number of servers to meet the throughput requirements
specified by the users. All stored data is automatically replicated
across multiple availability zones to provide data durability.

10.Enumerate the features of Google Cloud SQL. [K2,


CO2]
Google SQL is the relational database service from Google.
Google Cloud SQL service allows you to host MySQL databases in
the Google's cloud. Cloud SQL provides both synchronous or
asynchronous geographic replication and the ability to import/
export databases. Figure 3.10 shows a screenshot of the Google
Cloud SQL console. You can create new database instances from
the console and manage existing instances. To create a new
instance you select a region, database tier, billing plan and
replication mode. You can schedule daily backups for your Google
Cloud SQL instances, and also restore backed-up databases.

11.What is the benefit of using a sandbox environment


for a PaaS? [K2, CO2]
Applications run in a secure sandbox environment isolated
from other applications. The sandbox environment provides a
limited access to the underlying operating system. App Engine
can only execute application code called from HTTP requests.
The sandbox environment allows App Engine to distribute web
requests for the application across multiple servers.

45
12.Which cloud service is most important for
developing loosely coupled applications? [K2, CO2]
Cloud-based queuing services allow de-coupling
application components. The de-coupled components
communicate via messaging queues. Queues are useful for
asynchronous processing. Another use of queues is to act as
overflow buffers to handle temporary volume spikes or
mismatches in message generation and consumption rates from
application components. Queuing services from various cloud
service providers allow short messages of a few kilo-bytes in
size. Messages can be enqueued and read from the queues
simultaneously. The enqueued messages are typically retained
for a couple of days to a couple of weeks.

13.What is a push messaging service? What are its


uses? [K2, CO2]
Cloud-based notification services or push messaging services
allow applications to push messages to internet connected
smart devices such as smartphones, tablets, etc. Push
messaging services are based on publish-subscribe model in
which consumers subscribe to various topics/channels provided
by a publisher/producer. Whenever new content is available on
one of those topics/channels, the notification service pushes
that information out to the consumer. Push notifications are
used for such smart devices as they help in displaying the
latest information while remaining energy efficient. Consumer
applications on such devices can increase their consumer
engagement with the help of push notifications.

14.What is a Content Delivery Network? [K2, CO2]


Cloud-based content delivery service include Content
Delivery Networks (CDNs). A CDN is a distributed system of
servers located across multiple geographic locations to serve
content to end-users with high availability and high
performance. CDNs are useful for serving static content
such as text, images, scripts, etc., and streaming media.
CDNs have a number of edge locations deployed in multiple
locations, often over multiple backbones. Requests for static
or streaming media content that is served by a CDN are
directed to the nearest edge location. CDNs cache the
popular content on the edge servers which helps in
reducing bandwidth costs and improving response times.

46
15. What are the various types of MapReduce jobs
supported by Amazon EMR? [K2, CO2]
EMR supports various job types:
• Custom JAR: Custom JAR job flow runs a Java program
that you have uploaded to Amazon S3.
• Hive program: Hive is a data warehouse system for
Hadoop. You can use Hive to process data using the SQL-
like language, called Hive-QL. You can create a Hive job
flow with EMR which can either be an interactive Hive job
or a Hive script.
• Streaming job: Streaming job flow runs a single Hadoop
job consisting of map and reduce functions implemented in
a script or binary that you have uploaded to Amazon S3.
You can write map and reduce scripts in Ruby, Perl,
Python, PHP, R, Bash, or C++.
• Pig programs: Apache Pig is a platform for analyzing
large data sets that consists of a high-level language (Pig
Latin) for expressing data analysis programs, coupled with
infrastructure for evaluating these programs. You can
create a Pig job flow with EMR which can either be an
interactive Pig job or a Pig script.
• HBase: HBase is a distributed, scalable, No-SQL database
built on top of Hadoop. EMR allows you to launch an HBase
cluster. HBase can be used for various purposes such as
referencing data for Hadoop analytics, real-time log
ingestion and batch log analytics, etc.

16. List few benefits from using Google BigQuery.


[K2, CO2]
Distributed architecture – Google distributes the computing
used by BigQuery across compute resources dynamically which
means that you do not have to manage compute clusters.
Competing offerings typically require custom sizing (and pricing)
of specific compute clusters, and this can change over time which
can be challenging.
Flexible pricing options – Since Google dynamically allocates
resources, prices are dynamic too. Google offers both a pay-as-
you-go option where you pay for the data imported into BigQuery
and then per query costs. As part of this approach, they provide
a reporting tool to provide added visibility into usage and cost
trends. Fixed pricing is also an option for larger users.
Fully managed – Because BigQuery is a fully managed service,
the backend configuration and tuning is handled by Google. This
is much simpler than competing solutions that require you to

47
choose a number and type of clusters to create and to manage
over time.
High Availability – BigQuery automatically replicates data
between zones to enable high availability. It also automatically
load balances to provide optimal performance and to minimize the
impact of any hardware failures. This is different from competing
solutions which typically focus on one zone only.
Google BigQuery is unique in that it takes an “Analytics-as-a-
Service” approach to the cloud.

48
12. Part B Questions

Q. Questions K Level CO
No. Mapping
1 Explain the various layers in the K2 CO2
cloud reference model.
2 Describe three applications of K2 CO2
compute services.
3 Describe the various security K3 CO2
mechanisms of cloud-storage
services.
4 What is a push messaging service? K3 CO2
What are its uses?
5 Explain a Content Delivery Network. K3 CO2

6 What are the various types of K3 CO2


MapReduce jobs supported by
Amazon EMR?
7 Explain the various Cloud services K3 CO2
provided by Amazon.
8 Explain the various Cloud services K2 CO2
provided by Google.
9 Explain the various Cloud services K2 CO2
provided by Microsoft.

49
13. Supportive Online Certification Courses

• Coursera - Cloud Computing Specialization


• Cloud Computing Concepts, Part 1
• Cloud Computing Concepts: Part 2
• Cloud Computing Applications, Part 1: Cloud
Systems and Infrastructure
• Cloud Computing Applications, Part 2:
Big Data and Applications in the Cloud
• Cloud Networking
• Cloud Computing Project
• https://www.coursera.org/specializati
ons/cloud-computing#courses

• NPTEL:
• Cloud Computing and Distributed
Systems
• https://onlinecourses.nptel.ac.in/noc2
3_cs27/preview

• Infosys SpringBoard:
• Introduction to Cloud Computing
• https://infyspringboard.onwingspan.c
om/web/en/app/toc/lex_292450150
89922640000_shared/overview

50
14. Real time Applications

1. Online Data Storage


2. Backup and Recovery
3. Big Data Analysis
4. Testing and Development
5. Antivirus Applications
6. E-commerce Application
7. Cloud Computing in Education

51
15. ASSESSMENT SCHEDULE

• Tentative schedule for the Assessment During 2023-2024 Odd


Semester

Name of the
S.NO Start Date End Date Portion
Assessment

1 IAT 1 09.08.2023 15.08.2023 UNIT 1 & 2

2 IAT 2 26.10.2023 01.11.2023 UNIT 3 & 4

3 REVISION 09.11.2023 14.10.2023 UNIT 5 , 1 & 2

4 MODEL 15.11.2023 25.11.2023 ALL 5 UNITS

52
16. Prescribed Text Books & Reference Books

TEXT BOOKS:

1. Arshdeep Bahga, Vijay Madisetti, “Cloud Computing: A Hands-on


Approach”, Universities Press Private Limited, 2014. (Unit 1, 2)
2. Douglass E. Comer, “The Cloud Computing Book: The future of
computing explained”, CRC Press, 2021. (Unit 3, 4, 5)

REFERENCES:

1. Rajkumar Buyya, Christian Vecchiola, S. ThamaraiSelvi,


“Mastering Cloud Computing”, Tata Mcgraw Hill, 2017. (Unit 1)
2. Kai Hwang, Geoffrey C. Fox, Jack G. Dongarra, “Distributed and Cloud
Computing, From Parallel Processing to the Internet of Things”, Morgan
Kaufmann Publishers, 2012.
3. Toby Velte, Anthony Velte, Robert Elsenpeter, “Cloud Computing
- A Practical Approach”, Tata Mcgraw Hill, 2009.
4. George Reese, “Cloud Application Architectures: Building Applications
and Infrastructure in the Cloud: Transactional Systems for EC2 and
Beyond (Theory in Practice)”, O'Reilly, 2009.
5. https://cloud.google.com/docs
6. https://www.cloudskillsboost.google/course_templates/153
7. https://nptel.ac.in/courses/106105223

53
17. MINI PROJECT SUGGESTIONS

In this project you will create an Amazon BC2 instance and


setup a web-server on the instance and associate an Amazon
Elastic IP address with the instance. Follow the steps below:
• Create and Amazon Web Services account.
• Log into the AWS account and open the Amazon
EC2 console. Click on the launch instances button.
• Select an 'Ubuntu Server' AMI. In the create
instance wizard select t1.micro instance type.
Proceed with the instance creation wizard with
default settings. On the create key-pair page, create
a new key-pair. On the security groups page, create
a new security group. In the security group, create
a custom TCP rule with port 80. Proceed with the
wizard and launch the instance.
• View the instance status in the console and wait till
the status becomes running. When the status
becomes running, note the public DNS of the
instance from the console.
• Connect to the instance using SSH. You will need
the key-pair you specified while creating the
instance. Use the following command for ssh:
ssh -i /path/to/myKeyPair.pem ubuntu@publicDNS

• After connecting to the EC2 instance, install Apache


server using the following command:

sudo apt-get install apache2

• Create a web page as


follows: cd /var/www/html
sudo vim index.html

• Restart the Apache server as


follows: sudo /etc/init.d/apache2
restart
• In a browser open the public DNS of the EC2
instance and see the web page you just created.
• Now to associate an Elastic IP address with the
instance, go back to the EC2 console and click on
54
the Elastic IPs link. Click on Allocate New Address
button and allocate a new address. You can see the
allocated Elastic IP address in the console. Right-
click on it and choose Associate from the menu.
Choose the EC2 instance you created previously and
associate it with the Elastic IP address. Now enter
the Elastic IP address in a browser and view the
web page you created.

55
Thank you
Disclaimer:

This document is confidential and intended solely for the educational


purpose of RMK Group of Educational Institutions. If you have received
this document through email in error, please notify the system
manager. This document contains proprietary information and is
intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate,
distribute or copy through e-mail. Please notify the sender immediately
by e-mail if you have received this document by mistake and delete this
document from your system. If you are not the intended recipient you
are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.

56

You might also like