Cloud Computing Notes
Cloud Computing Notes
Cloud Computing:
Instead of storing files on a storage device or hard drive, a user can save them on
cloud, making it possible to access the files from anywhere, as long as they have
access to the web. The services hosted on cloud can be broadly divided into
infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-
a-service (SaaS). Based on the deployment model, cloud can also be classified as
public, private, and hybrid cloud.
Further, cloud can be divided into two different layers, namely, front-end and
back-end. The layer with which users interact is called the front-end layer. This
layer enables a user to access the data that has been stored in cloud through cloud
computing software.
The layer made up of software and hardware, i.e., the computers, servers, central
servers, and databases, is the back-end layer. This layer is the primary component
of cloud and is entirely responsible for storing information securely. To ensure
seamless connectivity between devices linked via cloud computing, the central
servers use a software called middleware that acts as a bridge between the database
and applications.
Cloud Architecture:
On the other hand, the back end refers to the cloud architecture components that
make up the cloud itself, including computing resources, storage, security
mechanisms, management, and more.
Application:
Service:
Runtime cloud:
Storage:
Infrastructure:
Security:
Cloud architecture, on the other hand, is the plan that dictates how cloud
resources and infrastructure are organized.
The back end contains all the cloud computing resources, services, data storage,
and applications offered by a cloud service provider. A network is used to connect
the frontend and backend cloud architecture components, enabling data to be sent
back and forth between them. When users interact with the front end (or client-side
interface), it sends queries to the back end using middleware where the service
model carries out the specific task or request.
# Cloud computing provides a natural platform for SOA, as it allows for the
deployment and management of services in a flexible and scalable manner.
# Cloud services can be used as building blocks for applications, and SOA
principles can be used to design and implement these applications.
# Examples include using cloud services for data storage, processing, and
analytics, and building applications that leverage these services.
Easier Updates: Updates to one service don’t break others, making it reliable and
consistent.
Quick Fixes: Issues can be resolved quickly by fixing the affected service rather
than the entire system.
Regular Updates: Services can be updated regularly without downtime, allowing users
to continue their work without any stoppage
Web Services :
Web services are software functions or applications that are available over the
internet and use standard protocols like HTTP to communicate. They allow different
systems to talk to each other, even if they are built using different languages or
platforms. A client invokes a web service by submitting an XML request, to which
the service responds with an XML response.
A web service is a set of open protocols and standards that allow data exchange
between different applications or systems.
XML (Extensible Markup Language) : XML (Extensible Markup Language) plays a crucial
role in cloud computing by providing a standard way to exchange and manage data
across different platforms and applications. It's used for storing, transporting,
and sharing data in a flexible and extensible manner, making it well-suited for the
dynamic environment of the cloud.
WSDL stands for Web Services Description Language. A Web service cannot be used if
it can't be found.
The implementing client has to know where the web service is located.
Also, to invoke the correct web service, the client application has to
understand what the web service does. This is done with the help of Web services
description language(WSDL).
It's an XML-based standard that enables businesses to publish and find information
about web services. Essentially, it acts as a registry or directory for web
services, allowing businesses to list their services and find potential partners.
Publishing
Finding
Describing
Think of it like a Yellow Pages for web services. Just like you look up a business
in a directory, applications can look up web services in UDDI.
It’s an architectural style used to design lightweight, fast, and scalable web
services.
REST is an architectural style used to design web services that allow different
applications or systems to communicate over the Internet using standard HTTP
methods.
REST is an architectural pattern for creating web services that are lightweight,
scalable, and stateless, and that use HTTP methods like GET, POST, PUT, and DELETE
to perform operations on resources (such as data objects like users, files, or
products).
Unlike SOAP, REST is not a protocol, but a set of guidelines for building APIs.
So:
💡 "SOAP sends the message, WSDL describes the message, and UDDI helps you find the
service."
Cloud computing can either be classified based on the deployment model or the type
of service. Based on the specific deployment model, we can classify cloud as
public, private, and hybrid cloud. At the same time, it can be classified as
infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-
a-service (SaaS) based on the service the cloud model offers.
The types of services available to use vary depending on the cloud-based delivery
model or service model you have chosen. There are three main cloud computing
service models:
Instead of the user, a third-party vendor hosts the hardware, software, servers,
storage, and other infrastructure components. The vendor also hosts the user’s
applications and maintains a backup.
Examples:
Microsoft Azure
Key Features:
Platform as a service (PaaS): This model offers a computing platform with all the
underlying infrastructure and software tools needed to develop, run, and manage
applications.
PaaS doesn’t require users to manage the underlying infrastructure, i.e., the
network, servers, operating systems, or storage, but gives them control over the
deployed applications. This allows organizations to focus on the deployment and
management of their applications by freeing them of the responsibility of software
maintenance, planning, and resource procurement.
Examples:
Heroku
Key Features:
Ideal for developers who need to focus on writing code without managing servers or
networking
Types of PaaS
Various sorts of PaaS are presently accessible to engineers. They are :
1. Public PaaS
Designed for public cloud use, offering control over software while the provider
manages IT infrastructure. Suitable for small-medium businesses but less favored by
large organizations due to compliance issues.
2. Private PaaS
Plans to give dexterity of public pass while keeping up security, consistence,
advantages and ease of private security community. A private pass is normally
circulated as gadget or programming in client’s firewall, which is regularly kept
up in server farm on organization’s premises. A private PaaS can be created on
framework and works inside organization’s particular private cloud.
3. Hybrid PaaS
Consolidates organizations with Public PaaS and Private PaaS, with accommodation of
unbounded limit offered by Public PaaS and cost-adequacy of having inside framework
in Private PaaS. Hybrid PaaS utilizes hybrid cloud.
Software as a service (SaaS): This model offers cloud-based applications that are
delivered and maintained by the service provider, eliminating the need for end
users to deploy software locally.
SaaS or software as a service allows users to access a vendor’s software on cloud
on a subscription basis. In this type of cloud computing, users don’t need to
install or download applications on their local devices. Instead, the applications
are located on a remote cloud network that can be directly accessed through the web
or an API.
In the SaaS model, the service provider manages all the hardware, middleware,
application software, and security. Also referred to as ‘hosted software’ or ‘on-
demand software’, SaaS makes it easy for enterprises to streamline their
maintenance and support.
Examples:
Salesforce
Key Features:
Type of saas :
Single-Tenant Architecture :
In this type of a SaaS platform architecture each customer or tenant has their own
“instance” of the software, which runs on a separate server and is supported by a
single infrastructure and database. Thus, there is no sharing of resources between
tenants, and all customer information is separate from other customers.
With this setup comes greater control and customization capabilities, but it might
be more expensive for the provider to maintain since they need to manage multiple
“instances” of the software.
Multi-Tenant Architecture :
Here, the data of each tenant is kept separate and secure but they share the same
app, database, and infrastructure. Unlike the single-tenant version, this setup is
more efficient and cost-effective but typically offers less control and
customization to individual clients.
There are three main types of cloud architecture you can choose from: public,
private, and hybrid.
Types of cloud computing based on deployment:
Public clouds can help businesses save on purchasing, managing, and maintaining on-
premises infrastructure since the cloud service provider is responsible for
managing the system. They also offer scalable RAM and flexible bandwidth, making it
easier for businesses to scale their storage needs.
2) Private cloud architecture refers to a dedicated cloud that is owned and managed
by your organization.
It is privately hosted on-premises in your own data center, providing more control
over resources and more security over data and infrastructure.
However, this architecture is considerably more expensive and requires more IT
expertise to maintain.
In a private cloud, the computing services are offered over a private IT network
for the dedicated use of a single organization.
Also termed internal, enterprise, or corporate cloud, a private cloud is usually
managed via internal resources and is not accessible to anyone outside the
organization. Private cloud computing provides all the benefits of a public cloud,
such as self-service, scalability, and elasticity, along with additional control,
security, and customization.
Private clouds provide a higher level of security through company firewalls and
internal hosting to ensure that an organization’s sensitive data is not accessible
to third-party providers. The drawback of private cloud, however, is that the
organization becomes responsible for all the management and maintenance of the data
centers, which can prove to be quite resource-intensive.
3) Hybrid cloud architecture uses both public and private cloud architecture to
deliver a flexible mix of cloud services. A hybrid cloud allows you to migrate
workloads between environments, allowing you to use the services that best suit
your business demands and the workload. Hybrid cloud architectures are often the
solution of choice for businesses that need control over their data but also want
to take advantage of public cloud offerings.
Hybrid cloud uses a combination of public and private cloud features. The “best of
both worlds” cloud model allows a shift of workloads between private and public
clouds as the computing and cost requirements change. When the demand for computing
and processing fluctuates, hybrid cloud allows businesses to scale their on-
premises infrastructure up to the public cloud to handle the overflow while
ensuring that no third-party data centers have access to their data.
In a hybrid cloud model, companies only pay for the resources they use temporarily
instead of purchasing and maintaining resources that may not be used for an
extended period. In short, a hybrid cloud offers the benefits of a public cloud
without its security risks.
On-Demand Self-Service
➤ Users can access computing resources (like storage, servers) whenever they want
without needing human help.
Resource Pooling
➤ Cloud providers serve many users by sharing physical and virtual resources
dynamically.
Security
➤ Data is protected with encryption, firewalls, and access controls (though
security is a shared responsibility).
Data Storage
➤ Store large amounts of data in the cloud (like Google Drive, AWS S3).
Application Hosting
➤ Host websites, mobile apps, or full software systems (like apps on AWS or
Heroku).
Content Delivery
➤ Deliver websites, videos, and other content faster through CDNs (Content Delivery
Networks).
Utility Computing :
Utility computing is a cloud computing model where you pay only for what you use,
just like a utility service.
The concept of utility computing is simple—it provides processing power when you
need it, where you need it, and at the cost of how much you use it.
You don’t buy hardware or software — instead, you rent computing resources like:
CPU power
Storage
Memory
Network bandwidth
Perfect for websites, apps, and businesses that experience variable traffic.
Instead of buying and managing your own servers, EC2 gives you a virtual machine,
where you can run websites, apps, or even big data tasks. You can choose how much
memory, storage, and processing power you need- and stop it when you’re done. EC2
offers security, reliability, high performance, and cost-effective infrastructure
to meet demanding business needs.
Mashup :
It’s like mixing two apps or APIs together to make something useful and unique!
They allow information to be viewed from different perspectives and combine data
from multiple sources into a single integrated tool. It is done using a web
application that takes information from one or more sources and presents it in a
new way or with a different graphical user interface.
Data Mashups
A data mashup focuses on combining data from multiple sources into a unified view
or interface. In this type, data is pulled from different databases or APIs and
presented in a single application, often for analysis or visualization purposes.
For example, combining public health data with geographic information to create a
real-time map of disease outbreaks. The goal is to aggregate diverse data points to
provide a comprehensive view of the subject.
Application Mashups
Application mashups integrate the functionality of different software applications
into a single user interface. These mashups often pull in different services,
enabling users to interact with features from multiple applications without
switching between platforms. A common example is a customer relationship management
(CRM) tool that integrates email, social media, and calendar functionalities into
one dashboard, enhancing productivity and user experience.
Business Mashups
Business mashups are tailored specifically to meet organizational or enterprise
needs, often involving both data and application mashups to create more complex and
functional systems. These mashups are used to streamline business processes by
combining various internal and external services, such as inventory management,
customer data, financial reporting, and supplier information. The goal is to
improve decision-making, operational efficiency, and data transparency within the
business ecosystem.
4. Vendor Lock-in
Once you start using one cloud provider (like AWS), it becomes hard to switch to
another.
5. Lack of Control
Users don’t have full control over infrastructure (because it’s managed by the
provider).
6. Performance Issues
Shared cloud resources can cause slow performance, especially during peak usage.
7. Misconfiguration
Incorrect settings (like open ports, wrong permissions) can lead to security risks.
8. Compliance Issues
Different countries have different data laws (like GDPR in Europe).
Benefits:
Proactive problem detection: Identifies and resolves issues before they impact end-
users.
Improved security: Helps identify and address security threats and vulnerabilities.
Optimized performance: Ensures applications and infrastructure are performing
optimally.
Cost optimization: Helps identify areas where cloud resources can be optimized to
reduce costs.
Compliance: Facilitates compliance with industry regulations and standards.
Cost Monitoring: Tracks and analyzes cloud spending to identify areas for
optimization.
1. Encryption:
Data at Rest:
Encrypting data while it's stored on cloud servers is crucial to prevent
unauthorized access, even if the storage is compromised.
Data in Transit:
Encrypting data while it's being transmitted between users and cloud servers (or
between different cloud locations) protects it from eavesdropping and interception.
Protocols:
Secure protocols like HTTPS, SSL/TLS, and VPNs are used to encrypt communication
channels.
3. Network Security:
Firewalls:
Firewalls act as a barrier, filtering network traffic and blocking unauthorized
access to cloud resources.
VPNs:
Virtual Private Networks (VPNs) create secure, encrypted tunnels for communication,
protecting data transmitted over public networks.
Intrusion Detection/Prevention Systems (IDS/IPS):
These systems monitor network traffic for malicious activity and can take action to
block threats.
Network Segmentation:
Dividing the cloud network into smaller, isolated segments can limit the scope of a
potential breach.
Round Robin:
Distributes traffic evenly across all available servers in a sequence.
Least Connection:
Routes traffic to the server with the fewest active connections, which helps
balance the workload among servers.
Resource-Based:
Distributes traffic based on server capacity, such as CPU or memory utilization.
Request-Based:
Distributes traffic based on the type of request, allowing for specialized handling
of different types of traffic.
Network Load Balancing: This technique is used to balance the network traffic
across multiple servers or instances. It is implemented at the network layer and
ensures that the incoming traffic is distributed evenly across the available
servers.
Application Load Balancing: This technique is used to balance the workload across
multiple instances of an application. It is implemented at the application layer
and ensures that each instance receives an equal share of the incoming requests.
Database Load Balancing: This technique is used to balance the workload across
multiple database servers. It is implemented at the database layer and ensures that
the incoming queries are distributed evenly across the available database servers.
1. Load Balancers
A load balancer is like a traffic police — it decides where to send each incoming
request.
It distributes traffic across multiple servers so no server gets too much load.
Example: AWS Elastic Load Balancer (ELB), Azure Load Balancer, Google Cloud Load
Balancing
2. Auto-Scaling
Cloud systems can automatically add or remove servers based on traffic.
These can be moved, cloned, or restarted easily if one becomes slow or crashes.
If one server is too busy, the system shifts some tasks to other less busy servers.
Resource Optimization :
Right-sizing:
Selecting the appropriate size and type of cloud resources (e.g., virtual machines,
storage, databases) for each workload to avoid overprovisioning and
underutilization.
Resource Allocation:
Matching cloud resources to application and workload requirements in real-time,
considering factors like performance, cost, and scalability.
Cost Optimization:
Identifying and eliminating unused or underutilized resources, negotiating
discounts with cloud providers, and optimizing storage and computing tiers to
reduce expenses.
Performance Enhancement:
Optimizing network configurations, database settings, and application architectures
to improve response times, throughput, and overall application performance.
Scalability:
Dynamically adjusting cloud resources to accommodate fluctuating workloads,
ensuring applications can handle peak loads without performance degradation.
Automation:
Leveraging automation tools and scripts to manage resource provisioning, scaling,
and monitoring, reducing manual effort and improving efficiency.
1) Virtual machine :
A virtual machine is a software-defined computer that runs on a physical computer
with a separate operating system and computing resources.
The physical computer is called the host machine and virtual machines are guest
machines.
Multiple virtual machines can run on a single physical machine.
Virtual machines are abstracted from the computer hardware by a hypervisor.
2) Hypervisor :
The hypervisor is a software component that manages multiple virtual machines in a
computer.
It ensures that each virtual machine gets the allocated resources and does not
interfere with the operation of other virtual machines.
It acts as a middle layer between the hardware and the virtual machines.
Allows multiple operating systems to run at the same time (like Windows + Linux on
one system)
Virtualization:
Resource Management:
Isolation:
Scalability:
Type of hypervisor :
Cost Efficiency:
Virtualization reduces the need for physical hardware, leading to cost savings in
terms of purchasing and maintaining servers. It also reduces the energy consumption
and space required for physical servers.
Scalability:
Virtual machines are isolated from each other, meaning that one VM’s issues, such
as crashes or security breaches, do not affect other VMs running on the same
physical machine. This enhances security and system stability.
Disaster Recovery:
Faster Deployment:
New virtual machines can be created and deployed quickly, reducing the time it
takes to deploy applications and services in the cloud.
Resource Contention:
Multiple virtual machines share the same underlying physical resources. If not
managed properly, this can lead to resource contention, where VMs compete for CPU,
memory, or I/O, resulting in degraded performance.
Complex Management:
Virtualized environments can become complex to manage as the number of VMs
increases. Administrators need tools and strategies to monitor, manage, and
maintain these virtualized environments effectively.
Security Concerns:
Compatibility Issues:
Data Management:
Virtualized environments often involve multiple VMs storing data across different
locations. Managing and securing data in such distributed environments can be more
challenging compared to a traditional physical infrastructure.
Multitenant Software :
It means that multiple customers of cloud vendor are using the same computing
resources. As they are sharing the same computing resources but the data of each
Cloud customer is kept totally separate and secure. It is very important concept of
Cloud Computing.
Relational Database :
A Relational Database stores data in tables (rows and columns) and uses
relationships to connect that data.
It follows SQL (Structured Query Language) for managing and querying the data.
These databases use Structured Query Language (SQL) to manage and query data,
making them a common choice for structured data storage and retrieval in cloud
environments.
Data is typically structured across multiple tables, which can be joined together
via a primary key or a foreign key.
Cloud storage and distributed systems like Google File System (GFS) and Hadoop
Distributed File System (HDFS) play a crucial role in cloud computing, as they
enable the efficient storage, management, and processing of massive amounts of data
across multiple machines. These distributed systems ensure high availability,
scalability, and fault tolerance, making them ideal for cloud-based environments
where large-scale data processing is required.
Cloud storage refers to storing data on remote servers, which can be accessed over
the internet. The infrastructure is managed by a cloud service provider, and users
can access their data from anywhere, typically via APIs or web interfaces.
Scalability: Cloud storage can scale to handle vast amounts of data. Users only pay
for the storage they use, and the system can automatically expand to accommodate
more data as needed.
Accessibility: Cloud storage allows users to access their data from anywhere with
an internet connection, making it convenient for collaboration and remote work.
Redundancy & Fault Tolerance: Cloud storage providers often replicate data across
multiple locations to ensure that it remains available even in the event of
hardware failures or network disruptions.
Google File System (GFS) is a distributed file system designed by Google to handle
large-scale data storage across multiple machines while providing high reliability
and performance.
GFS (Google File System) is a distributed file system developed by Google to store
huge amounts of data across multiple machines (servers).
In essence, GFS is a foundational technology for cloud computing, allowing for
efficient and scalable data storage and processing on a large scale.
Distributed Architecture:
GFS breaks files into 64MB chunks and replicates these chunks across multiple
servers (chunkservers) to ensure data durability and availability.
Fault Tolerance:
GFS is designed to handle failures gracefully, with mechanisms to automatically
recover data if chunkservers or other components fail.
Scalability:
GFS is designed to scale to handle large datasets and numerous users, making it
suitable for large-scale applications like web search and indexing.
High Throughput:
GFS is optimized for high-speed data processing and access, enabling efficient data
retrieval and processing.
Master Server:
A master server coordinates the entire system, managing metadata, file chunk
locations, and handling client requests.
File Splitting:
Chunk Servers:
Master Server:
Keeps metadata (file names, chunk locations), but not the data itself.
Client Request:
Clients ask the Master for chunk info, then talk to Chunk Servers directly to
read/write data.
GFS is a distributed file system developed by Google to meet the needs of large-
scale data processing. It is specifically optimized for applications like search
indexing, data mining, and processing large data sets across many machines.
High Fault Tolerance: GFS is designed to continue working seamlessly even if parts
of the system fail. Data is replicated across multiple nodes to prevent loss,
ensuring data availability and reliability.
Large File Storage: GFS is optimized to store very large files (in the range of
gigabytes or terabytes) and works efficiently by breaking these large files into
smaller chunks.
Data Consistency and Redundancy: GFS provides data consistency by using a master
server to coordinate file access. The system replicates data to ensure reliability,
even in the event of node failures.
Write-Once, Read-Many Model: Files are typically written once and read many times,
which simplifies the consistency and coordination mechanisms. This is ideal for
applications like logging or batch processing.
Designed for Large-Scale Data: GFS is built to handle petabytes of data and
thousands of machines, making it suitable for Google’s vast data processing needs.
HDFS is another widely used distributed file system designed for storing large data
sets in a distributed environment. It is an open-source project under the Apache
Hadoop ecosystem and is optimized for big data analytics and processing.
It is part of the Apache Hadoop ecosystem and is used to store very large files
across multiple machines.
It is a core component of the Apache Hadoop ecosystem and is designed to handle
large-scale data processing jobs such as those found in big data environments.
NameNode(MasterNode):
DataNode(SlaveNode):
Actual worker nodes do the actual work like reading, writing, processing, etc.
They also perform creation, deletion, and replication upon instruction from the
master.
They can be deployed on commodity hardware.
Component Role
NameNode : The master server that manages file system metadata like
file names, block locations, and
permissions.
DataNode : The worker nodes that store the actual data blocks. They
regularly report to the NameNode.
Fault Tolerance: Like GFS, HDFS ensures that data is replicated across multiple
nodes to prevent data loss. By default, it replicates data three times, but this
can be configured.
Data Access: HDFS is designed to handle large sequential data access patterns,
making it ideal for tasks like batch processing or large-scale data analytics.
Integration with Hadoop Ecosystem: HDFS is tightly integrated with other tools in
the Hadoop ecosystem (such as MapReduce, Hive, and Spark), allowing it to serve as
a foundational layer for big data processing.
GFS is proprietary to Google, while HDFS is open-source and part of the Apache
Hadoop project.
Both file systems are optimized for large-scale, fault-tolerant data storage across
many machines, with replication strategies to ensure data integrity.
GFS was designed with Google’s internal needs in mind, focusing on applications
like search indexing, while HDFS was designed for big data processing and analytics
in the Hadoop ecosystem.
Simplified Data Management: Both GFS and HDFS manage data at the block level,
abstracting away the complexities of low-level data storage and providing a unified
view of the data across a distributed environment.
Key Differences:
Feature Database
Data Store
Structure Structured data with predefined schema
Various data types, including structured, semi-structured, and unstructured
Management Managed by a DBMS
Managed by different systems depending on storage solution
Purpose Efficient storage and retrieval of structured data
Persistent storage and management of various data types
Examples Relational databases, NoSQL databases, cloud-based database
services Cloud storage services, file systems, object storage, data lakes
Cloud Middleware:
Without middleware, developers would need to write custom code for communication
between services — which is time-consuming and error-prone. Middleware simplifies
this process and improves scalability, flexibility, and integration.
Integration – Connects apps and services across cloud and on-premises environments.
Sky Computing :
In traditional cloud computing, companies usually use one cloud provider. But that
creates problems like:
Sky Computing solves this by enabling inter-cloud compatibility and giving freedom
to choose the best cloud services for different needs.
Key benefits:
# Flexibility
# Cost Efficiency
# Portability
# High Availability
Simple Example:
Imagine you run a website.
Big Table :
Google Cloud Bigtable is a highly scalable NoSQL database designed for handling
large volumes of data efficiently. It is built to store and manage terabytes to
petabytes of structured data while ensuring low-latency performance. This makes it
an excellent choice for applications requiring high throughput and real-time
analytics.
One of the key features of Bigtable is its row key-based indexing. Every row in a
table is uniquely identified by a row key, which allows quick lookups. Due to its
distributed architecture, Bigtable can process billions of rows and thousands of
columns seamlessly. It is particularly useful for use cases like time-series data,
financial transactions, and IoT analytics.
Other Applications:
Graph Processing:
Calculating PageRank, finding shortest paths, or detecting communities in large
graphs.
Image Processing:
Performing tasks like feature extraction, image classification, or image stitching.
Web Indexing:
Building and updating search engine indexes, mentioned by Wikipedia.
Text Mining:
Counting word frequencies, says IBM or identifying common hashtags in social media
data.
Entertainment:
Netflix uses MapReduce to provide personalized movie recommendations based on user
history.
E-commerce:
Amazon and other e-commerce companies use it to analyze customer buying behavior
and personalize shopping experiences.
Inter cloud :
Intercloud is a network of clouds that are linked with each other. This includes
private, public, and hybrid clouds that come together to provide a seamless
exchange of data.