Cloud Computing Notes-2
Cloud Computing Notes-2
1. On-Demand Self-Service
o Users can provision computing resources as needed without human intervention.
o ⮞ Example: Allocate storage or processing power instantly.
2. Broad Network Access
o Services are accessible over the network (internet/intranet) using standard devices
(phones, laptops, tablets).
o ⮞ Promotes remote access from anywhere.
3. Resource Pooling
o Providers serve multiple customers with dynamically assigned resources.
o ⮞ Uses multi-tenancy and virtualization.
o Resources are pooled across various locations and reassigned as needed.
4. Rapid Elasticity
o Resources can be scaled up or down quickly to match demand.
o ⮞ Appears unlimited to users.
5. Measured Service
o Usage is monitored, controlled, and reported—offering transparency.
o ⮞ Pay-per-use or metered billing model.
6. Scalability
o Ability to increase or decrease IT resources as business needs change.
o ⮞ Horizontal (adding more machines) and vertical (upgrading resources) scaling.
7. Economical
o Reduces upfront hardware and software costs.
o ⮞ Operates on a subscription or pay-as-you-go basis.
Cloud Computing Service-Oriented Architecture (SOA) is a design framework where cloud services are
structured as modular, loosely-coupled, and reusable components that interact over a network. It
combines the flexibility of cloud computing with the modularity of SOA, allowing organizations to build
applications from distributed services hosted in the cloud.
📌 Explanation
🔑 Characteristics/Concept of SOA:
1. Modular Services:
Each function is designed as a standalone "service" (e.g., login service, payment service).
2. Loose Coupling:
Services are loosely connected, meaning they can be modified or replaced independently.
3. Interoperability:
Different services, even if written in different languages or running on different platforms, can
work together.
4. Discoverability:
Services can be registered and discovered dynamically when needed.
5. Reusability:
The same service can be used in multiple applications.
6. Standardized Communication:
Services interact using common protocols like HTTP, SOAP, REST, etc.
7. Scalability: SOA supports auto-scaling of services based on demand.
📘 Example
Each service works independently but together form a complete system — this is SOA in action.
OR
1. Service-Based Approach
o Every component is a service (e.g., storage, compute, authentication).
o ⮞ Services communicate over standard protocols like HTTP, SOAP, or REST.
2. Loose Coupling
o Services are independent and minimally dependent on each other.
o ⮞ Easier to update, scale, or replace without impacting others.
3. Reusability
o Services are designed for repeated use across different applications.
o ⮞ Improves development speed and reduces redundancy.
4. Interoperability
o Services are platform and language agnostic, enabling communication between
heterogeneous systems.
o ⮞ Facilitates integration across different cloud platforms.
5. Discoverability
o Services are registered and can be discovered dynamically.
o ⮞ Promotes automated orchestration and scaling.
6. Scalability and Elasticity
o SOA in cloud supports auto-scaling of services based on demand.
o ⮞ Ensures optimal resource utilization.
7. Governance and Monitoring
o Includes tools to track usage, monitor performance, and enforce policies.
o ⮞ Ensures service reliability and security.
SOA enables cloud providers to expose functionality (e.g., databases, storage) as API-
driven services.
Helps build composable applications using services from different providers.
Cloud platforms like AWS, Azure, and Google Cloud heavily use SOA principles.
SOA is a foundation for microservices architecture in cloud-native development.
Q.3 what do you understand by SOA? how it supports cloud computing. Explain it
ChatGPT said:
8. Modular Services:
Each function is designed as a standalone "service" (e.g., login service, payment service).
9. Loose Coupling:
Services are loosely connected, meaning they can be modified or replaced independently.
10. Interoperability:
Different services, even if written in different languages or running on different platforms, can
work together.
11. Discoverability:
Services can be registered and discovered dynamically when needed.
12. Reusability:
The same service can be used in multiple applications.
13. Standardized Communication:
Services interact using common protocols like HTTP, SOAP, REST, etc.
🌐 How SOA Supports Cloud Computing
SOA plays a foundational role in enabling cloud computing by structuring cloud resources and
applications as services. Here’s how:
Service Reusability Allows cloud apps to reuse standard services like authentication, storage, etc.
On-Demand Access Users can access services anytime from anywhere over the internet.
Platform Independence Services work across different OS, devices, and cloud providers.
📘 Example
All these can be orchestrated into one cloud application using SOA principles.
Sky Computing is an emerging computing paradigm that aims to create a global, unified cloud
platform by seamlessly integrating multiple cloud providers (like AWS, Azure, GCP) into a single
virtual cloud.
It enables cross-cloud interoperability, application portability, and resource sharing across
clouds.
Key Features of Sky Computing
Feature Explanation
💵 Cost Optimization Allows using the cheapest or most efficient cloud at any time.
🔒 Security & Policy Control Centralized security and compliance control across cloud providers.
🧠 Intelligent Workload Uses AI/analytics to place workloads where performance and cost are
Placement optimized.
xxxx
1. Multi-Cloud Interoperability
o Enables applications to run across different cloud platforms without being locked into
one.
o ⮞ Example: Run compute on AWS, store data on Google Cloud, and use AI tools from
Azure.
2. Cloud Abstraction Layer
o Introduces a middleware or interface to hide the complexity of different cloud APIs.
o ⮞ Provides a unified way to deploy and manage services.
3. Portability and Flexibility
o Applications can move freely between clouds based on cost, performance, or
compliance.
o ⮞ Avoids vendor lock-in.
4. Optimized Resource Utilization
o Chooses the best cloud provider for each task, improving cost-effectiveness and
performance.
5. Enhanced Reliability and Redundancy
o By using multiple clouds, systems can stay up even if one provider fails.
6. Security and Compliance Control
o Supports distributed policies and encryption across clouds for data protection and
regulation adherence.
✅ In Simple Terms:
Cloud Middleware is a software layer that acts as a bridge between cloud applications and cloud
infrastructure. It simplifies communication, integration, and management of cloud services across
different platforms.
Sky Computing is crucial in the evolution of cloud usage, especially in large-scale, global, and mission-
critical applications.
Data Migration in Cloud refers to the process of moving data from on-premises systems, legacy
storage, or one cloud platform to another cloud environment. It is a key step in digital transformation
and cloud adoption.
🔑 Key Facts about Cloud Data Migration
1. Definition
o The process of transferring data, applications, or workloads to a cloud-based system or
between cloud environments.
2. Types of Migration
o Storage Migration: Moving data to cloud storage.
o Database Migration: Transferring databases with minimal downtime.
o Application Migration: Moving entire applications and their dependencies.
o Cloud-to-Cloud Migration: Switching providers (e.g., AWS to Azure).
3. Migration Phases
o Assessment: Evaluate data volume, dependencies, risks.
o Planning: Define migration strategy (big bang vs. phased).
o Migration: Use tools/services to transfer data securely.
o Validation: Ensure integrity, performance, and functionality.
4. Common Tools
o AWS DMS, Azure Migrate, Google Cloud Transfer Service, CloudEndure.
5. Challenges
o Data loss risks
o Downtime
o Compatibility issues
o Security and compliance concerns
6. Benefits
o Improved scalability and accessibility
o Reduced infrastructure costs
o Enhanced data analytics and processing
Q.8 Describe relational database. compare GFS and HDFS. enlist and explain the features of HDFS.
ChatGPT said:
📘 Relational Database
A Relational Database is a type of database that stores and provides access to data points that are
related to one another. Data is organized in tables (also called relations) consisting of rows and
columns.
1. Structured Data
o Data is stored in predefined schemas using tables.
2. Use of SQL
o Structured Query Language (SQL) is used to manage and query data.
3. ACID Properties
o Ensures Atomicity, Consistency, Isolation, and Durability of transactions.
4. Primary & Foreign Keys
o Used to define relationships between tables.
5. Examples
o MySQL, PostgreSQL, Oracle DB, Microsoft SQL Server.
Feature GFS (Google File System) HDFS (Hadoop Distributed File System)
Programming
C++ Java
Language
Community Support Proprietary (limited public access) Open-source (wide community support)
1. Fault Tolerance
o Files are split into blocks and stored redundantly (default 3 copies).
o ⮞ Ensures availability even if nodes fail.
2. High Throughput
o Designed for large-scale data processing, providing fast access.
3. Scalability
o Can scale out by simply adding more nodes to the cluster.
4. Large Data Set Handling
o Efficiently stores and processes terabytes to petabytes of data.
5. Data Locality
o Computation is moved closer to where the data resides to reduce latency.
6. Write Once, Read Many
o Data is typically written once and read many times, making it ideal for big data analytics.
7. Master-Slave Architecture
o NameNode: Manages metadata and namespace.
o DataNodes: Store actual data blocks.
8. Streaming Access
o Optimized for batch processing and large file streaming, not random access.
9. Open-Source
o Developed under Apache, widely supported and extensible.
Q.9
📌 What is GFS?
GFS is a scalable distributed file system developed by Google to handle large data-intensive
applications.
🔑 Key Points:
Developer: Google
Language: C++
Purpose: Internal use for large-scale data processing
Architecture: Master-Slave (1 Master, many ChunkServers)
Chunk Size: 64 MB
Fault Tolerance: Uses replication for reliability
Optimized For: Large streaming reads and large files
Proprietary: Not open-source
GFS is a proprietary, scalable distributed file system developed by Google to support large-scale data
processing workloads, such as those used in search indexing and data mining.
🧩 Architecture
Master Node:
o Manages metadata (namespace, file locations, chunk mapping).
o Coordinates all system operations.
Chunk Servers:
o Store actual file data in chunks (default size: 64 MB).
o Each chunk is replicated (typically 3 times) for fault tolerance.
Clients:
o Communicate with the Master to get metadata, and directly with ChunkServers to
read/write data.
🚀 Key Features
1. Chunk-Based Storage:
o Files are divided into large chunks to optimize for streaming access.
2. Replication for Fault Tolerance:
o Default replication is 3 copies per chunk.
3. Optimized for Append Operations:
o Ideal for logs and sequential file writes.
4. High Throughput:
o Designed for batch processing with large datasets.
5. Minimal Metadata Size:
o Master stores all metadata in memory for fast access.
📌 What is HDFS?
HDFS is an open-source distributed file system from the Apache Hadoop project, inspired by GFS.
🔑 Key Points:
HDFS is an open-source distributed file system and a core part of the Apache Hadoop ecosystem. It's
inspired by GFS and designed for processing big data using a cluster of commodity hardware.
🧩 Architecture
NameNode (Master):
o Manages the file system namespace and metadata.
o Stores information about block locations.
DataNodes (Slaves):
o Store actual data blocks on local disks.
o Periodically send heartbeat and block reports to the NameNode.
Client:
o Requests metadata from NameNode and reads/writes data directly with DataNodes.
📏 Block Size
🚀 Key Features
1. Fault Tolerance:
o Blocks are replicated (default: 3 times) across nodes.
2. High Throughput:
o Suitable for applications with large data sets and batch processing.
3. Scalability:
o Can scale horizontally by adding more nodes.
4. Data Locality:
o Computation is moved to where the data resides, reducing network bottlenecks.
5. Write-Once, Read-Many Model:
o Data is typically written once and read multiple times.
6. Open Source:
o Available under the Apache license, with large community support.
1. what is MapReduce? how does MapReduce model differ from other model?
ChatGPT said:
1. What is MapReduce?
MapReduce is a programming model and processing technique used to handle and process large-scale
data in a distributed environment. It is widely associated with Hadoop, which implements this model
for processing big data. MapReduce simplifies the complexity of parallelizing tasks across a cluster of
computers.
1. Parallel Processing:
o MapReduce distributes the workload across multiple machines in a cluster, enabling
parallel data processing.
o Unlike traditional, monolithic processing, where data is processed sequentially on a
single machine, MapReduce scales horizontally.
2. Fault Tolerance:
o MapReduce has built-in fault tolerance, automatically reassigning tasks to other nodes if
one fails. In traditional models, task failure might result in the loss of work.
3. Simplicity:
o The MapReduce model abstracts away complex issues like parallelism, fault tolerance,
and resource management. Other models, like shared-memory parallel processing,
require manual management of these issues.
4. Data Handling:
o Traditional models might work better with in-memory data (e.g., shared-memory
processing), whereas MapReduce works with data stored on disk, which is more suited
for handling vast amounts of data (terabytes/petabytes).
Step-by-Step Process:
1. Input Splitting:
o The input data is split into smaller, manageable chunks (e.g., text files or logs).
o These chunks are distributed across multiple worker nodes.
2. Map Phase:
o Each Mapper processes a chunk of input data in parallel.
o The Map function processes the input to produce key-value pairs.
o For example, if you're counting words in a large dataset, the mapper might emit pairs
like (word, 1).
3. Shuffling Phase (Between Map and Reduce):
o The MapReduce framework automatically handles shuffling and sorting of the key-
value pairs output by the Map function.
o The data is grouped by key so that all values for a given key are sent to the same
Reducer.
4. Reduce Phase:
o The Reducer receives the sorted key-value pairs and processes them (e.g., aggregating
values).
o For example, summing the values for each word, so you get the total count of each
word in the dataset.
5. Output:
o The final result is written to output files stored in distributed storage like HDFS.
Shuffling in MapReduce
Shuffling refers to the phase between the Map and Reduce phases in the MapReduce process. It is the
process of transferring and sorting data based on the keys generated by the Mappers, so that all values
with the same key are sent to the same Reducer.
1. Sorting:
o After the Mapper finishes processing, the output key-value pairs are sorted by the key.
o Sorting is necessary for the Reducer to handle all pairs with the same key together.
2. Grouping:
o The shuffled data is grouped by key. This means that all values with the same key are
sent to one Reducer.
3. Data Transfer:
o The process of transferring data from Mappers to Reducers is called the shuffle step.
o This step is critical for ensuring that all data related to the same key gets processed
together.
4. Disk I/O:
o Shuffling often involves disk I/O operations because the system might store
intermediate data on disk before passing it to the Reducer, especially in large-scale
processing.
1. Map Phase: Input data is divided into chunks, each processed by a Mapper that outputs key-
value pairs.
2. Shuffle Phase: Key-value pairs are grouped and sorted by key.
3. Reduce Phase: A Reducer processes each group of key-value pairs, typically performing an
aggregation or other computation.
4. Output: Final results are stored in distributed storage.
Q.10 why is QoS important? List out the issues of QoS and explain.
QoS (Quality of Service) is crucial in cloud computing and network services to ensure that resources are
efficiently allocated and prioritized, especially when multiple applications and users are accessing the
same infrastructure.
✅ Key Facts – Importance of QoS:
1. Performance Guarantee
o Ensures consistent and reliable performance for mission-critical applications.
2. Resource Allocation
o Helps manage bandwidth, CPU, and memory usage to meet service-level requirements.
3. Prioritization of Traffic
o Differentiates between real-time (e.g., VoIP) and non-real-time (e.g., email) data for
smoother communication.
4. Service Level Agreement (SLA) Compliance
o Ensures services meet the agreed standards of uptime, latency, and throughput.
5. User Experience
o Enhances user satisfaction by minimizing delays, jitter, and packet loss.
❌ Issues/Challenges in QoS
Issue Explanation
1. Resource Contention Limited resources like bandwidth or CPU must be shared among users.
3. Jitter Variation in packet arrival times impacts streaming and VoIP quality.
6. Traffic Classification Misclassification of traffic may cause wrong prioritization of data flows.
8. SLA Violations Failing to meet QoS targets may lead to penalties or loss of trust.
Resource Resources are often owned by multiple Usually provided by a single provider
Ownership organizations and shared. (e.g., AWS, Azure).
Less flexible; scaling requires adding physical Highly scalable; resources can be
Scalability
machines. increased dynamically.
Not typically pay-per-use; often for internal or Pay-per-use model (like water or
Cost Model
research purposes. electricity).
Grid Computing is about sharing computing power across distributed nodes for complex
computations.
Utility Computing is about charging based on consumption of computing services.
Elastic Computing is about scaling resources up/down dynamically based on real-time needs.
Multitenant software is a type of software architecture where a single instance of an application serves
multiple customers (tenants). Each tenant’s data is isolated and secure, but the underlying
infrastructure and codebase are shared.
Multitenant techniques refer to methods used to implement multitenancy, ensuring data isolation,
security, and performance while maximizing resource sharing.
Shared Database, Separate All tenants share a database but have separate schemas for data
Schemas isolation.
Resource Throttling Prevents any single tenant from consuming all resources.
Access Control Enforces tenant-specific access via user roles and permissions.
🧩 Conclusion
Multitenant software maximizes efficiency by serving multiple users from a single system.
Multitenant techniques ensure that each tenant gets secure, isolated access without sacrificing
performance or cost.
WSDL is an XML-based language used to describe web services and how to access them.
1. Definition:
o WSDL stands for Web Services Description Language.
o It defines the interface of a web service in a platform-independent way.
2. Purpose:
o Describes the operations offered by the web service, the messages used, and how to
access them.
3. Based On:
o Written in XML, making it both machine and human readable.
4. Components of WSDL:
o Types: Data types used (often described using XML Schema).
o Message: The input/output data for each operation.
o PortType: Defines the operations (like methods in an interface).
o Binding: Specifies the communication protocol (e.g., SOAP, HTTP).
o Service: Specifies the service location (URL endpoint).
5. Used With:
o Commonly used with SOAP-based web services, but also supports other protocols.
6. Interoperability:
o Enables communication between heterogeneous systems (e.g., Java and .NET services).
A developer can use WSDL to automatically generate client stubs to interact with a web service.
Q.14 what is the purpose of mobile cloud computing? explain its features.
What is Mobile Cloud Computing (MCC)?
Mobile Cloud Computing (MCC) is a combination of cloud computing, mobile computing, and wireless
networks to bring rich computational resources to mobile users, network operators, and cloud
providers.
It enables mobile applications to offload computation and storage to the cloud, overcoming the
limitations of mobile devices.
✅ Extend Battery Life Reduces local computation, thus conserving battery on mobile devices.
✅ Increase Storage Uses cloud storage, removing the limitations of local mobile memory.
Users can access apps and data on the go, as long as they have
✅ Ubiquitous Access
internet connectivity.
Objective Explanation
✅ Real-time Data
Ensures up-to-date data across multiple devices using cloud sync.
Synchronization
🔹 Scalability Cloud-based apps can scale dynamically to support more users or data.
🔹 Mobility Support MCC supports seamless access across different mobile devices.
🔹 Resource Efficiency Uses cloud resources, minimizing device CPU, memory, and storage load.
Supports apps like GPS, online games, video conferencing with minimal
🔹 Real-Time Applications
delay.
🔹 Cross-platform Support Enables running applications across Android, iOS, Windows, etc.
🔹 Integration with IoT MCC can integrate with IoT devices for real-time processing and control.
Q.15 what are hardware virtualization techniques? what are the benefits of
virtualization.
ChatGPT said:
💻 What are Hardware Virtualization Techniques?
Hardware virtualization is the process of creating virtual versions of physical hardware components
using a hypervisor or virtual machine monitor (VMM). It allows multiple operating systems to run on a
single physical machine by abstracting the hardware layer.
Technique Description
✅ Hardware-Assisted Uses CPU extensions (e.g., Intel VT-x, AMD-V) for faster and secure
Virtualization virtualization.
✅ OS-Level Virtualization Virtualizes at the operating system level (e.g., Docker); shares the
(Containers) same OS kernel.
✅ Storage & Network Virtualizes storage and network components to enhance scalability
Virtualization and flexibility.
🌟 Benefits of Virtualization
Benefit Explanation
🚀 Improved Resource
Maximizes use of physical hardware by running multiple VMs.
Utilization
🔄 Isolation & Security Each VM is isolated, minimizing the risk of system-wide failures.
🌍 Platform Independence VMs can run different operating systems on the same physical host.
🧠 Conclusion
Hardware virtualization is a key enabler of cloud computing, data centers, and efficient IT
infrastructure. It allows organizations to maximize hardware efficiency, reduce costs, and improve
scalability and flexibility.