0% found this document useful (0 votes)

15 views

Cloud Computing Unit 5

Cloud computing unit 5

Uploaded by

visheshsanadhyacs21

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Cloud Computing Unit 5

Cloud computing unit 5

Uploaded by

visheshsanadhyacs21

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

1.

Introduction to Hadoop

Hadoop is an open-source framework designed to process and store large sets of data in a
distributed computing environment. It provides a way to manage big data using a cluster of
computers and offers scalable storage and processing power.

Key Components of Hadoop:

 Hadoop Distributed File System (HDFS): For distributed data storage.

 MapReduce: For distributed data processing.
 YARN (Yet Another Resource Negotiator): Manages resources and job scheduling.
 Hadoop Common: The set of common utilities and libraries required by other Hadoop modules.

Why Hadoop?

 Scalability: Can scale up to thousands of nodes.

 Fault Tolerance: Data replication ensures no data loss.
 Cost-Effectiveness: Runs on commodity hardware.
 Flexibility: Can process both structured and unstructured data.

2. Hadoop Architecture

HDFS (Hadoop Distributed File System)

HDFS is the storage layer of Hadoop, designed to store very large files across a distributed
environment. It divides large files into smaller blocks (typically 128MB or 256MB) and
distributes them across multiple nodes in the cluster.

Hadoop Distributed File System

Key Components of HDFS:

 NameNode: The master server that manages the metadata (structure) of the file system,
including the location of data blocks.
 DataNode: Worker nodes that store the actual data blocks. They periodically send heartbeats
and block reports to the NameNode.
 Secondary NameNode: Performs periodic checkpoints of the NameNode’s metadata to avoid
data loss in case of failure.

HDFS Features:

 Block Size: Large block size (128MB or 256MB) reduces the overhead of managing small files.
 Replication: Each block is replicated multiple times (default is 3 replicas) across the cluster to
ensure fault tolerance.
 High Throughput: Optimized for streaming data access rather than low-latency access.

MapReduce (Data Processing)

MapReduce is the computational layer of Hadoop, responsible for processing large volumes of
data in parallel across a distributed system.

How MapReduce Works:

1. Map Phase: The input data is divided into smaller chunks, and each chunk is processed by a
mapper. The mapper generates intermediate key-value pairs.
2. Shuffle and Sort Phase: The intermediate key-value pairs are shuffled and sorted by key before
being sent to the reducers.
3. Reduce Phase: The reducers process the sorted data, typically aggregating or summarizing the
results, and then write the output to HDFS.
Key Concepts in MapReduce:

 Mapper: Processes the input data and generates intermediate key-value pairs.
 Reducer: Aggregates the intermediate data from mappers based on the key and generates the
final output.
 JobTracker: The master node that schedules MapReduce jobs and monitors their progress.
 TaskTracker: Worker nodes that execute the map and reduce tasks.

Job tracker and task tracker in Hadoop

Word count in Hadoop

YARN (Yet Another Resource Negotiator)

YARN is the resource management layer of Hadoop, responsible for managing and scheduling
resources across the cluster. It was introduced in Hadoop 2.0 to separate resource management
and job scheduling from MapReduce.

YARN Components:

 ResourceManager: The master daemon responsible for managing resources and scheduling
tasks.
 NodeManager: The worker daemon running on each node in the cluster, which monitors
resource usage and reports to the ResourceManager.
 ApplicationMaster: Manages the execution of a specific application, including task scheduling
and resource allocation.

YARN Benefits:

 Resource Utilization: Supports multi-tenancy, allowing multiple applications (such as

MapReduce, Spark, etc.) to share resources.
 Scalability: Efficient resource allocation and management in a large cluster.
 Fault Tolerance: Automatically handles the failure of tasks and nodes.

Advantages of Hadoop

1. Scalability: Easily scales horizontally by adding more nodes to the cluster.

2. Cost-Effective: Runs on commodity hardware, making it affordable for big data
processing.
3. Fault Tolerance: Data is replicated across nodes to ensure no data loss in case of
failures.
4. Flexibility: Handles both structured and unstructured data (e.g., text, images, logs).
5. High Throughput: Optimized for high-speed data processing with low latency.
6. Parallel Processing: Uses MapReduce for distributed and parallel data processing.
7. Support for Complex Data: Can process diverse data types like JSON, XML, and text.
8. Large Ecosystem: Integrates with various tools (Hive, HBase, Spark, etc.) for extended
functionality.
9. Community Support: Backed by a large open-source community providing continuous
improvements.
10. Real-Time Processing: Supports real-time data processing via tools like Spark and
Storm.
11. Security: Provides strong security features like Kerberos authentication and data
encryption.
12. Easy Integration: Can integrate with existing systems (databases, cloud, etc.) for
enhanced data analysis.
1. Introduction to VirtualBox

VirtualBox is a free and open-source virtualization platform developed by Oracle. It allows users
to run multiple operating systems on a single physical machine by creating and managing virtual
machines (VMs). In the context of cloud computing, VirtualBox is often used for creating test
environments, learning virtualization concepts, and building scalable infrastructure setups.

What is Virtualization?

Virtualization is the creation of virtual (rather than physical) versions of resources, such as
servers, storage devices, or networks. In the case of VirtualBox, it allows a single computer to
run multiple guest operating systems (OS) simultaneously.

 Hypervisor: VirtualBox functions as a Type 2 hypervisor, meaning it runs on top of an

existing operating system (host OS) and creates virtual environments (VMs) on it.

2. Key Features of VirtualBox

 Cross-Platform: VirtualBox runs on Windows, Linux, macOS, and Solaris hosts,

allowing it to be used in diverse environments.
 Multiple OS Support: VirtualBox supports a wide range of guest operating systems,
including Windows, Linux, macOS, and others.
 Snapshots: VirtualBox allows users to take snapshots of a VM at any point in time.
These snapshots can be used to restore the system to a previous state.
 VM Cloning: VirtualBox enables the creation of exact copies of virtual machines,
making it easier to replicate environments for testing or deployment.
 Seamless Mode: Allows integration between the host and guest OS, enabling seamless
interaction between both systems.
 Networking Options: Offers various networking modes for virtual machines, including
NAT, Bridged, Host-Only, and Internal networking.
 Shared Folders: VirtualBox allows the sharing of files between the host and guest
systems by creating shared folders.
 VirtualBox Extension Pack: Extends the functionality of VirtualBox by adding features
like USB device support, remote desktop, and disk encryption.

3. VirtualBox Architecture

VirtualBox consists of several components that work together to create and manage virtual
machines. These components are:

1. Host OS: The physical operating system on which VirtualBox is installed.

2. Guest OS: The operating system that runs inside a virtual machine.
3. VM (Virtual Machine): A software-based simulation of a physical computer. Each VM
can run its own operating system and applications.
4. VirtualBox Manager: A graphical user interface (GUI) used to create, configure, and
manage virtual machines.
5. Virtual Machine Settings: Configurations that define how the VM behaves (e.g., CPU,
memory, storage, network).
6. Hypervisor (VirtualBox Engine): The core software component responsible for creating
and managing virtual machines.

What is a hypervisor?

A hypervisor is a software that you can use to run multiple virtual machines on a single physical
machine. Every virtual machine has its own operating system and applications. The hypervisor
allocates the underlying physical computing resources such as CPU and memory to individual virtual
machines as required

What are the types of hypervisors?

There are two types of hypervisors, each differing in architecture and performance.

Type 1 hypervisor
The type 1 hypervisor sits on top of the metal server and has direct access to the hardware
resources. Because of this, the type 1 hypervisor is also known as a bare-metal hypervisor. The host
machine does not have an operating system installed in a bare-metal hypervisor setup. Instead, the
hypervisor software acts as a lightweight operating system.

Type 2 hypervisor
The type 2 hypervisor is a hypervisor program installed on a host operating system. It is also known
as a hosted or embedded hypervisor. Like other software applications, hosted hypervisors do not
have complete control of the computer resources. Instead, the system administrator allocates the
resources for the hosted hypervisor, which it distributes to the virtual machines.
4. VirtualBox for Cloud Computing

VirtualBox plays an important role in the development and management of cloud computing
infrastructure by enabling virtualization. Here's how it connects to cloud computing concepts:

1. Virtualization in Cloud Computing

VirtualBox enables the creation of isolated virtual machines, which is a key feature of cloud
environments. Virtualization is at the heart of cloud computing, as it allows for the efficient use
of hardware resources, isolation, and scalability.

 On-Demand Provisioning: VirtualBox allows for quick creation and deletion of virtual
machines, a process similar to provisioning resources in a cloud environment.
 Testing & Development: VirtualBox is often used for setting up cloud environments in
test and development scenarios, where developers need to simulate a cloud infrastructure
or experiment with different configurations.

2. Cloud Simulation & Learning

While VirtualBox itself is not a full cloud platform, it can be used to simulate basic cloud-like
environments. For example, it allows users to:

 Create clusters of VMs for testing distributed applications (e.g., big data processing).
 Deploy software like OpenStack, Kubernetes, or Docker within VMs to simulate cloud
environments.

This helps users understand how cloud platforms like AWS, Microsoft Azure, or Google Cloud
work without needing access to a public cloud provider.

3. VirtualBox and Cloud Management Tools

While VirtualBox is not typically used for running production cloud environments, it integrates
well with cloud management platforms for educational or developmental purposes.

 Integration with Cloud Platforms: VirtualBox VMs can be exported or imported to

cloud providers that use virtualization (e.g., AWS EC2, Microsoft Azure VMs).
 Cloud-Init Scripts: These are used to automate the configuration of virtual machines
when they are started, enabling easy integration with cloud orchestration tools.

1. Introduction to Google App Engine (GAE)

Google App Engine (GAE) is a Platform as a Service (PaaS) offering from Google Cloud that
allows developers to build and deploy web applications without worrying about the underlying
infrastructure. GAE abstracts much of the system administration and hardware management,
enabling developers to focus on writing code rather than managing servers.

GAE provides auto-scaling, load balancing, and built-in application services, making it an
attractive solution for developers seeking to deploy apps in the cloud.

2. Key Features of Google App Engine

 Fully Managed Service: Google manages the infrastructure, so developers don’t need to
handle server management, patching, or scaling.
 Auto-Scaling: GAE automatically adjusts the number of running instances based on the
app’s traffic. This eliminates the need for manual scaling and ensures that resources are
allocated dynamically.
 Integrated with Google Cloud: GAE integrates seamlessly with other Google Cloud
services, such as Cloud Datastore, Google Cloud Storage, and Google Cloud Pub/Sub.
 Multi-language Support: Supports several programming languages like Python, Java,
Go, Node.js, Ruby, PHP, and more, allowing developers to use their preferred language.
 Serverless Architecture: Developers don't have to worry about provisioning or
managing servers, as App Engine automatically handles it for them.
 Managed Security: Built-in security features, including SSL, identity and access
management (IAM), and firewalls, to protect applications from threats.
 Development and Deployment Tools: Provides tools like the Google Cloud SDK, local
emulators, and continuous integration to make development and deployment easier.

3. Components of Google App Engine

1. App Engine Standard Environment:

o Provides a pre-configured environment for developing applications with specific
runtime environments (such as Python, Java, PHP, Node.js).
o Auto-Scaling: The standard environment automatically scales based on traffic,
ensuring that resources are used efficiently.
o Sandboxed Execution: Apps are restricted to run in a limited set of APIs,
ensuring security and isolation.
o Automatic Versioning: GAE supports versioning, allowing users to deploy new
versions of an app without downtime.
2. App Engine Flexible Environment:
o Allows more customization and supports a broader range of languages and
frameworks (e.g., Ruby, .NET, and custom Docker containers).
o Customizable VMs: Developers can create custom Docker containers and define
their application’s environment more freely.
o Scaling Control: In contrast to the standard environment, the flexible
environment allows more fine-grained control over scaling.
o Persistent Storage: Unlike the standard environment, the flexible environment
offers the ability to store persistent data in your app’s file system.
4. How Google App Engine Works

 Deployment: Developers upload their code to App Engine, which automatically manages
the resources required to run the application, including scaling the app and handling
traffic distribution.
 Scaling:
o Automatic Scaling: App Engine automatically adjusts the number of running
instances based on incoming traffic. For example, during peak traffic, App Engine
may create new instances to handle the load and then scale down during low
traffic.
o Manual Scaling: In the flexible environment, developers can define the minimum
and maximum number of instances required for their application.
 Routing: GAE uses load balancing to distribute incoming requests to the right instances.
It ensures that users’ requests are routed to the most appropriate version of the
application.
 Storage: GAE integrates with various Google Cloud storage services such as:
o Cloud Datastore: A NoSQL database for storing structured data.
o Cloud Storage: Used for storing large files, like images and videos.
o Cloud SQL: Managed relational databases supporting MySQL and PostgreSQL.

5. Benefits of Using Google App Engine

1. Simplified Deployment: Developers can deploy applications quickly without worrying

about the underlying infrastructure.
2. Automatic Scaling: App Engine automatically adjusts to changing application demands,
so you don’t need to manually manage scaling.
3. Cost Efficiency: You only pay for the resources that are consumed, based on traffic and
usage, which can significantly reduce operational costs.
4. Focus on Development: Developers can focus on writing code and building functionality
instead of managing servers, network configurations, or databases.
5. Global Availability: App Engine is hosted on Google’s global infrastructure, providing
low-latency access to users across the world.
6. Integrated Developer Tools: GAE integrates with Google’s suite of developer tools,
such as Cloud SDK, Cloud Build, and Stackdriver for monitoring, which simplifies the
development and deployment processes.
7. Security: Google handles security, including DDoS protection, encryption at rest and in
transit, and identity management.
8. Flexible Environment: Developers can run custom runtimes or Docker containers in the
flexible environment for specialized use cases.
6. Google App Engine Pricing

App Engine pricing is based on several factors:

1. Compute Resources: This includes the number of instance hours used by your
application.
2. Storage: Charges are applied for the data stored in services like Cloud Datastore, Cloud
SQL, and Cloud Storage.
3. Outbound Traffic: You are charged for outgoing data traffic from your app.
4. Additional Services: Google offers various services (e.g., email, messaging, monitoring)
that may incur additional costs.

Pricing Models:

 Free Tier: Google App Engine offers a free tier with limited resources, which is suitable
for small applications and learning purposes.
 Pay-as-You-Go: For larger applications, pricing is based on actual usage.

7. Common Use Cases of Google App Engine

1. Web Applications: GAE is ideal for building scalable web applications that need to
handle varying levels of traffic, such as social networking sites, news websites, and
blogs.
2. Mobile Backend: Many mobile applications use App Engine for managing user
authentication, storing data, and handling traffic to scale backend services dynamically.
3. Microservices: App Engine can be used to deploy microservices in a distributed system,
allowing each service to scale independently.
4. Real-Time Applications: GAE is useful for building real-time applications such as chat
apps, gaming apps, and collaboration tools.
5. Machine Learning APIs: Developers can deploy machine learning models as APIs to
serve predictions at scale using Google App Engine.

8. Google App Engine vs. Other Cloud Platforms

Google App Engine Microsoft Azure App

Feature AWS Elastic Beanstalk
(GAE) Services
Type of Service PaaS PaaS PaaS
Language Python, Java, Go, Node.js, Java, .NET, Node.js, .NET, Node.js, Python,
Support PHP, Ruby, etc. PHP, Python, etc. Java, PHP
Automatic scaling (both Automatic scaling (with Automatic scaling (with
Scaling
up and down) more control) more control)
Google App Engine Microsoft Azure App
Feature AWS Elastic Beanstalk
(GAE) Services
Limited to pre-configured Full customization with Full customization with
Customization
environments EC2 App Services
Managed Integrated with Google Integrated with AWS Integrated with Azure
Services Cloud services services services
Pay-as-you-go, free tier Pay-as-you-go, free tier Pay-as-you-go, free tier
Pricing
available available available

9. How to Deploy an App on Google App Engine

Step 1: Set Up a Google Cloud Project

1. Go to the Google Cloud Console.

2. Create a new project.

Step 2: Install Google Cloud SDK

1. Download and install the Google Cloud SDK.

2. Authenticate using gcloud auth login.

Step 3: Prepare the Application

1. Create a web application (for example, a Python Flask app or a Node.js Express app).
2. Include an app.yaml configuration file that defines the app’s environment and scaling
behavior.

Step 4: Deploy the Application

1. Run the command gcloud app deploy to deploy your application.

2. Google App Engine will automatically manage the deployment, scaling, and hosting of
your application.

Step 5: Access the Application

Once deployed, your app will be accessible at a URL like https://your-project-

id.appspot.com.
1. Introduction to OpenStack

OpenStack is an open-source platform that enables cloud computing infrastructure. It provides a

suite of software tools for building and managing cloud environments. OpenStack allows users to
manage compute, storage, and networking resources in a data center, enabling them to deploy
and scale cloud-based services.

Key Features of OpenStack:

 Open-Source: OpenStack is open-source and free to use, making it a cost-effective option for
private and public cloud deployments.
 Modular Architecture: OpenStack has a modular architecture with various components that
work together to provide cloud services.
 Scalability: It is highly scalable, meaning it can handle everything from small-scale deployments
to large-scale enterprise environments.
 Vendor-Neutral: It is compatible with multiple hardware and software platforms, making it
adaptable to various cloud needs.
 Multi-Tenancy: OpenStack supports multi-tenancy, allowing multiple organizations or
departments to share the same cloud infrastructure securely.

2. Core Components of OpenStack

OpenStack is divided into several key components, each serving a specific function in the cloud
infrastructure. Here are the primary components:
a. Nova (Compute)

 Purpose: Nova is responsible for provisioning and managing virtual machines (VMs) and
handling the computing resources in the cloud.
 Functionality: It manages the lifecycle of virtual machines (from creation to termination) and
manages various hypervisors (like KVM, VMware, Hyper-V).
 Key Features:
o Support for multiple hypervisors.
o VM orchestration and management.
o Integration with other OpenStack services like Neutron and Glance.

b. Swift (Object Storage)

 Purpose: Swift is responsible for providing scalable object storage.

 Functionality: It allows users to store and retrieve unstructured data (such as files, backups, and
media).
 Key Features:
o Highly available and fault-tolerant.
o Supports large amounts of data and ensures durability.
o Used for storing data in a distributed manner.

c. Cinder (Block Storage)

 Purpose: Cinder provides persistent block storage for virtual machines.

 Functionality: It allows users to create and manage block-level storage volumes that can be
attached to virtual machines (VMs).
 Key Features:
o Provides storage volumes that can be attached to instances.
o Supports data snapshots and backups.
o Integrates with various storage backends (e.g., iSCSI, Fibre Channel).

d. Neutron (Networking)

 Purpose: Neutron is the networking component of OpenStack, responsible for managing

networking resources and services.
 Functionality: It provides networking as a service, enabling virtual networks, subnets, routers,
and firewalls.
 Key Features:
o Support for software-defined networking (SDN).
o Network isolation for multi-tenancy.
o Integration with SDN platforms like Open vSwitch and hardware-based switches.

e. Horizon (Dashboard)

 Purpose: Horizon is the web-based user interface for OpenStack.

 Functionality: It provides a graphical interface to manage the OpenStack services and resources.
 Key Features:
o Intuitive dashboard for managing virtual machines, networks, storage, and other
resources.
o Role-based access control (RBAC) for different user permissions.

f. Keystone (Identity Service)

 Purpose: Keystone provides authentication and authorization services for OpenStack.

 Functionality: It manages user identities, services, and API access.
 Key Features:
o Centralized user authentication and authorization.
o Integration with other services for role-based access control.
o Support for different authentication methods (LDAP, OAuth).

g. Glance (Image Service)

 Purpose: Glance is responsible for managing virtual machine images.

 Functionality: It provides discovery, registration, and delivery of virtual machine disk images.
 Key Features:
o Supports image formats like QCOW2, RAW, and VHD.
o Image versioning and sharing.
o Integration with Nova for creating instances from images.

h. Heat (Orchestration)

 Purpose: Heat is used for orchestration, automating the deployment of resources and services.
 Functionality: It allows users to define the infrastructure requirements in a template (often
written in YAML) and deploy them in an automated manner.
 Key Features:
o Infrastructure-as-Code (IAC) for cloud services.
o Supports auto-scaling, load balancing, and resource provisioning.

i. Ceilometer (Telemetry)

 Purpose: Ceilometer collects usage metrics for various OpenStack services.

 Functionality: It provides monitoring and billing information.
 Key Features:
o Collects data on resource usage (CPU, storage, network).
o Enables usage-based billing and auditing.
o Works with other services to trigger alarms based on usage thresholds.

3. OpenStack Architecture

OpenStack follows a distributed, microservice-based architecture. The core components

communicate with each other via APIs and are typically deployed on different servers. The
architecture is designed for scalability and fault tolerance, enabling the system to grow and
remain resilient.

 Controller Node: Houses the services responsible for managing the cloud (e.g., Nova, Keystone,
Horizon).
 Compute Nodes: Run the virtual machines and are managed by Nova.
 Storage Nodes: Provide storage through Swift (object storage) and Cinder (block storage).
 Network Nodes: Handle networking tasks and are typically configured with Neutron to manage
network resources.

4. Deployment of OpenStack

OpenStack can be deployed in several ways:

 DevStack: A tool used to set up OpenStack on a single machine for testing and development
purposes.
 Packstack: A deployment tool for OpenStack that simplifies multi-node deployments.
 RDO: A community-supported distribution of OpenStack for Red Hat-based systems.
 Mirantis OpenStack: A commercial version of OpenStack with enterprise support.
 Kolla: A deployment tool that uses Docker containers to deploy OpenStack services.

OpenStack can be deployed on physical servers or virtual machines, and its services can run on
different nodes for redundancy and scalability.

5. Use Cases of OpenStack

OpenStack can be used in various cloud computing environments:

 Private Cloud: Organizations can use OpenStack to build their own private cloud to manage
internal resources securely.
 Public Cloud: OpenStack is used by some public cloud providers to offer cloud services at scale.
 Hybrid Cloud: OpenStack can be integrated with other cloud platforms (e.g., AWS, Google
Cloud) to provide a hybrid cloud environment, enabling workload migration.
 Edge Computing: OpenStack is also used in edge computing scenarios where computing power
is needed at the network edge for faster processing.

6. Benefits of OpenStack

 Cost-Effective: As an open-source platform, OpenStack helps reduce costs by avoiding licensing

fees.
 Flexibility: Users can choose the best technologies for their environment since OpenStack
supports a wide variety of hardware and software.
 Community Support: Being open-source, OpenStack has a large and active community, which
ensures continuous development and a wealth of resources for troubleshooting and
collaboration.
 Customizability: Organizations can modify OpenStack to meet their specific needs and integrate
with other systems or platforms.

7. Challenges with OpenStack

 Complexity: OpenStack can be complex to deploy and manage, particularly for organizations
without prior experience in cloud infrastructure.
 Compatibility Issues: Compatibility between different components and versions can cause
issues, especially in large environments.
 Resource Intensive: OpenStack requires significant hardware resources for large-scale
deployments, making it potentially expensive in terms of infrastructure.

1. What is Federation?

Federation in the context of services and applications refers to the concept of linking multiple
independent systems, often from different organizations, so they can interact and share resources
while maintaining separate control. Federation typically involves:

 Data Sharing: Enabling secure and seamless data exchange across systems.
 Single Sign-On (SSO): Allowing users to authenticate once and access multiple
applications/services across different domains without needing to log in again.

The four levels of federation in cloud computing are generally:

1. Identity Federation

 Definition: This is the most common type of federation and refers to the ability to share
and manage user identity across multiple domains or organizations.
 Key Concept: It enables Single Sign-On (SSO), allowing users to authenticate once and
gain access to services across different cloud environments or systems without needing to
log in multiple times.
 How it Works:
o Identity providers (IdPs) like Microsoft Active Directory, Google Identity, or
other authentication services issue tokens that can be trusted by service providers
(SPs) across different platforms or organizations.
o Common protocols include SAML (Security Assertion Markup Language),
OAuth, and OpenID Connect.
 Example: An organization federates its internal directory service (e.g., Active Directory)
with a cloud service provider (e.g., Google Cloud), so employees can access both internal
applications and cloud resources using a single set of credentials.

2. Resource Federation

 Definition: Resource federation refers to the ability to share and manage computing
resources like storage, compute power, and networking across multiple clouds or data
centers.
 Key Concept: It enables the pooling and sharing of resources across different cloud
environments, creating a unified infrastructure that can scale based on demand.
 How it Works:
o Organizations can link their private cloud with public clouds, or multiple public
cloud services, to create a hybrid cloud or multi-cloud environment.
o Federation technologies allow workloads to move seamlessly between clouds to
optimize resource usage, reduce costs, and improve redundancy.
 Example: A company running workloads in a private cloud may burst compute resources
to a public cloud during periods of high demand, utilizing cloud bursting for scalability.

3. Service Federation

 Definition: Service federation involves the integration and seamless operation of

different cloud services, whether those services are hosted by different providers or by
the same provider in different regions.
 Key Concept: It allows services across different cloud platforms to communicate and
work together, facilitating better orchestration and interoperability between systems.
 How it Works:
o This type of federation integrates cloud applications and services, using APIs,
middleware, or cloud management platforms to enable unified access to multiple
services.
o Often involves service-level agreements (SLAs), API gateways, or federated
service catalogs to ensure smooth interaction.
 Example: A business using Salesforce (CRM) for customer management may federate
its cloud data with an external cloud storage service, like Amazon S3, for backup and
archival. Similarly, services like identity management (e.g., Okta or Azure AD) may be
federated with cloud applications.

4. Data Federation

 Definition: Data federation refers to the process of integrating and accessing data
distributed across multiple cloud environments or data sources as though it is stored in a
single, unified system.
 Key Concept: It allows organizations to access data from multiple disparate data stores
(e.g., databases, cloud storage) without needing to physically move the data.
 How it Works:
o Data federation technologies create a virtual layer that unifies the view of data
from multiple locations, enabling queries across various cloud systems without
duplicating or transferring the actual data.
o Often used in data integration platforms or data lakes that aggregate data from
multiple cloud services or data silos.
 Example: A business using multiple cloud storage solutions (e.g., AWS S3, Azure Blob
Storage, and Google Cloud Storage) may federate the data into a single virtual data
warehouse to enable unified querying and analytics.

2. Types of Federated Services and Applications

1. Federated Identity Management:

o A method of linking a user’s identity across multiple identity providers (e.g.,
Google, Facebook, corporate networks) to allow access to multiple applications
with a single set of credentials.
o Common protocols include SAML (Security Assertion Markup Language) and
OAuth.
o Example: Google Workspace with federated login from a corporate directory.
2. Federated Cloud Services:
o Different cloud providers (e.g., AWS, Azure, Google Cloud) can be federated to
enable resource sharing, such as using compute power from one cloud and storage
from another.
o Helps avoid vendor lock-in and optimize resource allocation across different
platforms.
3. Federated Data Systems:
o Systems that allow data sharing or querying across different databases or
platforms without centralizing the data into one repository. These systems
maintain data sovereignty but allow collaboration.
o Example: Healthcare data systems federated across different hospitals or clinics
while maintaining patient privacy.
4. Federated Learning:
o A machine learning technique where models are trained across multiple
decentralized devices or servers holding local data, without sharing the data itself.
The model is aggregated from the results of each node to improve without
violating data privacy.
o Example: Google’s Federated Learning in Android phones to improve typing
predictions while maintaining user privacy.
5. Federated Search:
o A search technology that allows a search engine to query multiple, distributed
databases or websites at once and aggregate results into a single view.
o Example: A federated search engine in an enterprise environment that aggregates
results from multiple internal document management systems.
3. Benefits of Federated Services and Applications

1. Decentralized Control: Different organizations or systems retain control over their own
data and resources while still collaborating.
2. Scalability: Allows for scaling across different domains and services without centralizing
everything.
3. Security: Federation can enable secure data sharing and authentication protocols,
reducing the need to replicate data across systems.
4. Cost Efficiency: By connecting existing systems, federation allows organizations to
collaborate without major infrastructure investments.
5. User Convenience: With federated identity management, users can access a wide range
of services and applications with a single login (SSO).
6. Interoperability: Federated services can bridge the gap between different systems,
enabling them to work together even if they were not originally designed to do so.

4. Challenges of Federated Services and Applications

1. Data Privacy and Compliance: Federating services can make it difficult to ensure that
data privacy regulations (e.g., GDPR) are maintained, as data crosses organizational
boundaries.
2. Security Risks: Although federation offers centralized authentication, it still poses
security risks, as a compromised identity provider could jeopardize access to multiple
systems.
3. Complexity: Setting up and managing federated systems can be complex, requiring
protocols and standards to be adhered to across various platforms.
4. Latency: Data transfer and synchronization across federated systems may lead to
increased latency, affecting performance.
5. Compatibility Issues: Different federated systems may use different standards,
protocols, or technologies, requiring additional middleware or adapters for seamless
integration.

5. Examples of Federated Services and Applications

1. Social Media Authentication (SSO):

o Many websites use federated login systems that allow users to log in via their
social media accounts (Google, Facebook, Twitter) instead of creating a new
account for each site. This is an example of federated identity management.
2. Healthcare Systems:
o Different healthcare providers or institutions might federate their patient
databases to allow healthcare professionals access to a patient’s medical records
across different facilities while ensuring data privacy and security.
o Example: A national healthcare system that federates data between hospitals,
clinics, and pharmacies.
3. Cloud Federation:
o Companies might federate multiple cloud services, such as combining AWS for
compute resources and Azure for storage, to ensure better performance, security,
and cost optimization across services.
4. Cross-Organization Collaboration:
o Organizations with federated email or collaboration systems (e.g., Microsoft 365,
Google Workspace) can share resources (like calendars, email, documents)
securely across different companies without giving up control over their internal
resources.

Future of Federation in Cloud Computing

1. Multi-Cloud and Hybrid Cloud Adoption

o Increased use of multi-cloud and hybrid cloud strategies.
o Federation enables seamless interoperability across different clouds.
2. Edge Computing Integration
o Federation will extend to edge computing, integrating local edge devices with
cloud infrastructure.
o Real-time data processing and low-latency applications.
3. AI/ML Federation
o Federating AI/ML models and data across clouds for better training and model
deployment.
o Faster innovation and improved performance.
4. Enhanced Security and Privacy Controls
o Stronger security with zero-trust architectures, encryption, and advanced identity
management.
o Improved compliance and data sovereignty features for federated systems.
5. Federated Cloud Service Marketplaces
o Emergence of unified cloud service marketplaces to access services from multiple
cloud providers.
o Simplified service discovery and management.
6. Federated Cloud-Native Architectures
o Federation will support cloud-native technologies like containers and serverless
computing.
o Kubernetes and other orchestration platforms to manage federated cloud-native
environments.
7. Federation and Automation
o Automation tools to manage federated cloud environments, optimizing resource
allocation.
o AI-driven cloud management for minimal manual intervention.
8. Interoperability and Open Standards
o Increased focus on open standards for federated systems to ensure cross-cloud
interoperability.
o Easier integration of services across cloud platforms.
9. Blockchain for Trust and Security
o Blockchain used for decentralized identity management and secure transactions in
federated systems.
o Improved trust and transparency.
10. Cloud Providers as Federation Facilitators
o Cloud providers will offer built-in federation tools for resource orchestration,
identity management, and security.
o Enhanced federated services as core offerings.

Deploying ACI Cisco Press
100% (1)
Deploying ACI Cisco Press
705 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Describe The Functions and Features of HDP
100% (2)
Describe The Functions and Features of HDP
16 pages
FCC_Module v - Cloud Technologies and Advancements
No ratings yet
FCC_Module v - Cloud Technologies and Advancements
63 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
CLOUD COMPUTING UNIT-5 (1)
No ratings yet
CLOUD COMPUTING UNIT-5 (1)
19 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
CLOUD.pdf
No ratings yet
CLOUD.pdf
138 pages
Unit - 3
No ratings yet
Unit - 3
34 pages
CC unit5
No ratings yet
CC unit5
27 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
R programming
No ratings yet
R programming
16 pages
YARN
No ratings yet
YARN
5 pages
10 - Big Data Architecture and Tools (1)
No ratings yet
10 - Big Data Architecture and Tools (1)
31 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Introduction to Hadoop - Copy
No ratings yet
Introduction to Hadoop - Copy
14 pages
learn
No ratings yet
learn
16 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
Hadoop Common Hadoop Distributed File System (HDFS) Hadoop Yarn Hadoop Mapreduce
No ratings yet
Hadoop Common Hadoop Distributed File System (HDFS) Hadoop Yarn Hadoop Mapreduce
30 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
1 Bda Chapter1 Answer
No ratings yet
1 Bda Chapter1 Answer
7 pages
DCC
No ratings yet
DCC
7 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Chapter_5
No ratings yet
Chapter_5
8 pages
Hadoop Important Lecture
No ratings yet
Hadoop Important Lecture
38 pages
Bda 201070046 01
No ratings yet
Bda 201070046 01
24 pages
UNIT II virtualization
No ratings yet
UNIT II virtualization
17 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
31 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Hadoop
No ratings yet
Hadoop
7 pages
Hadoop 1 Converted
No ratings yet
Hadoop 1 Converted
26 pages
Big data unit 3 own
No ratings yet
Big data unit 3 own
20 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
34 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
CDC Ansbank
No ratings yet
CDC Ansbank
26 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Wa0002.
No ratings yet
Wa0002.
32 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
os unit 5 lm2 for chatgpt
No ratings yet
os unit 5 lm2 for chatgpt
9 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
BDA-Unit-1
No ratings yet
BDA-Unit-1
35 pages
VM_Devops
No ratings yet
VM_Devops
8 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
53 pages
Unit V FRAMEWORKS AND VISUALIZATION
No ratings yet
Unit V FRAMEWORKS AND VISUALIZATION
71 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
20 pages
Hadoop
No ratings yet
Hadoop
7 pages
ECS765P_W3_Hadoop principles and components
No ratings yet
ECS765P_W3_Hadoop principles and components
47 pages
Big Data Analysis pdf 2
No ratings yet
Big Data Analysis pdf 2
18 pages
Big Data Unit 2 Notes
No ratings yet
Big Data Unit 2 Notes
6 pages
CC CW Chapter 2
No ratings yet
CC CW Chapter 2
9 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
No ratings yet
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
25 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
DC - Co 1 All in 1 PDF
No ratings yet
DC - Co 1 All in 1 PDF
197 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
Chapter3 HDFS MapReduce YARN
No ratings yet
Chapter3 HDFS MapReduce YARN
35 pages
Haoop Architecture
No ratings yet
Haoop Architecture
34 pages
OpenShift Container Platform-4.6-Authentication and authorization-en-US
No ratings yet
OpenShift Container Platform-4.6-Authentication and authorization-en-US
154 pages
Ovn-Architecture 7 Openvswitch-Manual
No ratings yet
Ovn-Architecture 7 Openvswitch-Manual
16 pages
Red Hat Virtualization-4.4-Installing Red Hat Virtualization As A Standalone Manager With Remote Databases-En-Us
No ratings yet
Red Hat Virtualization-4.4-Installing Red Hat Virtualization As A Standalone Manager With Remote Databases-En-Us
87 pages
Study About Openstack
No ratings yet
Study About Openstack
11 pages
JIT9010
No ratings yet
JIT9010
17 pages
CloudBoost Docu71141 NetWorker 8.x With EMC CloudBoost 2.1 Integration Guide
No ratings yet
CloudBoost Docu71141 NetWorker 8.x With EMC CloudBoost 2.1 Integration Guide
68 pages
Cloud Computing - Notes
No ratings yet
Cloud Computing - Notes
68 pages
Red Hat Gluster Storage-3.4-Quick Start Guide-En-US
No ratings yet
Red Hat Gluster Storage-3.4-Quick Start Guide-En-US
37 pages
RHEL 8.5 - Configuring and Managing Cloud-Init For RHEL 8
No ratings yet
RHEL 8.5 - Configuring and Managing Cloud-Init For RHEL 8
37 pages
OpenStack Networking Essentials - Sample Chapter
No ratings yet
OpenStack Networking Essentials - Sample Chapter
23 pages
01 OpenStack Overview
No ratings yet
01 OpenStack Overview
29 pages
Openstack Made Easy: Executive Summary
No ratings yet
Openstack Made Easy: Executive Summary
10 pages
Suse-3125
No ratings yet
Suse-3125
1 page
Open Source Clouds
No ratings yet
Open Source Clouds
20 pages
Nova HA
100% (1)
Nova HA
22 pages
Software Defined Storage Why What How
No ratings yet
Software Defined Storage Why What How
43 pages
Red Hat Directory Server-8.2-Performance Tuning Guide-En-US
No ratings yet
Red Hat Directory Server-8.2-Performance Tuning Guide-En-US
44 pages
A Nsibl e Tower User Guide
No ratings yet
A Nsibl e Tower User Guide
200 pages
H3C CAS Virtualization Platform Datasheet
No ratings yet
H3C CAS Virtualization Platform Datasheet
17 pages
Yang 2019
No ratings yet
Yang 2019
4 pages
SCE Admin Guide 31
No ratings yet
SCE Admin Guide 31
184 pages
Red Hat OpenStack-3-Deployment Guide Foreman Technical Preview-En-US
No ratings yet
Red Hat OpenStack-3-Deployment Guide Foreman Technical Preview-En-US
50 pages
16 FusionSphare OpenStack OM Services Operation
No ratings yet
16 FusionSphare OpenStack OM Services Operation
37 pages
Some Useful Commands
No ratings yet
Some Useful Commands
10 pages
ttm4200 Openstack Setup Guide
No ratings yet
ttm4200 Openstack Setup Guide
3 pages
Red Hat OpenStack Platform-13-Director Installation and Usage-en-US
No ratings yet
Red Hat OpenStack Platform-13-Director Installation and Usage-en-US
212 pages
Red Hat Enterprise Virtualization Features Guide: Technology Overview
No ratings yet
Red Hat Enterprise Virtualization Features Guide: Technology Overview
9 pages
Red Hat Virtualization-4.4-Installing Red Hat Virtualization As A Self-Hosted Engine Using The Cockpit Web interface-en-US
No ratings yet
Red Hat Virtualization-4.4-Installing Red Hat Virtualization As A Self-Hosted Engine Using The Cockpit Web interface-en-US
82 pages
Resume Avanish
No ratings yet
Resume Avanish
3 pages

Cloud Computing Unit 5

Uploaded by

Cloud Computing Unit 5

Uploaded by

1.

Key Components of Hadoop:

 Hadoop Distributed File System (HDFS): For distributed data storage.

 Scalability: Can scale up to thousands of nodes.

HDFS (Hadoop Distributed File System)

Hadoop Distributed File System

MapReduce (Data Processing)

How MapReduce Works:

Job tracker and task tracker in Hadoop

Word count in Hadoop

 Resource Utilization: Supports multi-tenancy, allowing multiple applications (such as

1. Scalability: Easily scales horizontally by adding more nodes to the cluster.

 Hypervisor: VirtualBox functions as a Type 2 hypervisor, meaning it runs on top of an

2. Key Features of VirtualBox

 Cross-Platform: VirtualBox runs on Windows, Linux, macOS, and Solaris hosts,

1. Host OS: The physical operating system on which VirtualBox is installed.

What are the types of hypervisors?

1. Virtualization in Cloud Computing

2. Cloud Simulation & Learning

3. VirtualBox and Cloud Management Tools

 Integration with Cloud Platforms: VirtualBox VMs can be exported or imported to

1. Introduction to Google App Engine (GAE)

2. Key Features of Google App Engine

3. Components of Google App Engine

1. App Engine Standard Environment:

5. Benefits of Using Google App Engine

1. Simplified Deployment: Developers can deploy applications quickly without worrying

App Engine pricing is based on several factors:

7. Common Use Cases of Google App Engine

8. Google App Engine vs. Other Cloud Platforms

Google App Engine Microsoft Azure App

9. How to Deploy an App on Google App Engine

Step 1: Set Up a Google Cloud Project

1. Go to the Google Cloud Console.

Step 2: Install Google Cloud SDK

1. Download and install the Google Cloud SDK.

Step 3: Prepare the Application

Step 4: Deploy the Application

1. Run the command gcloud app deploy to deploy your application.

Step 5: Access the Application

Once deployed, your app will be accessible at a URL like https://your-project-

OpenStack is an open-source platform that enables cloud computing infrastructure. It provides a

Key Features of OpenStack:

2. Core Components of OpenStack

b. Swift (Object Storage)

 Purpose: Swift is responsible for providing scalable object storage.

c. Cinder (Block Storage)

 Purpose: Cinder provides persistent block storage for virtual machines.

 Purpose: Neutron is the networking component of OpenStack, responsible for managing

 Purpose: Horizon is the web-based user interface for OpenStack.

f. Keystone (Identity Service)

 Purpose: Keystone provides authentication and authorization services for OpenStack.

g. Glance (Image Service)

 Purpose: Glance is responsible for managing virtual machine images.

 Purpose: Ceilometer collects usage metrics for various OpenStack services.

OpenStack follows a distributed, microservice-based architecture. The core components

OpenStack can be deployed in several ways:

5. Use Cases of OpenStack

OpenStack can be used in various cloud computing environments:

 Cost-Effective: As an open-source platform, OpenStack helps reduce costs by avoiding licensing

7. Challenges with OpenStack

The four levels of federation in cloud computing are generally:

 Definition: Service federation involves the integration and seamless operation of

2. Types of Federated Services and Applications

1. Federated Identity Management:

4. Challenges of Federated Services and Applications

5. Examples of Federated Services and Applications

1. Social Media Authentication (SSO):

Future of Federation in Cloud Computing

1. Multi-Cloud and Hybrid Cloud Adoption

You might also like