0% found this document useful (0 votes)
4 views

CC unit5

cloud computing unit 5 notes

Uploaded by

prashant98528
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

CC unit5

cloud computing unit 5 notes

Uploaded by

prashant98528
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Unit 5:

Cloud Technologies and advancement in Hadoop


 Hadoop in Cloud Computing:
• Hadoop is an open-source framework designed to efficiently store and process large
datasets(big data), ranging from gigabytes to petabytes.

• Instead of relying on a single powerful machine, Hadoop clusters multiple computers in a


distributed computing environment to analyze massive datasets in parallel, enabling faster and
more efficient data processing. This distributed computing environment allows Hadoop to scale
easily and handle the vast amount of data associated with big data applications.

• It is java-based programming framework.

• Cluster: A group of interconnected computers working together as a single system to store and
process data. In Hadoop, clusters are used for distributed data storage and parallel
computation.

• Big Data: Extremely large datasets that are difficult to store, process, or analyze using traditional
tools due to their volume, velocity, and variety.

 Main Modules of Hadoop


Hadoop comprises four main modules that work together to manage data storage and processing:

1. Hadoop Distributed File System (HDFS):

– A distributed file system that operates on low-cost hardware.

– Provides high fault tolerance, better data throughput, and native support for large
datasets.

– Data is divided into blocks (typically 128 MB or 256 MB in size), and each block is
replicated across multiple DataNodes to ensure fault tolerance and data availability.

2. Yet Another Resource Negotiator (YARN):

– Manages resources and monitors cluster nodes.

– Handles job scheduling and task execution within the Hadoop environment.
3. MapReduce:

– A programming framework for parallel data computation.

– Map Task: Converts input data into key-value pairs for processing.

– Reduce Task: Aggregates and organizes the results of map tasks to generate the desired
output.

4. Hadoop Common:

– Provides shared Java libraries and utilities that are essential for the functioning of all
other Hadoop modules.

 Features of Hadoop:
• Distributed Storage:

– Data is stored in smaller blocks across multiple nodes in the Hadoop Distributed File
System (HDFS).

• Parallel Processing:

– Utilizes the MapReduce framework to process data in parallel across the cluster.

• Scalability:

– Supports horizontal scaling by adding more nodes to the cluster to handle increasing
data loads.

• Fault Tolerance:

– Automatically replicates data blocks across nodes, ensuring data availability in case of
node failure.

• YARN for Resource Management:

• Dynamically allocates and manages resources for optimal task execution.

• Flexibility:

– Handles structured, semi-structured, and unstructured data from various sources.

• Open-Source Framework:

– Free to use and customizable, encouraging experimentation and innovation.


 Hadoop Architecture:
• Hadoop is designed to efficiently process and store large datasets in a distributed computing
environment. Its architecture comprises key components like the Hadoop Distributed File
System (HDFS), MapReduce engine, and a cluster with a master-slave architecture.

Key Components of Hadoop Architecture:

Hadoop Cluster Architecture: A Hadoop cluster follows a master/slave architecture:

• Master Node:

– Contains components like Job Tracker, Task Tracker, NameNode, and DataNode to
coordinate and manage tasks.

• Slave Nodes:

– Each node includes DataNodes and TaskTrackers to execute the tasks and store data
blocks.

Fig: Components of hadoop


1. Hadoop Distributed File System(HDFS):

• The Hadoop Distributed File System (HDFS) is a distributed file system for Hadoop.

• It contains a master/slave architecture. This architecture consist of a single NameNode performs


the role of master, and multiple DataNodes performs the role of a slave.

• Both NameNode and DataNode are capable enough to run on commodity machines.

• The Java language is used to develop HDFS. So any machine that supports Java language can
easily run the NameNode and DataNode software.

 NameNode (Master):

• Manages the file system namespace and metadata.

• Executes file operations like opening, renaming, and closing files.

• Handles metadata requests from Job Tracker and simplifies system architecture.

• As a single master, it may lead to a single point of failure.

 DataNode (Slave):

• The HDFS cluster contains multiple DataNodes.

• Each DataNode contains multiple data blocks.

• These data blocks are used to store data.

• Executes tasks like block creation, deletion, and replication on instructions from NameNode.

• Handles read and write requests from clients.

2. MapReduce Layer: The MapReduce comes into existence when the client application submits the
MapReduce job to Job Tracker. In response, the Job Tracker sends the request to the appropriate Task
Trackers. Sometimes, the TaskTracker fails or time out. In such a case, that part of the job is
rescheduled.

• MapReduce Engine Processes large datasets in parallel by dividing tasks into map and reduce
operations.

• The Map Task processes input data and converts it into key-value pairs.

• The Reduce Task aggregates and organizes the processed data.

 Job Tracker (Master):

– Accepts MapReduce jobs from clients and processes data using NameNode metadata.
– Coordinates task execution by assigning tasks to the appropriate Task Trackers.

 Task Tracker (Slave):

– Works under the Job Tracker to execute assigned tasks.

– Applies code to the data, often referred to as the Mapper process.

– Handles task rescheduling if failures or timeouts occur.

Fig: Architecture of Hadoop.

Data Processing Workflow in Hadoop:


 Client submits a MapReduce job to Job Tracker.
 Job Tracker assigns tasks to Task Trackers based on data locality and availability.
 Task Tracker executes the tasks. If a Task Tracker fails, the job is rescheduled on another node.
 Data is split into blocks and processed in parallel across the cluster for faster and efficient
analytics.
 Benefits of Hadoop:
• Scalability: Hadoop can easily scale by adding more nodes to the cluster to handle growing
datasets without affecting performance.

• Cost-Effectiveness: It uses low-cost commodity hardware, reducing the overall cost of


infrastructure compared to traditional systems.

• Fault Tolerance: Hadoop automatically replicates data blocks across multiple nodes in the
cluster, ensuring data availability even in the event of a node failure.

• Data Flexibility: It can process structured, semi-structured, and unstructured data, making it
suitable for diverse datasets like text, images, and videos.

• High-Speed Processing: Through parallel processing across multiple nodes, Hadoop ensures
faster analysis of large datasets.

• Efficient Storage Management: Data is stored in smaller blocks within the Hadoop Distributed
File System (HDFS), ensuring optimal storage utilization.

• Resource Optimization: The YARN module dynamically manages resources across the cluster,
ensuring efficient execution of tasks.

• Integration with Cloud: Hadoop works seamlessly with cloud platforms, leveraging elastic
scalability and on-demand provisioning to handle big data challenges in real-time.
• Open-Source Framework: Hadoop is open-source, making it cost-effective and customizable
without licensing restrictions, ideal for academic projects.

• Support for Big Data Applications: It enables advanced data analytics and processing, beneficial
for fields like healthcare, e-commerce, and research.

 MapReduce:

MapReduce is a programming model and framework within the Hadoop ecosystem that enables
efficient processing of big data by automatically distributing and parallelizing the computation. It
consists of two fundamental tasks: Map and Reduce.

In the Map phase, the input data divides into smaller chunks and processes independently in parallel
across multiple nodes in a distributed computing environment. Each chunk transforms or “maps” into
key-value pairs by applying a user-defined function. The output of the Map phase is a set of
intermediate key-value pairs.

The Reduce phase follows the Map phase. It gathers the intermediate key-value pairs generated by the
Map tasks, performs data shuffling to group together pairs with the same key, and then applies a user-
defined reduction function to aggregate and process the data. The output of the Reduce phase is the
final result of the computation.
Map Reduce example allows for efficient processing of large-scale datasets by leveraging parallelism and
distributing the workload across a cluster of machines. It simplifies the development of distributed data
processing applications by abstracting away the complexities of parallelization, data distribution, and
fault tolerance, making it an essential tool for big data processing in the Hadoop ecosystem.

MapReduce Architecture

Map Reduce example process has the following phases:

1. Input Splits
2. Mapping
3. Shuffling
4. Sorting
5. Reducing

Input Splits: MapReduce splits the input into smaller chunks called input splits, representing a
block of work with a single mapper task.

Mapping: The input data is processed and divided into smaller segments in the mapper phase,
where the number of mappers is equal to the number of input splits. Record-Reader produces a
key-value pair of the input splits using Text-Format, which Reducer later uses as input. The mapper
then processes these key-value pairs using coding logic to produce an output of the same form.

Shuffling: In the shuffling phase, the output of the mapper phase is passed to the reducer phase by
removing duplicate values and grouping the values. The output remains in the form of keys and
values in the mapper phase. Since shuffling can begin even before the mapper phase is complete,
it saves time.

Sorting: Sorting is performed simultaneously with shuffling. The Sorting phase involves merging
and sorting the output generated by the mapper. The intermediate key-value pairs are sorted by
key before starting the reducer phase, and the values can take any order. Sorting by value is done
by secondary sorting.

Reducing: In the reducer phase, the system reduces the intermediate values from the shuffling
phase to produce a single output value that summarizes the entire dataset. The process then uses
HDFS to store the final output.

Here’s an Map Reduce example to count the frequency of each word in an input text. The text is,
“This is an apple. Apple is red in color.”

 The input data divides into multiple segments and then processes in parallel to reduce

processing time. In this case, we will divide the input data into two input splits to distribute

work over all the map nodes.

 The Mapper counts the number of times each word occurs from input splits in the form of

key-value pairs where the key is the word, and the value is the frequency.

 For the first input split, it generates 4 key-value pairs: This, 1; is, 1; an, 1; apple, 1; and for

the second, it generates 5 key-value pairs: Apple, 1; is, 1; red, 1; in, 1; color.
 It is followed by the shuffle phase, in which the values are grouped by keys in the form of

key-value pairs. Here we get a total of 6 groups of key-value pairs.

 The same reducer handles all key-value pairs with the same key.

 Combine all the words present in the data into a single output during the reducer

phase. The output shows the frequency of each word.

 Here in the example, we get the final output of key-value pairs as This, 1; is, 2; an, 1;

apple, 2; red, 1; in, 1; color, 1.

 The record writer writes the output key-value pairs from the reducer into the output files,

and by default, HDFS stores the final output data.

 Four levels of Federation


1. Permissive federation
2. Verified federation
3. Encrypted federation
4. Trusted federation

1. Permissive federation: Permissive federation occurs when server accepts a connection


from a peer network server without verifying its identity using DNS lookups or certificate
checking.
• The lack of verification or authentication may lead to domain spoofing (the unauthorized use
of a third-party domain name in an email message in order to pretend to be someone else),
which opens the door to widespread spam and other abuses.
• With the release of the open source jabbered 1.2 server in October 2000, which included
support for the Server Dialback protocol (fully supported in Jabber XCP), permissive federation
met its demise on the XMPP network.

2. Verified federation: This type of federation occurs when a server accepts a connection
from a peer after the identity of the peer has been verified.
• It uses information obtained via DNS and by means of domain-specific keys exchanged
beforehand.
• The connection is not encrypted, and the use of identity verification effectively prevents
domain spoofing
• To make this work, federation requires proper DNS setup, and that is still subject to DNS
poisoning attacks.
• Verified federation has been the default service policy on the open XMPP since the release of
the open-source jabbered 1.2 server.

3. Encrypted Federation: In this mode, a server accepts a connection from peer if and only
if the peer supports Transport Layer Security (TLS) as defined for XMPP in Request for
Comments (RFC).
• The peer must present a digital certificate.
• The certificate may be self-signed, but this prevents using mutual authentication. If this is the
case, both parties proceed to weakly verify identity using Server Dialback.
Server Dialback protocol, which is used between XMPP servers to provide identity verfication.
Dialback uses the DNS as the basis for verifying identity; the basic approach is that when a
receiving server receives a server to server connection request from an originating server, it
does not accept the request until it has verified a key with an authoritative server for the
domain asserted by the originating server.
• Although Server Dialback does not provide strong authentication or trusted federation, and
although it is subject to DNS poisoning attacks, it has effectively prevented most instances of
address spoofing on the XMPP network since its release in 2000 • This results in an encrypted
connection with weak identity verification.

4. Trusted federation: Here, a server accepts a connection from peer only under the
stipulation that the peer supports TLS and the peer can present a digital certificate issued by a
root certification authority (CA) that is trusted by the authenticating server.
• The list of trusted root CA s may be determined by one or more factors, such as the operating
system, XMPP server software, or local service policy.
• In trusted federation, the use of digital certificates results not only in a channel encryption
but also in strong authentication.
• The use of trusted domain certificates effectively prevents DNS poisoning attacks but makes
federation more difficult, since such certificates have traditionally not been easy to obtain.
Key Terms

 DNS (Domain Name System): Converts domain names into IP addresses to direct internet
traffic.
 TLS (Transport Layer Security): Protocol that provides secure communication by encrypting
data.
 CA (Certification Authority): A trusted organization that issues digital certificates to verify
identities.
 XMPP (Extensible Messaging and Presence Protocol): A communication protocol for messaging
systems.
 RFC 3920: A standard document defining TLS implementation in XMPP.
 XEP-0220: A protocol for verifying server identities in XMPP using Server Dialback.

Cloud Federation Stack:


Creating a cloud federation involves research and development at different levels: conceptual,
logical and operational, and infrastructural.
Figure provides a comprehensive view of the challenges faced in designing and implementing
an organizational structure that coordinates together cloud services that belong to different
administrative domains and makes them operate within a context of a single unified service
middleware.

Each cloud federation level presents different challenges and operates at a different layer of
the IT stack. It then requires the use of different approaches and technologies. Taken together,
the solutions to the challenges faced at each of these levels constitute a reference model for a
cloud federation.

CONCEPTUAL LEVEL: The conceptual level addresses the challenges in presenting a cloud
federation as a favourable solution with respect to the use of services leased by single cloud
providers. In this level it is important to clearly identify the advantages for either service
providers or service consumers in joining a federation and to delineate the new opportunities
that a federated environment creates with respect to the single-provider solution.
Elements of concern at this level are:

 Motivations for cloud providers to join a federation.


 Motivations for service consumers to leverage a federation.
 Advantages for providers in leasing their services to other providers.
 Obligations of providers once they have joined the federation.
 Trust agreements between providers.
 Transparency versus consumers.

Among these aspects, the most relevant are the motivations of both service providers and
consumers in joining a federation.

LOGICAL & OPERATIONAL LEVEL: The logical and operational level of a federated cloud
identifies and addresses the challenges in devising a framework that enables the aggregation of
providers that belong to different administrative domains within a context of a single overlay
infrastructure, which is the cloud federation.

At this level, policies and rules for interoperation are defined. Moreover, this is the layer at
which decisions are made as to how and when to lease a service to—or to leverage a service
from— another provider.

The logical component defines a context in which agreements among providers are settled and
services are negotiated, whereas the operational component characterizes and shapes the
dynamic behaviour of the federation as a result of the single providers’ choices.

This is the level where MOCC (Market oriented cloud computing) is implemented and realized.
It is important at this level to address the following challenges:

 How should a federation be represented?


 How should we model and represent a cloud service, a cloud provider, or an agreement?
 How should we define the rules and policies that allow providers to join a federation?
 What are the mechanisms in place for settling agreements among providers?
 What are provider’s responsibilities with respect to each other?
 When should providers and consumers take advantage of the federation?
 Which kinds of services are more likely to be leased or bought?
 How should we price resources that are leased, and which fraction of resources should we
lease? The logical and operational level provides opportunities for both academia and industry.
INFRASTRUCTURE LEVEL: The infrastructural level addresses the technical challenges involved
in enabling heterogeneous cloud computing systems to interoperate seamlessly.

It deals with the technology barriers that keep separate cloud computing systems belonging to
different administrative domains. By having standardized protocols and interfaces, these
barriers can be overcome.

At this level it is important to address the following issues:

 What kind of standards should be used?


 How should design interfaces and protocols be designed for interoperation?
 Which are the technologies to use for interoperation?
 How can we realize a software system, design platform components, and services enabling
interoperability?

Interoperation and composition among different cloud computing vendors is possible only by
means of open standards and interfaces. Moreover, interfaces and protocols change
considerably at each layer of the Cloud Computing Reference Model.

 Federated Services and Applications:


S2S federation is a good start toward building a real-time communications cloud. Clouds
typically consist of all the users, devices, services, and applications connected to the network. In
order to fully leverage the capabilities of this cloud structure, a participant needs the ability to
find other entities of interest. Such entities might be end users, multiuser chat rooms, real-time
content feeds, user directories, data relays, messaging gateways, etc. Finding these entities is a
process called discovery.

XMPP uses service discovery to find the aforementioned entities. The discovery protocol
enables any network participant to query another entity regarding its identity, capabilities, and
associated entities. When a participant connects to the network, it queries the authoritative
server for its particular domain about the entities associated with that authoritative server.

In response to a service discovery query, the authoritative server informs the inquirer about
services hosted there and may also detail services that are available but hosted elsewhere.
XMPP includes a method for maintaining personal lists of other entities, known as roster
technology, which enables end users to keep track of various types of entities.
Usually, these lists are comprised of other entities the users are interested in or interact with
regularly. Most XMPP deployments include custom directories so that internal users of those
services can easily find what they are looking for.

 The Future of Federation


 The implementation of federated communications is a precursor to building a seamless
cloud that can interact with people, devices, information feeds, documents, application
interfaces, and other entities.
 The power of a federated, presence-enabled communications infrastructure is that it
enables software developers and service providers to build and deploy such applications
without asking permission from a large, centralized communications operator.
 The process of server-to-server federation for the purpose of inter-domain
communication has played a large role in the success of XMPP, which relies on a small
set of simple but powerful mechanisms for domain checking and security to generate
verified, encrypted, and trusted connections between any two deployed servers.
 These mechanisms have provided a stable, secure foundation for growth of the XMPP
network and similar real-time technologies.

Virtual box:
VirtualBox is a widely used open-source virtualization software developed by Oracle. It
enables users to create and manage virtual machines (VMs) on their local systems. In the
context of cloud computing, VirtualBox plays a critical role in development, testing, and
deploying virtualized environments.
Oracle VirtualBox, the world’s most popular open source, cross-platform, virtualization
software, enables developers to deliver code faster by running multiple operating systems
on a single device. IT teams and solution providers use VirtualBox to reduce operational
costs and shorten the time needed to securely deploy applications on-premises and to the
cloud.
Oracle VirtualBox is a type-2 hypervisor that allows users to create and manage virtual
machines (VMs) on their physical hardware.

Role of VirtualBox in Cloud Computing:

• Local Development and Testing Developers often use VirtualBox to build and test
applications in isolated environments before deploying them to the cloud. It supports
creating multi-VM setups for simulating cloud environments locally, such as clusters or
distributed systems.
• Prototyping Cloud Solutions Before deploying solutions to a cloud provider like AWS,
Azure, or GCP, VirtualBox helps create prototypes to test configurations, services, and
scalability.
• Hybrid Cloud Integrations VirtualBox VMs can bridge local systems and cloud
environments, enabling hybrid cloud solutions. VMs created in VirtualBox can often be
exported as images and imported into cloud platforms.

Advantages of VirtualBox:

1. Free and Open Source: VirtualBox is free for personal and educational use, and its open-source
nature allows for modifications.
2. Cross-Platform Support: Compatible with Windows, Linux, macOS, and Solaris, making it
versatile.
3. Supports Multiple Guest Operating Systems: Enables installation of various operating systems,
including older and less common ones.
4. Snapshot Feature: Saves the current state of a virtual machine, allowing users to revert to it
later for testing and debugging.
5. User-Friendly Interface: Intuitive and easy-to-use interface, ideal for beginners and advanced
users.
6. Portable VMs: Virtual machines created in VirtualBox can be exported and run on other systems
with VirtualBox installed.
7. Low System Requirements: Runs efficiently on personal computers with moderate hardware
specifications.
8. Customizability: Allows adjustments in CPU cores, RAM, and disk space to suit user needs.
9. Integration Features: Provides USB device support, shared folders, and clipboard sharing
between host and guest systems.
10. Great for Learning: Offers an excellent environment for students to experiment with
virtualization and practice cloud computing concepts.

Disadvantages of VirtualBox:

1. Performance Constraints: As a type-2 hypervisor, it operates on top of the host OS, which can
result in slower performance compared to type-1 hypervisors.
2. Limited Advanced Features: Lacks the high-end features available in enterprise-level solutions
like VMware vSphere or Hyper-V.
3. Hardware Compatibility Issues: May encounter challenges with specific hardware
configurations, especially older systems.
4. Resource Sharing: Shares host system resources, which can limit performance for resource-
intensive applications.
5. Community Support Only: Relies on forums and user communities for troubleshooting, which
might not be timely or comprehensive.
6. Stability Issues: Can experience crashes or become unstable when running many VMs
concurrently.
7. Limited Graphics Performance: Not optimized for tasks requiring high GPU performance, such
as 3D rendering or gaming.
8. Not Suitable for Large-Scale Applications: Designed for personal, small-scale, or educational
purposes, not for enterprise-scale cloud infrastructure.

 Google App Engine (Gae)


Google App Engine (GAE) or just 'App Engine is a cloud computing platform as a service
that allows you to create and run web applications in Google-managed data centers.
Applications are sandboxed and run across multiple servers. App Engine provides
automatic scaling for web applications, which means that when the number of requests
for an application rises, App Engine automatically assigns more resources to the web
application to accommodate the increased demand.

 Google App Engine is a Google Cloud Platform service that helps build highly scalable
applications on a fully managed serverless platform. It generally supports apps written
in Go, PHP, Java, Python, Node.js, .NET, and Ruby, but it can also support additional
languages via "custom runtimes." The service is free up to a particular number of
consumed resources, and it is only available in a regular environment, not a flexible
environment. Fees are levied for additional storage, bandwidth, or instance hours
required by the application.

 Google App Engine is a Platform as a Service (PaaS) cloud computing paradigm that
provides a platform for developers to construct scalable apps on the Google cloud
platform. The most impressive feature of GAE is its capacity to handle built-in apps in
Google's data centers. As a result, enterprises simply have one task to master: creating
cloud-based apps. The remainder of the time, the App Engine provides the platform and
administers the apps.

Google App Engine (GAE) Architecture:

Google App Engine (GAE) is a platform under Google Cloud Platform (GCP) designed to help
developers build and host scalable web applications. It structures applications into different
components, making it easier to manage services, versions, and instances.

Components of GAE Architecture:

1. Application

 The application acts as the main container that holds all the resources, configurations,
and metadata required for the app.
 When an App Engine application is created, it is tied to a Google Cloud Platform project
and deployed in a specific region defined by the user.
 It includes app code, settings, and credentials, ensuring all elements are organized for
deployment.

2. Services

 Services allow the application to be divided into separate logical units that can securely
share resources and communicate with each other.
 These services function like microservices and are useful for handling different tasks
within an app.
 An app can include services for:
o Administrative tasks for internal use.
o Backend processes like billing and data analysis.
o API handling for mobile or external device requests.
 Each service is made up of the app's source code and configuration files.
 Deploying changes to a service creates a new version within that service.

3. Versions

 Versions are different deployments of a service that allow flexibility in managing


updates.
 They are useful for:
o Testing new features before full rollout.
o Rolling back to a previous version in case of issues.
o Splitting traffic between versions for A/B testing or gradual migration.
 Versions simplify deployment management without disrupting the running application.

4. Instances

 Instances are the environments where versions of services run.


 GAE automatically scales instances based on traffic.
o Scale Up – Increases instances during high traffic for better performance.
o Scale Down – Reduces instances during low traffic to minimize costs.
 This dynamic scaling ensures performance and cost efficiency.

Services Provided by App Engine Includes:


 Platform as a Service (PaaS) to build and deploy scalable applications
 The hosting facility is a fully managed data center
 A fully managed, flexible environment platform for managing application servers and
infrastructure
 Support in the form of popular development languages and developer tools

Google App Engine Environments:

• Google Cloud provides two environments:

1) Standard Environment with constrained environments and support for languages such as
Python, Go, node.js

• Features of Standard Environment:


• Persistent storage with queries, sorting, and transactions.
• Auto-scaling and load balancing.
• Asynchronous task queues for performing work.
• Scheduled tasks for triggering events at regular or specific time intervals.
• Integration with other GCP services and APIs.

2) Flexible Environment where developers have more flexibility such as running custom
runtimes using Docker, longer request & response timeout, ability to install custom
dependencies/software and SSH into the virtual machine.

• Features of Flexible Environment:


• Infrastructure Customization: GAE flexible environment instances are Compute Engine
VMs, which implies that users can take benefits of custom libraries, use SSH for
debugging and deploy their own Docker Containers.
• It’s an open-source community.
• Native feature support: Features such as microservices, authorization, databases, traffic-
splitting, logging, etc are supported.
• Performance: Users can use a wider CPU and memory setting.

Features of Google App Engine

1. Development Languages and Tools Collection: For developers, the App Engine supports a
wide range of programming languages and allows them to import libraries and frameworks via
Docker containers. You can develop and test an app locally by using the SDK, which includes
app deployment tools. Every language has its SDK and runtime environment. Python, PHP, .NET,
Java, Ruby, C#, Go, and Node.js are among the languages available.
2. Completely Managed: Google lets you add your web application code to the platform while
they manage the infrastructure. By enabling the firewall, the engine ensures that your web
apps are secure and operational, as well as protecting them from malware and threats.

3. Pay-Per-Use: The app engine operates on a pay-as-you-go basis, which means you only pay
for what you use. When application traffic increases, the app engine automatically scales up
resources, and vice versa.

4. Reliable Diagnostic Services: Cloud Monitoring and Cloud Logging that aids in the execution
of app scans to identify bugs. The app reporting document assists developers in resolving bugs
as soon as possible.

5. Traffic Segmentation: As part of A/B testing, the app engine automatically routes incoming
traffic to different versions of the apps. The consecutive increments can be planned based on
which version of the app works best.

Benefits of Google App Engine for Websites


Adopting the App Engine is a wise decision for your company because it will enable you to
innovate and remain valuable. Here's why Google App Engine is a better choice for developing
applications:

1. High Availability:

When you develop and deploy web applications to the cloud, you enable remote access to your
applications. Considering the huge impact of COVID-19 on businesses, Google App Engine is the
best option because it allows developers to develop applications remotely while the cloud
service manages the infrastructure requirements.

2. Ensure a quicker time to market:

For your web applications to succeed, a shorter time to market is critical, as requirements are
likely to change if the launch time is extended. For developers, using Google App Engine is as
simple as it gets. The diverse tool repository and other functionalities ensure that development
and testing time is reduced, resulting in a faster MVP and consecutive launch time.

3. User-Friendly Platform:

The only thing the developers need to do is write code. You eliminate all of the burdens of
managing and deploying the code with zero configuration and server management. Google App
Engine makes it simple to use the platform, allowing you to focus on other concurrent web
applications and processes. The best part is that GAE handles the increased traffic automatically
via patching, provisioning, and monitoring.

4. A wide range of APIs:

Google App Engine includes many APIs and services that enable developers to create robust
and feature-rich apps. These characteristics are as follows:

 Access to the application log


 Blobstore, serve large data objects Google App Engine Cloud Storage
 SSL Support
 Page Speed Services
 Google Cloud Endpoint, for mobile application
 URL Fetch API, User API, Memcache API, Channel API, XXMP API, File API

5. Enhanced Scalability:

Scalability is synonymous with growth, and it is a critical factor in ensuring success and
competitive advantage. The Google App Engine cloud development platform, on the other
hand, is automatically scalable. GAE automatically scales up the resources when the traffic to
the web application increases, and vice versa.

6. Increased Savings:

You do not have to spend extra money on server management when you use Google App
Engine. The Google Cloud service is adept at handling backend operations. Furthermore, Google
App Engine pricing is adaptable because resources can be scaled up or down based on app
usage. The resources automatically scale up and down based on how well the app performs in
the market, ensuring that the pricing is fair in the end.

7. Pricing Intelligence:

The main concern of enterprises is how much Google App Engine cost. Google App Engine has a
daily and monthly billing cycle for your convenience.

• Daily: You will be charged for the resources you utilize daily.

• Every month: All daily charges are calculated, added to any applicable taxes, and deducted
from your payment method.

The App Engine offers a separate billing dashboard, "App Engine Dashboard," where you can
check and control your account and subsequent billings.
 OpenStack:
 OpenStack is a free, open-standard cloud computing platform, first launched on July 21,
2010.
 It was created as a joint project between Rackspace Hosting and NASA, aiming to make
cloud computing more accessible and widespread.
 OpenStack is deployed as Infrastructure-as-a-Service (IaaS) in both public and private
clouds, where virtual resources are made available to users.
 The platform consists of interrelated components that manage multi-vendor hardware
pools, including resources for processing, storage, and networking, all through a data
center.
These components are referred to as “projects” in OpenStack, handling a wide array of
services, such as computing, networking, and storage.
 Unlike traditional virtualization, which abstracts resources like RAM and CPU through
hypervisors, OpenStack uses APIs to abstract and manage these resources.
APIs in OpenStack enable users and administrators to directly interact with cloud
services, ensuring greater flexibility and control.
 OpenStack’s modular architecture allows users to customize and extend the platform to
suit specific needs and environments.
 It supports scalability, flexibility, and cost-effectiveness, making it ideal for both small-
scale and enterprise cloud deployments.
 OpenStack’s ability to integrate with third-party tools and solutions enhances its
adaptability to different cloud environments.
OpenStack Components

OpenStack is a powerful open-source cloud platform consisting of several interconnected


services, each responsible for managing different aspects of cloud infrastructure. Below is a
breakdown of the key components and features of OpenStack, rephrased for a clearer
understanding:

Key OpenStack Services/Components:

 Nova (Compute Service): Manages computing resources, including the creation,


deletion, and scheduling of virtual machines (VMs). It automates resource allocation,
responsible for virtualization and high-performance computing.
 Neutron (Networking Service): Controls network connectivity within OpenStack,
handling network creation, management, and IP address assignment through an API-
driven approach.
 Swift (Object Storage): A distributed storage service designed for high fault tolerance,
managing unstructured data objects via RESTful APIs. It supports petabytes of data and
offers redundant storage across clustered servers.
 Cinder (Block Storage): Provides persistent block storage that users can manage and
access via APIs. It allows customization of the amount of storage required by users.
 Keystone (Identity Service): Handles user authentication and authorization for
OpenStack services, functioning as a central directory to map services to the correct
users.
 Glance (Image Service): Responsible for storing, registering, and retrieving virtual disk
images across the network. These images are stored in various back-end systems.
 Horizon (Dashboard): A web-based interface that allows users to manage, provision,
and monitor OpenStack resources through an easy-to-use graphical interface.
 Ceilometer (Telemetry): Handles the metering of services and generates billing data. It
also triggers alarms when specific thresholds are exceeded.
 Heat (Orchestration): Automates service provisioning and scaling of cloud resources,
often in conjunction with Ceilometer for on-demand services.

Features of OpenStack:

 Modular Architecture: OpenStack is built with a modular architecture, enabling users to


deploy only the components they need. This allows for customization and scalability
according to specific requirements.
 Multi-Tenancy Support: OpenStack provides support for multi-tenancy, ensuring that
multiple users can securely access the same cloud infrastructure while maintaining
isolation between them.
 Open-Source: OpenStack is free to use, modify, and distribute, offering users the
flexibility to customize the platform without the cost of proprietary software licenses.
 Distributed Design: OpenStack’s distributed architecture allows horizontal scaling across
multiple physical servers, improving system performance and handling large workloads
effectively.
 API-Driven: OpenStack services are accessible via APIs, enabling seamless automation
and integration with other tools and services, enhancing operational flexibility.
 Comprehensive Dashboard: The platform includes an easy-to-use dashboard for
managing cloud resources, allowing users to monitor and control their infrastructure
without requiring specialized technical expertise.
 Resource Pooling: OpenStack pools computing, storage, and networking resources,
enabling dynamic allocation based on demand, optimizing resource utilization, and
reducing waste.

Advantages of OpenStack:

 Rapid provisioning and orchestration of resources, enabling fast scaling up and down as
needed.
 Streamlined deployment of applications, minimizing setup time.
 Scalable resources allow more efficient and flexible usage.
 Helps maintain regulatory compliance in cloud environments.

Disadvantages of OpenStack:

 Orchestration capabilities are still developing and may not be as robust as some other
platforms.
 OpenStack’s APIs may not be fully compatible with certain hybrid cloud providers, which
can make integration more challenging.
 As with all cloud platforms, OpenStack services come with the inherent risk of security
vulnerabilities and breaches.

Disaster recovery (DR)


Disaster recovery in cloud computing refers to the strategies and actions taken to recover data,
applications, and systems after a disaster or failure. It is essential for businesses to have
disaster recovery plans in place to ensure minimal downtime and avoid losing important data.
Disasters can be caused by various factors such as natural events, technical failures, or human
errors.

Types of Disasters to Prepare For:

1. Natural Disasters: Events like floods, earthquakes, or fires can damage physical servers
and data centers. For example, the OVHCloud data center fire in 2021 affected many
cloud services.
2. Technical Disasters: Problems with cloud infrastructure, such as power outages or
network failures, can disrupt services. AWS faced power issues in Sydney in 2016,
causing downtime for several businesses.
3. Human Disasters: These include errors like misconfigurations, accidental data deletion,
or cyber-attacks. In 2017, an error at Amazon led to widespread server failures,
impacting thousands of users.

Importance of Disaster Recovery:

 Minimized Downtime: A disaster recovery plan ensures that services are quickly
restored, which reduces business disruption and minimizes the loss of revenue.
 Recovery Time Objective (RTO) and Recovery Point Objective (RPO):
o RTO refers to the maximum acceptable time the service can be down after a
disaster. It defines how quickly the system should be up and running again.
o RPO is the maximum amount of data loss that is acceptable in a disaster. It
defines how frequently backups should be taken to ensure minimal data loss.
 Business Continuity: A disaster recovery plan ensures that businesses can continue
functioning even during catastrophic failures, thus maintaining reputation and customer
trust.

Methods/Approaches for Cloud Disaster Recovery:

1. Backup and Restore: This is the simplest and most cost-effective method, where data is
periodically backed up and can be restored in case of a disaster. It is suitable for minor
issues like server crashes or power outages.
2. Pilot Light Strategy: This method involves maintaining a minimal version of essential
services in a backup location. If a disaster happens, the core services can quickly be
expanded to restore full functionality. This ensures that a minimal version of the service
is always available for quick recovery.
3. Warm Standby Method: In this approach, a scaled-down version of the entire system is
always running in a separate location. During a disaster, it can be rapidly scaled up to
handle the load, thus ensuring continuity of business operations with minimal
disruption.
4. Multi-Site Deployment: This is the most comprehensive disaster recovery strategy. It
involves replicating the entire infrastructure across multiple locations or regions. If one
region fails, the system can instantly shift to another region, ensuring no downtime and
a high level of business continuity.

Benefits of Cloud Disaster Recovery:


 Faster Recovery: Cloud disaster recovery solutions are typically faster than traditional
methods due to the distributed nature of cloud infrastructure. Data can be restored
quickly from backup locations, reducing downtime.
 Cost-Efficiency: Cloud disaster recovery allows businesses to pay only for the services
they use, avoiding the need for expensive hardware and infrastructure.
 Scalability: Cloud disaster recovery solutions are highly scalable, allowing businesses to
adjust their resource requirements based on their needs. This flexibility makes it easier
to manage recovery costs.
 Reduced Complexity: Many cloud providers manage disaster recovery services, such as
hardware and infrastructure, which reduces the burden on businesses and allows them
to focus on their core functions.
 Customization and Flexibility: Cloud disaster recovery provides various options such as
backup and restore, warm standby, and multi-site deployment. Businesses can choose
the method that best suits their needs and budget.
 Automated Testing and Updates: Cloud disaster recovery services can be easily updated
and tested. This ensures that disaster recovery plans are always up to date and
functional when required.

You might also like