0% found this document useful (0 votes)

32 views

Csm-Part C

Cloud services management

Uploaded by

sachinragul50

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Csm-Part C

Cloud services management

Uploaded by

sachinragul50

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

PART C

1.Describe the scalability of Disney + Hot star app cloud architectures, concurrency
control meets the increasing demands of content delivery, security, cost effectiveness
and workflow with neat diagram?
Answer: Disney+Hotstar: An introduction

The journey started with the launch of the Hotstar app, in 2015, which was developed by Star
India. The 2015 Cricket World Cup was about to start, along with the 2015 IPL tournament,
and Star network wanted to fully capitalize on the insane viewership.

While Hotstar generated massive 345 million views for World Cup, 200 million views were
generated for the IPL Tournament.

This was before the Jio launch, which happened in 2016. And watching TV series and matches
on the mobile was still at a nascent stage. The foundation was set.

The introduction of Reliance Jio’s telecom network changed Internet usage in India, and this
changed everything for Hotstar.

By 2017, Hotstar had 300 million downloads, making them the world’s second-biggest OTT
app, only below Netflix.

In 2019, Hotstar was acquired by Disney, as part of their 21st Century Fox acquisition, and the
app was rebranded to Disney+Hotstar.

As of now, Disney+Hotstar has 400 million+ downloads, with a whooping user base of 300
million active monthly users, and 100 million daily active users. Almost 1 billion minutes of
videos are watched on the app daily.

The 2019 IPL tournament was watched by 267 million Disney+Hotstar users, and in 2020, a
record 400 billion minutes of content was viewed during the IPL matches.

In India, Disney+Hotstar has a very intense focus on regional content, as more than 60% of
the content is viewed in local languages. This is the reason they support 8 Indian languages,
with plans to expand this number. The same strategy is visible in other countries as well, with
deep focus on regional content, along with regular English content.

They have 100,000+ hours of content for viewers, and India accounts for approximately 40%
of their overall user base.

As of now, Disney+Hotstar is available in India, US, UK, Indonesia, Malaysia, and Thailand
and by 2023, they will launch in Vietnam.

Decoding the scalability of Disney+Hotstar app: powerful data structure

We will observe the architecture of the Disney+Hotstar app, and decode how they are able to
ensure such powerful scalability, on a consistent basis.

Backend of Disney+Hotstar

The team behind Disney+Hotstar has ensured a powerful backend by choosing Amazon Web
Services or AWS for their hosting, while their CDN partner is Akamai.

Almost 100% of their traffic is supported by EC2 instances & S3 Object store is deployed for
the data store.

At the same time, they use a mixture of on-demand & spot instances to ensure that the costs
are controlled. For spot instances, they use machine learning & data analytics algorithms
which drastically reduces their overall expenses of managing the backend.
AWS EMR Clusters is the service they use to process terabytes of data (in double-digit) on a
daily basis. Note here, that AWS EMR is a managed Hadoop framework for processing
massive data across all EC2 instances.

In some cases, they also use Apache Spark, Presto, HBase frameworks in-sync with AWS
EMR.

The core of scalability: infrastructure setup

Here are some interesting details about their infrastructure setup for load testing, just before an
important event such as IPL matches.

They have 500+ AWS CPU instances, which are C4.4X Large or C4.8X Large running at 75%
utilization.

C4.4X instances have typically 30 Gigs of RAM & C4.8X 60 Gigs of RAM!

The entire setup of Disney+Hotstar infrastructure has 16 TBs of RAM, 8000 CPU core, with a
peak speed of 32Gbps for data transfer. This is the scale of their operations, which ensures that
millions of users are able to concurrently access live streaming on their app.

Note here, that C4X instances are really high CPU-intensive operations, ensuring a low price-
per-compute ratio. With C4X instances, the app has high networking performance and optimal
storage performance at no additional cost.
Disney+Hotstar uses these Android components for having a powerful infrastructure (and to
keep the design loosely coupled for more flexibility):

 ViewModel: For communicating with the network layer and filling the final result in
LiveData.
 Room
 LifeCycleObserver
 RxJava 2
 Dagger 2 and Dagger Android
 AutoValue
 Glide 4
 Gson
 Retrofit 2 + okhttp 3
 Chuck Interceptor: For ensuring swift and easy debugging of all network requests,
when the devices are not connected with the network.

How Disney+Hotstar ensures seamless scalability?

There exist basically two models to ensure seamless scalability: Traffic based and Ladder
based.

In traffic-based scaling, the tech team simply adds new servers and infrastructure to the pool,
as the number of requests being processed by the system keeps on adding.

Ladder-based scaling is opted in those cases, wherein the details and the nature of the new
processes are not clear. In such cases, the tech team of Disney+Hotstar has pre-defined ladders
per million concurrent users.

As more requests are processed by the system adds on, new infrastructure in terms of ladders
is added.
As of now, the Disney+Hotstar app has a concurrency buffer of 2 million concurrent users,
which are, as we know, optimally utilized during the peak events such as World Cup matches
or IPL tournaments.

In case the number of users goes beyond this concurrency level, then it takes 90 seconds to
add new infrastructure to the pool, and the container and the application take 74 seconds to
start.

In order to handle this time lag, the team has a pre-provisioned buffer, which is the opposite of
auto-scaling and has proven to be a better option.

The team also has an in-built dashboard called Infradashboard, which helps the team to make
smart decisions, based on the concurrency levels, and prediction models of new users, during
an important event.

By using Fragments, the team behind Disney+Hotstar has ensured modularity to the next
level.

Here are some of the features that a typical page holds:

 Player
 Vertically and horizontally scrolling lists, which display other contents. Now, the type
of data being displayed and the UI of these lists varies based on what type of content it
is.
 Watch and Play, Emojis.
 Heatmap and Key Moments.
 Different type player controllers. — Live, Ads, VoD (Episodes, Movies etc.)
 Different type of Ad formats
 Nudge to ask user to login.
 Nudge to ask user to pay for All Live Sports
 Chromecast
 Content Description
 Error View and more

Deploying intelligent client for seamless performance

On occasions when latency in response is increased for the application client and the backend
is overwhelmed with new requests, then there are established protocols, which absorb this
sudden surge.

For instance, in such cases, the intelligent client deliberately increases the time interval
between subsequent requests, and the backend is able to get some respite.

For the end-users, there exists caching & intelligent protocols, which ensures that they are not
able to differentiate this intentional time-lag, and the user experience is not hampered.

Besides, the Infradashboard continuously observes and reports every single severe error and
fatal exception happening on millions of devices, and either they are rectified in real-time, or
deploy a retry mechanism for ensuring seamless performance.

This was just the tip of the iceberg!

If you wish to know more about how Disney+Hotstar operates, its system architecture,
database architecture, network protocols, and more and wish to launch an app similar to
Disney+Hotstar, then you can connect with our team, and explore the possibilities.

With more than 13 years of experience in accelerating business agility & stimulating digital
transformation for startups, enterprises, and SMEs, TechAhead is a pioneer in this space.
2. Exemplify in detail about Amazon Web service cloud model in the following aspects
with relevant applications.
i). pay-as-you –go pricing model
ii). Security and compliance
iii). Freemium
iv). Pay per user/registration
v). diverse services
Ans: What is AWS?
AWS (Amazon Web Services) is a comprehensive, evolving cloud computing platform
provided by Amazon. It includes a mixture of infrastructure-as-a-service (IaaS), platform-as-
a-service (PaaS) and packaged software-as-a-service (SaaS) offerings. AWS offers tools such
as compute power, database storage and content delivery services.
Amazon.com Web Services launched its first web services in 2002 from the internal
infrastructure that the company built to handle its online retail operations. In 2006, it began
offering its defining IaaS services. AWS was one of the first companies to introduce a pay-as-
you-go cloud computing model that scales to provide users with compute, storage and
throughput as needed.
1.Pay-as-you-go
With AWS you only pay for what use, helping your organization remain agile, responsive
and always able to meet scale demands. Pay-as-you-go pricing allows you to easily adapt to
changing business needs without overcommitting budgets and improving your responsiveness
to changes. With a pay as you go model, you can adapt your business depending on need and
not on forecasts, reducing the risk or overprovisioning or missing capacity. By paying for
services on an as needed basis, you can redirect your focus to innovation and invention,
reducing procurement complexity and enabling your business to be fully elastic.

AWS

2.Security and Compliance

Cloud security at AWS is the highest priority. As organizations embrace the scalability and
flexibility of the cloud, AWS is helping them evolve security, identity, and compliance into
key business enablers. AWS builds security into the core of our cloud infrastructure, and
offers foundational services to help organizations meet their unique security requirements in
the cloud.
As an AWS customer, you will benefit from a data center and network architecture built to
meet the requirements of the most security-sensitive organizations. Security in the cloud is
much like security in your on-premises data centers—only without the costs of maintaining
facilities and hardware. In the cloud, you don’t have to manage physical servers or storage
devices. Instead, you use software-based security tools to monitor and protect the flow of
information into and out of your cloud resources.

An advantage of the AWS Cloud is that it allows you to scale and innovate, while
maintaining a secure environment and paying only for the services you use. This means that
you can have the security you need at a lower cost than in an on-premises environment.

As an AWS customer you inherit all the best practices of AWS policies, architecture, and
operational processes built to satisfy the requirements of our most security-sensitive
customers. Get the flexibility and agility you need in security controls.

The AWS Cloud enables a shared responsibility model. While AWS manages security of the
cloud, you are responsible for security in the cloud. This means that you retain control of the
security you choose to implement to protect your own content, platform, applications,
systems, and networks no differently than you would in an on-site data center.

AWS provides you with guidance and expertise through online resources, personnel, and
partners. AWS provides you with advisories for current issues, plus you have the opportunity
to work with AWS when you encounter security issues.

You get access to hundreds of tools and features to help you to meet your security objectives.
AWS provides security-specific tools and features across network security, configuration
management, access control, and data encryption.

Finally, AWS environments are continuously audited, with certifications from accreditation
bodies across geographies and verticals. In the AWS environment, you can take advantage of
automated tools for asset inventory and privileged access reporting.

Benefits of AWS security

 Keep Your data safe — The AWS infrastructure puts strong safeguards in place to help
protect your privacy. All data is stored in highly secure AWS data centers.
 Meet compliance requirements — AWS manages dozens of compliance programs in its
infrastructure. This means that segments of your compliance have already been completed.
 Save money — Cut costs by using AWS data centers. Maintain the highest standard of
security without having to manage your own facility
 Scale quickly — Security scales with your AWS Cloud usage. No matter the size of your
business, the AWS infrastructure is designed to keep your data safe.
Compliance
AWS Cloud Compliance helps you understand the robust controls in place at AWS for
security and data protection in the cloud. Compliance is a shared responsibility between AWS
and the customer, and you can visit the Shared Responsibility Model to learn more.
Customers can feel confident in operating and building on top of the security controls AWS
uses on its infrastructure.

The IT infrastructure that AWS provides to its customers is designed and managed in
alignment with best security practices and a variety of IT security standards. The following is
a partial list of assurance programs with which AWS complies:

 SOC 1/ISAE 3402, SOC 2, SOC 3

 FISMA, DIACAP, and FedRAMP
 PCI DSS Level 1
 ISO 9001, ISO 27001, ISO 27017, ISO 27018
3. Freemium
While Amazon does not primarily operate as a freemium model, it incorporates elements of
this approach in certain services. For example, Amazon offers a free trial of its Prime
membership, allowing users to experience benefits like free shipping and access to Prime
Video for a limited time.
A customer with access to the AWS Free Tier can use up to 750 instance hours each of
t2. micro instances running Linux and Windows. Usage of the Linux and Windows t2.
With the Cloud Data Integration Free Service on AWS, you can:
Rapidly load massive volumes of data
Load data into your cloud data warehouse or data lake from sources like Salesforce,
Oracle, and other databases and files.
Easily transform and prepare your data
Modern drag-and-drop experience and pre-built, no-code transformations for Amazon
Redshift and Amazon S3
Get up and running in minutes
Interactive step-by-step tutorials and 24x7 chat support to ensure can integrate fast.
4. Pay per user/registration
Procedure:
To create an organization in AWS with role-based access, you can follow these general
steps:

1. Create an AWS account: If you don't already have an AWS account, you'll need
to create one. This will be your management account and the root of your
organization.
2. Enable AWS Organizations: From the AWS Management Console, navigate
to the AWS Organizations service and enable it. This will create the
organization with your management account as the master account.

3. Create OUs (Organizational Units): You can create one or more OUs to
organize your accounts. For example, you might create separate OUs for
different departments or environments (e.g., production, staging, development).
4. Create member accounts: You can create new AWS accounts and invite
existing accounts to join your organization as member accounts. You can add
these accounts to the appropriate OUs.

Create service control policies (SCPs): SCPs are policies that you can attach to OUs or
individual accounts to define the maximum set of actions that can be performed on resources
in those OUs or accounts. This allows you to enforce role-based access and other security
policies across your organization.

Assign IAM roles: You can create IAM roles in your management account and delegate
specific permissions to them. You can then assume these roles from your member accounts to
perform actions on resources in the management account or other member accounts.
Configure permissions: You can use IAM policies to control access to AWS services and
resources. You can attach these policies to IAM users, groups, or roles in your management
account or member accounts.
To create a role with specific permissions, you can follow these steps:

• Open the IAM console in your management account.

• Create a new role and choose the appropriate trusted entity (e.g., another AWS
account, an AWS service, or your AWS Organizations).
• Define the permissions for the role by attaching an IAM policy or a service control
policy (SCP).
• Save the role and note down the ARN (Amazon Resource Name) of the role.

• In the AWS Organizations console, attach the role to the appropriate OU or account.

• In the member account, assume the role to perform actions on resources in the
management account or other member accounts.

5. Diverse services

Amazon Web Services (AWS) is a comprehensive cloud computing platform that offers a

wide range of services and products to meet the evolving needs of businesses and individuals.

From computing power and storage to databases, analytics, and machine learning, AWS

provides a versatile suite of tools that empower organizations to innovate, scale, and optimize

their operations. In this article, we will explore some of the key services and products offered

by AWS and their applications across industries.

1. Amazon EC2 (Elastic Compute Cloud): Amazon EC2 provides scalable virtual servers in

the cloud, allowing users to quickly provision computing resources as needed. It enables

businesses to deploy applications, run batch-processing workloads, and handle web traffic

efficiently. With EC2, users can choose from a variety of instance types, such as general-

purpose, memory-optimized, and GPU instances, to match their specific requirements.

2. Amazon S3 (Simple Storage Service): Amazon S3 offers highly scalable and durable object

storage for various data types, including documents, images, videos, and backups. It provides

secure storage options and allows users to retrieve data quickly and reliably from anywhere.
S3 is widely used for website hosting, data archiving, content distribution, and backup and

restore operations.

3. AWS Lambda: AWS Lambda is a server less computing service that enables users to run

code without provisioning or managing servers. It allows developers to focus on writing code

while AWS handles the infrastructure management. Lambda is ideal for building event-driven

applications, data processing pipelines, and micro services architectures.

4. Amazon RDS (Relational Database Service): Amazon RDS simplifies the management of

relational databases by automating time-consuming administrative tasks such as backups,

software patching, and scaling. It supports popular database engines like MySQL,

PostgreSQL, Oracle, and SQL Server. RDS provides high availability, fault tolerance, and

automatic scaling.

5. Amazon Redshift: Amazon Redshift is a fully managed data warehousing service designed

for high-performance analysis of large datasets. It allows organizations to store and query vast

amounts of structured and semi-structured data using SQL-based queries. Redshift integrates

with popular business intelligence and analytics tools, enabling users to gain insights from

their data efficiently.

6. AWS Glue: AWS Glue is a fully managed extract, transform, and load (ETL) service that

makes it easy to prepare and transform data for analysis. It automates the process of

discovering, cataloguing, and cleaning data, enabling organizations to derive meaningful

insights from diverse data sources. Glue seamlessly integrates with other AWS services, such

as S3, Redshift, and Athena, and supports popular data formats, including JSON, CSV, and

Parquet.

7. Amazon Sage Maker: Amazon Sage Maker is a fully managed machine learning service

that empowers data scientists and developers to build, train, and deploy machine learning

models at scale. It provides a comprehensive set of tools and resources for the entire machine
learning workflow, from data pre-processing and model training to model deployment and

monitoring. Sage Maker supports popular frameworks like Tensor Flow and PyTorch and

offers built-in algorithms for common use cases.

3.Decode the scalability of Netflix App, cloud architecture, concurrency control meets
the increasing demands for content delivery, security, cost-effectiveness and workflow
with neat diagram notes?
Whether it’s conceptualizing a high-level system architecture, designing an on-demand video
streaming system, or outlining the layers and cloud operations for video processing, the
challenges presented in a system design can be both intriguing and complex. This article
delves into the labyrinth of Netflix system design, breaking down the components and
technical nuances that make it an industry leader. We aim not only to provide insights into the
mechanisms driving Netflix but also to position ourselves as thought leaders in the realm of
system design.
At its core, Netflix operates as a subscription-based streaming service, offering a vast library
of films and TV series, both in-house productions and licensed content.
System Design Netflix: Components and Architecture
The seamless streaming experience we enjoy on Netflix is not just the result of a vast content
library; it’s a testament to a meticulously crafted system design architecture. Let’s dissect the
architectural marvel that powers Netflix, exploring the key components orchestrating the
magic.
1. Client App
The Client App is at the forefront of the Netflix experience, a versatile interface accessible on
various devices – from mobile phones and tablets to TVs and laptops. The user-friendly design
is a hallmark, enhancing the watching escapade.
Features like cross-device continuity and intelligent video recommendations are a testament to
Netflix’s commitment to an exceptional User Experience (UX).
Technical Underpinning:
 Front-End Technology: Netflix relies on React.js for its front-end, ensuring a
seamless and responsive interface. The choice is driven by React.js’s speed, durability,
and high performance attributes.
2. Backend
Netflix embraces a Microservices architecture for its cloud-based system, balancing heavy and
lightweight workloads seamlessly. The backend, powered by Java, MySQL, Gluster, Apache
Tomcat, Hive, Chukwa, Cassandra, and Hadoop, comprises small, manageable software
components operating at the API level.
Key Backend Services:
 User and Authentication Service: Ensures secure access and personalized
experiences.
 Subscription Management: Manages user subscriptions and billing processes.
 Videos Service: Handles video metadata, indexing, and retrieval.
 TransCoder Service: Responsible for video transcoding and format adaptation.
 Global Search: Enables efficient content discovery.
The backend’s responsibilities span beyond being a mere video streaming app, encompassing
video processing, content onboarding, network traffic management, and resource distribution
across global servers – a symphony orchestrated primarily by Amazon Web Services (AWS).
3. Cloud
As the demand for content surges, Netflix adopted cloud migration strategy and migrated its
IT infrastructure to the public cloud. Operating on both Amazon Web Services and Open
Connect (Netflix’s custom CDN), these cloud services work together to process and deliver
content efficiently to end-users.
4. CDN (Content Delivery Network) to Minimizing Latency and Maximizing Performance
A crucial player in Netflix’s architecture, the CDN is a globally distributed network of servers.
When you hit the play button, the video is streamed from the nearest server, significantly
reducing response time.
Key CDN Characteristics:
 Content Replication: Videos are replicated in multiple locations, ensuring proximity
to users and minimal data hops.
 Caching Efficiency: CDN machines leverage caching to serve videos primarily from
memory.
 Server Diversity: Less popular videos reach users through servers in various data
centres.
5. Open Connect: Netflix’s Custom Content delivery network
Open Connect, Netflix’s in-house content delivery network takes centre stage in storing and
delivering movies and TV shows globally. Netflix’s personalized movie recommendations
transform into a tailored viewing experience by leveraging data, guaranteeing a distinctive and
captivating cinematic journey for each user.
Netflix Backend Architecture

Behind the seamless streaming experience that defines Netflix lies a robust backend
architecture, orchestrating everything from content processing to global distribution.
Netflix Backend Design Decoded
1. ELB and Load Balancing:
 Tier 1: The journey begins with the AWS Elastic Load Balancer (ELB), employing a
two-tier architecture for load balancing across different zones. DNS-based round-robin
scheduling ensures an even distribution of requests.
 Tier 2: An array of load-balancing instances in the second tier further balances the
load, employing round-robin load balancing within the same zone.
2. API Gateway with ZUUL:
The ELB seamlessly passes the baton to the API gateway, where Netflix utilizes ZUUL. On
AWS EC2 instances, ZUUL is the gatekeeper for dynamic routing, monitoring, and security.
Its routing capabilities are based on query parameters, URL, and path, ensuring efficient
request handling.
3. Microservices Architecture:
The Microservices architecture is the cornerstone of Netflix’s backend, empowering
individual services to operate independently. This approach boosts scalability, flexibility, and
fault isolation.
4. Hystrix for Resilience:
Addressing dependencies and potential failures, Netflix employs Hystrix, a powerful library
isolating microservices. It minimizes failures by isolating access points between services,
ensuring fail-fast mechanisms, real-time monitoring, and rapid recovery.
5. Stream Processing Pipeline:
User activities and historical data embark on a journey through the stream processing pipeline.
Transforming into a tailored viewing experience, this data is the backbone for Netflix’s
personalized movie recommendations, ensuring a unique and engaging cinematic journey for
every user.
6. Big Data Processing Tools:
Netflix leverages the prowess of big data processing tools such as AWS, Hadoop, and
Cassandra. These tools dive deep into the vast pool of user data, extracting valuable insights
that contribute to enhancing the overall streaming experience.
Navigating Complexity with Hystrix
While Netflix’s backend architecture is a marvel, it does not escape the challenges of
distributed systems, where server interdependencies can introduce latency and potential single
points of failure. Enter Hystrix – a guardian against cascading failures.
This library ensures fail-fast mechanisms, rapid recovery, real-time monitoring, and
operational control, mitigating the impact of dependencies in a complex distributed system.
Netflix Microservices Architecture

In the intricate dance of Netflix’s backend architecture, microservices emerge as the unsung
heroes, orchestrating a symphony of seamless streaming experiences. Let’s delve into how
Netflix harnesses the power of microservices and the critical role stateless services play in this
technological marvel.
Netflix’s Microservices Odyssey
Netflix’s adoption of microservices in its backend architecture marks a pivotal shift, enabling
nimble deployments and granular control over the performance of each service. This
architectural choice aligns perfectly with the dynamic nature of content streaming, allowing
for swift adaptations and enhancements.
Faster Deployments and Isolation:
One of the core benefits of embracing microservices is the agility it brings to the deployment
process. Any modification or update to a specific service can be executed swiftly without
disrupting the entire system. This accelerates development cycles and facilitates seamless
integration of new features and improvements.
In the realm of distributed systems, the ability to isolate issues quickly is paramount. With
microservices, the impact of a glitch or a performance hiccup in one service can be confined,
preventing it from cascading across the entire system. This isolation ensures that users
experience minimal disruptions even in the face of potential challenges.
Types of Services: Critical and Stateless
Netflix’s microservices ecosystem is categorized into two main types based on functionality –
Critical Services and Stateless Services.
1. Critical Services: Ensuring Continuity
Definition: Critical services are those frequently interacted with by users. These services are
deliberately kept independent of others, ensuring that even in the event of a fail-over, users can
seamlessly perform essential operations.
Role in Netflix’s Architecture: Critical services act as the backbone of user interactions,
providing a reliable foundation for users to engage with the platform. Their independence
guarantees that basic operations remain unaffected, offering users a consistent experience.
2. Stateless Services: Sustaining High Availability
Definition: Stateless services serve API requests to clients and are designed to continue
working seamlessly with other instances, even if a server experiences a failure. This design
prioritizes high availability and uninterrupted service.
Role in Netflix’s Architecture: Stateless services are the workhorses handling API requests,
ensuring that user interactions proceed smoothly. Their deployment strategy, unaffected by
individual server failures, guarantees a consistently high level of service availability.
REST API’s: Bridging the Gap with Clients
In the microservices landscape, REST APIs are pivotal as the primary means of interaction
between services and clients. Netflix leverages the simplicity and efficiency of REST APIs to
facilitate seamless communication, ensuring a responsive and dynamic user experience.
How does data processing Unfold in the Netflix app?
When you click that enticing play button on Netflix, a complex ballet of data processing
begins, ensuring your streaming experience is nothing short of seamless. In this segment, we
unravel the intricacies of Netflix’s evolution pipeline, focusing on the role of Kafka and
Apache Chukwa in handling massive data volumes with astonishing efficiency.
The Netflix Data Ingestion Odyssey
Netflix boasts an impressive data processing pipeline, efficiently managing an astronomical
amount of data with every video click. This involves the use of two key players – Kafka and
Apache Chukwa – working in tandem to ingest, process, and route vast data events.
1. Kafka: The Data Mover and Shaker
Definition: Kafka serves as the backbone for moving data from one point to another within
Netflix system design. It efficiently handles the colossal volume of data events generated
during user interactions.
Role in Netflix’s Architecture:
 Ingestion Magnitude: Netflix processes a staggering 500 billion data events daily,
consuming a mind-boggling 1.3 petabytes of data and hitting a peak of 8 million
events per second during prime hours.
 Data Types: These events range from error logs and User Interface activities to
performance metrics, video viewing activities, and diagnostic events.
2. Apache Chukwa: The Data Collector and Analyzer
Definition: Apache Chukwa is an open-source data collection system that seamlessly
integrates with Netflix’s architecture. It collects and analyzes logs and events from different
parts of the system.
Key Features:
 Built on Robust Frameworks: Chukwa leverages the scalability and robustness of
HDFS (Hadoop Distributed File System) and the MapReduce framework.
 Monitoring and Analysis: Chukwa provides a toolkit for powerful and flexible
monitoring and analysis of collected data.
 Event Storage: Events collected by Chukwa are written in the Hadoop file sequence
format, stored in S3.
Evolution Pipeline: From Kafka to Hadoop and Beyond
The evolution pipeline at Netflix involves the orchestrated flow of data from Kafka to Apache
Chukwa and eventually to Hadoop for further processing.
1. Kafka to Chukwa: Data flows seamlessly from Kafka to Chukwa, where it’s
collected, monitored, and analyzed.
2. Chukwa to Hadoop: Events are then written to Hadoop file sequence format, residing
in the scalable and distributed data storage of S3.
3. Batch Processing: The Big Data team takes charge of processing these stored Hadoop
files through batch processing at hourly or daily intervals.
Real-Time Processing: The Kafka Advantage
To handle online events in real-time, Chukwa feeds traffic to Kafka, serving as the main gate
in Netflix’s data processing. Kafka efficiently moves data to various sinks like S3,
Elasticsearch, and secondary Kafka, ensuring real-time responsiveness.
Routing Mechanism: The Apache Samza framework orchestrates the routing of messages
within Kafka, ensuring smooth transitions between various destinations.
4. Exemplify in detail about Google cloud model in the aspects with relevant
applications.
What is Google Cloud Platform (GCP)?
GCP is Google’s suite of public cloud computing tools and services, including well-known
data analytics services like Google BigQuery and Looker Studio.
Powered by Google’s global network of data centers, GCP runs on the same infrastructure as
Gmail, Google Drive, and Google Docs.
Google originally opened its infrastructure to business users in 2008 via a public cloud. Since
then, its tool suite of cloud services has been expanded rapidly, with Cloud AI being among
the most recent additions. Other services include computing resources, networking, data
storage, IoT, security, app deployment, and management tools.
GCP has an especially strong focus on data analytics, machine learning, and artificial
intelligence, making it a crucial tool to learn for data analysts and consultants.
Google Cloud Platform (GCP) vs. Google Cloud
People sometimes mix up GCP and Google Cloud by using the terms interchangeably, but
really, GCP is a part of Google Cloud.
Google Cloud refers to all of Google’s cloud services. These also include Google Workspace
(formerly known as G-Suite or Google Apps) and enterprise versions of Android and Chrome
OS. Google Cloud also encompasses Google apps like Gmail and Google Docs.
GCP, on the other hand, only refers to cloud services covered by the GCP pricing models,
such as App Engine, Google BigQuery, and Cloud Console, which we will explore below.
Pros and Cons of GCP
Like all services, GCP has some advantages and disadvantages for its users. Let’s first look at
the advantages.
GCP Advantages
Wide range of cloud computing services
GCP offers an especially wide range of cloud computing services for businesses and end
users. These include computing power, networking, data storage, data analytics, machine
learning, artificial intelligence, and even app deployment and API integration. All this makes
GCP a good solution for businesses with diverse or rapidly changing cloud requirements.
Global reach of network infrastructure
Businesses that use GCP have access to Google’s robust and globally distributed network
infrastructure. This allows the implementation of multi-region redundancy or using especially
cheap locations for hosting the main workload.
Robust security
Google Cloud Platform provides robust security options, including IAM (Identity and Access
Management), KMS (Key Management Service), and the SCC (Security Command Center).
This grants businesses great cybersecurity measures while hosting their data remotely.
Strong focus on innovation
Google is known for its strong pioneering spirit and focus on implementing technological
advancements fast within its product suite. GCP customers are therefore likely to enjoy new
technologies like artificial intelligence integration for analyzing their data.
GCP Disadvantages
Complex pricing model
Contrary to other cloud providers, GCP has relatively complex pricing models. This can
make it difficult for businesses to forecast and manage their expenses for cloud computing.
Limited support
GCP also provides a relatively limited and hard-to-access support team that might not
respond immediately to requests. This can negatively impact productivity and data
availability if there is an ongoing issue with the cloud infrastructure.
Proprietary platform
It’s also important to note that GCP is a proprietary platform. This can make it harder to
migrate data and deploy applications if a business wants to leave Google’s services.
Furthermore, GCP can be more expensive than open-source solutions or cloud services from
smaller providers.
Use Cases of GCP
Due to its diverse suite of cloud services and tools, GCP has many use cases for businesses of
all sizes. Here are some common ways a business might utilize GCP:
 Data storage: With Google BigQuery, businesses can use an enterprise-level data warehouse
in GCP. In addition, Cloud SQL provides a database-as-a-service model for MySQL,
PostgreSQL and Microsoft SQL Server databases. Cloud BigTable can be used for NoSQL
databases and Cloud Storage offers options for unstructured data and large files like images.
 Business intelligence: GCP’s integrated BI tool Looker Studio offers swift data visualization
and reporting directly on the platform. This lets data analysts and consultants gain quick
insights and create shareable visuals for presentations and consulting calls.
 Machine learning: GCP also offers services to deploy machine learning models like Cloud
AutoML and Cloud Machine Learning Engine. Businesses can use these tools to train,
validate and deploy their models directly in the cloud, automating and improving their
business intelligence processes.
 IoT management: GCP is ideal for managing a company-wide IoT (Internet of Things)
network. Services for IoT device connection and management like Cloud IoT Core make it
easy to set up and supervise various IoT devices.
 App deployment: GCP can also be used to deploy applications developed with the Java,
Python, Go, Ruby, PHP or C# programming languages by utilizing the service App Engine.
This makes it easy for businesses to host their applications without upfront infrastructure
setup and allows them to iterate swiftly during the development process.
 API development: GCP’s integrated tools Apigee API Platform and Developer Portal make
it possible to use GCP as a base for developing and hosting APIs.
Services Offered by GCP
Google Cloud Platform offers over 100 cloud-based tools, services, and infrastructure
elements, which can be classified as SaaS (Software-as-a-Service), PaaS (Platform-as-a-
Service), or IaaS (Infrastructure-as-a-Service) products.
Computing services
Let’s start with GCP's core computational resources, which are designed to cater to different
development and deployment needs. These services are foundational to any application or
system you might want to build on the cloud, and are worth knowing:
 App Engine: To deploy applications developed in Java, Go, PHP, C#, Ruby, Python, or
Node.js.
 Compute Engine: To run Windows or Linux virtual machines.
 Google Kubernetes Engine (GKE): To run containers based on Kubernetes.
 Cloud Functions: To run event-driven code in Java, Go, Python, or Node.js.
Data storage services
GCP also provides robust solutions that support both SQL and NoSQL options. This ensures
scalable and flexible database management for modern applications:
 Cloud Storage: Object storage for unstructured data and files of all kinds.
 Cloud SQL: Cloud-based database service for MySQL, PostgreSQL, or Microsoft SQL
Server databases.
 Cloud Bigtable: Cloud-based database service for NoSQL databases.
 Cloud Spanner: Database service for relational databases.
 Cloud Datastore: NoSQL databases for web and mobile applications.
 Firestore: Document database for building mobile, web, and IoT apps.
Data analytics services
GCP's powerful tools process, analyze, and visualize large datasets, enabling better business
insights. Let’s take a look at the suite of tools that are essential for companies looking to
leverage data in their decision-making:
 BigQuery: Cloud-based enterprise data warehouse for business intelligence.
 Cloud Dataflow: Service for stream and batch processing.
 Cloud Data Fusion: ETL service for setting up data pipelines.
 Dataproc: To run Apache Hadoop and Apache Spark jobs.
 Cloud Composer: Workflow orchestration service based on Apache Airflow.
 Cloud Datalab: Jupyter Notebook service for data exploration, data analysis, data
visualization, and machine learning.
 Cloud Dataprep: To visually explore, clean, and prepare data.
 Looker Studio/Looker: Business intelligence tool to create reports and data visualizations
like charts and tables.
Artificial intelligence services
Artificial intelligence is, of course, a big topic of conversation, and GCP has state-of-the-art
offerings that enable businesses to implement cutting-edge AI and machine learning
capabilities. These tools are transforming how businesses interact with data, gain insights,
and automate processes:
 Vertex AI: Cloud-based platform for ML models and generative AI.
 AutoML: Custom machine learning model training and development.
 Dialogflow: Conversational AI with virtual agents.
 Vision AI: Models to extract insights from images, videos, and documents.
 Translation AI: Models for language detection and automated translation.
 Document AI: AI for document processing and data capture.
 Recommendations AI: AI for automated product recommendations.
 Natural Language AI: Models for sentiment analysis and classification of unstructured text.
 Text-to-Speech/Speech-to-Text: AI for speech recognition and transcription (available in
125 languages) as well as for speech synthesis (available in 40 languages).
Networking services
Let’s now talk about networking; GCP offers services that ensure secure, reliable, and
scalable networking infrastructure.
 VPC: Virtual private cloud for building networks of cloud resources.
 Cloud CDN: Cloud-based content delivery network.
 Cloud DNS: DNS hosting service running on Google’s infrastructure.
 Cloud Armor: Cloud-hosted web application firewall.
 Cloud Load Balancing: Managed service for load balancing network traffic.
API services
GCP provides comprehensive tools to design, deploy, and scale APIs. These tools are a must-
known for developers who need to integrate with third-party systems:
 Apigee API Platform: To design, deploy, and scale APIs in a cloud environment.
 API Monetization: To create revenue models, reports, payment gateways, and developer
portal integrations for APIs.
 Developer Portal: Service for developers to publish and manage APIs.
 API Analytics: To monitor and measure API performance.
Management, security & identity tools
Last but not least on our list are the tools related to security:
 Google Cloud Console: Web interface to manage GCP resources.
 Cloud Shell: Browser-based shell command line interface to manage GCP.
 Operations Suite (Formerly Slackdriver): Service for monitoring, logging, tracing, and
diagnostics of applications running on GCP.
 Cloud IAM: Identity & Access Management (IAM) service for policies based on role-based
access control.
 Cloud Resource Manager: To manage resources by project, folder, and organization based
on a hierarchy.
 Cloud Security Command Center: Platform for security and data risk.
 Cloud Key Management: Cloud-hosted key management service.

Mukul Bishwas - Internship Report
No ratings yet
Mukul Bishwas - Internship Report
51 pages
Learning AWS
From Everand
Learning AWS
Aurobindo Sarkar
4/5 (4)
2018 MPN Partner of The Year Award Winner Statements
0% (1)
2018 MPN Partner of The Year Award Winner Statements
27 pages
Looking Ahead A Cloud Report From 2015
No ratings yet
Looking Ahead A Cloud Report From 2015
4 pages
Cia3 Midbs Shresthabhowmik
No ratings yet
Cia3 Midbs Shresthabhowmik
6 pages
Industry Analysis On Cloud Services
No ratings yet
Industry Analysis On Cloud Services
12 pages
A. Write A 500-Word Article On Trending IT Technologies in 2023
No ratings yet
A. Write A 500-Word Article On Trending IT Technologies in 2023
5 pages
2 Quiz 1: Platform As A Service (Paas)
100% (1)
2 Quiz 1: Platform As A Service (Paas)
9 pages
ISM Presentation - FinTech Case
No ratings yet
ISM Presentation - FinTech Case
15 pages
Cloud Success Stories APAC
No ratings yet
Cloud Success Stories APAC
27 pages
Cloud Computing Future
No ratings yet
Cloud Computing Future
7 pages
digital fluency 1
No ratings yet
digital fluency 1
6 pages
Case Study of Cloud Computing
No ratings yet
Case Study of Cloud Computing
5 pages
How To Invest in Software As A Service (SaaS)
No ratings yet
How To Invest in Software As A Service (SaaS)
16 pages
1 - Cloud Computing - 18 Examples
No ratings yet
1 - Cloud Computing - 18 Examples
18 pages
Unit 7 - IMED
No ratings yet
Unit 7 - IMED
31 pages
Top 10 Strategic Technology Trends For 2014
No ratings yet
Top 10 Strategic Technology Trends For 2014
14 pages
Core Tenets of Iot1
No ratings yet
Core Tenets of Iot1
15 pages
The Role of Cloud Providers in Iot Services
No ratings yet
The Role of Cloud Providers in Iot Services
11 pages
Enterprise Cloud Analytics
No ratings yet
Enterprise Cloud Analytics
5 pages
Cloud Computing
No ratings yet
Cloud Computing
8 pages
Top 7 Most Common Uses of Cloud Computing - IBM Blog
No ratings yet
Top 7 Most Common Uses of Cloud Computing - IBM Blog
6 pages
Cloud AWS DevOps Engineer
100% (1)
Cloud AWS DevOps Engineer
11 pages
Blog-Cloud Computing Service in Mohali
No ratings yet
Blog-Cloud Computing Service in Mohali
5 pages
Azure - Unit-1 - Part 3
No ratings yet
Azure - Unit-1 - Part 3
23 pages
Cloud Computing - Final Report
No ratings yet
Cloud Computing - Final Report
30 pages
2023 Annual Report
No ratings yet
2023 Annual Report
93 pages
C-Loud Computing: An Introduction
No ratings yet
C-Loud Computing: An Introduction
30 pages
Team Disha Newsletter: January 2016 in The News
No ratings yet
Team Disha Newsletter: January 2016 in The News
2 pages
Cloud Calling: Under The Gudiance of Prof. Naresh - Harale
No ratings yet
Cloud Calling: Under The Gudiance of Prof. Naresh - Harale
58 pages
Name:-Subhash Ramaswamy Roll No. 10 (New Sies) Topic: - Cloud Computing & Web Services
No ratings yet
Name:-Subhash Ramaswamy Roll No. 10 (New Sies) Topic: - Cloud Computing & Web Services
12 pages
Aditi Technologies Brochure
No ratings yet
Aditi Technologies Brochure
9 pages
A Technical Seminar Report
No ratings yet
A Technical Seminar Report
7 pages
Cloud Computing 101
No ratings yet
Cloud Computing 101
8 pages
us-enabling-
No ratings yet
us-enabling-
9 pages
Whitepaper Esb Event Driven Ipaas
No ratings yet
Whitepaper Esb Event Driven Ipaas
21 pages
AI As Services - DR - Cyril - Toswo
No ratings yet
AI As Services - DR - Cyril - Toswo
3 pages
AWS Overview
No ratings yet
AWS Overview
18 pages
AWS
No ratings yet
AWS
26 pages
Amazon Marketing
No ratings yet
Amazon Marketing
18 pages
Cloud Computing
No ratings yet
Cloud Computing
3 pages
Learning Big Data With Amazon Elastic Mapreduce: Chapter No - 1 "Amazon Web Services"
No ratings yet
Learning Big Data With Amazon Elastic Mapreduce: Chapter No - 1 "Amazon Web Services"
34 pages
Asset6 Looking Ahead Cloud Report From 2015
No ratings yet
Asset6 Looking Ahead Cloud Report From 2015
2 pages
Lab-1 Casestudies
No ratings yet
Lab-1 Casestudies
7 pages
Fresher
No ratings yet
Fresher
2 pages
Talk On "CLOUD Computing"
No ratings yet
Talk On "CLOUD Computing"
17 pages
emerging trends Ans
No ratings yet
emerging trends Ans
5 pages
IIT
No ratings yet
IIT
29 pages
Streamsets Gets 35$M For DataOps
No ratings yet
Streamsets Gets 35$M For DataOps
3 pages
Developing Cloud Business Models
No ratings yet
Developing Cloud Business Models
7 pages
DataArt White Paper Art of Low Cost IoT Solution
No ratings yet
DataArt White Paper Art of Low Cost IoT Solution
8 pages
Aws Overview
100% (1)
Aws Overview
93 pages
Overview of Amazon Web Services: December 2018
No ratings yet
Overview of Amazon Web Services: December 2018
93 pages
Cloud Computing: Prasanna Pachwadkar
No ratings yet
Cloud Computing: Prasanna Pachwadkar
10 pages
Cloud Computing
No ratings yet
Cloud Computing
4 pages
Bhavesh Krishan Garg Cse2b-G1 (Lab02 CC)
No ratings yet
Bhavesh Krishan Garg Cse2b-G1 (Lab02 CC)
4 pages
Articles For Report
No ratings yet
Articles For Report
31 pages
Ljybtwsye0gzyeq9z Embedding GenAI With MongoDB
No ratings yet
Ljybtwsye0gzyeq9z Embedding GenAI With MongoDB
17 pages
CPA USA Information Systems and Controls: The Complete Syllabus Guide
From Everand
CPA USA Information Systems and Controls: The Complete Syllabus Guide
Azhar ul Haque Sario
No ratings yet
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
Distributed Computing 2525
No ratings yet
Distributed Computing 2525
21 pages
CN LAB First Page
No ratings yet
CN LAB First Page
2 pages
Ethics and Ai
No ratings yet
Ethics and Ai
93 pages
CCS345 Ethics and AI Apr May 2024 Question Paper Download
No ratings yet
CCS345 Ethics and AI Apr May 2024 Question Paper Download
4 pages
Oracle DBA - RAC - Updated Profile - Kanchana
No ratings yet
Oracle DBA - RAC - Updated Profile - Kanchana
4 pages
Procedure For Obtaining Water Connection
No ratings yet
Procedure For Obtaining Water Connection
2 pages
JD Software Engineer Quality Intern-2024
No ratings yet
JD Software Engineer Quality Intern-2024
2 pages
Reset Visual Assist Trial Period
89% (9)
Reset Visual Assist Trial Period
4 pages
CV Nathanel John Corpuz
No ratings yet
CV Nathanel John Corpuz
1 page
MBA- 404-G Ecommerce Analytics
No ratings yet
MBA- 404-G Ecommerce Analytics
1 page
Continual Service Improvement Readiness Assessment
No ratings yet
Continual Service Improvement Readiness Assessment
23 pages
Ict Exam
No ratings yet
Ict Exam
8 pages
Ecwid 2019 PCI DSS v3 - 2 - 1
No ratings yet
Ecwid 2019 PCI DSS v3 - 2 - 1
16 pages
Storage Area Network SAN SEMINAR
No ratings yet
Storage Area Network SAN SEMINAR
6 pages
COM2021 281 EN Annexe Proposition CP Part1 YGqOlw69VmZuPhJsP06yr5IajvU 76607
No ratings yet
COM2021 281 EN Annexe Proposition CP Part1 YGqOlw69VmZuPhJsP06yr5IajvU 76607
7 pages
Detect Phishing Website by Using Machine Learning
No ratings yet
Detect Phishing Website by Using Machine Learning
4 pages
Paperless Document JSON Developer Guide
No ratings yet
Paperless Document JSON Developer Guide
11 pages
Ceh V10
No ratings yet
Ceh V10
97 pages
Lifetech-CAF Programmer-Afifah
No ratings yet
Lifetech-CAF Programmer-Afifah
9 pages
Numetry Technologies Campus Recruitment - 2025 Passing Out Batch
No ratings yet
Numetry Technologies Campus Recruitment - 2025 Passing Out Batch
1 page
Itsm Usage Rights Details
No ratings yet
Itsm Usage Rights Details
12 pages
Chapter 10 E-COMMERCE DIGITAL MARKETS, DIGITAL GOODS
No ratings yet
Chapter 10 E-COMMERCE DIGITAL MARKETS, DIGITAL GOODS
40 pages
Justdial SWOT
No ratings yet
Justdial SWOT
2 pages
Cybersecurity Guidelines Cloud Access
No ratings yet
Cybersecurity Guidelines Cloud Access
36 pages
Practice Exam Questions For: Nokia Virtual Private Routed Networks (Exam Number: 4A0-106)
100% (1)
Practice Exam Questions For: Nokia Virtual Private Routed Networks (Exam Number: 4A0-106)
21 pages
C File Handling Program Example
No ratings yet
C File Handling Program Example
7 pages
Salesforce Administrator Syllabus
No ratings yet
Salesforce Administrator Syllabus
14 pages
Alteryx Spreadsheet Users Guide To Modern Analytics en
No ratings yet
Alteryx Spreadsheet Users Guide To Modern Analytics en
17 pages
Control M Administration and Monitoring Services
No ratings yet
Control M Administration and Monitoring Services
9 pages
Scale Factor in Staad Pro For Response Spectrum Analysis
No ratings yet
Scale Factor in Staad Pro For Response Spectrum Analysis
2 pages
Hunting Your Memory Heather Adkins Google
No ratings yet
Hunting Your Memory Heather Adkins Google
64 pages
Secure Data Transfer and Deletion From Counting Bloom Filter
No ratings yet
Secure Data Transfer and Deletion From Counting Bloom Filter
51 pages
SOFTWARE ENGINERING ;LAB_compressed
No ratings yet
SOFTWARE ENGINERING ;LAB_compressed
36 pages
Unit 2
No ratings yet
Unit 2
73 pages

Csm-Part C

Uploaded by

Csm-Part C

Uploaded by

PART C

Decoding the scalability of Disney+Hotstar app: powerful data structure

The core of scalability: infrastructure setup

How Disney+Hotstar ensures seamless scalability?

Here are some of the features that a typical page holds:

Deploying intelligent client for seamless performance

This was just the tip of the iceberg!

2.Security and Compliance

Benefits of AWS security

 SOC 1/ISAE 3402, SOC 2, SOC 3

• Open the IAM console in your management account.

by AWS and their applications across industries.

purpose, memory-optimized, and GPU instances, to match their specific requirements.

applications, data processing pipelines, and micro services architectures.

relational databases by automating time-consuming administrative tasks such as backups,

their data efficiently.

discovering, cataloguing, and cleaning data, enabling organizations to derive meaningful

offers built-in algorithms for common use cases.

You might also like