Amazon EMR Serverless Architecture and Use Cases

Uploaded by

nikki91476

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Amazon EMR Serverless Architecture and Use Cases

Uploaded by

nikki91476

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Amazon EMR Serverless Architecture and Use Cases

Lesson objectives
In this lesson, you will learn the following:
 The architecture of Amazon EMR Serverless
 Typical use cases for Amazon EMR Serverless
 Key points about Amazon EMR Serverless

How is Amazon EMR Serverless used to architect a cloud solution?

Amazon EMR Serverless automatically provisions, configures, and scales compute
and memory resources required at each stage of your data-processing application.
With Amazon EMR Serverless, your jobs run faster because it includes the
performance-optimized Amazon EMR runtime for Apache Spark, Hive, Presto, and
other technologies. Additionally, Amazon EMR Serverless integrates with EMR Studio
to provide an interactive development experience using notebooks and familiar
open-source tools. Such tools include Spark UI and Tez UI to help you develop,
visualize, and debug your applications.
How does Amazon EMR Serverless work?
As soon as the Amazon EMR application is created, users can start submitting their
Apache Spark jobs. There are multiple ways to submit Apache Spark jobs. For
example, you can use Apache Airflow, Step Functions, EMR Studio notebook, the
AWS CLI, AWS SDK, or custom-built pipelines. Amazon EMR Serverless automatically
provisions workers required for data processing jobs in the Amazon EMR service
account. These interact with resources in your AWS account to run the jobs.
To learn more about how Amazon EMR Serverless works, choose the following three
interactive markers.
1.
2.
3.
What are the core concepts of Amazon EMR Serverless?
With Amazon EMR Serverless, the behind-the-scenes architecture remains similar to
Amazon EMR running on Amazon EC2 and Amazon EMR on EKS. However, the core
concepts you work with shift from nodes to application, jobs, workers, and pre-
initialized workers. To learn more about the core concepts of Amazon EMR
Serverless, choose the following four tabs.
ApplicationJobWorkersPre-initialized workers
You can create one or more applications that use open-source analytics frameworks
by specifying the framework that you want to use (for example, Apache Spark or
Apache Hive), the Amazon EMR release version, and the name of your application.
What are typical use cases for Amazon EMR Serverless?
To learn more about Amazon EMR Serverless use cases, expand each of the
following six sections.
Apache Spark ETL jobs
–
With Amazon EMR Serverless, you can run Apache Spark ETL jobs on an application
with the type parameter set to “SPARK.”
For example:
 Extract: Read CSV data from Amazon S3.
 Transform: Add or remove columns in the dataset.
 Load: Write updated data back to Amazon S3 (load).
Jobs must be compatible with the Apache Spark version referenced in the Amazon
EMR release version. For example, when you run jobs on an application with
Amazon EMR release 6.6.0, your job must be compatible with Apache Spark 3.2.0.

Alternatively, you can submit the same Apache Spark ETL job without any code
changes to other deployment options, such as Amazon EMR on Amazon EC2 or
Amazon EMR on EKS. You can submit the job using the AWS Management Console,
AWS CLI, or Amazon EMR APIs. With Amazon EMR on Amazon EC2, you can submit
jobs using the Amazon EMR steps API during or after cluster launch.
Large-scale SQL queries using Hive
–
Apache Hive on Amazon EMR provides data warehouse-like query capabilities. You
can read, write, and manage petabytes of data using a SQL-like language with
Amazon EMR Serverless or Amazon EMR on Amazon EC2 clusters. Starting with EMR
6.0.0, Amazon EMR Hive supports the Live Long and Process (LLAP) functionality.
LLAP uses persistent daemons with intelligent in-memory caching to improve Hive
query performance.
Amazon EMR 6.1.0 and later support Hive ACID (atomicity, consistency, isolation,
durability) transactions, so it complies with the ACID properties of a database. With
this feature, you can run INSERT, UPDATE, DELETE, and MERGE SQL operations in
Hive-managed tables with data stored in Amazon S3.
Interactive analysis using Jupyter notebooks with EMR studio
–
EMR Studio provides a managed interactive analysis environment for Jupyter
notebooks. It can help data scientists and data engineers to develop, visualize, and
debug data engineering and data science applications written in R, Python, Scala, or
PySpark.
With EMR Studio, you can start notebooks in seconds, get onboarded with sample
notebooks, and perform your data exploration. You can collaborate with peers using
built-in real-time collaboration and track changes across notebook versions using
Git repositories. You can also customize your environment by loading custom
kernels and Python libraries from notebooks, or start parameterized notebooks as
part of scheduled workflows using orchestration services like Apache Airflow or
Amazon MWAA.
Ad-hoc analysis using Presto
–
When using Presto on Amazon EMR on Amazon EC2, you can run interactive queries
on large datasets with minimal setup time. Amazon EMR handles the provisioning,
configuration, and tuning of Hadoop clusters. Presto is included in Amazon EMR
versions 5.0.0 and later.

Presto running on Amazon EMR gives you more flexibility in how you configure and
run queries, including the ability to federate to other data sources if needed. For
example:
 A use case that requires Lightweight Directory Access Protocol (LDAP)
authentication for clients such as the Presto CLI or Java Database
Connectivity/Open Database Connectivity drivers
 A workflow in which you need to join data between different systems like
MySQL, Amazon Redshift, Apache Cassandra, and Hive.
Building real-time streaming data pipelines
–
With Amazon EMR, you can perform fault-tolerant stream processing of live data
streams using Apache Spark or Apache Flink data frameworks. With Apache Spark,
you can run Spark streaming or Apache Spark Structured Streaming applications.
Structured streaming is a scalable and fault-tolerant stream processing engine built
on the Spark SQL, while Spark streaming uses DStream API, powered by Spark RDDs
(Resilient Data Sets), to process streams of data.

Amazon EMR makes it possible for the streaming data pipelines to distribute and
process data across dynamically scalable Amazon EC2 instances and Amazon S3.
You can use Amazon EMR to analyze event-based data for use cases such as
personalization, product discovery, and fraud detection. Amazon EMR also supports
Apache Flink, which lets you run real-time stream processing on high-throughput
data sources.
Running AI/ML workloads on Amazon EMR
–
You can pre-process data and train models and perform prediction and validation to
build accurate ML models using Amazon EMR. You can analyze data using open-
source ML frameworks such as Apache Spark MLlib, TensorFlow, and Apache MXNet.

Amazon EMR is used for ML use cases in which Spark is already used with a
persistent cluster, or where an end-to-end pipeline already exists and the team has
the skill set and inclination to run a persistent cluster. With a wide range of instance
types, including AWS Graviton processors and Amazon EC2 Spot Instances, Amazon
EMR offers flexibility and cost optimization for running ML workloads.
Amazon EMR also features integrations with Amazon SageMaker, in which a
SageMaker model training job can start from a Spark pipeline in Amazon EMR.

Amazon EMR Studio offers fully managed Jupyter notebooks for visualization with an
ability to log in through AWS IAM Identity Center (successor to AWS Single Sign-On.

With Amazon EMR Serverless, the behind-the-scenes architecture remains

similar to Amazon EMR running on Amazon EC2 and Amazon EMR on EKS.
However, the core concepts you work with shift from nodes to application,
jobs, workers, and pre-initialized workers.
What else should I keep in mind when using Amazon EMR Serverless?
There are multiple aspects to consider while designing workloads to run
on Amazon EMR. To learn more about Amazon EMR Serverless
considerations, expand each of the following three sections.
Fine-grained autoscaling with no need to guess cluster sizes
–
Amazon EMR Serverless eliminates the need to right-size clusters for
varying jobs and data sizes. It automatically adds and removes workers at
different stages of your job. With Amazon EMR Serverless, you provide the
minimum and maximum number of concurrent workers for your
application, as well as compute resources and storage for each worker.
Amazon EMR automatically adds and removes workers based on what the
job requires, within your specified limits. Also, it provisions, configures,
and dynamically scales the compute and memory resources needed a
teach stage of your data processing application. You’re charged for
aggregated vCPU, memory, and storage resources used from the time a
worker starts running until it stops, rounded up to the nearest second
with a 1-minute minimum. It’s a cost-effective offering, where you need to
pay only for the compute time and resources that were used.
Resilience to Availability Zone failures
+
Different Amazon EMR deployment options
+
Getting Started with Amazon EMR Serverless
To learn more about getting started with Amazon EMR Serverless when
you deploy a sample Spark or Hive workload, choose the following button.
Go To user guide
Run Big Data Applications without Managing Servers
To learn more about using Amazon EMR Serverless, choose the following
button.
Go to AWS Blog
What's next?
In this lesson, you learned the basics of Amazon EMR Serverless
architecture and the use cases the service can be applied to. In the next
lesson, you will learn the basics of Amazon EMR cluster

AWS Cloud Practitioner (CLF C02)
100% (1)
AWS Cloud Practitioner (CLF C02)
102 pages
Amazon Web Services (AWS) Interview Questions and Answers
From Everand
Amazon Web Services (AWS) Interview Questions and Answers
Tech Interviews
4.5/5 (3)
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
AWS Certified Cloud Practitioner - Practice Paper 1: AWS Certified Cloud Practitioner, #1
From Everand
AWS Certified Cloud Practitioner - Practice Paper 1: AWS Certified Cloud Practitioner, #1
Tech Interviews
4.5/5 (2)
Network Design Case Study 1
0% (2)
Network Design Case Study 1
3 pages
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
From Everand
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
Tech Interviews
No ratings yet
DMWQ1D4S1T2 - Building Data Pipelines With Amazon EMR and MWAA - Updated
No ratings yet
DMWQ1D4S1T2 - Building Data Pipelines With Amazon EMR and MWAA - Updated
26 pages
Storage: The Node Types in Amazon EMR Are As Follows
No ratings yet
Storage: The Node Types in Amazon EMR Are As Follows
10 pages
9.elastic MapReduce-Redshift
No ratings yet
9.elastic MapReduce-Redshift
16 pages
Module3 5
No ratings yet
Module3 5
11 pages
AWS_EMR
No ratings yet
AWS_EMR
14 pages
ColorImages
No ratings yet
ColorImages
56 pages
Amazon Emr Management Guide
No ratings yet
Amazon Emr Management Guide
314 pages
AWS in Action Part -2: Real-world Solutions for Cloud Professionals
From Everand
AWS in Action Part -2: Real-world Solutions for Cloud Professionals
Poonam Devi
No ratings yet
LabManual5_ProcessingLogs_Using_EMR(1) (1)
No ratings yet
LabManual5_ProcessingLogs_Using_EMR(1) (1)
29 pages
Amazon EMR
No ratings yet
Amazon EMR
6 pages
Emr MGMT
No ratings yet
Emr MGMT
585 pages
02 - Apache Spark On Amazon EMR
No ratings yet
02 - Apache Spark On Amazon EMR
31 pages
Amazon Emr Migration Guide
No ratings yet
Amazon Emr Migration Guide
167 pages
Downloaded_oct24_Lab5_latestManual (1)
No ratings yet
Downloaded_oct24_Lab5_latestManual (1)
24 pages
Step by Step: Fault-tolerant, Scalable, Secure AWS Web Stack
From Everand
Step by Step: Fault-tolerant, Scalable, Secure AWS Web Stack
Savitra Sirohi
No ratings yet
Amazon Emr Migration Guide
No ratings yet
Amazon Emr Migration Guide
141 pages
AWS Plus Common Big Data Notes
No ratings yet
AWS Plus Common Big Data Notes
3 pages
Building 1000 Node Spark Cluster On EMR
No ratings yet
Building 1000 Node Spark Cluster On EMR
53 pages
Amazon Elastic MapReduce PDF
No ratings yet
Amazon Elastic MapReduce PDF
231 pages
Production Data Processing With Apache Spark
No ratings yet
Production Data Processing With Apache Spark
7 pages
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
From Everand
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
Tech Interviews
5/5 (2)
AWS Project by AnwarAkhtar
No ratings yet
AWS Project by AnwarAkhtar
7 pages
AWS Cloud Automation: Harnessing Terraform For AWS Infrastructure As Code
From Everand
AWS Cloud Automation: Harnessing Terraform For AWS Infrastructure As Code
Rob Botwright
No ratings yet
Cheat Sheet AWS Solutions Architect Professional
No ratings yet
Cheat Sheet AWS Solutions Architect Professional
177 pages
Mastering Amazon Web Services: Essential AWS Techniques
From Everand
Mastering Amazon Web Services: Essential AWS Techniques
Ed A Norex
No ratings yet
lambda-architecure-on-for-batch-aws
No ratings yet
lambda-architecure-on-for-batch-aws
12 pages
AWS for Beginners
From Everand
AWS for Beginners
Sankar Srinivasan
No ratings yet
WhizCard CLF C02 Cheat Sheet Nov 2024
No ratings yet
WhizCard CLF C02 Cheat Sheet Nov 2024
110 pages
AWS Certified Solutions Architect Associate Exam Insights : Q&A with Explanations
From Everand
AWS Certified Solutions Architect Associate Exam Insights : Q&A with Explanations
SUJAN
No ratings yet
A Comprehensive Guide to Amazon Web Services
From Everand
A Comprehensive Guide to Amazon Web Services
Josh Luberisse
No ratings yet
AWS Associate Architect: From basic to advanced
From Everand
AWS Associate Architect: From basic to advanced
Alex Carvalho
No ratings yet
AWS Amazon EMR
100% (1)
AWS Amazon EMR
38 pages
AWS for Beginners: A Step-by-Step Guide to Cloud Computing
From Everand
AWS for Beginners: A Step-by-Step Guide to Cloud Computing
Sankar Srinivasan
No ratings yet
Amazon Web Services: Migrating your .NET Enterprise Application
From Everand
Amazon Web Services: Migrating your .NET Enterprise Application
Rob Linton
No ratings yet
AWS Solutions Architect Cheat Sheet Nov 2024
No ratings yet
AWS Solutions Architect Cheat Sheet Nov 2024
148 pages
AWS Certified Solutions Architect - Associate Exam Prep kit
From Everand
AWS Certified Solutions Architect - Associate Exam Prep kit
SUJAN
No ratings yet
CSAA Whizcard Revised 19 07 2021
No ratings yet
CSAA Whizcard Revised 19 07 2021
119 pages
How Are Hadoop and Big Data Related?
No ratings yet
How Are Hadoop and Big Data Related?
18 pages
AWS Compute
No ratings yet
AWS Compute
1 page
Products Whizcard Saa c02!26!23
No ratings yet
Products Whizcard Saa c02!26!23
132 pages
Amazon Elastic MapReduce Best Practices
No ratings yet
Amazon Elastic MapReduce Best Practices
38 pages
Four Programming Languages Creating a Complete Website Scraper Application
From Everand
Four Programming Languages Creating a Complete Website Scraper Application
Stephen J Link
No ratings yet
Handout Introduction To AWS Services Compute, Storage, Databases
No ratings yet
Handout Introduction To AWS Services Compute, Storage, Databases
32 pages
AWS-Solutions-Architect-Cheat-Sheet-Feb-2025
No ratings yet
AWS-Solutions-Architect-Cheat-Sheet-Feb-2025
65 pages
Oracle APEX Tips and Tricks
From Everand
Oracle APEX Tips and Tricks
Priyanka Agarwal
No ratings yet
Introduction to Amazon AWS
From Everand
Introduction to Amazon AWS
Eric Frick
No ratings yet
Elastic Compute Cloud (EC2)
No ratings yet
Elastic Compute Cloud (EC2)
59 pages
amazon-emr-hardware
No ratings yet
amazon-emr-hardware
59 pages
AWS Services List and CLF02 Content - Services and Usage-1
No ratings yet
AWS Services List and CLF02 Content - Services and Usage-1
57 pages
Amazon Web Services V06jun2013
No ratings yet
Amazon Web Services V06jun2013
15 pages
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
From Everand
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
Tech Interviews
5/5 (1)
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
From Everand
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
Anthony Serpico
No ratings yet
Aws ED2 Overview
No ratings yet
Aws ED2 Overview
44 pages
Handout Getting Started With Serverless Applications
No ratings yet
Handout Getting Started With Serverless Applications
26 pages
Lean Maintenance Roadmap
0% (1)
Lean Maintenance Roadmap
11 pages
Pi Delta Hardware Reference 29D-071397 Internet
No ratings yet
Pi Delta Hardware Reference 29D-071397 Internet
102 pages
Research - Configuration and User Guide - Mass Creation of Bank Master
No ratings yet
Research - Configuration and User Guide - Mass Creation of Bank Master
21 pages
Font Awesome
No ratings yet
Font Awesome
24 pages
Edgelink RESTful API Specification_v2.0
No ratings yet
Edgelink RESTful API Specification_v2.0
99 pages
Lab12 Voting Machine Design
No ratings yet
Lab12 Voting Machine Design
6 pages
944 Imagerunner Advance DX c5800 SM r6 210730
No ratings yet
944 Imagerunner Advance DX c5800 SM r6 210730
1,201 pages
Draft
No ratings yet
Draft
24 pages
MCQ From Control Statement To Pointer-2019
No ratings yet
MCQ From Control Statement To Pointer-2019
2 pages
WUB BBA Course Class No 7 8 Introduction To Internet-IRC
No ratings yet
WUB BBA Course Class No 7 8 Introduction To Internet-IRC
24 pages
Daa-r22-Unit 1&2-Digital Notes Cse Dept (A.y 2024-25) @DR.K
No ratings yet
Daa-r22-Unit 1&2-Digital Notes Cse Dept (A.y 2024-25) @DR.K
50 pages
Instructions For Dynapack Selection
No ratings yet
Instructions For Dynapack Selection
5 pages
Multiple Choice Questions (2021)
No ratings yet
Multiple Choice Questions (2021)
55 pages
Parts of The Computer
No ratings yet
Parts of The Computer
17 pages
Equipment List
No ratings yet
Equipment List
24 pages
Food C++
No ratings yet
Food C++
59 pages
Shivali 21020203006
No ratings yet
Shivali 21020203006
51 pages
Fantasy General Manual
100% (1)
Fantasy General Manual
39 pages
Trend Micro Cloud App Security Best Practice Guide
No ratings yet
Trend Micro Cloud App Security Best Practice Guide
28 pages
Ug Intel Fpga Download Cable II 683719 666496
No ratings yet
Ug Intel Fpga Download Cable II 683719 666496
24 pages
To Asia, With Love: Who We Are
No ratings yet
To Asia, With Love: Who We Are
6 pages
Vim Cheat Sheet
No ratings yet
Vim Cheat Sheet
4 pages
#14 Journal on Drone Ground Control Station
No ratings yet
#14 Journal on Drone Ground Control Station
16 pages
Question: How Does Leave Accrual Process Works in Peoplesoft? What Are The Different Setups Needed?
No ratings yet
Question: How Does Leave Accrual Process Works in Peoplesoft? What Are The Different Setups Needed?
4 pages
Project Synopsis
No ratings yet
Project Synopsis
15 pages
task_report_8900777_1914_7936497940
No ratings yet
task_report_8900777_1914_7936497940
14 pages
Introduction To C#
No ratings yet
Introduction To C#
46 pages
9626_w24_qp_02
No ratings yet
9626_w24_qp_02
8 pages
Problem Solving 5 RT Scheduling Analysis Soalan 2023
No ratings yet
Problem Solving 5 RT Scheduling Analysis Soalan 2023
3 pages