SlideShare a Scribd company logo
Spark as Service in cloud on Yarn
Hadoop Meetup
bharatb@qubole.com, rgupta@qubole.com
May 15, 2015
Agenda
• Spark on Yarn
• Autoscaling Spark Apps and Cluster
management
• Hive Integration with Spark
• Persistent History Server
Spark on Yarn
Hadoop1
Disadvantages of hadoop1
• Limited to only MR
• Separate Map and Reduce slots =>
underutilization
• JT is heavily loaded for job scheduling,
monitoring and resource allocation.
Yarn Overview
Advantages of Spark on Yarn
• General cluster for running multiple
workflows. AM can have custom logic for
scheduling
• AM can ask for more containers when
required and give up containers when free
• This become even better when yarn clusters
can autoscale
• Get features like spot nodes etc which brings
additional challenges
Advantages of Spark on Yarn
• Qubole Yarn clusters can upscale and downscale
based on load and support spot instances.
Autoscaling Spark Applications
Spark Provisioning: Problems
• Spark Application starts with fixed number of
resources and hold on to them till its alive
• Sometimes its difficult to estimate resources
required by a job since AM is long running
• It becomes limiting spl when Yarn clusters can
autoscale.
Dynamic Provisioning
• Speed up spark commands by using free
resources in yarn cluster and also by releasing
resources when free to RM.
Spark on Yarn basics
Driver
AM
Executor-1 Executor-n
• Cluster Mode: Driver and AM run in same JVM in
a yarn Executor
• Client Mode: Driver and AM run in separate JVM
• Driver and AM talk using Actors to handle both
cases
Driver AM Executor-1 Executor-n
Dynamic Provisioning: Problem
Statement
• Two parts:
– Spark AM has no way to ask for additional
containers and give up free containers
– Automating the process of requesting containers
and releasing containers. Cached data in
containers make this difficult
Dynamic Provisioning: Part1
Dynamic Provisioning: Part1
• Implementation of 2 new apis:
// Request 5 extra executors
sc.requestExecutors(5)
// Kill executors with IDs 1, 15, and 16
sc.killExecutors(Seq("1", "15", "16"))
requestExecutors
AM
Reporter Thread
E1 E2 En
• AM has reporter thread that has count of
number of executors
• Reporter thread was used to restart died
executors
• Driver increments count of number of
executors when sc.requestExecutors is called.
Driver
removeExecutors
• To kill executors, one must precisely tell which
executors need to be killed
• Driver maintains list of all executors and can
be obtained by:
sc.executorStorageStatuses.foreach(x => println(x.blockManagerId.executorId))
• Whats cached in each executor is also
available using:
sc.executorStorageStatuses.foreach(x => println(s”memUsed = ${x.memUsed}
diskUsed=${x.diskUsed)”))
Removing Executors Tradeoffs
• BlockManager in each executor can have
cached RDDs, shuffle and broadcast data
• Killing an executor with shuffle data will
require the stage to rerun.
• To avoid this use external shuffle service
introduced in spark-1.2
Dynamic Provisioning: Part2
Upscaling Heuristics
• Request Executors as many pending tasks
• Request Executors in rounds if there are
pending tasks, doubling number of executors
added in each round bounded by some upper
limit
• Request executors by estimating workload
• Introduced –max-executors as extra param
Downscaling Heuristics
• Remove Executors when they are idle
• Remove Executors if then are idle for X secs
• Cant downscale executors with shuffle data or
broadcast data.
• --num-executors act as minimum executors
Scope
• Kill executors on spot nodes first
• Flag for not killing up executors if they have
shuffle data
Where is the code?
• https://github.com/apache/spark/pull/2840
• https://github.com/apache/spark/pull/2746
Spark Hive Integration
What is involved?
• Spark programs should be able to access hive
metastore
• Other Qubole services can be producers or
consumers of data and metadata(hive, presto,
pig etc)
Using SparkSQL - Command UI
Using SparkSQL - Results
Using SparkSQL - Notebook
• SQL, Python, Scala code can be input
Using SparkSQL - REST api - scala
curl --silent -X POST 
-H "X-AUTH-TOKEN: $AUTH_TOKEN" 
-H "Content-Type: application/json" 
-H "Accept: application/json" 
-d '{
"program" : "val s = new org.apache.spark.sql.hive.HiveContext(sc);
s.sql("show tables").collect.foreach(println)",
"language" : "scala",
"command_type" : "SparkCommand"
}' 
https://api.qubole.net/api/latest/commands
Using SparkSQL - REST api - sql
curl --silent -X POST 
-H "X-AUTH-TOKEN: $AUTH_TOKEN" 
-H "Content-Type: application/json" 
-H "Accept: application/json" 
-d '{
"program" : "show tables",
"language" : "sql",
"command_type" : "SparkCommand"
}' 
https://api.qubole.net/api/latest/commands
NOT RELEASE YET
Using SparkSQL - qds-sdk-py / java
from qds_sdk.commands import SparkCommand
with open(“test_spark.py”) as f:
code = f.read()
cmd = SparkCommand.run(language="python",
label="spark", program=code)
results = cmd.get_results()
Using SparkSQL - Cluster config
Spark UI container info
Basic cluster organization
• DB instance in Qubole account
• ssh tunnel from master to metastore DB
• Metastore server running on master on port
10000
• On master and slave nodes, hive-site.xml:-
hive.metastore.uris=thrift://master_ip:10000
Hosted metastore
Problems
• yarn overhead should be 20% (TPC-H)
• Parquet needs higher PermGen
• cached tables use actual table
• alter table recover partitions not supported
• VPC cluster has slow access to metastore
• SchemaRDD gone - old jars dont run
• hive jars needed on system classpath
Future/Near future
• Run with Qubole’s hive codebase
• Metastore caching
• Benchmarking
Future/Near future
• Persistent History Server
• Fast access to spark AM running in customer
cluster
Thank You
Ad

More Related Content

What's hot (20)

Streaming replication in PostgreSQL
Streaming replication in PostgreSQLStreaming replication in PostgreSQL
Streaming replication in PostgreSQL
Ashnikbiz
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Equnix Business Solutions
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Mydbops
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
PostgreSQL-Consulting
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL Cluster
Srihari Sriraman
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?
PGConf APAC
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?
Mydbops
 
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas) PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
Ontico
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Ashnikbiz
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
Mydbops
 
MySQL shell and It's utilities - Praveen GR (Mydbops Team)
MySQL shell and It's utilities - Praveen GR (Mydbops Team)MySQL shell and It's utilities - Praveen GR (Mydbops Team)
MySQL shell and It's utilities - Praveen GR (Mydbops Team)
Mydbops
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
Mydbops
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Equnix Business Solutions
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
Spark performance tuning eng
Spark performance tuning engSpark performance tuning eng
Spark performance tuning eng
haiteam
 
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
Kristofferson A
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
distributed matters
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
Denish Patel
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
PostgreSQL-Consulting
 
Streaming replication in PostgreSQL
Streaming replication in PostgreSQLStreaming replication in PostgreSQL
Streaming replication in PostgreSQL
Ashnikbiz
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Equnix Business Solutions
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Mydbops
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
PostgreSQL-Consulting
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL Cluster
Srihari Sriraman
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?
PGConf APAC
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?
Mydbops
 
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas) PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
Ontico
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Ashnikbiz
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
Mydbops
 
MySQL shell and It's utilities - Praveen GR (Mydbops Team)
MySQL shell and It's utilities - Praveen GR (Mydbops Team)MySQL shell and It's utilities - Praveen GR (Mydbops Team)
MySQL shell and It's utilities - Praveen GR (Mydbops Team)
Mydbops
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
Mydbops
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Equnix Business Solutions
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
Spark performance tuning eng
Spark performance tuning engSpark performance tuning eng
Spark performance tuning eng
haiteam
 
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
Kristofferson A
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
distributed matters
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
Denish Patel
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
PostgreSQL-Consulting
 

Viewers also liked (20)

Building Machine Learning Pipelines
Building Machine Learning PipelinesBuilding Machine Learning Pipelines
Building Machine Learning Pipelines
InMobi Technology
 
Optimizer Hints
Optimizer HintsOptimizer Hints
Optimizer Hints
InMobi Technology
 
Attacking Web Proxies
Attacking Web ProxiesAttacking Web Proxies
Attacking Web Proxies
InMobi Technology
 
Introduction to cocoa sql mapper
Introduction to cocoa sql mapperIntroduction to cocoa sql mapper
Introduction to cocoa sql mapper
mavelph
 
Cloud Computing (CCSME 2015 talk) - mypapit
Cloud Computing (CCSME 2015 talk) - mypapitCloud Computing (CCSME 2015 talk) - mypapit
Cloud Computing (CCSME 2015 talk) - mypapit
Mohammad Hafiz Cs Mypapit
 
8 Ways a Digital Media Platform is More Powerful than “Marketing”
8 Ways a Digital Media Platform is More Powerful than “Marketing”8 Ways a Digital Media Platform is More Powerful than “Marketing”
8 Ways a Digital Media Platform is More Powerful than “Marketing”
New Rainmaker
 
How Often Should You Post to Facebook and Twitter
How Often Should You Post to Facebook and TwitterHow Often Should You Post to Facebook and Twitter
How Often Should You Post to Facebook and Twitter
Buffer
 
Slides That Rock
Slides That RockSlides That Rock
Slides That Rock
Slides That Rock
 
Why Content Marketing Fails
Why Content Marketing FailsWhy Content Marketing Fails
Why Content Marketing Fails
Rand Fishkin
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
SlideShare
 
Sea Of Greed
Sea Of GreedSea Of Greed
Sea Of Greed
Sarah Perez
 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShare
Kapost
 
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to SlideshareSTOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
Empowered Presentations
 
You Suck At PowerPoint!
You Suck At PowerPoint!You Suck At PowerPoint!
You Suck At PowerPoint!
Jesse Desjardins - @jessedee
 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization
Oneupweb
 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
Content Marketing Institute
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
SlideShare
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
InMobi Technology
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
SlideShare
 
Building Machine Learning Pipelines
Building Machine Learning PipelinesBuilding Machine Learning Pipelines
Building Machine Learning Pipelines
InMobi Technology
 
Introduction to cocoa sql mapper
Introduction to cocoa sql mapperIntroduction to cocoa sql mapper
Introduction to cocoa sql mapper
mavelph
 
8 Ways a Digital Media Platform is More Powerful than “Marketing”
8 Ways a Digital Media Platform is More Powerful than “Marketing”8 Ways a Digital Media Platform is More Powerful than “Marketing”
8 Ways a Digital Media Platform is More Powerful than “Marketing”
New Rainmaker
 
How Often Should You Post to Facebook and Twitter
How Often Should You Post to Facebook and TwitterHow Often Should You Post to Facebook and Twitter
How Often Should You Post to Facebook and Twitter
Buffer
 
Why Content Marketing Fails
Why Content Marketing FailsWhy Content Marketing Fails
Why Content Marketing Fails
Rand Fishkin
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
SlideShare
 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShare
Kapost
 
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to SlideshareSTOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
Empowered Presentations
 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization
Oneupweb
 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
Content Marketing Institute
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
SlideShare
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
InMobi Technology
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
SlideShare
 
Ad

Similar to Building Spark as Service in Cloud (20)

Spark on Yarn
Spark on YarnSpark on Yarn
Spark on Yarn
Qubole
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
Karan Alang
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on Cloud
Qubole
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and Microservices
Alexis Seigneurin
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
Joey Echeverria
 
Introduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlibIntroduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlib
pumaranikar
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Evan Chan
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
clairvoyantllc
 
Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)
Viral Solani
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
Spark Summit
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Spark Summit
 
What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0
Databricks
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on Yarn
Qubole
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
Karan Alang
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on Cloud
Qubole
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and Microservices
Alexis Seigneurin
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
Joey Echeverria
 
Introduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlibIntroduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlib
pumaranikar
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Evan Chan
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
clairvoyantllc
 
Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)
Viral Solani
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
Spark Summit
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Spark Summit
 
What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0
Databricks
 
Ad

More from InMobi Technology (19)

Ensemble Methods for Algorithmic Trading
Ensemble Methods for Algorithmic TradingEnsemble Methods for Algorithmic Trading
Ensemble Methods for Algorithmic Trading
InMobi Technology
 
Backbone & Graphs
Backbone & GraphsBackbone & Graphs
Backbone & Graphs
InMobi Technology
 
24/7 Monitoring and Alerting of PostgreSQL
24/7 Monitoring and Alerting of PostgreSQL24/7 Monitoring and Alerting of PostgreSQL
24/7 Monitoring and Alerting of PostgreSQL
InMobi Technology
 
Reflective and Stored XSS- Cross Site Scripting
Reflective and Stored XSS- Cross Site ScriptingReflective and Stored XSS- Cross Site Scripting
Reflective and Stored XSS- Cross Site Scripting
InMobi Technology
 
Introduction to Threat Modeling
Introduction to Threat ModelingIntroduction to Threat Modeling
Introduction to Threat Modeling
InMobi Technology
 
HTTP Basics Demo
HTTP Basics DemoHTTP Basics Demo
HTTP Basics Demo
InMobi Technology
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014
InMobi Technology
 
Security News Bytes Null Dec Meet Bangalore
Security News Bytes Null Dec Meet BangaloreSecurity News Bytes Null Dec Meet Bangalore
Security News Bytes Null Dec Meet Bangalore
InMobi Technology
 
Matriux blue
Matriux blueMatriux blue
Matriux blue
InMobi Technology
 
PCI DSS v3 - Protecting Cardholder data
PCI DSS v3 - Protecting Cardholder dataPCI DSS v3 - Protecting Cardholder data
PCI DSS v3 - Protecting Cardholder data
InMobi Technology
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
InMobi Technology
 
Shodan- That Device Search Engine
Shodan- That Device Search EngineShodan- That Device Search Engine
Shodan- That Device Search Engine
InMobi Technology
 
Big Data BI Simplified
Big Data BI SimplifiedBig Data BI Simplified
Big Data BI Simplified
InMobi Technology
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
InMobi Technology
 
Building Audience Analytics Platform
Building Audience Analytics PlatformBuilding Audience Analytics Platform
Building Audience Analytics Platform
InMobi Technology
 
Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile Context
InMobi Technology
 
Freedom Hack Report 2014
Freedom Hack Report 2014Freedom Hack Report 2014
Freedom Hack Report 2014
InMobi Technology
 
Hadoop fundamentals
Hadoop fundamentalsHadoop fundamentals
Hadoop fundamentals
InMobi Technology
 
Ensemble Methods for Algorithmic Trading
Ensemble Methods for Algorithmic TradingEnsemble Methods for Algorithmic Trading
Ensemble Methods for Algorithmic Trading
InMobi Technology
 
24/7 Monitoring and Alerting of PostgreSQL
24/7 Monitoring and Alerting of PostgreSQL24/7 Monitoring and Alerting of PostgreSQL
24/7 Monitoring and Alerting of PostgreSQL
InMobi Technology
 
Reflective and Stored XSS- Cross Site Scripting
Reflective and Stored XSS- Cross Site ScriptingReflective and Stored XSS- Cross Site Scripting
Reflective and Stored XSS- Cross Site Scripting
InMobi Technology
 
Introduction to Threat Modeling
Introduction to Threat ModelingIntroduction to Threat Modeling
Introduction to Threat Modeling
InMobi Technology
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014
InMobi Technology
 
Security News Bytes Null Dec Meet Bangalore
Security News Bytes Null Dec Meet BangaloreSecurity News Bytes Null Dec Meet Bangalore
Security News Bytes Null Dec Meet Bangalore
InMobi Technology
 
PCI DSS v3 - Protecting Cardholder data
PCI DSS v3 - Protecting Cardholder dataPCI DSS v3 - Protecting Cardholder data
PCI DSS v3 - Protecting Cardholder data
InMobi Technology
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
InMobi Technology
 
Shodan- That Device Search Engine
Shodan- That Device Search EngineShodan- That Device Search Engine
Shodan- That Device Search Engine
InMobi Technology
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
InMobi Technology
 
Building Audience Analytics Platform
Building Audience Analytics PlatformBuilding Audience Analytics Platform
Building Audience Analytics Platform
InMobi Technology
 
Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile Context
InMobi Technology
 

Recently uploaded (20)

Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 

Building Spark as Service in Cloud

  • 1. Spark as Service in cloud on Yarn Hadoop Meetup [email protected], [email protected] May 15, 2015
  • 2. Agenda • Spark on Yarn • Autoscaling Spark Apps and Cluster management • Hive Integration with Spark • Persistent History Server
  • 5. Disadvantages of hadoop1 • Limited to only MR • Separate Map and Reduce slots => underutilization • JT is heavily loaded for job scheduling, monitoring and resource allocation.
  • 7. Advantages of Spark on Yarn • General cluster for running multiple workflows. AM can have custom logic for scheduling • AM can ask for more containers when required and give up containers when free • This become even better when yarn clusters can autoscale • Get features like spot nodes etc which brings additional challenges
  • 8. Advantages of Spark on Yarn • Qubole Yarn clusters can upscale and downscale based on load and support spot instances.
  • 10. Spark Provisioning: Problems • Spark Application starts with fixed number of resources and hold on to them till its alive • Sometimes its difficult to estimate resources required by a job since AM is long running • It becomes limiting spl when Yarn clusters can autoscale.
  • 11. Dynamic Provisioning • Speed up spark commands by using free resources in yarn cluster and also by releasing resources when free to RM.
  • 12. Spark on Yarn basics Driver AM Executor-1 Executor-n • Cluster Mode: Driver and AM run in same JVM in a yarn Executor • Client Mode: Driver and AM run in separate JVM • Driver and AM talk using Actors to handle both cases Driver AM Executor-1 Executor-n
  • 13. Dynamic Provisioning: Problem Statement • Two parts: – Spark AM has no way to ask for additional containers and give up free containers – Automating the process of requesting containers and releasing containers. Cached data in containers make this difficult
  • 15. Dynamic Provisioning: Part1 • Implementation of 2 new apis: // Request 5 extra executors sc.requestExecutors(5) // Kill executors with IDs 1, 15, and 16 sc.killExecutors(Seq("1", "15", "16"))
  • 16. requestExecutors AM Reporter Thread E1 E2 En • AM has reporter thread that has count of number of executors • Reporter thread was used to restart died executors • Driver increments count of number of executors when sc.requestExecutors is called. Driver
  • 17. removeExecutors • To kill executors, one must precisely tell which executors need to be killed • Driver maintains list of all executors and can be obtained by: sc.executorStorageStatuses.foreach(x => println(x.blockManagerId.executorId)) • Whats cached in each executor is also available using: sc.executorStorageStatuses.foreach(x => println(s”memUsed = ${x.memUsed} diskUsed=${x.diskUsed)”))
  • 18. Removing Executors Tradeoffs • BlockManager in each executor can have cached RDDs, shuffle and broadcast data • Killing an executor with shuffle data will require the stage to rerun. • To avoid this use external shuffle service introduced in spark-1.2
  • 20. Upscaling Heuristics • Request Executors as many pending tasks • Request Executors in rounds if there are pending tasks, doubling number of executors added in each round bounded by some upper limit • Request executors by estimating workload • Introduced –max-executors as extra param
  • 21. Downscaling Heuristics • Remove Executors when they are idle • Remove Executors if then are idle for X secs • Cant downscale executors with shuffle data or broadcast data. • --num-executors act as minimum executors
  • 22. Scope • Kill executors on spot nodes first • Flag for not killing up executors if they have shuffle data
  • 23. Where is the code? • https://github.com/apache/spark/pull/2840 • https://github.com/apache/spark/pull/2746
  • 25. What is involved? • Spark programs should be able to access hive metastore • Other Qubole services can be producers or consumers of data and metadata(hive, presto, pig etc)
  • 26. Using SparkSQL - Command UI
  • 27. Using SparkSQL - Results
  • 28. Using SparkSQL - Notebook • SQL, Python, Scala code can be input
  • 29. Using SparkSQL - REST api - scala curl --silent -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" -d '{ "program" : "val s = new org.apache.spark.sql.hive.HiveContext(sc); s.sql("show tables").collect.foreach(println)", "language" : "scala", "command_type" : "SparkCommand" }' https://api.qubole.net/api/latest/commands
  • 30. Using SparkSQL - REST api - sql curl --silent -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" -d '{ "program" : "show tables", "language" : "sql", "command_type" : "SparkCommand" }' https://api.qubole.net/api/latest/commands NOT RELEASE YET
  • 31. Using SparkSQL - qds-sdk-py / java from qds_sdk.commands import SparkCommand with open(“test_spark.py”) as f: code = f.read() cmd = SparkCommand.run(language="python", label="spark", program=code) results = cmd.get_results()
  • 32. Using SparkSQL - Cluster config
  • 34. Basic cluster organization • DB instance in Qubole account • ssh tunnel from master to metastore DB • Metastore server running on master on port 10000 • On master and slave nodes, hive-site.xml:- hive.metastore.uris=thrift://master_ip:10000
  • 36. Problems • yarn overhead should be 20% (TPC-H) • Parquet needs higher PermGen • cached tables use actual table • alter table recover partitions not supported • VPC cluster has slow access to metastore • SchemaRDD gone - old jars dont run • hive jars needed on system classpath
  • 37. Future/Near future • Run with Qubole’s hive codebase • Metastore caching • Benchmarking
  • 38. Future/Near future • Persistent History Server • Fast access to spark AM running in customer cluster

Editor's Notes

  • #2: Intro- self, Qubole. In this video, we will see how users setup a Qubole Cluster in 3 simple steps.. Those 3 steps are…
  • #4: Intro- self, Qubole. In this video, we will see how users setup a Qubole Cluster in 3 simple steps.. Those 3 steps are…
  • #10: Intro- self, Qubole. In this video, we will see how users setup a Qubole Cluster in 3 simple steps.. Those 3 steps are…
  • #15: Intro- self, Qubole. In this video, we will see how users setup a Qubole Cluster in 3 simple steps.. Those 3 steps are…
  • #20: Intro- self, Qubole. In this video, we will see how users setup a Qubole Cluster in 3 simple steps.. Those 3 steps are…
  • #25: Intro- self, Qubole. In this video, we will see how users setup a Qubole Cluster in 3 simple steps.. Those 3 steps are…
  • #40: Intro- self, Qubole. In this video, we will see how users setup a Qubole Cluster in 3 simple steps.. Those 3 steps are…