Spark Optimizations & Deployment
Spark Optimizations & Deployment
Big Data
Spark optimizations & deployment
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
3. Page Rank example
4. Deployment on clusters & clouds
1
08/09/2021
Wide and Narrow transformations
Narrow transformations
• Local computations applied to each partition block
no communication between processes (or nodes)
only local dependencies (between parent & son RDDs)
•Map() •Union()
•Filter()
RDD RDD
• In case of sequence of Narrow transformations:
possible pipelining inside one step
Wide and Narrow transformations
Narrow transformations
• Local computations applied to each partition block
no communication between processes (or nodes)
only local dependencies (between parent & son RDDs)
•Map() •Union()
•Filter()
RDD RDD
• In case of failure:
recompute only the damaged partition blocks
recompute/reload only its parent blocks
Lineage
2
08/09/2021
Wide and Narrow transformations
Wide transformations
• Computations requiring data from all parent RDD blocks
many comms between processes (and nodes) (shuffle & sort)
non‐local dependencies (between parent & son RDDs)
•groupByKey()
•reduceByKey()
• In case of sequence of transformations:
no pipelining of transformations
wide transformation must be totally achieved before to enter
next transformation reduceByKey filter
Wide and Narrow transformations
Wide transformations
• Computations requiring data from all parent RDD blocks
many comms between processes (and nodes) (shuffle & sort)
non‐local dependencies (between parent & son RDDs)
•groupByKey()
•reduceByKey()
• In case of sequence of failure:
recompute the damaged partition blocks
recompute/reload all blocks of the parent RDDs
3
08/09/2021
Wide and Narrow transformations
Avoiding wide transformations with co‐partitioning
• With identical partitioning of inputs:
wide transforma on → narrow transformation
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
• RDD Persistence
• RDD Co‐partitionning
• RDD controlled distribution
• Traffic minimization
• Maintaining parallelism
3. Page Rank example
4. Deployment on clusters & clouds
4
08/09/2021
Optimizations: persistence
Persistence of the RDD
RDD are stored:
• in the memory space of the Spark Executors
• or on disk (of the node) when memory space of the Executor is full
By default: an old RDD is removed when memory space is required
(Least Recently Used policy)
Spark allows to make a
« persistent » RDD to
avoid to recompute it Source : Stack Overflow
Optimizations: persistence
Persistence of the RDD to improve Spark application performances
Spark application developper has to add instructions to force RDD
storage, and to force RDD forgetting:
myRDD.persist(StorageLevel) // or myRDD.cache()
… // Transformations and Actions
myRDD.unpersist()
10
5
08/09/2021
Optimizations: persistence
Persistence of the RDD to improve fault tolerance
To face short term failures: Spark application developper can force
RDD storage with replication in the local memory/disk of several
Spark Executors
myRDD.persist(storageLevel.MEMORY_AND_DISK_SER_2)
… // Transformations and Actions
myRDD.unpersist()
Longer, but secure!
11
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
• RDD Persistence
• RDD Co‐partitionning
• RDD controlled distribution
• Traffic minimization
• Maintaining parallelism
3. Page Rank example
4. Deployment on clusters & clouds
12
6
08/09/2021
Optimizations: RDD co‐partitionning
5 main internal properties of a RDD:
• A list of partition blocks
getPartitions()
• A function for computing each partition block
To compute and
compute(…)
re‐compute the
• A list of dependencies on other RDDs: parent RDD when failure
RDDs and transformations to apply happens
getDependencies()
Optionally: To control the
• A Partitioner for key‐value RDDs: metadata RDD partitioning,
specifying the RDD partitioning to achieve co‐
partitioner() partitioning…
• A list of nodes where each partition block To improve data
can be accessed faster due to data locality locality with
getPreferredLocations(…) HDFS & YARN…
13
Optimizations: RDD co‐partitionning
Specify a « partitioner »
Creates a new RDD (rdd2):
• partitionned according to hash partitionner strategy
• on 100 Spark Executors
Redistribute the RDD (rdd1 rdd2)
WIDE (expensive) transformation
• Do not keep the original partition (rdd1) in memory / on disk
• keep the new partition (rrd2) in memory / on disk
to avoid to repeat a WIDE transformation when rdd2 is re‐used
14
7
08/09/2021
Optimizations: RDD co‐partitionning
Specify a « partitioner »
Optimizations: RDD co‐partitionning
Avoid repetitive WIDE transformations on large data sets
Repeated op.
Same
partitioner
used on
Partitioner same set of
specified keys
B
Re‐partition Repeated op.
One time
Narrow
A A.join(B)
Wide Wide
• Make ONE Wide op (one time) to
avoid many Wide ops
• An explicit partitioning « propagates » B
to the transformation result
• Replace Wide op by Narrow op
• Do not re‐partition a RDD to use only A A’ A’.join(B)
once! Wide Wide
16
8
08/09/2021
Optimizations: RDD co‐partitionning
Co‐paritioning
Repeated op.
Use the same partitioner
Avoid to repeat Wide op.
B
Repeated op.
Wide
A Wide A’ A’.join(B)
Narrow
Created
with the
right
partitioning
A A’ A’.join(B) B
Wide Narrow Narrow
17
Optimizations: RDD co‐partitionning
PageRank with partitioner (see further)
Val links = …… // previous code
val links1 = links.partitionBy(new HashPartitioner(100)).persist()
• Pb: flatMap{…urlinks.map(…)} can change the partitionning ?!
18
9
08/09/2021
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
• RDD Persistence
• RDD Co‐partitionning
• RDD controlled distribution
• Traffic minimization
• Maintaining parallelism
3. Page Rank example
4. Deployment on clusters & clouds
19
Optimization: RDD distribution
Create and distribute a RDD
• By default: level of parallelism set by the nb of partition blocks
of the input RDD
• When the input is a in‐memory collection (list, array…), it needs
to be parallelized:
val theData = List(("a",1), ("b",2), ("c",3),……)
sc.parallelize(theData).theTransformation(…)
Or :
val theData = List(1,2,3,……).par
theData.theTransformation(…)
20
10
08/09/2021
Optimization: RDD distribution
Control of the RDD distribution
• Most of transformations support an extra parameter to control
the distribution (and the parallelism)
• Example:
Default parallelism:
val theData = List(("a",1), ("b",2), ("c",3),……)
sc.parallelize(theData).reduceByKey((x,y) => x+y)
Tuned parallelism:
val theData = List(("a",1), ("b",2), ("c",3),……)
sc.parallelize(theData).reduceByKey((x,y) => x+y,8)
8 partition blocks imposed for
the result of the reduceByKey
21
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
• RDD Persistence
• RDD Co‐partitionning
• RDD controlled distribution
• Traffic minimization
• Maintaining parallelism
3. Page Rank example
4. Deployment on clusters & clouds
22
11
08/09/2021
Optimization: traffic minimization
RDD redistribution: rdd : {(1, 2), (3, 3), (3, 4)}
Scala : rdd.groupByKey() rdd: {(1, [2]), (3, [3, 4])}
Group values associated to the same key
Move almost all input data
shuffle
Huge trafic in the shuffle step !!
23
Optimization: traffic minimization
RDD reduction: rdd : {(1, 2), (3, 3), (3, 4)}
Scala : rdd.reduceByKey((x,y) => x+y) rdd: {(1, 2), (3, 7)}
Reduce values associated to the same key
24
12
08/09/2021
Optimization: traffic minimization
RDD reduction with different input and reduced datatypes:
Scala : rdd.aggregateByKey(init_acc)(
…, // mergeValueAccumulator fct
…, // mergeAccumulators fct
)
Scala : rdd.combineByKey(
…, // createAccumulator fct
…, // mergeValueAccumulator fct
25
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
• RDD Persistence
• RDD Co‐partitionning
• RDD controlled distribution
• Traffic minimization
• Maintaining parallelism
3. Page Rank example
4. Deployment on clusters & clouds
26
13
08/09/2021
Optimization: maintaining parallelism
Computing an average value per key in parallel
theMarks: {(‘’julie’’, 12), (‘’marc’’, 10), (‘’albert’’, 19), (‘’julie’’, 15), (‘’albert’’, 15),…}
27
Optimization: maintaining parallelism
Computing an average value per key in parallel
theMarks: {(‘’julie’’, 12), (‘’marc’’, 10), (‘’albert’’, 19), (‘’julie’’, 15), (‘’albert’’, 15),…}
28
14
08/09/2021
Optimization: maintaining parallelism
Computing an average value per key in parallel
theMarks: {(‘’julie’’, 12), (‘’marc’’, 10), (‘’albert’’, 19), (‘’julie’’, 15), (‘’albert’’, 15),…}
29
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
3. Page Rank example
4. Deployment on clusters & clouds
• Task DAG execution
• Spark execution on clusters
• Ex of Spark execution on cloud
30
15
08/09/2021
PageRank with Spark
PageRank objectives
Important URL
(referenced by
Compute the probability to many pages)
arrive at a web page when Rank increases
url 1 (referenced by an
randomly clicking on web
important URL)
links…
url 2 url 4
url 3
• If a URL is referenced by many other URLs then its rank increases
(because being referenced means that it is important – ex: URL 1)
• If an important URL (like URL 1) references other URLs (like URL 4)
this will increase the destination’s ranking
31
PageRank with Spark
PageRank principles
• Simplified algorithm:
𝐵 𝑢 : the set containing all
pages linking to page u
𝑃𝑅 𝑣
𝑃𝑅 𝑢 𝑃𝑅 𝑥 : PageRank of page x
𝐿 𝑣
∈
𝐿 𝑣 : the number of outbound
links of page v
Contribution of page v
to the rank of page u
• Iterate k times:
compute PR of each page
32
16
08/09/2021
PageRank with Spark
PageRank principles
• The damping factor:
the probability a user continues to click is a damping factor: d
𝑁 : Nb of documents
1 𝑑 𝑃𝑅 𝑣 in the collection
𝑃𝑅 𝑢 𝑑.
𝑁 𝐿 𝑣
∈ Usually : d = 0.85
Sum of all PR is 1
Variant:
𝑃𝑅 𝑣
𝑃𝑅 𝑢 1 𝑑 𝑑. Usually : d = 0.85
𝐿 𝑣
∈
33
PageRank with Spark
PageRank first step in Spark (Scala)
// read text file into Dataset[String] -> RDD1
val lines = spark.read.textFile(args(0)).rdd
34
17
08/09/2021
PageRank with Spark
url 1
PageRank second step in Spark (Scala) url 2 url 4
Initialization with 1/N equi‐probability: url 3
// links <key, Iter> RDD ranks <key,1.0/Npages> RDD
var ranks = links.mapValues(v => 1.0/4.0)
links.mapValues(…) is an immutable RDD
var ranks is a mutable variable
var ranks = RDD1
ranks = RDD2
« ranks » is re‐associated to a new RDD
RDD1 is forgotten …
…and will be removed from memory
Other strategy:
// links <key, Iter> RDD ranks <key,one> RDD
var ranks = links.mapValues(v => 1.0)
links RDD url 4 [url 3, url 1] ranks RDD url 4 1.0
url 3 [url 2, url 1] url 3 1.0
url 2 [url 1] url 2 1.0
url 1 [url 4] url 1 1.0
35
PageRank with Spark
url 1
PageRank third step in Spark (Scala) url 2 url 4
for (i <- 1 to iters) {
val contribs = url 3
links.join(ranks)
.flatMap{ case (url (urlLinks, rank)) =>
urlLinks.map(dest => (dest, rank/urlLinks.size)) }
ranks = contribs.reduceByKey(_ + _)
.mapValues(0.15 + 0.85 * _)
}
links RDD Output links RDD’
url 4 [url 3, url 1] url 4 ([url 3, url 1], 1.0)
url 3 [url 2, url 1] url 3 ([url 2, url 1], 1.0) contribs RDD
url 2 [url 1] url 2 ([url 1], 1.0) .flatmap url 3 0.5
url 1 [url 4] url 1 ([url 4], 1.0) url 1 0.5
.join Output links & url 2 0.5
url 4 1.0 contributions url 1 0.5
url 3 1.0 url 1 1.0
url 2 1.0 url 4 1.0 url 3 0.5 .reduceByKey url 4 1.0
.mapValues url 1 2.0
url 1 1.0 url 3 0.57
url 2 0.5 individual input
ranks RDD url 2 0.57
url 4 1.0 contributions
url 1 1.849
new ranks RDD Individual & cumulated
var ranks (with damping factor) input contributions
36
18
08/09/2021
PageRank with Spark
PageRank third step in Spark (Scala)
• Spark & Scala allow a short/compact implementation of the
PageRank algorithm
• Each RDD remains in‐memory from one iteration to the next one
37
PageRank with Spark
PageRank third step in Spark (Scala): optimized with partitioner
Val links = …… // previous code
val links1 = links.partitionBy(new HashPartitioner(100)).persist()
• Pb: flatMap{…urlinks.map(…)} can change the partitionning ?!
38
19
08/09/2021
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
3. Page Rank example
4. Deployment on clusters & clouds
• Task DAG execution
• Spark execution on clusters
• Ex of Spark execution on cloud
39
Task DAG execution
• A RDD is a dataset distributed among the Spark compute nodes
• Transformations are lazy operations: saved and executed further
• Actions trigger the execution of the sequence of transformations
Action
Result
40
20
08/09/2021
Task DAG execution
The Spark application driver controls the application run
• It creates the Spark context
• It analyses the Spark program
41
Task DAG execution
Spark job trace: on 10 Spark executors, with 3GB input file
DAGScheduler: Submitting 24 missing tasks from ShuffleMapStage 0 ...
Submitting the 10
TaskSchedulerImpl: Adding task set 0.0 with 24 tasks first tasks on the
... 10 Spark executor
processes
TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.20.10.14, executor 0, partition 1, ...)
TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 172.20.10.11, executor 7, partition 2, ...)
...
TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10, 172.20.10.11, executor 7, partition 10, ...)
TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 18274 ms … (executor 7) (1/24)
TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11, 172.20.10.7, executor 8, partition 11, ...)
TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 18459 ms … (executor 8) (2/24)
...
Submitting a new
TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks task when a
have all completed, from pool previous one has
... finished
End of task graph
execution
42
21
08/09/2021
Task DAG execution
Execution time as a function of the number of Spark executors
Ex. of Spark application run: Spark pgm run on 1-15 nodes
• from 1 up to 15 executors 512
• with 1 executor per node
256
Good overall decrease but
Exec Time(s)
plateaus appear ! 128
Probable load balancing
64
problem…
32
1 2 4 8 16
Ex: a graph of 4 parallel tasks
Nb of nodes
43
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
3. Page Rank example
4. Deployment on clusters & clouds
• Task DAG execution
• Spark execution on clusters
• Using the Spark cluster manager (standalone mode)
• Using YARN as cluster manager
• Using Mesos as cluster manager
• Ex of Spark execution on cloud
44
22
08/09/2021
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
Spark cluster configuration:
• Add the list of cluster worker nodes in the Spark Master config.
• Specify the maximum amount of memory per Spark Executor
spark-submit --executor-memory XX …
• Specify the total amount of CPU cores used to process one
Spark application (through all its Spark executors)
spark-submit --total-executor-cores YY …
45
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
Spark cluster configuration:
• Default config :
− (only) 1GB/Spark Executor
− Unlimited nb of CPU cores per application execution
− The Spark Master creates one mono‐core Executor on all
Worker nodes to process each job …
• You can limit the total nb of cores per job
• You can concentrate the cores into few multi‐core Executors
46
23
08/09/2021
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
Spark cluster configuration:
• Default config :
− (only) 1GB/Spark Executor
− Unlimited nb of CPU cores per application execution
one multi‐core Executor
− The Spark Master creates one mono‐core Executor on all
on all
job (invading all cores!)
Worker nodes to process each job
• You can limit the total nb of cores per job
• You can concentrate the cores into few multi‐core Executors
47
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
48
24
08/09/2021
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
Spark Spark
Laptop connection executor executor
can be turn off: Spark
Spark Spark
production mode executor executor executor
49
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
HDFS
Name Node Cluster worker node
& Hadoop Data Node
50
25
08/09/2021
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
HDFS
Name Node Cluster worker node
& Hadoop Data Node
51
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
Cluster
Spark Master Spark app. Driver
deployment mode: Cluster • DAG builder
Manager • DAG scheduler‐optimizer
• Task scheduler
52
26
08/09/2021
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
53
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
3. Page Rank example
4. Deployment on clusters & clouds
• Task DAG execution
• Spark execution on clusters
• Using the Spark cluster manager (standalone mode)
• Using YARN as cluster manager
• Using Mesos as cluster manager
• Ex of Spark execution on cloud
54
27
08/09/2021
HDFS
Name Node Cluster worker node
& Hadoop Data Node
Spark cluster configuration:
• Add an env. variable defining the path to Hadoop conf directory
• Specify the maximum amount of memory per Spark Executor
• Specify the amount of CPU cores used per Spark executor
spark-submit --executor-cores YY …
• Specify the nb of Spark Executors per job: --num-executors
55
HDFS
Name Node Cluster worker node
& Hadoop Data Node
Spark cluster configuration:
• By default:
− (only) 1GB/Spark Executor
− (only) 1 CPU core per Spark Executor
− (only) 2 Spark Executors per job
• Usually better with few large Executors (RAM & nb of cores)…
56
28
08/09/2021
HDFS
Name Node Cluster worker node
& Hadoop Data Node
Spark cluster configuration:
• Link Spark RDD meta‐data « prefered locations » to HDFS meta‐
data about « localization of the input file blocks »
val sc = new SparkContext(sparkConf, Spark Context
InputFormatInfo.computePreferredLocations(
Seq(new InputFormatInfo(conf, construction
classOf[org.apache.hadoop.mapred.TextInputFormat], hdfspath ))…
57
Client deployment
YARN
mode: Resource App. Master
Manager Executor launcher
HDFS
Spark Driver Name Node
• DAG builder
• DAG scheduler‐
optimizer
• Task scheduler
58
29
08/09/2021
Client deployment
YARN
mode: Resource App. Master
Manager « Executor » launcher
HDFS Spark
Spark Driver Name Node executor
• DAG builder
• DAG scheduler‐
optimizer Spark
• Task scheduler executor
59
Cluster deployment
mode: YARN App. Master / Spark Driver
Resource • DAG builder
Manager • DAG scheduler‐optimizer
• Task scheduler
HDFS Spark
Name Node executor
Spark
executor
60
30
08/09/2021
YARN vs standalone Spark Master:
• Usually available on HADOOP/HDFS clusters
• Allows to run Spark and other kinds of applications on HDFS
(better to share a Hadoop cluster)
• Advanced application scheduling mechanisms
(multiple queues, managing priorities…)
61
YARN vs standalone Spark Master:
• Improvement of the data‐computation locality…but is it critical ?
− Spark reads/writes only input/output RDD from Disk/HDFS
− Spark keeps intermediate RDD in‐memory
− With cheap disks: disk‐IO time > network time
Better to deploy many Executors on unloaded nodes ?
62
31
08/09/2021
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
3. Page Rank example
4. Deployment on clusters & clouds
• Task DAG execution
• Spark execution on clusters
• Using the Spark cluster manager (standalone mode)
• Using YARN as cluster manager
• Using Mesos as cluster manager
• Ex of Spark execution on cloud
63
64
32
08/09/2021
Client deployment
mode: Mesos Master With just Mesos:
Cluster • No Application Master
Manager
• No Input Data – Executor locality
66
33
08/09/2021
Cluster deployment
mode: Mesos Master Spark Driver
Cluster • DAG builder
Manager • DAG scheduler‐
optimizer
• Task scheduler
HDFS
Name Node
67
68
34
08/09/2021
Spark optimizations & deployment
1. Wide and Narrow transformations
2. Optimizations
3. Page Rank example
4. Deployment on clusters & clouds
• Task DAG execution
• Spark execution on clusters
• Ex of Spark execution on cloud
69
70
35
08/09/2021
71
Standalone Spark app. Driver
Spark Master • DAG builder
MyCluster‐1
• DAG scheduler‐optimizer
• Task scheduler
Spark Master
HDFS
Name Node
72
36
08/09/2021
Standalone Spark app. Driver
MyCluster‐1 Spark Master • DAG builder
• DAG scheduler‐optimizer
• Task scheduler
Spark Master
HDFS
Name Node
73
Standalone Spark app. Driver
Spark Master • DAG builder
MyCluster‐1
• DAG scheduler‐optimizer
• Task scheduler
74
37
08/09/2021
HDFS
Name Node
75
76
38
08/09/2021
Spark optimizations & deployment
77
39