Spark Parallel Processes and Aggregation PDF
Spark Parallel Processes and Aggregation PDF
Deploying Spark
Applications
Dr. Gasan Elkhodari
BUAN6346
Big Data analytics
1. Spark driver runs on the client
sc.wholeTextFiles(“mydir”):
This creates a paired RDD, where the key is the
name of the file, and the contents are the
value.
Onto that cluster we place an HDFS file
that consists of three blocks.