sparkcore

Spark Core is the foundational component of the Spark project, providing distributed task dispatching, scheduling, and I/O functionalities through an API that supports multiple programming languages. It utilizes the RDD (Resilient Distributed Dataset) abstraction for parallel operations and ensures fault-tolerance by tracking the lineage of RDDs. Additionally, Spark offers shared variables like broadcast variables and accumulators for efficient data handling and computation.

Uploaded by

derkuzesta

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

sparkcore

Uploaded by

derkuzesta

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

park Core

[edit]

Spark Core is the foundation of the overall project. It provides distributed task dispatching,
scheduling, and basic I/O functionalities, exposed through an application programming
interface (for Java, Python, Scala, .NET[16] and R) centered on the RDD abstraction (the Java
API is available for other JVM languages, but is also usable for some other non-JVM languages
that can connect to the JVM, such as Julia[17]). This interface mirrors a functional/higher-
order model of programming: a "driver" program invokes parallel operations such as map,
filter or reduce on an RDD by passing a function to Spark, which then schedules the
function's execution in parallel on the cluster.[2] These operations, and additional ones such
as joins, take RDDs as input and produce new RDDs. RDDs are immutable and their
operations are lazy; fault-tolerance is achieved by keeping track of the "lineage" of each RDD
(the sequence of operations that produced it) so that it can be reconstructed in the case of
data loss. RDDs can contain any type of Python, .NET, Java, or Scala objects.

Besides the RDD-oriented functional style of programming, Spark provides two restricted
forms of shared variables: broadcast variables reference read-only data that needs to be
available on all nodes, while accumulators can be used to program reductions in an
imperative style.[2]

A typical example of RDD-centric functional programming is the following Scala program that
computes the frequencies of all words occurring in a set of text files and prints the most
common ones. Each map, flatMap (a variant of map) and reduceByKey takes an anonymous
function that performs a simple operation on a single data item (or a pair of items), and
applies its argument to transform an RDD into a new RDD.

val conf = new SparkConf().setAppName("wiki_test") // create a spark config object

val sc = new SparkContext(conf) // Create a spark context

val data = sc.textFile("/path/to/somedir") // Read files from "somedir" into an RDD of

(filename, content) pairs.

val tokens = data.flatMap(_.split(" ")) // Split each file into a list of tokens (words).

val wordFreq = tokens.map((_, 1)).reduceByKey(_ + _) // Add a count of one to each token, then
sum the counts per word type.

wordFreq.sortBy(s => -s._2).map(x => (x._2, x._1)).top(10) // Get the top 10 words. Swap word
and count to sort by count.

The C# Player's Guide - 5th Edition - 5.0.0
83% (18)
The C# Player's Guide - 5th Edition - 5.0.0
497 pages
Corce
70% (46)
Corce
206 pages
Introduction To Computer Theory by Cohen Solutions Manual
80% (5)
Introduction To Computer Theory by Cohen Solutions Manual
198 pages
Ap Computer Science Principles Practice Exam and Notes 2021
86% (7)
Ap Computer Science Principles Practice Exam and Notes 2021
108 pages
The Ethical Slut PDF
55% (69)
The Ethical Slut PDF
298 pages
Hacking The Art of Exploitation 2nd Edition Jon Erickson
100% (20)
Hacking The Art of Exploitation 2nd Edition Jon Erickson
492 pages
Typography For Lawyers
33% (6)
Typography For Lawyers
9 pages
PrepTest 83 - Print and Take Test - 7sage Lsat
100% (3)
PrepTest 83 - Print and Take Test - 7sage Lsat
46 pages
50 Phone Hacks DR - Brad
58% (19)
50 Phone Hacks DR - Brad
29 pages
One-Page Mythic GME
100% (8)
One-Page Mythic GME
11 pages
C# Cheat Sheet
100% (6)
C# Cheat Sheet
12 pages
Learn Python in A Day
100% (14)
Learn Python in A Day
141 pages
All Codes Mobile
100% (1)
All Codes Mobile
53 pages
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
No ratings yet
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
96 pages
BDA-Unit-III
No ratings yet
BDA-Unit-III
19 pages
Unit-V Spark
No ratings yet
Unit-V Spark
69 pages
Bda 5
No ratings yet
Bda 5
21 pages
Spark PPT
No ratings yet
Spark PPT
55 pages
Parallel Programming With Spark: Matei Zaharia
No ratings yet
Parallel Programming With Spark: Matei Zaharia
40 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
C5-SPARK Technology
No ratings yet
C5-SPARK Technology
39 pages
4.1. Spark Basics
No ratings yet
4.1. Spark Basics
28 pages
Apache Spark
No ratings yet
Apache Spark
31 pages
Spark Notes
No ratings yet
Spark Notes
6 pages
5 - Programming With RDDs and Dataframes
No ratings yet
5 - Programming With RDDs and Dataframes
32 pages
spark
No ratings yet
spark
160 pages
BDA-Lec8
No ratings yet
BDA-Lec8
39 pages
BDT Unit 3
No ratings yet
BDT Unit 3
105 pages
Spark Programming Basics
No ratings yet
Spark Programming Basics
54 pages
Mod4 Bda
No ratings yet
Mod4 Bda
14 pages
unit 6 spark (2)
No ratings yet
unit 6 spark (2)
43 pages
3- SPARK
No ratings yet
3- SPARK
51 pages
Writing Spark Application
No ratings yet
Writing Spark Application
37 pages
BDA1
No ratings yet
BDA1
17 pages
Spark
No ratings yet
Spark
96 pages
sparksql
No ratings yet
sparksql
2 pages
Apache Spark Components
No ratings yet
Apache Spark Components
4 pages
Spark: Fast, Interactive, Language-Integrated Cluster Computing
No ratings yet
Spark: Fast, Interactive, Language-Integrated Cluster Computing
25 pages
Function Spark
No ratings yet
Function Spark
9 pages
Ch. 4
No ratings yet
Ch. 4
4 pages
Lec 9
No ratings yet
Lec 9
38 pages
What Is Spark?: History of Apache Spark
No ratings yet
What Is Spark?: History of Apache Spark
65 pages
Bootcamp Keynote
No ratings yet
Bootcamp Keynote
47 pages
Spark Slides
No ratings yet
Spark Slides
23 pages
Spark - RDD CS DESIGN
No ratings yet
Spark - RDD CS DESIGN
1 page
Lecture 10 - Spark
No ratings yet
Lecture 10 - Spark
87 pages
Unit 5
100% (1)
Unit 5
109 pages
Open Spark Shell
No ratings yet
Open Spark Shell
12 pages
Spark And Scala Week 1
No ratings yet
Spark And Scala Week 1
16 pages
Apache Spark: CS240A Winter 2016. T Yang
No ratings yet
Apache Spark: CS240A Winter 2016. T Yang
36 pages
Introduction Data Science Programming Handout Set 1A
No ratings yet
Introduction Data Science Programming Handout Set 1A
53 pages
ECS765P_W4_Introduction to Spark
No ratings yet
ECS765P_W4_Introduction to Spark
39 pages
DA U2
No ratings yet
DA U2
17 pages
Lecture 25
No ratings yet
Lecture 25
59 pages
Unit-5 Spark
No ratings yet
Unit-5 Spark
24 pages
notes - Copy
No ratings yet
notes - Copy
5 pages
UNIT 4 Part 2
No ratings yet
UNIT 4 Part 2
11 pages
Function Spark
No ratings yet
Function Spark
10 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
APACHE SPARK and Scala
No ratings yet
APACHE SPARK and Scala
49 pages
BDA-Lec7
No ratings yet
BDA-Lec7
32 pages
Lec 9
No ratings yet
Lec 9
33 pages
M5
No ratings yet
M5
18 pages
Unit 4
No ratings yet
Unit 4
8 pages
SPARK
No ratings yet
SPARK
66 pages
Spark Deep Dive
No ratings yet
Spark Deep Dive
34 pages
CS226 06 RDD
No ratings yet
CS226 06 RDD
29 pages
Big Data Assignment
No ratings yet
Big Data Assignment
6 pages
Introduction To Spark
No ratings yet
Introduction To Spark
54 pages
Scala and Spark Overview PDF
No ratings yet
Scala and Spark Overview PDF
37 pages
Lecture 19-RDD in Spark
No ratings yet
Lecture 19-RDD in Spark
12 pages
QuickStart Guide to Db2 Development with Python
From Everand
QuickStart Guide to Db2 Development with Python
Roger E. Sanders
No ratings yet
quantalgo
No ratings yet
quantalgo
2 pages
wiki_revenge
No ratings yet
wiki_revenge
1 page
wiki_sikh_rule
No ratings yet
wiki_sikh_rule
2 pages
pythonsyntax
No ratings yet
pythonsyntax
2 pages
wiki_sindoor
No ratings yet
wiki_sindoor
1 page
quanthistory
No ratings yet
quanthistory
3 pages
PRMintro
No ratings yet
PRMintro
1 page
quantumhist
No ratings yet
quantumhist
2 pages
redis histori
No ratings yet
redis histori
1 page
History Spark
No ratings yet
History Spark
1 page
springbootintrp
No ratings yet
springbootintrp
2 pages
Ai in Real
No ratings yet
Ai in Real
2 pages
Presently, The Vast Majority o
No ratings yet
Presently, The Vast Majority o
1 page
carshistory
No ratings yet
carshistory
2 pages
Wiki Peadia 39209
No ratings yet
Wiki Peadia 39209
2 pages
An Economy (A) Is An Area of The Production, Distribution and Trade
No ratings yet
An Economy (A) Is An Area of The Production, Distribution and Trade
1 page
Ecomnomy Crickets
No ratings yet
Ecomnomy Crickets
2 pages
An Economy (A) Is An Area of The Production
No ratings yet
An Economy (A) Is An Area of The Production
2 pages
Jets
No ratings yet
Jets
3 pages
Fellow Crickeeteases
No ratings yet
Fellow Crickeeteases
2 pages
An Economy
No ratings yet
An Economy
3 pages
Untitled Design
No ratings yet
Untitled Design
1 page
Linux Cheat Sheet
No ratings yet
Linux Cheat Sheet
4 pages
Eat That Frog
100% (10)
Eat That Frog
124 pages
AI Tools and Prompts
100% (4)
AI Tools and Prompts
94 pages
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
100% (1)
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
247 pages
Simple Sabotage Field Manual
100% (2)
Simple Sabotage Field Manual
16 pages
Introduction To Computer Science
100% (6)
Introduction To Computer Science
202 pages
The JavaScript Beginner's Handbook
90% (10)
The JavaScript Beginner's Handbook
76 pages
Learn To Code Getting Started Guide
100% (4)
Learn To Code Getting Started Guide
23 pages
Structured and Unstructured Maintenance With Example
0% (1)
Structured and Unstructured Maintenance With Example
9 pages
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
No ratings yet
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
174 pages
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
No ratings yet
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
47 pages
Credit Card Processing System
No ratings yet
Credit Card Processing System
18 pages
Do You Speak Java
No ratings yet
Do You Speak Java
186 pages
How To Use PATS Module Initialization Function
No ratings yet
How To Use PATS Module Initialization Function
5 pages
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
83% (6)
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
86 pages
Learning Liquid
100% (1)
Learning Liquid
89 pages
Learn To Code HTML and CSS Develop Style Websites PDF
100% (2)
Learn To Code HTML and CSS Develop Style Websites PDF
595 pages

sparkcore

Uploaded by

sparkcore

Uploaded by

park Core

val conf = new SparkConf().setAppName("wiki_test") // create a spark config object

val sc = new SparkContext(conf) // Create a spark context

val data = sc.textFile("/path/to/somedir") // Read files from "somedir" into an RDD of

You might also like