SlideShare a Scribd company logo
Robert	Hryniewicz
Data		Evangelist
@RobHryniewicz
Hands-on	Intro	to	Spark	&	Zeppelin
Crash Course
2 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
The	“Big	Data”	Problem
à A	single	machine	cannot	process	or	even	store	all	the	data!
Problem
Solution
à Distribute	data	over	large	clusters
Difficulty
à How	to	split	work	across	machines?
à Moving	data	over	network	is	expensive
à Must	consider	data	&	network	locality
à How	to	deal	with	failures?
à How	to	deal	with	slow	nodes?
3 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	Background
4 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Access	Rates
At	least	an	order	of	magnitude	difference	between	memory	and	hard	drive	/	network	speed
FAST slow slow
5 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
What	is	Spark?
à Apache	Open	Source	Project - originally	developed	at	AMPLab (University	of	California	
Berkeley)
à Data	Processing	Engine - focused	on	in-memory	distributed	computing	use-cases
à API - Scala,	Python,	Java	and	R
6 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	Ecosystem
Spark	Core
Spark	SQL Spark	Streaming MLLib GraphX
7 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Why	Spark?
à Elegant	Developer	APIs
– Single	environment	for	data	munging and	Machine	Learning	(ML)
à In-memory	computation	model	– Fast!
– Effective	for	iterative	computations	and	ML
à Machine	Learning
– Implementation	of	distributed	ML	algorithms
– Pipeline	API	(Spark	ML)
8 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
History	of	Hadoop &	Spark
9 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Apache	Spark	Basics
10 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	Context
à Main	entry	point	for	Spark	functionality
à Represents	a	connection	to	a	Spark	cluster
à Represented	as	sc in	your	code
What	is	it?
11 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
RDD	- Resilient	Distributed	Dataset
à Primary	abstraction	in	Spark
– An	Immutable collection	of	objects	(or	records,	or	elements)	that	can	be	operated	on	in	parallel
à Distributed
– Collection	of	elements	partitioned across	nodes	in	a	cluster
– Each	RDD	is	composed	of	one	or	more	partitions
– User	can	control	the	number	of	partitions
– More	partitions	=>	more	parallelism	
à Resilient
– Recover	from	node	failures
– An	RDD	keeps	its	lineage	information	->	it	can	be	recreated	from	parent	RDDs
à Created	by	starting	with	a	file	in	Hadoop Distributed	File	System	(HDFS)	or	an	existing	
collection	in	the	driver	program
à May	be	persisted in	memory	for	efficient reuse across	parallel	operations	(caching)
12 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
RDD	– Resilient	Distributed	Dataset
Partition	
1
Partition	
2
Partition	
3
RDD	2
Partition	
1
Partition	
2
Partition	
3
Partition	
4
RDD	1
Cluster
Nodes
13 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	SQL
14 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	SQL	Overview
à Spark	module	for	structured	data	processing	(e.g.	DB	tables,	JSON	files)
à Three	ways	to	manipulate	data:
– DataFrames API
– SQL	queries
– Datasets	API	
à Same	execution	engine	for	all	three
à Spark	SQL	interfaces provide	more	information	about	both	structure and	computation
being	performed	than	basic	Spark	RDD	API
15 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
DataFrames
à Conceptually equivalent to	a	table	in	relational	DB	or	data	frame	in	R/Python
à API	available	in	Scala,	Java,	Python,	and	R
à Richer	optimizations	(significantly	faster	than	RDDs)
à Distributed	collection	of	data	organized	into	named	columns
à Underneath	is	an	RDD
16 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
DataFrames
CSVAvro
HIVE
Spark	SQL
Text
Col1 Col2 … … ColN
DataFrame
(with	RDD	underneath)
Column
Row
Created	from	Various	Sources
à DataFrames from	HIVE:
– Reading	and	writing	HIVE	tables,	
including	ORC
à DataFrames from	files:
– Built-in:	JSON,	JDBC,	ORC,	Parquet,	HDFS
– External	plug-in:	CSV,	HBASE,	Avro
à DataFrames from	existing	RDDs
– with	toDF()function
Data	is	described	as	a	DataFrame
with	rows,	columns	and	a	schema
17 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
SQL	Context	and	Hive	Context
à Entry	point	into	all	functionality	in	Spark	SQL
à All	you	need	is	SparkContext
val sqlContext = SQLContext(sc)
SQLContext
à Superset	of	functionality	provided	by	basic	SQLContext
– Read	data	from	Hive	tables
– Access	to	Hive	Functions	à UDFs
HiveContext
val hc = HiveContext(sc)
Use	when	your	
data	resides	in	
Hive
18 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	SQL	Examples
19 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
DataFrame Example
val df = sqlContext.table("flightsTbl")
df.select("Origin", "Dest", "DepDelay").show(5)
Reading	Data	From	Table
+------+----+--------+
|Origin|Dest|DepDelay|
+------+----+--------+
| IAD| TPA| 8|
| IAD| TPA| 19|
| IND| BWI| 8|
| IND| BWI| -4|
| IND| BWI| 34|
+------+----+--------+
20 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
DataFrame Example
df.select("Origin", "Dest", "DepDelay”).filter($"DepDelay" > 15).show(5)
Using	DataFrame API	to	Filter	Data	(show	delays	more	than	15	min)
+------+----+--------+
|Origin|Dest|DepDelay|
+------+----+--------+
| IAD| TPA| 19|
| IND| BWI| 34|
| IND| JAX| 25|
| IND| LAS| 67|
| IND| MCO| 94|
+------+----+--------+
21 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
SQL	Example
// Register Temporary Table
df.registerTempTable("flights")
// Use SQL to Query Dataset
sqlContext.sql("SELECT Origin, Dest, DepDelay
FROM flights
WHERE DepDelay > 15 LIMIT 5").show
Using	SQL	to	Query	and	Filter	Data	(again,	show	delays	more	than	15	min)
+------+----+--------+
|Origin|Dest|DepDelay|
+------+----+--------+
| IAD| TPA| 19|
| IND| BWI| 34|
| IND| JAX| 25|
| IND| LAS| 67|
| IND| MCO| 94|
+------+----+--------+
22 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
RDD	vs.	DataFrame
23 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
RDDs	vs.	DataFrames
RDD
DataFrame
à Lower-level	API	(more	control)
à Lots	of	existing	code	&	users
à Compile-time	type-safety
à Higher-level	API	(faster	development)
à Faster	sorting,	hashing,	and	serialization
à More	opportunities	for	automatic	optimization
à Lower	memory	pressure
24 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Data Frames are Intuitive
RDD Example
Equivalent Data Frame Example
dept name age
Bio H	Smith 48
CS A	Turing 54
Bio B	Jones 43
Phys E Witten 61
Find	average	age	by	
department?
25 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	SQL	Optimizations
à Spark	SQL	uses	an	underlying	optimization	engine	(Catalyst)
– Catalyst	can	perform	intelligent	optimization	since	it	understands	the	schema
à Spark	SQL	does	not	materialize	all	the	columns	(as	with	RDD)	only	what’s	needed
26 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Catalyst:	Spark	SQL	optimizer
à Query	or	data	frame	operations	modeled	as	a	tree
à Logical	plan	created	and	optimized
à Various	physical	plans	created;	best	plan	chosen
à Code	generation	and	execution
27 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	Streaming
28 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	Streaming
à Extension	of	Spark	Core	API
à Stream	processing	of	live	data	streams
– Scalable
– High-throughput
– Fault-tolerant
Overview
29 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	Streaming
30 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	Streaming
à Apply	transformations	over	a	sliding	window	of	data,	e.g.	rolling	average
Window	Operations
31 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Apache	Zeppelin	&	HDP	Sandbox
32 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Apache	Zeppelin	– A	Modern	Web-based	Data	Science	Studio
à Data	exploration	and	discovery
à Visualization
à Deeply	integrated	with	Spark	and	Hadoop
à Pluggable	interpreters
à Multiple	languages	in	one	notebook:	R,	Python,	Scala
33 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
34 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
35 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
36 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
What’s	not	included	with	Spark?
ResourceManagement
Storage
Applications
Spark	Core	Engine
Scala
Java
Python
libraries
MLlib	
(Machine	
learning)
Spark	
SQL*
Spark	
Streaming*
Spark	Core	Engine
37 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
HDP	Sandbox
What’s	included	in	the	Sandbox?
à Zeppelin	
à Latest	Hortonworks	Data	Platform	(HDP)
– Spark
– YARN	à Resource	Management
– HDFS	à Distributed	Storage	Layer
– And	many	more	components... YARN
Scala
Java
Python
R
APIs
Spark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
N
HDFS
38 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Access patterns enabled by YARN
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
°
°N
HDFS
Hadoop Distributed File System
Interactive Real-TimeBatch
Applications Batch
Needs to happen but, no
timeframe limitations
Interactive
Needs to happen at
Human time
Real-Time
Needs to happen at
Machine Execution time.
39 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Why	Spark	on	YARN?
à Utilize	existing	HDP	cluster	infrastructure
à Resource	management	
– share	Spark	workloads	with	other	workloads	like	PIG,	HIVE,	etc.
à Scheduling	and	queues
Spark	Driver
Client
Spark
Application	Master
YARN	container
Spark	Executor
YARN	container
Task Task
Spark	Executor
YARN	container
Task Task
Spark	Executor
YARN	container
Task Task
40 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Why HDFS?
Fault Tolerant Distributed Storage
• Divide	files	into	big	blocks	and	distribute	3	copies	randomlyacross	the	cluster
• Processing	Data	Locality
• Not	Just	storage	but	computation
10110100101
00100111001
11111001010
01110100101
00101100100
10101001100
01010010111
01011101011
11011011010
10110100101
01001010101
01011100100
11010111010
0
Logical File
1
2
3
4
Blocks
1
Cluster
1
1
2
2
2
3
3
34
4
4
41 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
There’s more to HDP
YARN : Data Operating System
DATA ACCESS SECURITY
GOVERNANCE &
INTEGRATION OPERATIONS
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
N
Data Lifecycle &
Governance
Falcon
Atlas
Administration
Authentication
Authorization
Auditing
Data Protection
Ranger
Knox
Atlas
HDFS	EncryptionData Workflow
Sqoop
Flume
Kafka
NFS
WebHDFS
Provisioning,
Managing, &
Monitoring
Ambari
Cloudbreak
Zookeeper
Scheduling
Oozie
Batch
MapReduce
Script
Pig
Search
Solr
SQL
Hive
NoSQL
HBase
Accumulo
Phoenix
Stream
Storm
In-memory Others
ISV Engines
Tez Tez Slider Slider
DATA MANAGEMENT
Hortonworks	Data	Platform	2.4.x
Deployment	ChoiceLinux	 Windows	 On-Premise	 Cloud
HDFS Hadoop Distributed File System
42 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
HDP	2.5	TP
43 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
44 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
45 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
View	User	Sessions
46 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Hortonworks	Community	Connection
47 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Hortonworks	Community	Connection
Read access for everyone, join to participate and be recognized
• Full	Q&A	Platform	(like	StackOverflow)
• Knowledge	Base	Articles
• Code	Samples	and	Repositories
48 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Community	Engagement
Participate now at: community.hortonworks.com©	Hortonworks	Inc.	2011	–2015.	All	Rights	Reserved
7,500+
Registered	Users
15,000+
Answers
20,000+
Technical	Assets
One Website!
49 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Lab	Preview
50 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Link	to	Tutorial	with	Lab	Instructions
http://tinyurl.com/hwx-intro-to-spark
Robert	Hryniewicz
@RobHryniewicz
Thanks!

More Related Content

What's hot (20)

PPTX
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
 
PPTX
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks
 
PDF
Fast SQL on Hadoop, really?
DataWorks Summit
 
PPTX
Hortonworks Data In Motion Series Part 4
Hortonworks
 
PDF
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
 
PPTX
Falcon Meetup
Hortonworks
 
PPTX
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
PPTX
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
PPTX
Intro to Spark with Zeppelin
Hortonworks
 
PPTX
Protecting your Critical Hadoop Clusters Against Disasters
DataWorks Summit
 
PPTX
Log Analytics Optimization
Hortonworks
 
PDF
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
PDF
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
PDF
The Next Generation of Data Processing and Open Source
DataWorks Summit/Hadoop Summit
 
PDF
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
PPTX
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
PPTX
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
PDF
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks
 
Fast SQL on Hadoop, really?
DataWorks Summit
 
Hortonworks Data In Motion Series Part 4
Hortonworks
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
 
Falcon Meetup
Hortonworks
 
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
Intro to Spark with Zeppelin
Hortonworks
 
Protecting your Critical Hadoop Clusters Against Disasters
DataWorks Summit
 
Log Analytics Optimization
Hortonworks
 
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
The Next Generation of Data Processing and Open Source
DataWorks Summit/Hadoop Summit
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 

Viewers also liked (20)

PDF
Hadoop Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PDF
Data Science Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PPTX
Apache Ranger Hive Metastore Security
DataWorks Summit/Hadoop Summit
 
PPTX
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
PPTX
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
PDF
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
PPTX
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
 
PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
PPTX
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
PPTX
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
PDF
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
 
PDF
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
 
PPTX
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
PPTX
Open Source Ingredients for Interactive Data Analysis in Spark
DataWorks Summit/Hadoop Summit
 
PPT
Toward Better Multi-Tenancy Support from HDFS
DataWorks Summit/Hadoop Summit
 
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
 
PPTX
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
PPTX
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Data Science Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Apache Ranger Hive Metastore Security
DataWorks Summit/Hadoop Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
 
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
 
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Open Source Ingredients for Interactive Data Analysis in Spark
DataWorks Summit/Hadoop Summit
 
Toward Better Multi-Tenancy Support from HDFS
DataWorks Summit/Hadoop Summit
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
 
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ (20)

PPTX
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
 
PPTX
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
 
PPTX
Apache Spark: Lightning Fast Cluster Computing
All Things Open
 
PPTX
Apache Spark Crash Course
DataWorks Summit
 
PPTX
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
PDF
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Simplifying Big Data Analytics with Apache Spark
Databricks
 
PPTX
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Alex Zeltov
 
PDF
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
 
PDF
Unified Big Data Processing with Apache Spark
C4Media
 
PDF
Spark forspringdevs springone_final
sdeeg
 
PDF
Learning Spark Lightningfast Data Analytics 2nd Edition Jules S Damji
snaggbarumx3
 
PDF
Dev Ops Training
Spark Summit
 
PDF
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
PPT
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
PPTX
Spark crash course workshop at Hadoop Summit
DataWorks Summit
 
PPTX
Intro to Apache Spark by CTO of Twingo
MapR Technologies
 
PPTX
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
 
Apache Spark: Lightning Fast Cluster Computing
All Things Open
 
Apache Spark Crash Course
DataWorks Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Alex Zeltov
 
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
 
Unified Big Data Processing with Apache Spark
C4Media
 
Spark forspringdevs springone_final
sdeeg
 
Learning Spark Lightningfast Data Analytics 2nd Edition Jules S Damji
snaggbarumx3
 
Dev Ops Training
Spark Summit
 
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
Spark crash course workshop at Hadoop Summit
DataWorks Summit
 
Intro to Apache Spark by CTO of Twingo
MapR Technologies
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
Ad

Recently uploaded (20)

PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 

Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ