SlideShare a Scribd company logo
Spark’s Role in the Big Data Ecosystem 
Matei Zaharia
An Exciting Year for Spark 
Very fast community growth 
1.0 release in May 
7+ distributors, 20+ apps
Project Activity 
June 2013 
June 2014 
total 
contributors 
68 
255 
companies 
contributing 
17 
50 
total lines" 
of code 
63,000 
175,000
Project Activity 
June 2013 
June 2014 
total 
contributors 
68 
255 
companies 
contributing 
17 
50 
total lines" 
of code 
63,000 
175,000
Compared to Other Projects 
MapReduce 
YARN 
HDFS 
Storm 
Spark 
1400 
1200 
1000 
800 
600 
400 
200 
0 
MapReduce 
YARN 
HDFS 
Storm 
Spark 
300000 
250000 
200000 
150000 
100000 
50000 
0 
Commits 
Lines of Code Changed 
Activity in past 6 months
Compared to Other Projects 
MapReduce 
YARN 
HDFS 
Storm 
Spark 
1400 
1200 
1000 
800 
600 
400 
200 
0 
MapReduce 
YARN 
HDFS 
Storm 
Spark 
300000 
250000 
200000 
150000 
100000 
50000 
0 
Commits 
Lines of Code Changed 
Spark is now the most active project in the" 
Hadoop ecosystem 
Activity in past 6 months
Compared to Other Projects 
Spark is one of top 3 most active projects at Apache 
More active than “general” data processing projects 
like NumPy, matplotlib, SciKit-Learn
Continuing Growth 
source: ohloh.net 
Contributors per month to Spark
Major new additions
Last Summit 
Last Summit we said we’d focus on two things: 
• Standard libraries 
• Enterprise features 
New libraries: Spark SQL, MLlib (machine learning), 
GraphX (graph processing) 
Enterprise features: security, monitoring, HA
Spark SQL 
Enables loading & querying structured data in Spark 
From Hive: 
c = HiveContext(sc)! 
rows = c.sql(“select text, year from hivetable”)! 
rows.filter(lambda r: r.year > 2013).collect()! 
{“text”: “hi”, 
“user”: { 
“name”: “matei”, 
“id”: 123 
}} 
From JSON: 
c.jsonFile(“tweets.json”).registerAsTable(“tweets”)! 
c.sql(“select text, user.name from tweets”)! 
tweets.json
Spark SQL 
Integrates closely with Spark’s language APIs 
c.registerFunction(“hasSpark”, lambda text: “Spark” in text)! 
c.sql(“select * from tweets where hasSpark(text)”)! 
Uniform interface for data access 
44 contributors in 
past year 
Hive 
Parquet 
JSON 
Cassan-dra 
… 
SQL 
Python 
Scala 
Java
Machine Learning Library (MLlib) 
Standard library of machine learning algorithms 
Now includes 15+ algorithms 
• New in 1.0: decision trees, SVD, PCA, L-BFGS 
• In development: non-negative matrix factorization, LDA, 
Lanczos, multiclass trees, ADMM 
points = context.sql(“select latitude, longitude from tweets”)! 
model = KMeans.train(points, 10)! 
! 
40 contributors in 
past year
Java 8 API 
Enables concise programming in Java similar to 
Scala and Python 
JavaRDD<String> lines = sc.textFile("data.txt");! 
JavaRDD<Integer> lineLengths = lines.map(s -> s.length());! 
int totalLength = lineLengths.reduce((a, b) -> a + b);!
What is our vision for Spark?
1. Unified Platform for Big Data Apps 
Batch 
Interactive 
Streaming 
Hadoop 
Cassandra 
Mesos 
… 
Uniform API for diverse workloads over diverse 
storage systems and runtimes 
… 
Cloud 
Providers 
…
Why a Platform Matters 
Good for developers: one system to learn 
Good for users: take apps anywhere 
Good for distributors: more applications
2. Standard Library for Big Data 
Big data apps lack libraries" 
of common algorithms 
Spark’s generality + support" 
for multiple languages make it" 
suitable to offer this 
Python 
Scala 
Java 
R 
SQL 
ML 
graph 
Core 
… 
Much of future activity will be in these libraries
Databricks & Spark 
At Databricks, we are working to keep Spark 100% 
open source and compatible across vendors 
All our work on Spark is at Apache 
Check out project-specific talks to see what’s next!
Thank You and Enjoy Spark Summit!

More Related Content

What's hot (20)

PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
PDF
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
PDF
Introducing Databricks Delta
Databricks
 
PPTX
Introduction to Apache Spark
Rahul Jain
 
PDF
Building End-to-End Delta Pipelines on GCP
Databricks
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
PPTX
Free Training: How to Build a Lakehouse
Databricks
 
PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Spark
Heena Madan
 
PDF
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
PDF
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
PDF
Data Mesh for Dinner
Kent Graziano
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PPTX
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
PDF
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
PDF
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
PPTX
Introduction to Azure Databricks
James Serra
 
PDF
Databricks Delta Lake and Its Benefits
Databricks
 
PPTX
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
Introducing Databricks Delta
Databricks
 
Introduction to Apache Spark
Rahul Jain
 
Building End-to-End Delta Pipelines on GCP
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Free Training: How to Build a Lakehouse
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Data Mesh for Dinner
Kent Graziano
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Introduction to Azure Databricks
James Serra
 
Databricks Delta Lake and Its Benefits
Databricks
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 

Viewers also liked (20)

PDF
Temporal Databases: Data Models
torp42
 
PDF
JupyterHub for Interactive Data Science Collaboration
Carol Willing
 
PDF
Jupyter, A Platform for Data Science at Scale
Matthias Bussonnier
 
PPTX
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Mitul Tiwari
 
PDF
Big data ecosystem
magda3695
 
PPT
Temporal
sunsie
 
PPTX
Bde euro proworkshop
BigData_Europe
 
PDF
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Denodo
 
PDF
Temporal database
Hussain Azmee
 
PPTX
The Big Data Ecosystem for Financial Services
DataStax
 
PDF
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Caserta
 
PPTX
The Big Data Ecosystem at LinkedIn
OSCON Byrum
 
PDF
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”
BigData_Europe
 
PPTX
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
Jürgen Ambrosi
 
PDF
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
 
PDF
The Ecosystem is too damn big
DataWorks Summit/Hadoop Summit
 
PDF
Overview - IBM Big Data Platform
Vikas Manoria
 
PDF
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Edureka!
 
PDF
Big data landscape v 3.0 - Matt Turck (FirstMark)
Matt Turck
 
PDF
The Rise of the CDO in Today's Enterprise
Caserta
 
Temporal Databases: Data Models
torp42
 
JupyterHub for Interactive Data Science Collaboration
Carol Willing
 
Jupyter, A Platform for Data Science at Scale
Matthias Bussonnier
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Mitul Tiwari
 
Big data ecosystem
magda3695
 
Temporal
sunsie
 
Bde euro proworkshop
BigData_Europe
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Denodo
 
Temporal database
Hussain Azmee
 
The Big Data Ecosystem for Financial Services
DataStax
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Caserta
 
The Big Data Ecosystem at LinkedIn
OSCON Byrum
 
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”
BigData_Europe
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
Jürgen Ambrosi
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
 
The Ecosystem is too damn big
DataWorks Summit/Hadoop Summit
 
Overview - IBM Big Data Platform
Vikas Manoria
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Edureka!
 
Big data landscape v 3.0 - Matt Turck (FirstMark)
Matt Turck
 
The Rise of the CDO in Today's Enterprise
Caserta
 
Ad

Similar to Spark's Role in the Big Data Ecosystem (Spark Summit 2014) (20)

PDF
Composable Parallel Processing in Apache Spark and Weld
Databricks
 
PDF
BDTC2015 databricks-辛湜-state of spark
Jerry Wen
 
PPT
An Introduction to Apache spark with scala
johnn210
 
PDF
Spark streaming State of the Union - Strata San Jose 2015
Databricks
 
PDF
Why spark by Stratio - v.1.0
Stratio
 
PDF
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 
PDF
The BDAS Open Source Community
jeykottalam
 
PDF
Dev Ops Training
Spark Summit
 
PDF
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
 
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
PDF
Present and future of unified, portable, and efficient data processing with A...
DataWorks Summit
 
PDF
Big data apache spark + scala
Juantomás García Molina
 
PDF
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Databricks
 
PDF
Started with-apache-spark
Happiest Minds Technologies
 
PDF
New directions for Apache Spark in 2015
Databricks
 
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
PDF
Spark + AI Summit 2020 イベント概要
Paulo Gutierrez
 
PPTX
Koalas: Unifying Spark and pandas APIs
Takuya UESHIN
 
PDF
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
 
PDF
Spark Community Update - Spark Summit San Francisco 2015
Databricks
 
Composable Parallel Processing in Apache Spark and Weld
Databricks
 
BDTC2015 databricks-辛湜-state of spark
Jerry Wen
 
An Introduction to Apache spark with scala
johnn210
 
Spark streaming State of the Union - Strata San Jose 2015
Databricks
 
Why spark by Stratio - v.1.0
Stratio
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 
The BDAS Open Source Community
jeykottalam
 
Dev Ops Training
Spark Summit
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Present and future of unified, portable, and efficient data processing with A...
DataWorks Summit
 
Big data apache spark + scala
Juantomás García Molina
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Databricks
 
Started with-apache-spark
Happiest Minds Technologies
 
New directions for Apache Spark in 2015
Databricks
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
Spark + AI Summit 2020 イベント概要
Paulo Gutierrez
 
Koalas: Unifying Spark and pandas APIs
Takuya UESHIN
 
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
 
Spark Community Update - Spark Summit San Francisco 2015
Databricks
 
Ad

More from Databricks (20)

PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
PDF
Machine Learning CI/CD for Email Attack Detection
Databricks
 
PDF
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 

Recently uploaded (20)

PPTX
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
PDF
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
apidays
 
PDF
Top Civil Engineer Canada Services111111
nengineeringfirms
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PPTX
GLOBAL_Gender-module-5_committing-equity-responsive-budget.pptx
rashmisahu90
 
PPTX
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays
 
PDF
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
apidays
 
PPTX
apidays Munich 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (Aavista Oy)
apidays
 
DOCX
Online Delivery Restaurant idea and analyst the data
sejalsengar2323
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PDF
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
PPTX
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
apidays
 
PDF
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
Basotho Satisfaction with Electricity(Statspack)
KatlehoMefane
 
PPTX
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
apidays
 
PPTX
things that used in cleaning of the things
drkaran1421
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PDF
apidays Munich 2025 - Let’s build, debug and test a magic MCP server in Postm...
apidays
 
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
apidays
 
Top Civil Engineer Canada Services111111
nengineeringfirms
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
GLOBAL_Gender-module-5_committing-equity-responsive-budget.pptx
rashmisahu90
 
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays
 
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
apidays
 
apidays Munich 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (Aavista Oy)
apidays
 
Online Delivery Restaurant idea and analyst the data
sejalsengar2323
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
apidays
 
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
Basotho Satisfaction with Electricity(Statspack)
KatlehoMefane
 
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
apidays
 
things that used in cleaning of the things
drkaran1421
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
apidays Munich 2025 - Let’s build, debug and test a magic MCP server in Postm...
apidays
 

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)

  • 1. Spark’s Role in the Big Data Ecosystem Matei Zaharia
  • 2. An Exciting Year for Spark Very fast community growth 1.0 release in May 7+ distributors, 20+ apps
  • 3. Project Activity June 2013 June 2014 total contributors 68 255 companies contributing 17 50 total lines" of code 63,000 175,000
  • 4. Project Activity June 2013 June 2014 total contributors 68 255 companies contributing 17 50 total lines" of code 63,000 175,000
  • 5. Compared to Other Projects MapReduce YARN HDFS Storm Spark 1400 1200 1000 800 600 400 200 0 MapReduce YARN HDFS Storm Spark 300000 250000 200000 150000 100000 50000 0 Commits Lines of Code Changed Activity in past 6 months
  • 6. Compared to Other Projects MapReduce YARN HDFS Storm Spark 1400 1200 1000 800 600 400 200 0 MapReduce YARN HDFS Storm Spark 300000 250000 200000 150000 100000 50000 0 Commits Lines of Code Changed Spark is now the most active project in the" Hadoop ecosystem Activity in past 6 months
  • 7. Compared to Other Projects Spark is one of top 3 most active projects at Apache More active than “general” data processing projects like NumPy, matplotlib, SciKit-Learn
  • 8. Continuing Growth source: ohloh.net Contributors per month to Spark
  • 10. Last Summit Last Summit we said we’d focus on two things: • Standard libraries • Enterprise features New libraries: Spark SQL, MLlib (machine learning), GraphX (graph processing) Enterprise features: security, monitoring, HA
  • 11. Spark SQL Enables loading & querying structured data in Spark From Hive: c = HiveContext(sc)! rows = c.sql(“select text, year from hivetable”)! rows.filter(lambda r: r.year > 2013).collect()! {“text”: “hi”, “user”: { “name”: “matei”, “id”: 123 }} From JSON: c.jsonFile(“tweets.json”).registerAsTable(“tweets”)! c.sql(“select text, user.name from tweets”)! tweets.json
  • 12. Spark SQL Integrates closely with Spark’s language APIs c.registerFunction(“hasSpark”, lambda text: “Spark” in text)! c.sql(“select * from tweets where hasSpark(text)”)! Uniform interface for data access 44 contributors in past year Hive Parquet JSON Cassan-dra … SQL Python Scala Java
  • 13. Machine Learning Library (MLlib) Standard library of machine learning algorithms Now includes 15+ algorithms • New in 1.0: decision trees, SVD, PCA, L-BFGS • In development: non-negative matrix factorization, LDA, Lanczos, multiclass trees, ADMM points = context.sql(“select latitude, longitude from tweets”)! model = KMeans.train(points, 10)! ! 40 contributors in past year
  • 14. Java 8 API Enables concise programming in Java similar to Scala and Python JavaRDD<String> lines = sc.textFile("data.txt");! JavaRDD<Integer> lineLengths = lines.map(s -> s.length());! int totalLength = lineLengths.reduce((a, b) -> a + b);!
  • 15. What is our vision for Spark?
  • 16. 1. Unified Platform for Big Data Apps Batch Interactive Streaming Hadoop Cassandra Mesos … Uniform API for diverse workloads over diverse storage systems and runtimes … Cloud Providers …
  • 17. Why a Platform Matters Good for developers: one system to learn Good for users: take apps anywhere Good for distributors: more applications
  • 18. 2. Standard Library for Big Data Big data apps lack libraries" of common algorithms Spark’s generality + support" for multiple languages make it" suitable to offer this Python Scala Java R SQL ML graph Core … Much of future activity will be in these libraries
  • 19. Databricks & Spark At Databricks, we are working to keep Spark 100% open source and compatible across vendors All our work on Spark is at Apache Check out project-specific talks to see what’s next!
  • 20. Thank You and Enjoy Spark Summit!