SlideShare a Scribd company logo
APACHE SPARK
PREPARING FOR THE NEXT WAVE OF REACTIVE BIG DATA
74% Developers
8% Data Scientists
7% C-level execs
TOP 3 LANGUAGES
USED WITH SPARK
88% Scala
44% Java
22% Python
31%
are evaluating
Spark now
are running Spark
in production
13%
82%
of users chose
Spark to replace
MapReduce
78%
of users
need faster
processing
of larger
data sets
62%
of users load data into
Spark with Hadoop DFS
54%
of users
run Spark
standalone
67%
of users need
Spark for event
stream processing
20%
are planning to use
Spark in 2015
TOP 3 INDUSTRIES
RESPONDENTS
Telecoms, Banks, Retail
APACHE SPARK SURVEY 2015 - QUICK SNAPSHOT
3
JOB TYPE/ROLE
7.5%Data Scientist
6.5%C-Level Executive
3.5%Software Architect
3.5%Dev Ops
1% Business Analyst
74%Developer
6.5%Other
INDUSTRY FOCUS
33%Other
5%Consulting
4%Healthcare / Insurance
9%Advertising
10% Software / Technology
11%Retail
12%Banking / Finance
16% Telecommunications / Networks
Including Biotechnology/Chemistry,
Machinery, Education, Government
and Utilities and other sectors
4
INFRASTRUCTURE TECHNOLOGIES IN USE
53% Amazon EC2
34% Docker
22% Cloudera CDH
16% Ansible
14% Mesos
13% OpenStack
12% Apache.org Builds of Hadoop
10% HortonWorks HDP
10% Heroku
8% Google Compute Engine
7% Core OS
7% MapR Hadoop Distribution
6% Microsoft Azure
5% Marathon
4% Kubernetes
2% Aurora
11% Other XaaS
5
Evaluating
Spark now
Currently using
in production
Evaluated,
not planning to use
Evaluated,
will use in 2016 or later
Um, what’s Spark?
Planning to
use in 2015
31%
28%
20%
13%
6%
2%
CURRENT RELATIONSHIP WITH SPARK
6
Fast Batch
Processing of
Large Data Sets
78%
Support for
Event Stream
Processing
60%
Fast Data
Queries in
Real Time
56%
Improved
Programmer
Productivity
55%
BUSINESS GOALS IN MIND
7
SPARK FEATURES/MODULES IN DEMAND
25%
59%
65%
82%
51%
Core API as a
Replacement for
MapReduce
Streaming Library
(Spark Streaming)
Machine
Learning Library
(MLlib) Integrated SQL
(SparkSQL)
Graph
Algorithms Library
(GraphX)
8
DATA PROCESSING WITH SPARK
39%
41%
46%
46%
59%
61%
Read or Write Data to One or More Databases
Static Reports
SQL Queries and Business Intelligence
Write Data to Hadoop Distributed File System (HDFS)
Ad-hoc Queries and Reporting
ETL Data from External Sources
67% Event Stream Processing
71%
65%
40%
Use Spark as Part of a Larger Data Pipeline
Extract Information from Data Sooner Rather than Later
Automate Decision Making at Runtime
9
2nd
Java 44%
1st
Scala 88%
3rd
Python 22%
WHICH LANGUAGES ARE IMPORTANT TO YOUR SPARK INSTALLATION?
Honorable mentions: R, Clojure, Groovy, Ruby & Go
10
HOW DO YOU LOAD DATA INTO SPARK?
62% Hadoop Distributed
File System (HDFS)
18% Other Services
(e.g. over socket connection)
41% Apache Kafka
46% Databases
29% Amazon S3
12% Other*
*Including:
Apache Cassandra, Amazon
Kinesis and Apache HBase
11
Typesafe (Twitter: @Typesafe) is dedicated to helping developers build Reactive applications on the JVM. Backed by Greylock
Partners, Shasta Ventures, Bain Capital Ventures and Juniper Networks, Typesafe is headquartered in San Francisco with
offices in Switzerland and Sweden. To start building Reactive applications today, download Typesafe Activator.
© 2015 Typesafe
Hello, Apache Spark!
Typesafe Activator template for devs
DOWNLOAD
Get the FULL report
(PDF)
DOWNLOAD

More Related Content

What's hot (20)

PPTX
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
PDF
The Next Generation of Data Processing and Open Source
DataWorks Summit/Hadoop Summit
 
PPTX
Log I am your father
DataWorks Summit/Hadoop Summit
 
PDF
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
PDF
H2O Advancements - Arno Candel
Sri Ambati
 
PPTX
[Strata] Sparkta
Stratio
 
PPTX
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
PDF
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
PDF
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Databricks
 
PPTX
Spark Streaming the Industrial IoT
Jim Haughwout
 
PPTX
Real Time Machine Learning Visualization with Spark
DataWorks Summit/Hadoop Summit
 
PDF
Shifting Data Science into High Gear
Spark Summit
 
PPTX
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Sri Ambati
 
PDF
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
PPTX
What’s new in Apache Spark 2.3
DataWorks Summit
 
PDF
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Turkish Testing Board
 
PPTX
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
PDF
Insights Without Tradeoffs: Using Structured Streaming
Databricks
 
PDF
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Spark Summit
 
PPTX
Apache Flink community Update for March 2016 - Slim Baltagi
Slim Baltagi
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
The Next Generation of Data Processing and Open Source
DataWorks Summit/Hadoop Summit
 
Log I am your father
DataWorks Summit/Hadoop Summit
 
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
H2O Advancements - Arno Candel
Sri Ambati
 
[Strata] Sparkta
Stratio
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Databricks
 
Spark Streaming the Industrial IoT
Jim Haughwout
 
Real Time Machine Learning Visualization with Spark
DataWorks Summit/Hadoop Summit
 
Shifting Data Science into High Gear
Spark Summit
 
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Sri Ambati
 
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
What’s new in Apache Spark 2.3
DataWorks Summit
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Turkish Testing Board
 
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
Insights Without Tradeoffs: Using Structured Streaming
Databricks
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Spark Summit
 
Apache Flink community Update for March 2016 - Slim Baltagi
Slim Baltagi
 

Viewers also liked (7)

PDF
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Databricks
 
PDF
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Spark Summit
 
PPTX
Parallelizing Existing R Packages with SparkR
Databricks
 
PDF
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
PDF
Map reduce vs spark
Tudor Lapusan
 
PPT
Hadoop MapReduce Fundamentals
Lynn Langit
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Databricks
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Spark Summit
 
Parallelizing Existing R Packages with SparkR
Databricks
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Map reduce vs spark
Tudor Lapusan
 
Hadoop MapReduce Fundamentals
Lynn Langit
 
Ad

Similar to [Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data (20)

PDF
Started with-apache-spark
Happiest Minds Technologies
 
PPTX
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Slim Baltagi
 
PDF
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
PDF
2016 spark survey
Abhishek Choudhary
 
PDF
BDTC2015 databricks-辛湜-state of spark
Jerry Wen
 
PPTX
Spark
Srinath Reddy
 
PDF
spark_v1_2
Frank Schroeter
 
PDF
Apache Spark and Python: unified Big Data analytics
Julien Anguenot
 
PPTX
Spark for big data analytics
Edureka!
 
PPTX
Spark SQL
Caserta
 
PDF
20160512 apache-spark-for-everyone
Amanda Casari
 
PDF
Dev Ops Training
Spark Summit
 
PPTX
Apache Spark Fundamentals
Zahra Eskandari
 
PPTX
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
 
PDF
Spark Concepts Cheat Sheet_Interview_Question.pdf
aekannake
 
PDF
Apache Spark PDF
Naresh Rupareliya
 
PDF
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
PPTX
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
MLconf
 
PPTX
Atlanta MLConf
Qubole
 
PPT
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
Started with-apache-spark
Happiest Minds Technologies
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Slim Baltagi
 
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
2016 spark survey
Abhishek Choudhary
 
BDTC2015 databricks-辛湜-state of spark
Jerry Wen
 
spark_v1_2
Frank Schroeter
 
Apache Spark and Python: unified Big Data analytics
Julien Anguenot
 
Spark for big data analytics
Edureka!
 
Spark SQL
Caserta
 
20160512 apache-spark-for-everyone
Amanda Casari
 
Dev Ops Training
Spark Summit
 
Apache Spark Fundamentals
Zahra Eskandari
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
 
Spark Concepts Cheat Sheet_Interview_Question.pdf
aekannake
 
Apache Spark PDF
Naresh Rupareliya
 
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
MLconf
 
Atlanta MLConf
Qubole
 
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
Ad

More from Legacy Typesafe (now Lightbend) (16)

PDF
The How and Why of Fast Data Analytics with Apache Spark
Legacy Typesafe (now Lightbend)
 
PDF
Reactive Design Patterns
Legacy Typesafe (now Lightbend)
 
PDF
Revitalizing Aging Architectures with Microservices
Legacy Typesafe (now Lightbend)
 
PPTX
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Legacy Typesafe (now Lightbend)
 
PPTX
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
PDF
How to deploy Apache Spark 
to Mesos/DCOS
Legacy Typesafe (now Lightbend)
 
PDF
Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...
Legacy Typesafe (now Lightbend)
 
PPTX
Akka 2.4 plus commercial features in Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
PPTX
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Legacy Typesafe (now Lightbend)
 
PDF
Microservices 101: Exploiting Reality's Constraints with Technology
Legacy Typesafe (now Lightbend)
 
PDF
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
PPTX
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
Legacy Typesafe (now Lightbend)
 
PPTX
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Legacy Typesafe (now Lightbend)
 
PDF
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Legacy Typesafe (now Lightbend)
 
PPTX
Going Reactive in Java with Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
PPTX
Why Play Framework is fast
Legacy Typesafe (now Lightbend)
 
The How and Why of Fast Data Analytics with Apache Spark
Legacy Typesafe (now Lightbend)
 
Reactive Design Patterns
Legacy Typesafe (now Lightbend)
 
Revitalizing Aging Architectures with Microservices
Legacy Typesafe (now Lightbend)
 
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Legacy Typesafe (now Lightbend)
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
How to deploy Apache Spark 
to Mesos/DCOS
Legacy Typesafe (now Lightbend)
 
Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...
Legacy Typesafe (now Lightbend)
 
Akka 2.4 plus commercial features in Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Legacy Typesafe (now Lightbend)
 
Microservices 101: Exploiting Reality's Constraints with Technology
Legacy Typesafe (now Lightbend)
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
Legacy Typesafe (now Lightbend)
 
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Legacy Typesafe (now Lightbend)
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Legacy Typesafe (now Lightbend)
 
Going Reactive in Java with Typesafe Reactive Platform
Legacy Typesafe (now Lightbend)
 
Why Play Framework is fast
Legacy Typesafe (now Lightbend)
 

Recently uploaded (20)

PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
Executive Business Intelligence Dashboards
vandeslie24
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 

[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

  • 1. APACHE SPARK PREPARING FOR THE NEXT WAVE OF REACTIVE BIG DATA
  • 2. 74% Developers 8% Data Scientists 7% C-level execs TOP 3 LANGUAGES USED WITH SPARK 88% Scala 44% Java 22% Python 31% are evaluating Spark now are running Spark in production 13% 82% of users chose Spark to replace MapReduce 78% of users need faster processing of larger data sets 62% of users load data into Spark with Hadoop DFS 54% of users run Spark standalone 67% of users need Spark for event stream processing 20% are planning to use Spark in 2015 TOP 3 INDUSTRIES RESPONDENTS Telecoms, Banks, Retail APACHE SPARK SURVEY 2015 - QUICK SNAPSHOT
  • 3. 3 JOB TYPE/ROLE 7.5%Data Scientist 6.5%C-Level Executive 3.5%Software Architect 3.5%Dev Ops 1% Business Analyst 74%Developer 6.5%Other INDUSTRY FOCUS 33%Other 5%Consulting 4%Healthcare / Insurance 9%Advertising 10% Software / Technology 11%Retail 12%Banking / Finance 16% Telecommunications / Networks Including Biotechnology/Chemistry, Machinery, Education, Government and Utilities and other sectors
  • 4. 4 INFRASTRUCTURE TECHNOLOGIES IN USE 53% Amazon EC2 34% Docker 22% Cloudera CDH 16% Ansible 14% Mesos 13% OpenStack 12% Apache.org Builds of Hadoop 10% HortonWorks HDP 10% Heroku 8% Google Compute Engine 7% Core OS 7% MapR Hadoop Distribution 6% Microsoft Azure 5% Marathon 4% Kubernetes 2% Aurora 11% Other XaaS
  • 5. 5 Evaluating Spark now Currently using in production Evaluated, not planning to use Evaluated, will use in 2016 or later Um, what’s Spark? Planning to use in 2015 31% 28% 20% 13% 6% 2% CURRENT RELATIONSHIP WITH SPARK
  • 6. 6 Fast Batch Processing of Large Data Sets 78% Support for Event Stream Processing 60% Fast Data Queries in Real Time 56% Improved Programmer Productivity 55% BUSINESS GOALS IN MIND
  • 7. 7 SPARK FEATURES/MODULES IN DEMAND 25% 59% 65% 82% 51% Core API as a Replacement for MapReduce Streaming Library (Spark Streaming) Machine Learning Library (MLlib) Integrated SQL (SparkSQL) Graph Algorithms Library (GraphX)
  • 8. 8 DATA PROCESSING WITH SPARK 39% 41% 46% 46% 59% 61% Read or Write Data to One or More Databases Static Reports SQL Queries and Business Intelligence Write Data to Hadoop Distributed File System (HDFS) Ad-hoc Queries and Reporting ETL Data from External Sources 67% Event Stream Processing 71% 65% 40% Use Spark as Part of a Larger Data Pipeline Extract Information from Data Sooner Rather than Later Automate Decision Making at Runtime
  • 9. 9 2nd Java 44% 1st Scala 88% 3rd Python 22% WHICH LANGUAGES ARE IMPORTANT TO YOUR SPARK INSTALLATION? Honorable mentions: R, Clojure, Groovy, Ruby & Go
  • 10. 10 HOW DO YOU LOAD DATA INTO SPARK? 62% Hadoop Distributed File System (HDFS) 18% Other Services (e.g. over socket connection) 41% Apache Kafka 46% Databases 29% Amazon S3 12% Other* *Including: Apache Cassandra, Amazon Kinesis and Apache HBase
  • 11. 11 Typesafe (Twitter: @Typesafe) is dedicated to helping developers build Reactive applications on the JVM. Backed by Greylock Partners, Shasta Ventures, Bain Capital Ventures and Juniper Networks, Typesafe is headquartered in San Francisco with offices in Switzerland and Sweden. To start building Reactive applications today, download Typesafe Activator. © 2015 Typesafe Hello, Apache Spark! Typesafe Activator template for devs DOWNLOAD Get the FULL report (PDF) DOWNLOAD