SlideShare a Scribd company logo
Sorry for the Delay
• There were some technical difficulties, so we are giving folks a
few more minutes to join
• Again – sorry for the dely 
© 2014 DataStax, All Rights Reserved. Company Confidential 1
Big Data Analytics with Spark
All attendees
placed on mute
Input questions at any time
using the online interface
Webinar Housekeeping
Big Data Analytics with
Cassandra and Spark
Brian Hess
Sr. Product Manager for Analytics
DataStax
© 2014 DataStax, All Rights Reserved. Company Confidential 5
© 2014 DataStax, All Rights Reserved. Company Confidential 6
Willie Sutton
Bank Robber in the 1930s-1950s
FBI Most Wanted List 1950
Captured in 1952
© 2014 DataStax, All Rights Reserved. Company Confidential 7
Willie Sutton
When asked
“Why do you rob banks?”
“Because that’s where the
money is.”
Motivating Use Case
Internet of Things
© 2014 DataStax, All Rights Reserved. Company Confidential 8
Your
System
Motivating Use Case
Internet of Things
© 2014 DataStax, All Rights Reserved. Company Confidential 9
Your
System
Motivating Use Case
Internet of Things
© 2014 DataStax, All Rights Reserved. Company Confidential 10
Your
SystemFAULT
© 2014 DataStax, All Rights Reserved. Company Confidential
Cassandra
Spark
Spark + Cassandra
11
Apache Cassandra
• Distributed NoSQL database
– BigTable meets Dynamo
• All nodes are equal
– Always on
– Linear scale out - a lot
• More data
• More transactions
• Multi-Datacenter
– Geographic or Workload
• Cassandra Query Language
– SQL-like
© 2014 DataStax, All Rights Reserved. Company Confidential 12
200,000
txns/sec
100,000
txns/sec
400,000
txns/sec
How Cassandra Works – Writes
© 2014 DataStax, All Rights Reserved. Company Confidential 13
It’s 72°
How Cassandra Works – Writes
© 2014 DataStax, All Rights Reserved. Company Confidential 14
It’s 72°
How Cassandra Works – Writes
© 2014 DataStax, All Rights Reserved. Company Confidential 15
Done
How Cassandra Works – Writes
© 2014 DataStax, All Rights Reserved. Company Confidential 16
Done
Tunable Consistency
• Relax the Consistency in ACID
– Isn’t always needed – and isn’t guaranteed anyway (in distributed DBs)
– Reads my not get the most up-to-date data – but almost always will
• All data is replicated
– Set in the schema
– Distributed to nodes by Token Range
• Options:
– QUORUM, ONE, ALL
• Can ensure reads get most up-to-date value
– E.g. – read/write at QUORUM
© 2014 DataStax, All Rights Reserved. Company Confidential 17
How Cassandra Works – Tunable Consistency
© 2014 DataStax, All Rights Reserved. Company Confidential 18
You got it.
I’ll make sure
everyone gets it.
You got it.
A majority got it.
The rest will.
You got it.
One guy got it.
The rest will.
You got it.
Everyone has it.
How Cassandra Works – Query
© 2014 DataStax, All Rights Reserved. Company Confidential 19
SELECT user_id
FROM users
WHERE name =
‘PBCupFan’;
How Cassandra Works – Query
© 2014 DataStax, All Rights Reserved. Company Confidential 20
Sure Thing, Let me
get that for you.
SELECT user_id
FROM users
WHERE name =
‘PBCupFan’;
How Cassandra Works – Query
© 2014 DataStax, All Rights Reserved. Company Confidential 21
What do you guys
have for PBCup?
SELECT user_id
FROM users
WHERE name =
‘PBCupFan’;
How Cassandra Works – Query
© 2014 DataStax, All Rights Reserved. Company Confidential 22
Here’s what I have:
Here’s what I have:
SELECT user_id
FROM users
WHERE name =
‘PBCupFan’;
How Cassandra Works – Query
© 2014 DataStax, All Rights Reserved. Company Confidential 23
Let me resolve
any conflicts
SELECT user_id
FROM users
WHERE name =
‘PBCupFan’;
How Cassandra Works – Query
© 2014 DataStax, All Rights Reserved. Company Confidential 24
Here ya go!
user_id
---------
1234
(1 rows)
Cassandra for Internet of Things
It’s all about scaling
© 2014 DataStax, All Rights Reserved. Company Confidential 25
Cassandra for Internet of Things
It’s all about scaling
© 2014 DataStax, All Rights Reserved. Company Confidential 26
Cassandra for Internet of Things
It’s all about scaling
© 2014 DataStax, All Rights Reserved. Company Confidential 27
Cassandra
• Always On
– No down time
• Linear Scalability
– For writes or reads
– For data size
© 2014 DataStax, All Rights Reserved. Company Confidential 28
• Terrific choice for Internet of Things, Web, Mobile, etc.
– British Gas, Nike, etc – Thermostats, Manufacturing, Oil/Gas, etc
It’s where the data is!
Cassandra Limitations
• No aggregations
– Optimized for lookups & writes
– No GROUP BYs
– No Windowed Aggregates
• No Joins
– Data model to avoid
• Must select by partition key
– There are secondary indexes
• But they are an antipattern
• Not optimized for full-table
scans
© 2014 DataStax, All Rights Reserved. Company Confidential 29
It actually can’t do everything 
Apache Spark
• Distributed computing framework
• Generalized DAG execution
• Easy Abstraction for Datasets
• Integrated SQL Queries
• Streaming
• Machine Learning Library
© 2014 DataStax, All Rights Reserved. Company Confidential 30
Spark Components
© 2014 DataStax, All Rights Reserved. Company Confidential 31
Spark Core Engine
Spark SQL Spark
Streaming
MLlib GraphX Spark R
Spark Components
© 2014 DataStax, All Rights Reserved. Company Confidential 32
Spark Provides a Simple and Efficient
framework for Distributed Computations
Node Roles 2
In Memory Caching Yes!
Generic DAG Execution Yes!
Great Abstraction For Datasets?
Dataframe!
(previously Resilient Distributed Dataset (RDD))
Spark
Master
Spark
Worker
Spark
Worker
Spark
WorkerSpark Executor
Spark Partition
Dataframe
(or RDD)
Spark Provides a Simple and Efficient
framework for Distributed Computations
Spark Master: Assigns cluster resources to applications
Spark Worker: Manages executors running on a machine
Spark Executor: Started by Worker - Workhorse of the spark application
Spark
Master
Spark
Worker
Spark
Worker
Spark
WorkerSpark Executor
Spark Partition
Dataframe
(or RDD)
Spark Provides a Simple and Efficient
framework for Distributed Computations
Spark Master: Assigns cluster resources to applications
Spark Worker: Manages executors running on a machine
Spark Executor: Started by Worker - Workhorse of the spark application
Spark
Master
Spark
Worker
Spark
Worker
Spark
WorkerSpark Executor
Spark Partition
Dataframe
(or RDD)
Spark Provides a Simple and Efficient
framework for Distributed Computations
Spark Master: Assigns cluster resources to applications
Spark Worker: Manages executors running on a machine
Spark Executor: Started by Worker - Workhorse of the spark application
Spark
Master
Spark
Worker
Spark
Worker
Spark
WorkerSpark Executor
Spark Partition
Dataframe
(or RDD)
RDDs Can be Generated from a
Variety of Sources
Textfiles
Parallelized Collections
RDDs Can be Generated from a
Variety of Sources
Textfiles
Parallelized Collections
Big Data Analytics with Spark
Spark on Cassandra
© 2014 DataStax, All Rights Reserved. Company Confidential 40
Spark Core Engine
Spark SQL Spark
Streaming
MLlib GraphX Spark R
Cassandra
DataStax Spark-Cassandra Connector
Spark Cassandra Connector uses the DataStax
Java Driver to Read from and Write to Cassandra
Each Executor Maintains a
connection to the C* Cluster
Spark Executor
DataStax
Java Driver
Tokens 1-1000
Tokens 1001 -2000
Tokens …
RDD’s read into
different splits based
on sets of tokens
C*
Full Token
Range
© 2014 DataStax, All Rights Reserved. Company Confidential 42
Co-locate Spark and C* for Best Performance
• Run Cassandra and
Spark on same nodes
• Local reads/writes
• Increased performance
© 2014 DataStax, All Rights Reserved. Company Confidential 43
Things you can’t do in Cassandra
– Using SparkSQL
• JOINs
sc.sql("SELECT t.sensor_id, t.temp, m.location
FROM ks.temperatures t JOIN ks.metadata m
ON t.sensor_id = m.sensor_id
WHERE t.sensor_id = 12345");
• Aggregates
sc.sql("SELECT sensor_id, year, month, MAX(temp) mtemp
FROM ks.temperatures
GROUP BY sensor_id, year, month");
© 2014 DataStax, All Rights Reserved. Company Confidential 44
Things you can’t do in Cassandra
– External Data
• JOIN with HDFS data
val temp2014 = sc.textFile("webhdfs://myhadoop/data/temp2014.csv").
map(x=>x.split(",")).
map(x=>((x(0).toInt, x(1).toInt, x(2).toInt),
x(3).toDouble))
val temp2015 = sc.cassandraTable("ks", "temperatures").
map(x=>((x.getInt("sensor_id"), x.getInt("year"), x.getInt("month")),
x.getDouble("avgTemp")))
val hotter = temp2015.join(temp2014).filter(x => x._2._1._1 > x._2._2._1)
• Non-Partition Key Predicates
csc.sql("SELECT * FROM ks.temperatures WHERE temp > 100")
© 2014 DataStax, All Rights Reserved. Company Confidential 45
Tools
• ODBC and JDBC tools via SparkSQL
– Tableau, Pentaho, R, etc
• Apache Zeppelin (incubating)
A web-based notebook
that enables interactive data
analytics.
© 2014 DataStax, All Rights Reserved. Company Confidential 46
Quick word on Spark Streaming and Cassandra
• Very good combination
– Simple, powerful, useful, scalable, etc, etc, etc.
© 2014 DataStax, All Rights Reserved. Company Confidential 47
Receiver
Quick word on Spark Streaming and Cassandra
© 2014 DataStax, All Rights Reserved. Company Confidential 48
import com.datastax.spark.connector.streaming._
// Spark connection options
val conf = new SparkConf(true)...
// streaming with 1 second batch window
val ssc = new StreamingContext(conf, Seconds(1))
// stream input
val lines = ssc.socketTextStream(serverIP, serverPort)
// count words
val wordCounts = lines.flatMap(_.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
// stream output
wordCounts.saveToCassandra("test", "words")
// start processing
ssc.start()
ssc.awaitTermination()
DataStax Enterprise
© 2014 DataStax, All Rights Reserved. Company Confidential 49
Combines Cassandra,
Spark, and Solr (and more!)
- Fault Tolerance
- Management
- Visual Monitoring
- Security
- ETC!
Motivating Use Case
Internet of Things
© 2014 DataStax, All Rights Reserved. Company Confidential 50
Cassandra + Spark
• Unleash the power of analytics
• On your operational data
– IoT, Web, Mobile, etc
© 2014 DataStax, All Rights Reserved. Company Confidential 51
“Because that’s where
the Data is.”
Contacts and Links
• Links
– Cassandra Summit: http://cassandrasummit-datastax.com/
– DataStax Academy: https://academy.datastax.com/
• Contacts
– Kevin Pardue, Regional Channel Manager: kevin.pardue@datastax.com
– Brian Hess, Sr Product Manager for Analytics: brian.hess@datastax.com
– Devin Saxon, Marketing Specialist: dsaxon@datastax.com
© 2014 DataStax, All Rights Reserved. Company Confidential 52
© 2014 DataStax, All Rights Reserved. Company Confidential 53

More Related Content

What's hot (20)

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
Shu-Jeng Hsieh
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
Szilveszter Molnár
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
Yahoo Developer Network
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For Scale
Helena Edelson
 
Spark
SparkSpark
Spark
fatemehjamalii
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
Codemotion Dubai
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Databricks
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Spark Summit
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
DataStax
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
DataStax
 
Open Source Reliability for Data Lake with Apache Spark by Michael Armbrust
Open Source Reliability for Data Lake with Apache Spark by Michael ArmbrustOpen Source Reliability for Data Lake with Apache Spark by Michael Armbrust
Open Source Reliability for Data Lake with Apache Spark by Michael Armbrust
Data Con LA
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
Helena Edelson
 
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDBMonitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Leandro Totino Pereira
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
Cloudera, Inc.
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
Yahoo Developer Network
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For Scale
Helena Edelson
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
Codemotion Dubai
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Databricks
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Spark Summit
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
DataStax
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
DataStax
 
Open Source Reliability for Data Lake with Apache Spark by Michael Armbrust
Open Source Reliability for Data Lake with Apache Spark by Michael ArmbrustOpen Source Reliability for Data Lake with Apache Spark by Michael Armbrust
Open Source Reliability for Data Lake with Apache Spark by Michael Armbrust
Data Con LA
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
Helena Edelson
 
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDBMonitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Leandro Totino Pereira
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
Cloudera, Inc.
 

Viewers also liked (14)

OAuth-as-a-service using ASP.NET Web API and Windows Azure Access Control
OAuth-as-a-serviceusing ASP.NET Web API and Windows Azure Access ControlOAuth-as-a-serviceusing ASP.NET Web API and Windows Azure Access Control
OAuth-as-a-service using ASP.NET Web API and Windows Azure Access Control
Maarten Balliauw
 
The Full Power of ASP.NET Web API
The Full Power of ASP.NET Web APIThe Full Power of ASP.NET Web API
The Full Power of ASP.NET Web API
Eyal Vardi
 
ASP.NET Mvc 4 web api
ASP.NET Mvc 4 web apiASP.NET Mvc 4 web api
ASP.NET Mvc 4 web api
Tiago Knoch
 
Web API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonWeb API or WCF - An Architectural Comparison
Web API or WCF - An Architectural Comparison
Adnan Masood
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
Paco Nathan
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
David Pittman
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
Marko Rodriguez
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
Daniel Tunkelang
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
Matthias Broecheler
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
Prof. Dr. Diego Kuonen
 
Introduction to R for Data Mining
Introduction to R for Data MiningIntroduction to R for Data Mining
Introduction to R for Data Mining
Revolution Analytics
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
DataStax
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
Bernard Marr
 
OAuth-as-a-service using ASP.NET Web API and Windows Azure Access Control
OAuth-as-a-serviceusing ASP.NET Web API and Windows Azure Access ControlOAuth-as-a-serviceusing ASP.NET Web API and Windows Azure Access Control
OAuth-as-a-service using ASP.NET Web API and Windows Azure Access Control
Maarten Balliauw
 
The Full Power of ASP.NET Web API
The Full Power of ASP.NET Web APIThe Full Power of ASP.NET Web API
The Full Power of ASP.NET Web API
Eyal Vardi
 
ASP.NET Mvc 4 web api
ASP.NET Mvc 4 web apiASP.NET Mvc 4 web api
ASP.NET Mvc 4 web api
Tiago Knoch
 
Web API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonWeb API or WCF - An Architectural Comparison
Web API or WCF - An Architectural Comparison
Adnan Masood
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
Paco Nathan
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
David Pittman
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
Marko Rodriguez
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
Daniel Tunkelang
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
Matthias Broecheler
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
Prof. Dr. Diego Kuonen
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
DataStax
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 

Similar to Big Data Analytics with Spark (20)

Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
Victor Coustenoble
 
Data Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda Moran
Data Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda MoranData Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda Moran
Data Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda Moran
Data Con LA
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
Jon Haddad
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
Jim Hatcher
 
How to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - DatastaxHow to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - Datastax
DataStax
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Data Con LA
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016
Duyhai Doan
 
Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and Spark
Tim Vincent
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Robert Stupp
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
Victor Coustenoble
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
Frens Jan Rumph
 
Announcing Spark Driver for Cassandra
Announcing Spark Driver for CassandraAnnouncing Spark Driver for Cassandra
Announcing Spark Driver for Cassandra
DataStax
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
Victor Coustenoble
 
Data Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda Moran
Data Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda MoranData Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda Moran
Data Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda Moran
Data Con LA
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
Jon Haddad
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
Jim Hatcher
 
How to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - DatastaxHow to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - Datastax
DataStax
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Data Con LA
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016
Duyhai Doan
 
Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and Spark
Tim Vincent
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Robert Stupp
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
Victor Coustenoble
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
Frens Jan Rumph
 
Announcing Spark Driver for Cassandra
Announcing Spark Driver for CassandraAnnouncing Spark Driver for Cassandra
Announcing Spark Driver for Cassandra
DataStax
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
DataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
DataStax Academy
 

Recently uploaded (20)

Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure ModesCognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Dr. Tathagat Varma
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AISAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
Peter Spielvogel
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Security Operations and the Defense Analyst - Splunk Certificate
Security Operations and the Defense Analyst - Splunk CertificateSecurity Operations and the Defense Analyst - Splunk Certificate
Security Operations and the Defense Analyst - Splunk Certificate
VICTOR MAESTRE RAMIREZ
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Kubernetes Cloud Native Indonesia Meetup - May 2025
Kubernetes Cloud Native Indonesia Meetup - May 2025Kubernetes Cloud Native Indonesia Meetup - May 2025
Kubernetes Cloud Native Indonesia Meetup - May 2025
Prasta Maha
 
What is DePIN? The Hottest Trend in Web3 Right Now!
What is DePIN? The Hottest Trend in Web3 Right Now!What is DePIN? The Hottest Trend in Web3 Right Now!
What is DePIN? The Hottest Trend in Web3 Right Now!
cryptouniversityoffi
 
Splunk Leadership Forum Wien - 20.05.2025
Splunk Leadership Forum Wien - 20.05.2025Splunk Leadership Forum Wien - 20.05.2025
Splunk Leadership Forum Wien - 20.05.2025
Splunk
 
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCPMCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
Sambhav Kothari
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
A Comprehensive Guide on Integrating Monoova Payment Gateway
A Comprehensive Guide on Integrating Monoova Payment GatewayA Comprehensive Guide on Integrating Monoova Payment Gateway
A Comprehensive Guide on Integrating Monoova Payment Gateway
danielle hunter
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure ModesCognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Dr. Tathagat Varma
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AISAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
Peter Spielvogel
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Security Operations and the Defense Analyst - Splunk Certificate
Security Operations and the Defense Analyst - Splunk CertificateSecurity Operations and the Defense Analyst - Splunk Certificate
Security Operations and the Defense Analyst - Splunk Certificate
VICTOR MAESTRE RAMIREZ
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Kubernetes Cloud Native Indonesia Meetup - May 2025
Kubernetes Cloud Native Indonesia Meetup - May 2025Kubernetes Cloud Native Indonesia Meetup - May 2025
Kubernetes Cloud Native Indonesia Meetup - May 2025
Prasta Maha
 
What is DePIN? The Hottest Trend in Web3 Right Now!
What is DePIN? The Hottest Trend in Web3 Right Now!What is DePIN? The Hottest Trend in Web3 Right Now!
What is DePIN? The Hottest Trend in Web3 Right Now!
cryptouniversityoffi
 
Splunk Leadership Forum Wien - 20.05.2025
Splunk Leadership Forum Wien - 20.05.2025Splunk Leadership Forum Wien - 20.05.2025
Splunk Leadership Forum Wien - 20.05.2025
Splunk
 
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCPMCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
Sambhav Kothari
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
A Comprehensive Guide on Integrating Monoova Payment Gateway
A Comprehensive Guide on Integrating Monoova Payment GatewayA Comprehensive Guide on Integrating Monoova Payment Gateway
A Comprehensive Guide on Integrating Monoova Payment Gateway
danielle hunter
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 

Big Data Analytics with Spark

  • 1. Sorry for the Delay • There were some technical difficulties, so we are giving folks a few more minutes to join • Again – sorry for the dely  © 2014 DataStax, All Rights Reserved. Company Confidential 1
  • 3. All attendees placed on mute Input questions at any time using the online interface Webinar Housekeeping
  • 4. Big Data Analytics with Cassandra and Spark Brian Hess Sr. Product Manager for Analytics DataStax
  • 5. © 2014 DataStax, All Rights Reserved. Company Confidential 5
  • 6. © 2014 DataStax, All Rights Reserved. Company Confidential 6 Willie Sutton Bank Robber in the 1930s-1950s FBI Most Wanted List 1950 Captured in 1952
  • 7. © 2014 DataStax, All Rights Reserved. Company Confidential 7 Willie Sutton When asked “Why do you rob banks?” “Because that’s where the money is.”
  • 8. Motivating Use Case Internet of Things © 2014 DataStax, All Rights Reserved. Company Confidential 8 Your System
  • 9. Motivating Use Case Internet of Things © 2014 DataStax, All Rights Reserved. Company Confidential 9 Your System
  • 10. Motivating Use Case Internet of Things © 2014 DataStax, All Rights Reserved. Company Confidential 10 Your SystemFAULT
  • 11. © 2014 DataStax, All Rights Reserved. Company Confidential Cassandra Spark Spark + Cassandra 11
  • 12. Apache Cassandra • Distributed NoSQL database – BigTable meets Dynamo • All nodes are equal – Always on – Linear scale out - a lot • More data • More transactions • Multi-Datacenter – Geographic or Workload • Cassandra Query Language – SQL-like © 2014 DataStax, All Rights Reserved. Company Confidential 12 200,000 txns/sec 100,000 txns/sec 400,000 txns/sec
  • 13. How Cassandra Works – Writes © 2014 DataStax, All Rights Reserved. Company Confidential 13 It’s 72°
  • 14. How Cassandra Works – Writes © 2014 DataStax, All Rights Reserved. Company Confidential 14 It’s 72°
  • 15. How Cassandra Works – Writes © 2014 DataStax, All Rights Reserved. Company Confidential 15 Done
  • 16. How Cassandra Works – Writes © 2014 DataStax, All Rights Reserved. Company Confidential 16 Done
  • 17. Tunable Consistency • Relax the Consistency in ACID – Isn’t always needed – and isn’t guaranteed anyway (in distributed DBs) – Reads my not get the most up-to-date data – but almost always will • All data is replicated – Set in the schema – Distributed to nodes by Token Range • Options: – QUORUM, ONE, ALL • Can ensure reads get most up-to-date value – E.g. – read/write at QUORUM © 2014 DataStax, All Rights Reserved. Company Confidential 17
  • 18. How Cassandra Works – Tunable Consistency © 2014 DataStax, All Rights Reserved. Company Confidential 18 You got it. I’ll make sure everyone gets it. You got it. A majority got it. The rest will. You got it. One guy got it. The rest will. You got it. Everyone has it.
  • 19. How Cassandra Works – Query © 2014 DataStax, All Rights Reserved. Company Confidential 19 SELECT user_id FROM users WHERE name = ‘PBCupFan’;
  • 20. How Cassandra Works – Query © 2014 DataStax, All Rights Reserved. Company Confidential 20 Sure Thing, Let me get that for you. SELECT user_id FROM users WHERE name = ‘PBCupFan’;
  • 21. How Cassandra Works – Query © 2014 DataStax, All Rights Reserved. Company Confidential 21 What do you guys have for PBCup? SELECT user_id FROM users WHERE name = ‘PBCupFan’;
  • 22. How Cassandra Works – Query © 2014 DataStax, All Rights Reserved. Company Confidential 22 Here’s what I have: Here’s what I have: SELECT user_id FROM users WHERE name = ‘PBCupFan’;
  • 23. How Cassandra Works – Query © 2014 DataStax, All Rights Reserved. Company Confidential 23 Let me resolve any conflicts SELECT user_id FROM users WHERE name = ‘PBCupFan’;
  • 24. How Cassandra Works – Query © 2014 DataStax, All Rights Reserved. Company Confidential 24 Here ya go! user_id --------- 1234 (1 rows)
  • 25. Cassandra for Internet of Things It’s all about scaling © 2014 DataStax, All Rights Reserved. Company Confidential 25
  • 26. Cassandra for Internet of Things It’s all about scaling © 2014 DataStax, All Rights Reserved. Company Confidential 26
  • 27. Cassandra for Internet of Things It’s all about scaling © 2014 DataStax, All Rights Reserved. Company Confidential 27
  • 28. Cassandra • Always On – No down time • Linear Scalability – For writes or reads – For data size © 2014 DataStax, All Rights Reserved. Company Confidential 28 • Terrific choice for Internet of Things, Web, Mobile, etc. – British Gas, Nike, etc – Thermostats, Manufacturing, Oil/Gas, etc It’s where the data is!
  • 29. Cassandra Limitations • No aggregations – Optimized for lookups & writes – No GROUP BYs – No Windowed Aggregates • No Joins – Data model to avoid • Must select by partition key – There are secondary indexes • But they are an antipattern • Not optimized for full-table scans © 2014 DataStax, All Rights Reserved. Company Confidential 29 It actually can’t do everything 
  • 30. Apache Spark • Distributed computing framework • Generalized DAG execution • Easy Abstraction for Datasets • Integrated SQL Queries • Streaming • Machine Learning Library © 2014 DataStax, All Rights Reserved. Company Confidential 30
  • 31. Spark Components © 2014 DataStax, All Rights Reserved. Company Confidential 31 Spark Core Engine Spark SQL Spark Streaming MLlib GraphX Spark R
  • 32. Spark Components © 2014 DataStax, All Rights Reserved. Company Confidential 32
  • 33. Spark Provides a Simple and Efficient framework for Distributed Computations Node Roles 2 In Memory Caching Yes! Generic DAG Execution Yes! Great Abstraction For Datasets? Dataframe! (previously Resilient Distributed Dataset (RDD)) Spark Master Spark Worker Spark Worker Spark WorkerSpark Executor Spark Partition Dataframe (or RDD)
  • 34. Spark Provides a Simple and Efficient framework for Distributed Computations Spark Master: Assigns cluster resources to applications Spark Worker: Manages executors running on a machine Spark Executor: Started by Worker - Workhorse of the spark application Spark Master Spark Worker Spark Worker Spark WorkerSpark Executor Spark Partition Dataframe (or RDD)
  • 35. Spark Provides a Simple and Efficient framework for Distributed Computations Spark Master: Assigns cluster resources to applications Spark Worker: Manages executors running on a machine Spark Executor: Started by Worker - Workhorse of the spark application Spark Master Spark Worker Spark Worker Spark WorkerSpark Executor Spark Partition Dataframe (or RDD)
  • 36. Spark Provides a Simple and Efficient framework for Distributed Computations Spark Master: Assigns cluster resources to applications Spark Worker: Manages executors running on a machine Spark Executor: Started by Worker - Workhorse of the spark application Spark Master Spark Worker Spark Worker Spark WorkerSpark Executor Spark Partition Dataframe (or RDD)
  • 37. RDDs Can be Generated from a Variety of Sources Textfiles Parallelized Collections
  • 38. RDDs Can be Generated from a Variety of Sources Textfiles Parallelized Collections
  • 40. Spark on Cassandra © 2014 DataStax, All Rights Reserved. Company Confidential 40 Spark Core Engine Spark SQL Spark Streaming MLlib GraphX Spark R Cassandra DataStax Spark-Cassandra Connector
  • 41. Spark Cassandra Connector uses the DataStax Java Driver to Read from and Write to Cassandra Each Executor Maintains a connection to the C* Cluster Spark Executor DataStax Java Driver Tokens 1-1000 Tokens 1001 -2000 Tokens … RDD’s read into different splits based on sets of tokens C* Full Token Range
  • 42. © 2014 DataStax, All Rights Reserved. Company Confidential 42
  • 43. Co-locate Spark and C* for Best Performance • Run Cassandra and Spark on same nodes • Local reads/writes • Increased performance © 2014 DataStax, All Rights Reserved. Company Confidential 43
  • 44. Things you can’t do in Cassandra – Using SparkSQL • JOINs sc.sql("SELECT t.sensor_id, t.temp, m.location FROM ks.temperatures t JOIN ks.metadata m ON t.sensor_id = m.sensor_id WHERE t.sensor_id = 12345"); • Aggregates sc.sql("SELECT sensor_id, year, month, MAX(temp) mtemp FROM ks.temperatures GROUP BY sensor_id, year, month"); © 2014 DataStax, All Rights Reserved. Company Confidential 44
  • 45. Things you can’t do in Cassandra – External Data • JOIN with HDFS data val temp2014 = sc.textFile("webhdfs://myhadoop/data/temp2014.csv"). map(x=>x.split(",")). map(x=>((x(0).toInt, x(1).toInt, x(2).toInt), x(3).toDouble)) val temp2015 = sc.cassandraTable("ks", "temperatures"). map(x=>((x.getInt("sensor_id"), x.getInt("year"), x.getInt("month")), x.getDouble("avgTemp"))) val hotter = temp2015.join(temp2014).filter(x => x._2._1._1 > x._2._2._1) • Non-Partition Key Predicates csc.sql("SELECT * FROM ks.temperatures WHERE temp > 100") © 2014 DataStax, All Rights Reserved. Company Confidential 45
  • 46. Tools • ODBC and JDBC tools via SparkSQL – Tableau, Pentaho, R, etc • Apache Zeppelin (incubating) A web-based notebook that enables interactive data analytics. © 2014 DataStax, All Rights Reserved. Company Confidential 46
  • 47. Quick word on Spark Streaming and Cassandra • Very good combination – Simple, powerful, useful, scalable, etc, etc, etc. © 2014 DataStax, All Rights Reserved. Company Confidential 47 Receiver
  • 48. Quick word on Spark Streaming and Cassandra © 2014 DataStax, All Rights Reserved. Company Confidential 48 import com.datastax.spark.connector.streaming._ // Spark connection options val conf = new SparkConf(true)... // streaming with 1 second batch window val ssc = new StreamingContext(conf, Seconds(1)) // stream input val lines = ssc.socketTextStream(serverIP, serverPort) // count words val wordCounts = lines.flatMap(_.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) // stream output wordCounts.saveToCassandra("test", "words") // start processing ssc.start() ssc.awaitTermination()
  • 49. DataStax Enterprise © 2014 DataStax, All Rights Reserved. Company Confidential 49 Combines Cassandra, Spark, and Solr (and more!) - Fault Tolerance - Management - Visual Monitoring - Security - ETC!
  • 50. Motivating Use Case Internet of Things © 2014 DataStax, All Rights Reserved. Company Confidential 50
  • 51. Cassandra + Spark • Unleash the power of analytics • On your operational data – IoT, Web, Mobile, etc © 2014 DataStax, All Rights Reserved. Company Confidential 51 “Because that’s where the Data is.”
  • 52. Contacts and Links • Links – Cassandra Summit: http://cassandrasummit-datastax.com/ – DataStax Academy: https://academy.datastax.com/ • Contacts – Kevin Pardue, Regional Channel Manager: [email protected] – Brian Hess, Sr Product Manager for Analytics: [email protected] – Devin Saxon, Marketing Specialist: [email protected] © 2014 DataStax, All Rights Reserved. Company Confidential 52
  • 53. © 2014 DataStax, All Rights Reserved. Company Confidential 53

Editor's Notes

  • #34: Spark has a very simple Architecture (see chart) Basic model for RDD is really nice, Easy to grok RDD, many sources you can get this from Lots of fun languages supported
  • #35: Spark Master : Analgous to Job Tracker Initial contact point for applications Keeps track of state of system
  • #36: Spark Worker: Task Tracker … Manages starting "executors" on machines Reports and setups env for executors
  • #37: Spark Executor: Actually does the work Process started by worker, Communicates directly with driving application & master 1 Spark Partition per executor … KEY Spark Partition != Cassandra Partition
  • #38: RDD’s where do they come from All sorts of great places
  • #39: So how do we act with these RDD’s
  • #42: Basics on how OSSConnector works How RDD is Split up