SlideShare a Scribd company logo
© Copyright 2015 Glassbeam Inc.
Ad Hoc Analytics
on
Internet of Complex Things
with
Spark and Cassandra
Mohammed Guller
September 2015
© Copyright 2015 Glassbeam Inc.
Let’s Take a Quick Poll
 Familiar with IoT
 Data modelling experience in C*
 Familiar with Spark
 Hands-on experience with Spark
3
© Copyright 2015 Glassbeam Inc.
About Me
 Principal Architect at Glassbeam
 Author of an upcoming book
– “Big Data Analytics with Spark”
 Founded two startups
 Passionate about building new products, big data analytics,
and Machine Learning
 Berkeley Graduate
LinkedIn: www.linkedin.com/in/mohammedguller
Twitter: @MohammedGuller
4
© Copyright 2015 Glassbeam Inc.
Internet of Things (IoT)
5
Network of objects embedded with software for
collecting and exchanging data over the Internet
© Copyright 2015 Glassbeam Inc.
Internet of Complex Things (IoCT)
6
 Data Center Devices
– Server, storage, controller
 Medical Devices
– X-Ray, MRI scan, CT scan
 Manufacturing Systems
 Cars
 Electric Vehicle Chargers
 Other Complex Devices
Glassbeam target market is focused on driving opera onal & business
naly cs value for connected product companies in Industrial IoT market
IT & Networks Medical & Health Care EV Chargers & Smart Grid
© Copyright 2015 Glassbeam Inc.
IT & Networks
Medical &
Healthcare
EV Chargers &
Smart Grid
Industrial & Mfg
Transportation
Glassbeam
7
target market is focused on driving opera onal & business
ue for connected product companies in Industrial IoT market
rks Medical & Health Care
Transporta on
EV Chargers & Smart Grid
Industrial & Mfg
5
Glassbeam target market is focused on d
analy cs value for connected product com
IT & Networks Medical & Health Care
TrIndustrial & Mfg
market is focused on driving opera onal & business
connected product companies in Industrial IoT market
Medical & Health Care EV Chargers & Smart Grid
Transporta on
5
Advanced and
Predictive Analytics
for Connected
Product Companies
© Copyright 2015 Glassbeam Inc.
10101000101011010101110101111010101000101001010101010111110001011001000110000110101110100110011111000000101
01101010011111000101001010110010100101100010011010101140101010000101010000101111001001101011010010101000001
11101010101000101011010101110101111010101000101001010101010111110001011001000110000110101110100110011111000
00010101101010011111000101001010110010100101100010011010101140101010000101010000101111001001101011010010101
00000100100110101101001001001101011010010010011010001001101011010010010011010110100101101001101001101001101
Analytics on Operational Data
8
Operational Data
to
Powerful Insights
© Copyright 2015 Glassbeam Inc.
High-level Architecture
9
1010100010101
10101011101011
1101010100010
1001010101010
11111000101100
1000110000110
10111010011001
11110000001010
11010100111110
0010100101011
0010100101100
0100110101011
4010101000010
10100001011110
0100110101101
0010101000001
11101001111001
0011010110100
1010101010100
0101011010101
11010111101010
1000101001010
10101011111000
1011001000110
00011010111010
011
Data
Inges on
Data
Transforma on
Data Stores Middleware Applica ons
Logs
(Streams/
docs)
SPL Library
S
C
A
L
A
R
I
N
F
O
S
E
R
V
E
R
LogVault
Explorer
Workbench
Standard Apps
Custom Apps
Rules & Alerts
DirectAccess
Glassbeam Studio
Cloud Enablement & Automa on
S3 Amazon
Raw logs
Cassandra
Processed Data
Solr Cloud
Index
Analy cs and
Machine learning
Spark
SQL
Spark
Streaming
MLlib
Event Processing & Rules Engine
End to End cloud based architecture built on modern
technologies to handle any machine, any data, any cloud
* SPL (Semiotic Parsing Language) and SCALAR are patent pending technology inventions of Glassbeam
© Copyright 2015 Glassbeam Inc.
Key Properties of IoCT Data
10
Volume Terabytes of Data
Variety Multi-structured Data
Velocity
Fast Paced Batch Data
Streaming Data
© Copyright 2015 Glassbeam Inc.
Why We Chose C*
11
Volume Economically Scale from Gigabytes
to Terabytes of Data
Variety Store Multi-structured Data
Velocity
Fast Ingest of New Data Quick
Reload of Old Data
Linear
Scalability
Dynamic
Schema
Fast
Writes
© Copyright 2015 Glassbeam Inc.
Modeling Data in C*
 Different from Modeling Data in RDBMS
 Queries Drive Table and Primary Key Definitions
– Primary Key Definition Limits the Kind of Queries You Can Run
– C* Does Not Support Joins
12
© Copyright 2015 Glassbeam Inc.
A Simple Table for Storing Event Data in C*
CREATE TABLE event (
sys_id text,
dt timestamp,
ts timestamp,
severity text,
module text,
message text,
PRIMARY KEY ((sys_id, dt), ts)
) WITH CLUSTERING ORDER BY (ts DESC);
13
© Copyright 2015 Glassbeam Inc.
Another Table to Filter Events by Severity
CREATE TABLE event_by_severity (
sys_id text,
dt timestamp,
ts timestamp,
severity text,
module text,
message text,
PRIMARY KEY ((sys_id, dt), severity, ts)
) WITH CLUSTERING ORDER BY (severity ASC, ts DESC);
14
© Copyright 2015 Glassbeam Inc.
Yet Another Table to Filter Events by Module
CREATE TABLE event_by_module (
sys_id text,
dt timestamp,
ts timestamp,
severity text,
module text,
message text,
PRIMARY KEY ((sys_id, dt), module, ts)
) WITH CLUSTERING ORDER BY (module ASC, ts DESC);
15
© Copyright 2015 Glassbeam Inc.
Ad Hoc Analytics with C*
 Oxymoron
 All queries Must be Known Upfront
16
© Copyright 2015 Glassbeam Inc.
Workaround Possible but Intractable
Sys_id Model Age OS City State Country
17
• sys_by_model
• sys_by_os
• sys_by_age
• sys_by_state
• sys_by_state_age
• sys_by_age_state
• sys_by_model_age
• sys_by_age_model
• sys_by_age_model_state
• sys_by_model_state_age
• sys_by_model_state_os
© Copyright 2015 Glassbeam Inc.
Other Barriers to Ad Hoc Queries
 No Aggregation
 No Group By
 No Joins
18
© Copyright 2015 Glassbeam Inc. 19
What Do
I Do
Now?
© Copyright 2015 Glassbeam Inc. 20
© Copyright 2015 Glassbeam Inc.
Spark
21
 Fast and General-purpose Cluster Computing
Framework for Processing Large Datasets
 API in Scala, Java, Python, SQL, and R
© Copyright 2015 Glassbeam Inc.
Integrated Libraries for a Variety of Tasks
22
Spark Core
Spark
SQL
GraphX
Spark
Streaming
MLlib &
Spark ML
© Copyright 2015 Glassbeam Inc.
One Minor Problem!
 Spark Does not Have Built-in Support for C*
 Built-in Support for HDFS, S3 and JDBC-compliant
Databases
23
© Copyright 2015 Glassbeam Inc.
Spark Cassandra Connector
 Open Source Library for Integrating Spark with C*
 Enables a Spark Application to Process Data in C* Just
Like Data from the Built-in Data Sources
24
© Copyright 2015 Glassbeam Inc.
Spark with C*
 Enables Ad Hoc Analytics
 CQL Limitations No Longer Apply
 Query Data Using SQL/HiveQL
– Filter on Any Column
– Aggregations
– Group By
25
© Copyright 2015 Glassbeam Inc.
Ad Hoc Analytics in Spark Shell
26
© Copyright 2015 Glassbeam Inc.
Launch the Spark Shell
/path/to/spark/bin/spark-shell 
--master spark://host:7077 
--packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0
27
© Copyright 2015 Glassbeam Inc.
Create a DataFrame
val events = sqlContext.read
.format("org.apache.spark.sql.cassandra")
.options( Map(
"keyspace" -> "test",
"table" -> "event"))
.load()
28
© Copyright 2015 Glassbeam Inc.
Fire Queries
events.cache()
events.select("ts", "module", "message").where($"severity" === "ERROR").show
events.select("ts", "severity", "message").where($"module" === "m1").show
events.select("ts", "message").where($"severity" === "ERROR" &&
$"module" === "m1").show
events.groupBy("severity").count()
29
© Copyright 2015 Glassbeam Inc.
Spark SQL JDBC/ODBC Server
 Analyze data in C* with just SQL/HiveQL
 Command Line Shell
– Beeline
 Graphical SQL Client
– Squirrel
 Data Visualization Applications
– Tableau
– ZoomData
– Qlik
30
© Copyright 2015 Glassbeam Inc.
Ad hoc Analytics with Spark SQL JDBC/ODBC server
31
© Copyright 2015 Glassbeam Inc.
Start the Spark SQL JDBC Server
/path/to/spark/sbin/start-thriftserver.sh 
--master spark://hostname:7077 
--packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0
32
© Copyright 2015 Glassbeam Inc.
Launch Beeline From a Terminal
/path/to/spark/bin/beeline
33
© Copyright 2015 Glassbeam Inc.
Connect to the Spark SQL JDBC Server
beeline> !connect jdbc:hive2://localhost:10000
34
© Copyright 2015 Glassbeam Inc.
Create a Temporary Table
0: jdbc:hive2://localhost:10000> CREATE TEMPORARY TABLE event
. . . . . . . . . . . . . . . .> USING org.apache.spark.sql.cassandra
. . . . . . . . . . . . . . . .> OPTIONS (
. . . . . . . . . . . . . . . .> keyspace "test",
. . . . . . . . . . . . . . . .> table "event"
. . . . . . . . . . . . . . . .> );
35
© Copyright 2015 Glassbeam Inc.
Query Data with SQL/HiveQL
...> CACHE TABLE event;
...> SELECT severity, count(1) as total FROM event GROUP BY severity;
...> SELECT module, severity, count(1) FROM event GROUP BY module, severity;
36
© Copyright 2015 Glassbeam Inc.
Caveats
 Latency
 Spark Query May Require Expensive Table Scan
– Reads Every Row
– Disk I / O Slow
37
© Copyright 2015 Glassbeam Inc.
Reduce the Impact of Slow Disk I / O
 Cache Tables
 Replace HDD with SSD
 Add More Nodes
38
© Copyright 2015 Glassbeam Inc.
Recommendations
 Known Queries Requiring Sub-second Response Time
– Query C* Directly
– Create Query Specific Tables
– Pre-aggregate Data
 Ad Hoc Queries
– Spark
39
© Copyright 2015 Glassbeam Inc. 40
Ad

More Related Content

What's hot (19)

"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
Kai Wähner
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark Summit
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
Joseph Arriola
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
DataWorks Summit
 
Spark meets Smart Meters
Spark meets Smart MetersSpark meets Smart Meters
Spark meets Smart Meters
DataWorks Summit/Hadoop Summit
 
Oracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer IntroductionOracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer Introduction
Jeffrey T. Pollock
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
DataWorks Summit
 
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
DataStax
 
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
Kai Wähner
 
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadForget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Data Con LA
 
Oracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud InfrastructureOracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud Infrastructure
OTN Systems Hub
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
Deepak Chandramouli
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
DataWorks Summit
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
SnapLogic
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
Mark Kerzner
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Databricks
 
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
Kai Wähner
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
Adam Doyle
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
Jeffrey T. Pollock
 
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
Kai Wähner
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark Summit
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
Joseph Arriola
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
DataWorks Summit
 
Oracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer IntroductionOracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer Introduction
Jeffrey T. Pollock
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
DataWorks Summit
 
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
DataStax
 
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
Kai Wähner
 
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadForget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Data Con LA
 
Oracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud InfrastructureOracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud Infrastructure
OTN Systems Hub
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
DataWorks Summit
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
SnapLogic
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
Mark Kerzner
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Databricks
 
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
Kai Wähner
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
Adam Doyle
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
Jeffrey T. Pollock
 

Viewers also liked (20)

Ad hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and SparkAd hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and Spark
Mohammed Guller
 
Pre-Con Education: Advanced and Reporting and Dashboards With Xtraction
Pre-Con Education: Advanced and Reporting and Dashboards With XtractionPre-Con Education: Advanced and Reporting and Dashboards With Xtraction
Pre-Con Education: Advanced and Reporting and Dashboards With Xtraction
CA Technologies
 
Petabridge: The New .NET Enterprise Stack
Petabridge: The New .NET Enterprise StackPetabridge: The New .NET Enterprise Stack
Petabridge: The New .NET Enterprise Stack
DataStax Academy
 
DataStax: Setting Your Database Management on Autopilot with OpsCenter
DataStax: Setting Your Database Management on Autopilot with OpsCenterDataStax: Setting Your Database Management on Autopilot with OpsCenter
DataStax: Setting Your Database Management on Autopilot with OpsCenter
DataStax Academy
 
DataStax: Steps to successfully implementing NoSQL in the enterprise
DataStax: Steps to successfully implementing NoSQL in the enterpriseDataStax: Steps to successfully implementing NoSQL in the enterprise
DataStax: Steps to successfully implementing NoSQL in the enterprise
DataStax Academy
 
DataStax: Making a Difference with Smart Analytics
DataStax: Making a Difference with Smart AnalyticsDataStax: Making a Difference with Smart Analytics
DataStax: Making a Difference with Smart Analytics
DataStax Academy
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
DataStax Academy
 
DataStax: The Whys of NoSQL
DataStax: The Whys of NoSQLDataStax: The Whys of NoSQL
DataStax: The Whys of NoSQL
DataStax Academy
 
DataStax: Ramping up Cassandra QA
DataStax: Ramping up Cassandra QADataStax: Ramping up Cassandra QA
DataStax: Ramping up Cassandra QA
DataStax Academy
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
DataStax Academy
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with Cassandra
DataStax Academy
 
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
CA Technologies
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
Mohammed Guller
 
DataStax: What's New in Apache TinkerPop - the Graph Computing Framework
DataStax: What's New in Apache TinkerPop - the Graph Computing FrameworkDataStax: What's New in Apache TinkerPop - the Graph Computing Framework
DataStax: What's New in Apache TinkerPop - the Graph Computing Framework
DataStax Academy
 
Target: Escaping Disco-Era Data Modeling
Target: Escaping Disco-Era Data ModelingTarget: Escaping Disco-Era Data Modeling
Target: Escaping Disco-Era Data Modeling
DataStax Academy
 
DataStax: Datastax Enterprise - The Multi-Model Platform
DataStax: Datastax Enterprise - The Multi-Model PlatformDataStax: Datastax Enterprise - The Multi-Model Platform
DataStax: Datastax Enterprise - The Multi-Model Platform
DataStax Academy
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
DataStax Academy
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
DataStax Academy
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
DataStax Academy
 
Ad hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and SparkAd hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and Spark
Mohammed Guller
 
Pre-Con Education: Advanced and Reporting and Dashboards With Xtraction
Pre-Con Education: Advanced and Reporting and Dashboards With XtractionPre-Con Education: Advanced and Reporting and Dashboards With Xtraction
Pre-Con Education: Advanced and Reporting and Dashboards With Xtraction
CA Technologies
 
Petabridge: The New .NET Enterprise Stack
Petabridge: The New .NET Enterprise StackPetabridge: The New .NET Enterprise Stack
Petabridge: The New .NET Enterprise Stack
DataStax Academy
 
DataStax: Setting Your Database Management on Autopilot with OpsCenter
DataStax: Setting Your Database Management on Autopilot with OpsCenterDataStax: Setting Your Database Management on Autopilot with OpsCenter
DataStax: Setting Your Database Management on Autopilot with OpsCenter
DataStax Academy
 
DataStax: Steps to successfully implementing NoSQL in the enterprise
DataStax: Steps to successfully implementing NoSQL in the enterpriseDataStax: Steps to successfully implementing NoSQL in the enterprise
DataStax: Steps to successfully implementing NoSQL in the enterprise
DataStax Academy
 
DataStax: Making a Difference with Smart Analytics
DataStax: Making a Difference with Smart AnalyticsDataStax: Making a Difference with Smart Analytics
DataStax: Making a Difference with Smart Analytics
DataStax Academy
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
DataStax Academy
 
DataStax: The Whys of NoSQL
DataStax: The Whys of NoSQLDataStax: The Whys of NoSQL
DataStax: The Whys of NoSQL
DataStax Academy
 
DataStax: Ramping up Cassandra QA
DataStax: Ramping up Cassandra QADataStax: Ramping up Cassandra QA
DataStax: Ramping up Cassandra QA
DataStax Academy
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
DataStax Academy
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with Cassandra
DataStax Academy
 
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
Hands-on Lab: Building Advanced Dashboards with Xtraction for CA Service Mana...
CA Technologies
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
Mohammed Guller
 
DataStax: What's New in Apache TinkerPop - the Graph Computing Framework
DataStax: What's New in Apache TinkerPop - the Graph Computing FrameworkDataStax: What's New in Apache TinkerPop - the Graph Computing Framework
DataStax: What's New in Apache TinkerPop - the Graph Computing Framework
DataStax Academy
 
Target: Escaping Disco-Era Data Modeling
Target: Escaping Disco-Era Data ModelingTarget: Escaping Disco-Era Data Modeling
Target: Escaping Disco-Era Data Modeling
DataStax Academy
 
DataStax: Datastax Enterprise - The Multi-Model Platform
DataStax: Datastax Enterprise - The Multi-Model PlatformDataStax: Datastax Enterprise - The Multi-Model Platform
DataStax: Datastax Enterprise - The Multi-Model Platform
DataStax Academy
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
DataStax Academy
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
DataStax Academy
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
DataStax Academy
 
Ad

Similar to Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassandra and Apache Spark (20)

Why and How to Monitor Application Performance in Azure
Why and How to Monitor Application Performance in AzureWhy and How to Monitor Application Performance in Azure
Why and How to Monitor Application Performance in Azure
Riverbed Technology
 
Why and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in AzureWhy and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in Azure
Ian Downard
 
StampedeCon 2015 Keynote
StampedeCon 2015 KeynoteStampedeCon 2015 Keynote
StampedeCon 2015 Keynote
Ken Owens
 
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
StampedeCon
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
Torsten Steinbach
 
Realise True Business Value .pdf
Realise True Business Value .pdfRealise True Business Value .pdf
Realise True Business Value .pdf
ThousandEyes
 
Power apps - Cloud business applications platform
Power apps - Cloud business applications platformPower apps - Cloud business applications platform
Power apps - Cloud business applications platform
Vladimir Ljubibratic
 
Oracle engineered systems executive presentation
Oracle engineered systems executive presentationOracle engineered systems executive presentation
Oracle engineered systems executive presentation
OTN Systems Hub
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
Romit Mehta
 
Splunk and Multicloud
Splunk and MulticloudSplunk and Multicloud
Splunk and Multicloud
Splunk
 
Splunk and Multicloud
Splunk and Multicloud Splunk and Multicloud
Splunk and Multicloud
Splunk
 
Hey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima MukkamalaHey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima Mukkamala
gogo6
 
Realize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyesRealize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyes
ThousandEyes
 
A New Day for Oracle Analytics
A New Day for Oracle AnalyticsA New Day for Oracle Analytics
A New Day for Oracle Analytics
Rich Clayton
 
はじめてのOracle Cloud Infrastructure(Oracle Cloudウェビナーシリーズ: 2020年6月24日)
はじめてのOracle Cloud Infrastructure(Oracle Cloudウェビナーシリーズ: 2020年6月24日)はじめてのOracle Cloud Infrastructure(Oracle Cloudウェビナーシリーズ: 2020年6月24日)
はじめてのOracle Cloud Infrastructure(Oracle Cloudウェビナーシリーズ: 2020年6月24日)
オラクルエンジニア通信
 
Scribe Online CDK & Connector Development
Scribe Online CDK & Connector DevelopmentScribe Online CDK & Connector Development
Scribe Online CDK & Connector Development
CloudFronts Technologies LLP.
 
Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture
Solace
 
Should healthcare abandon the cloud final
Should healthcare abandon the cloud finalShould healthcare abandon the cloud final
Should healthcare abandon the cloud final
sapenov
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Software Guru
 
Delivering New Visibility and Analytics for IT Operations
Delivering New Visibility and Analytics for IT OperationsDelivering New Visibility and Analytics for IT Operations
Delivering New Visibility and Analytics for IT Operations
Gabrielle Knowles
 
Why and How to Monitor Application Performance in Azure
Why and How to Monitor Application Performance in AzureWhy and How to Monitor Application Performance in Azure
Why and How to Monitor Application Performance in Azure
Riverbed Technology
 
Why and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in AzureWhy and How to Monitor App Performance in Azure
Why and How to Monitor App Performance in Azure
Ian Downard
 
StampedeCon 2015 Keynote
StampedeCon 2015 KeynoteStampedeCon 2015 Keynote
StampedeCon 2015 Keynote
Ken Owens
 
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
StampedeCon
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
Torsten Steinbach
 
Realise True Business Value .pdf
Realise True Business Value .pdfRealise True Business Value .pdf
Realise True Business Value .pdf
ThousandEyes
 
Power apps - Cloud business applications platform
Power apps - Cloud business applications platformPower apps - Cloud business applications platform
Power apps - Cloud business applications platform
Vladimir Ljubibratic
 
Oracle engineered systems executive presentation
Oracle engineered systems executive presentationOracle engineered systems executive presentation
Oracle engineered systems executive presentation
OTN Systems Hub
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
Romit Mehta
 
Splunk and Multicloud
Splunk and MulticloudSplunk and Multicloud
Splunk and Multicloud
Splunk
 
Splunk and Multicloud
Splunk and Multicloud Splunk and Multicloud
Splunk and Multicloud
Splunk
 
Hey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima MukkamalaHey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima Mukkamala
gogo6
 
Realize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyesRealize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyes
ThousandEyes
 
A New Day for Oracle Analytics
A New Day for Oracle AnalyticsA New Day for Oracle Analytics
A New Day for Oracle Analytics
Rich Clayton
 
はじめてのOracle Cloud Infrastructure(Oracle Cloudウェビナーシリーズ: 2020年6月24日)
はじめてのOracle Cloud Infrastructure(Oracle Cloudウェビナーシリーズ: 2020年6月24日)はじめてのOracle Cloud Infrastructure(Oracle Cloudウェビナーシリーズ: 2020年6月24日)
はじめてのOracle Cloud Infrastructure(Oracle Cloudウェビナーシリーズ: 2020年6月24日)
オラクルエンジニア通信
 
Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture
Solace
 
Should healthcare abandon the cloud final
Should healthcare abandon the cloud finalShould healthcare abandon the cloud final
Should healthcare abandon the cloud final
sapenov
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Software Guru
 
Delivering New Visibility and Analytics for IT Operations
Delivering New Visibility and Analytics for IT OperationsDelivering New Visibility and Analytics for IT Operations
Delivering New Visibility and Analytics for IT Operations
Gabrielle Knowles
 
Ad

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Recently uploaded (20)

Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 

Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassandra and Apache Spark

  • 1. © Copyright 2015 Glassbeam Inc. Ad Hoc Analytics on Internet of Complex Things with Spark and Cassandra Mohammed Guller September 2015
  • 2. © Copyright 2015 Glassbeam Inc. Let’s Take a Quick Poll  Familiar with IoT  Data modelling experience in C*  Familiar with Spark  Hands-on experience with Spark 3
  • 3. © Copyright 2015 Glassbeam Inc. About Me  Principal Architect at Glassbeam  Author of an upcoming book – “Big Data Analytics with Spark”  Founded two startups  Passionate about building new products, big data analytics, and Machine Learning  Berkeley Graduate LinkedIn: www.linkedin.com/in/mohammedguller Twitter: @MohammedGuller 4
  • 4. © Copyright 2015 Glassbeam Inc. Internet of Things (IoT) 5 Network of objects embedded with software for collecting and exchanging data over the Internet
  • 5. © Copyright 2015 Glassbeam Inc. Internet of Complex Things (IoCT) 6  Data Center Devices – Server, storage, controller  Medical Devices – X-Ray, MRI scan, CT scan  Manufacturing Systems  Cars  Electric Vehicle Chargers  Other Complex Devices Glassbeam target market is focused on driving opera onal & business naly cs value for connected product companies in Industrial IoT market IT & Networks Medical & Health Care EV Chargers & Smart Grid
  • 6. © Copyright 2015 Glassbeam Inc. IT & Networks Medical & Healthcare EV Chargers & Smart Grid Industrial & Mfg Transportation Glassbeam 7 target market is focused on driving opera onal & business ue for connected product companies in Industrial IoT market rks Medical & Health Care Transporta on EV Chargers & Smart Grid Industrial & Mfg 5 Glassbeam target market is focused on d analy cs value for connected product com IT & Networks Medical & Health Care TrIndustrial & Mfg market is focused on driving opera onal & business connected product companies in Industrial IoT market Medical & Health Care EV Chargers & Smart Grid Transporta on 5 Advanced and Predictive Analytics for Connected Product Companies
  • 7. © Copyright 2015 Glassbeam Inc. 10101000101011010101110101111010101000101001010101010111110001011001000110000110101110100110011111000000101 01101010011111000101001010110010100101100010011010101140101010000101010000101111001001101011010010101000001 11101010101000101011010101110101111010101000101001010101010111110001011001000110000110101110100110011111000 00010101101010011111000101001010110010100101100010011010101140101010000101010000101111001001101011010010101 00000100100110101101001001001101011010010010011010001001101011010010010011010110100101101001101001101001101 Analytics on Operational Data 8 Operational Data to Powerful Insights
  • 8. © Copyright 2015 Glassbeam Inc. High-level Architecture 9 1010100010101 10101011101011 1101010100010 1001010101010 11111000101100 1000110000110 10111010011001 11110000001010 11010100111110 0010100101011 0010100101100 0100110101011 4010101000010 10100001011110 0100110101101 0010101000001 11101001111001 0011010110100 1010101010100 0101011010101 11010111101010 1000101001010 10101011111000 1011001000110 00011010111010 011 Data Inges on Data Transforma on Data Stores Middleware Applica ons Logs (Streams/ docs) SPL Library S C A L A R I N F O S E R V E R LogVault Explorer Workbench Standard Apps Custom Apps Rules & Alerts DirectAccess Glassbeam Studio Cloud Enablement & Automa on S3 Amazon Raw logs Cassandra Processed Data Solr Cloud Index Analy cs and Machine learning Spark SQL Spark Streaming MLlib Event Processing & Rules Engine End to End cloud based architecture built on modern technologies to handle any machine, any data, any cloud * SPL (Semiotic Parsing Language) and SCALAR are patent pending technology inventions of Glassbeam
  • 9. © Copyright 2015 Glassbeam Inc. Key Properties of IoCT Data 10 Volume Terabytes of Data Variety Multi-structured Data Velocity Fast Paced Batch Data Streaming Data
  • 10. © Copyright 2015 Glassbeam Inc. Why We Chose C* 11 Volume Economically Scale from Gigabytes to Terabytes of Data Variety Store Multi-structured Data Velocity Fast Ingest of New Data Quick Reload of Old Data Linear Scalability Dynamic Schema Fast Writes
  • 11. © Copyright 2015 Glassbeam Inc. Modeling Data in C*  Different from Modeling Data in RDBMS  Queries Drive Table and Primary Key Definitions – Primary Key Definition Limits the Kind of Queries You Can Run – C* Does Not Support Joins 12
  • 12. © Copyright 2015 Glassbeam Inc. A Simple Table for Storing Event Data in C* CREATE TABLE event ( sys_id text, dt timestamp, ts timestamp, severity text, module text, message text, PRIMARY KEY ((sys_id, dt), ts) ) WITH CLUSTERING ORDER BY (ts DESC); 13
  • 13. © Copyright 2015 Glassbeam Inc. Another Table to Filter Events by Severity CREATE TABLE event_by_severity ( sys_id text, dt timestamp, ts timestamp, severity text, module text, message text, PRIMARY KEY ((sys_id, dt), severity, ts) ) WITH CLUSTERING ORDER BY (severity ASC, ts DESC); 14
  • 14. © Copyright 2015 Glassbeam Inc. Yet Another Table to Filter Events by Module CREATE TABLE event_by_module ( sys_id text, dt timestamp, ts timestamp, severity text, module text, message text, PRIMARY KEY ((sys_id, dt), module, ts) ) WITH CLUSTERING ORDER BY (module ASC, ts DESC); 15
  • 15. © Copyright 2015 Glassbeam Inc. Ad Hoc Analytics with C*  Oxymoron  All queries Must be Known Upfront 16
  • 16. © Copyright 2015 Glassbeam Inc. Workaround Possible but Intractable Sys_id Model Age OS City State Country 17 • sys_by_model • sys_by_os • sys_by_age • sys_by_state • sys_by_state_age • sys_by_age_state • sys_by_model_age • sys_by_age_model • sys_by_age_model_state • sys_by_model_state_age • sys_by_model_state_os
  • 17. © Copyright 2015 Glassbeam Inc. Other Barriers to Ad Hoc Queries  No Aggregation  No Group By  No Joins 18
  • 18. © Copyright 2015 Glassbeam Inc. 19 What Do I Do Now?
  • 19. © Copyright 2015 Glassbeam Inc. 20
  • 20. © Copyright 2015 Glassbeam Inc. Spark 21  Fast and General-purpose Cluster Computing Framework for Processing Large Datasets  API in Scala, Java, Python, SQL, and R
  • 21. © Copyright 2015 Glassbeam Inc. Integrated Libraries for a Variety of Tasks 22 Spark Core Spark SQL GraphX Spark Streaming MLlib & Spark ML
  • 22. © Copyright 2015 Glassbeam Inc. One Minor Problem!  Spark Does not Have Built-in Support for C*  Built-in Support for HDFS, S3 and JDBC-compliant Databases 23
  • 23. © Copyright 2015 Glassbeam Inc. Spark Cassandra Connector  Open Source Library for Integrating Spark with C*  Enables a Spark Application to Process Data in C* Just Like Data from the Built-in Data Sources 24
  • 24. © Copyright 2015 Glassbeam Inc. Spark with C*  Enables Ad Hoc Analytics  CQL Limitations No Longer Apply  Query Data Using SQL/HiveQL – Filter on Any Column – Aggregations – Group By 25
  • 25. © Copyright 2015 Glassbeam Inc. Ad Hoc Analytics in Spark Shell 26
  • 26. © Copyright 2015 Glassbeam Inc. Launch the Spark Shell /path/to/spark/bin/spark-shell --master spark://host:7077 --packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0 27
  • 27. © Copyright 2015 Glassbeam Inc. Create a DataFrame val events = sqlContext.read .format("org.apache.spark.sql.cassandra") .options( Map( "keyspace" -> "test", "table" -> "event")) .load() 28
  • 28. © Copyright 2015 Glassbeam Inc. Fire Queries events.cache() events.select("ts", "module", "message").where($"severity" === "ERROR").show events.select("ts", "severity", "message").where($"module" === "m1").show events.select("ts", "message").where($"severity" === "ERROR" && $"module" === "m1").show events.groupBy("severity").count() 29
  • 29. © Copyright 2015 Glassbeam Inc. Spark SQL JDBC/ODBC Server  Analyze data in C* with just SQL/HiveQL  Command Line Shell – Beeline  Graphical SQL Client – Squirrel  Data Visualization Applications – Tableau – ZoomData – Qlik 30
  • 30. © Copyright 2015 Glassbeam Inc. Ad hoc Analytics with Spark SQL JDBC/ODBC server 31
  • 31. © Copyright 2015 Glassbeam Inc. Start the Spark SQL JDBC Server /path/to/spark/sbin/start-thriftserver.sh --master spark://hostname:7077 --packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0 32
  • 32. © Copyright 2015 Glassbeam Inc. Launch Beeline From a Terminal /path/to/spark/bin/beeline 33
  • 33. © Copyright 2015 Glassbeam Inc. Connect to the Spark SQL JDBC Server beeline> !connect jdbc:hive2://localhost:10000 34
  • 34. © Copyright 2015 Glassbeam Inc. Create a Temporary Table 0: jdbc:hive2://localhost:10000> CREATE TEMPORARY TABLE event . . . . . . . . . . . . . . . .> USING org.apache.spark.sql.cassandra . . . . . . . . . . . . . . . .> OPTIONS ( . . . . . . . . . . . . . . . .> keyspace "test", . . . . . . . . . . . . . . . .> table "event" . . . . . . . . . . . . . . . .> ); 35
  • 35. © Copyright 2015 Glassbeam Inc. Query Data with SQL/HiveQL ...> CACHE TABLE event; ...> SELECT severity, count(1) as total FROM event GROUP BY severity; ...> SELECT module, severity, count(1) FROM event GROUP BY module, severity; 36
  • 36. © Copyright 2015 Glassbeam Inc. Caveats  Latency  Spark Query May Require Expensive Table Scan – Reads Every Row – Disk I / O Slow 37
  • 37. © Copyright 2015 Glassbeam Inc. Reduce the Impact of Slow Disk I / O  Cache Tables  Replace HDD with SSD  Add More Nodes 38
  • 38. © Copyright 2015 Glassbeam Inc. Recommendations  Known Queries Requiring Sub-second Response Time – Query C* Directly – Create Query Specific Tables – Pre-aggregate Data  Ad Hoc Queries – Spark 39
  • 39. © Copyright 2015 Glassbeam Inc. 40