SlideShare a Scribd company logo
Stratio Meta 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-20141"
Stratio Crossdata 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-20142"
Who are we? 
STRATIO 
• Stra3o-is-a-Big-Data-Company 
• Founded-in-2013 
• Commercially-launched-in-2014 
• 50+-employees-in-Madrid 
• Office-in-San-Francisco 
• Cer3fied-Spark-distribu3on 
#CassandraSummit-2014 
3"
We love… 
Cassandra 
• P2P-architecture 
• Read/write-performance 
• Fault-tolerance 
• Easy-to-deploy 
• CQL 
#CassandraSummit-2014 
4"
• Introduction 
• Crossdata architecture 
• Metadata management 
• Streaming sources 
• Full text search 
• Spark and Crossdata 
• ODBC 
• The future 
Agenda 
5"
Introduction 
o Big-Data-analysis-is-commonly-associated-with-batch-processing 
• Users-aiming-to-combine-batch-and-stream-processing-have-to- 
rely-on-tailorRmade-architectures 
o Users-buy-Big-Data-plaSorms,-but 
• How-do-I-start? 
• What-is-my-entry-point-to-the-plaSorm? 
#CassandraSummit-2014 
6"
What our clients demand? 
o Easy-deployment 
o Easy-administra3on 
o Read/write-performance 
o EasyRtoRlearn-query-language-o 
Integra3on-with-BI-Tools 
o Join-opera3ons 
o Support-for-streaming-sources 
o Integra3on-with-other-data-stores 
o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
7"
What our clients demand? 
! Easy%deployment% 
! Easy%administra0on% 
! Read/write%performance% 
! Easy6to6learn%query%language% 
o Integra3on-with-BI-Tools 
o Join-opera3ons 
o Support-for-streaming-sources 
o Integra3on-with-other-data-stores 
o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
8"
What our clients demand? 
! Easy"deployment" 
! Easy"administra8on" 
! Read/write"performance" 
! Easy>to>learn"query"language" 
! Integra3on-with-BI-Tools 
! Join-opera3ons 
! Support-for-streaming-sources 
! Integra3on-with-other-data-stores 
! Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
9"
Crossdata 
o A-new-technology-that: 
• Is-not-limited-by-the-underlying-datastore-capabili3es 
• Leverages-Spark-to-perform-nonRna3vely-supported-opera3ons 
• Supports-batch-and-streaming-queries 
• Supports-mul3ple-clusters-and-technologies 
#CassandraSummit-2014 
10"
Our architecture 
#CassandraSummit-2014 
11"
Connecting to the outside world 
o Crossdata-defines-an-IConnector-extension-interface 
o User-can-easily-add-new-connectors-to-support 
• Different-datastores 
• Different-processing-engines 
• Different-versions 
o Where-each-connector-defines-its-capabili3es 
#CassandraSummit-2014 
12" 
Our planner will choose the best connector for each query
Query execution 
#CassandraSummit-2014 
13" 
Parsing" Valida8on" Planning" Execu8on" 
C*" 
Connector1" 
Connector2" 
Connector3" 
Our planner will choose the best connector for each query
Multi-cluster support 
o Stra3o-Crossdata-offers-the-possibility-of-accessing-a-single-catalog- 
across-a-set-of-datastores.- 
• Mul3ple-clusters-can-coexist-to-op3mize-plaSorm-performance 
" E.g.,-produc3on-cluster,-test-cluster,-writeRop3mized-cluster,- 
readRop3mized-cluster,-etc.- 
• A-table-is-saved-in-a-unique-datastore 
#CassandraSummit-2014 
14"
Logical and physical mapping 
SELECT&*&FROM&app.users;& 
Users"table" Test"table" old_users"table" 
#CassandraSummit-2014 
15" 
App"catalog" 
C*"produc8on" C*"development" Other"datastores"
Metadata 
Management 
16"
Metadata in the era of Schemaless NoSQL datastores 
o Some-datastores-are-schemaless-but-our-applica3ons-are-not!- 
• Flexible-schemas-vs-Schemaless 
• Crossdata-provides-a-Metadata-manager-that-stores-schemas- 
for-any-datasource 
" Remember-ODBC-and-those-BI-tools 
" 
1010010101010 
1010110101010 
1111010001111 
?" 001000" 
#CassandraSummit-2014 
17"
Metadata management 
#CassandraSummit-2014 
18" 
Connector" 
C*"produc8on" 
Metadata"Store" 
Infinispan" 
Metadata"Manager" 
2% 
Updated"metadata" 
informa8on"is" 
maintained"among" 
Crossdata"servers" 
using"Infinispan" 
If"the"connector"does" 
not"support"metadata" 
opera8ons"those"are" 
skipped" 1% 2%
Streaming sources 
19"
Managing streaming sources 
o Nowadays-use-cases-expect-some-type-of-streaming-datasource 
• Streaming-data-has-an-ephemeral-nature 
• In-Stra3o-Crossdata-we-defined-the-ephemeral-table-abstrac3on- 
#CassandraSummit-2014 
to-work-with-streaming-sources-as-classical- 
RDBMS-tables 
20" 
streaming" 
source" 
{schema:{col1:…},…}" 
col1:text" col2:int" col3:int" col4:text" 
Streaming_query0" 
…" 
Streaming_queryn"
Streaming queries 
o Streaming-queries-are-infinite-by-defini3on 
• A-3me-window-is-defined-to-create-a-batch-like-view-of-the-rows- 
ingested-by-the-system-in-that-period 
• The-user-launches-queries-specifying-a-processing-3me-window 
" Crossdata-provides-methods-to-list-and-stop-running-streaming- 
#CassandraSummit-2014 
queries 
21"
Streaming queries: windows syntax 
#CassandraSummit-2014 
22" 
SELECT fieldGroup,avg(Field2) 
FROM eph_table 
WITH WINDOW 5 minutes 
WHERE field1=100 AND field2>100 
GROUP BY fieldGroup;
Joining batch and streaming 
SELECT * FROM demo.temporal 
WITH WINDOW 10 secs 
INNER JOIN demo.users 
#CassandraSummit-2014 
ON users.name = temporal.name; 
SELECT * FROM 
demo.temporal 
WITH WINDOW 10 secs 
" 
SELECT * 
FROM demo.users 
" 
INNER JOIN ON 
users.name = 
temporal.name 
" 
23"
Full text search 
24"
Full text search with 
o Clients-request-the-ability-to-perform-full-text-searches 
o We-have-developed-an-integra3on-between-Lucene-and- 
Cassandra 
o C*-users-can-now-enjoy-all-Lucene-features: 
• Full-text-searches,-range-queries,-fuzzy-queries…. 
#CassandraSummit-2014 
25" 
https://github.com/Stratio/stratio-cassandra
Stratio Lucene 2i 
#CassandraSummit-2014 
26" 
C*" 
node" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
Lucene" 
index"
Full text search queries 
o With-Crossdata,-we-simplify: 
• The-crea3on-syntax- 
• The-query-syntax-using-the-match-operator 
#CassandraSummit-2014 
27" 
CREATE&FULLTEXT&INDEX&ON&app.users(name,email);& 
SELECT&*&FROM&app.users&& 
where&email&MATCH&‘*@stratio.com’;&
& Stratio Crossdata 
28"
Why Spark? 
o Stra3o-Crossdata-uses-Spark-to-perform-nonRna3vely-supported-opera3ons 
o Spark-brings-several-benefits-over-Hadoop-o 
InRMemory-processing 
o RDD-abstrac3on 
o Simpler-API-o 
Increased-flexibility-(e.g.,-not-need-for-iden3ty-mapping) 
#CassandraSummit-2014 
29"
What about Spark SQL? 
o Different-approach-to-query-execu3on 
• We-only-use-Spark-when-it-speedups-queries 
" Na3ve-drivers-are-faster-for-simple-queries 
" Spark-SQL-has-limited-RDD-sources 
• Avoid-some-Spark-limita3ons 
• Several-batch-and-streaming-contexts-in-a-single-JVM-SPARKR2243 
#CassandraSummit-2014 
30"
Query approach 
SparkSQL"approach" Crossdata"approach" 
#CassandraSummit-2014 
SparkSQL" 
Spark" 
Cassandra" 
Spark" Na8ve"driver" 
Cassandra" 
31" 
Stra8o"Crossdata"
Our Cassandra-Spark integration 
o Project-started-in-June-2013 
" With-the-objec3ve-of-providing-a-method-to-interact-with- 
Cassandra-from-Spark 
" Ini3al-approach-based-on-the-HadoopInputFormat-interface 
" Current-version-uses-the-na3ve-Datastax-Java-driver 
#CassandraSummit-2014 
32" 
https://github.com/Stratio/stratio-deep
Our Cassandra-Spark integration 
o Benchmark-in-process-comparing-our-solu3on-with-the- 
Datastax-Spark-driver 
• Results-highly-influenced-by-the-split-size 
• Ini3al-results-are-promising-for-Stra3o-Spark-Integra3on-using- 
Datastax-default-values 
• Group-by-–-up-to-40%-faster 
• Join-–-up-to-17%-faster 
• Stay-tuned-for-the-benchmark-publica3on! 
#CassandraSummit-2014 
33"
Spark vs Lucene 2i 
#CassandraSummit-2014 
34" 
Time" 
Spark" 
Lucen"2i" 
Records/node"
ODBC 
35"
Stratio Crossdata ODBC 
o WellRknown-interface-standard-(for-BI-tools,-external-apps,-…) 
o We-have-implemented-for-Crossdata-using-Simba-SDK 
o ODBC-opens-the-full-poten3al-of-Stra3o-Crossdata-to-the-external- 
world 
o Currently-tested-with-Tableau,-Qlikview-and-MS-Excel 
#CassandraSummit-2014 
36" 
One ODBC for all datastores!
The future 
37"
The future 
o Security 
o Query-op3mizer-and-smart-query-planner 
o Leverage-system-sta3s3cs 
o Support-for-UDFs 
o Become-an-Apache-project 
#CassandraSummit-2014 
38" 
https://github.com/Stratio/stratio-meta
We are looking for an Apache Champion 
#CassandraSummit-2014 
39" 
Can"you" 
help"us?"
A wish list for Cassandra 
o Ability-to-stop-running-queries 
o Interac3ve-users-are-unpredictable 
o Some-excep3on-paths-are-not-clear-or-defined-(e.g.,-secondary-indexes) 
o Distribute-some-of-the-opera3ons-currently-performed-on-the-coordinator 
• E.g.,-aggrega3ons-like-count(*) 
#CassandraSummit-2014 
40"
Stratio Crossdata 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-201441"
Ad

More Related Content

What's hot (19)

BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
David Pilato
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
Russell Spitzer
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
Natalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Natalino Busa
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Sumeet Singh
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
Matthias Niehoff
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Brian O'Neill
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
DataStax Academy
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Robert Stupp
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introduction
Alex Su
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Duyhai Doan
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Carol McDonald
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
Guido Schmutz
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
David Pilato
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
Russell Spitzer
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
Natalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Natalino Busa
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Sumeet Singh
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Brian O'Neill
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
DataStax Academy
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Robert Stupp
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introduction
Alex Su
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Duyhai Doan
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
Guido Schmutz
 

Viewers also liked (6)

Big Data Technology
Big Data TechnologyBig Data Technology
Big Data Technology
Juan J. Mostazo
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
StampedeCon
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
Guido Schmutz
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
Impetus Technologies
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
StampedeCon
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
Guido Schmutz
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Ad

Similar to Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch and Streaming Query Capabilities (20)

Presentation
PresentationPresentation
Presentation
Dimitris Stripelis
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
DataStax Academy
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
Mark Tabladillo
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
Stratio
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in Cassandra
Stratio
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
ScyllaDB
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Sean Chittenden
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Anant Corporation
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
Álvaro Agea Herradón
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
BlueData, Inc.
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Duyhai Doan
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
confluent
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Andrés de la Peña
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
dhiguero
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
DataStax Academy
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
Mark Tabladillo
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
Stratio
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in Cassandra
Stratio
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
ScyllaDB
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Sean Chittenden
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Anant Corporation
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
BlueData, Inc.
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Duyhai Doan
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
confluent
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Andrés de la Peña
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
dhiguero
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
Ad

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Recently uploaded (20)

Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 

Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch and Streaming Query Capabilities

  • 1. Stratio Meta An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea [email protected] [email protected] #CassandraSummit-20141"
  • 2. Stratio Crossdata An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea [email protected] [email protected] #CassandraSummit-20142"
  • 3. Who are we? STRATIO • Stra3o-is-a-Big-Data-Company • Founded-in-2013 • Commercially-launched-in-2014 • 50+-employees-in-Madrid • Office-in-San-Francisco • Cer3fied-Spark-distribu3on #CassandraSummit-2014 3"
  • 4. We love… Cassandra • P2P-architecture • Read/write-performance • Fault-tolerance • Easy-to-deploy • CQL #CassandraSummit-2014 4"
  • 5. • Introduction • Crossdata architecture • Metadata management • Streaming sources • Full text search • Spark and Crossdata • ODBC • The future Agenda 5"
  • 6. Introduction o Big-Data-analysis-is-commonly-associated-with-batch-processing • Users-aiming-to-combine-batch-and-stream-processing-have-to- rely-on-tailorRmade-architectures o Users-buy-Big-Data-plaSorms,-but • How-do-I-start? • What-is-my-entry-point-to-the-plaSorm? #CassandraSummit-2014 6"
  • 7. What our clients demand? o Easy-deployment o Easy-administra3on o Read/write-performance o EasyRtoRlearn-query-language-o Integra3on-with-BI-Tools o Join-opera3ons o Support-for-streaming-sources o Integra3on-with-other-data-stores o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 7"
  • 8. What our clients demand? ! Easy%deployment% ! Easy%administra0on% ! Read/write%performance% ! Easy6to6learn%query%language% o Integra3on-with-BI-Tools o Join-opera3ons o Support-for-streaming-sources o Integra3on-with-other-data-stores o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 8"
  • 9. What our clients demand? ! Easy"deployment" ! Easy"administra8on" ! Read/write"performance" ! Easy>to>learn"query"language" ! Integra3on-with-BI-Tools ! Join-opera3ons ! Support-for-streaming-sources ! Integra3on-with-other-data-stores ! Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 9"
  • 10. Crossdata o A-new-technology-that: • Is-not-limited-by-the-underlying-datastore-capabili3es • Leverages-Spark-to-perform-nonRna3vely-supported-opera3ons • Supports-batch-and-streaming-queries • Supports-mul3ple-clusters-and-technologies #CassandraSummit-2014 10"
  • 12. Connecting to the outside world o Crossdata-defines-an-IConnector-extension-interface o User-can-easily-add-new-connectors-to-support • Different-datastores • Different-processing-engines • Different-versions o Where-each-connector-defines-its-capabili3es #CassandraSummit-2014 12" Our planner will choose the best connector for each query
  • 13. Query execution #CassandraSummit-2014 13" Parsing" Valida8on" Planning" Execu8on" C*" Connector1" Connector2" Connector3" Our planner will choose the best connector for each query
  • 14. Multi-cluster support o Stra3o-Crossdata-offers-the-possibility-of-accessing-a-single-catalog- across-a-set-of-datastores.- • Mul3ple-clusters-can-coexist-to-op3mize-plaSorm-performance " E.g.,-produc3on-cluster,-test-cluster,-writeRop3mized-cluster,- readRop3mized-cluster,-etc.- • A-table-is-saved-in-a-unique-datastore #CassandraSummit-2014 14"
  • 15. Logical and physical mapping SELECT&*&FROM&app.users;& Users"table" Test"table" old_users"table" #CassandraSummit-2014 15" App"catalog" C*"produc8on" C*"development" Other"datastores"
  • 17. Metadata in the era of Schemaless NoSQL datastores o Some-datastores-are-schemaless-but-our-applica3ons-are-not!- • Flexible-schemas-vs-Schemaless • Crossdata-provides-a-Metadata-manager-that-stores-schemas- for-any-datasource " Remember-ODBC-and-those-BI-tools " 1010010101010 1010110101010 1111010001111 ?" 001000" #CassandraSummit-2014 17"
  • 18. Metadata management #CassandraSummit-2014 18" Connector" C*"produc8on" Metadata"Store" Infinispan" Metadata"Manager" 2% Updated"metadata" informa8on"is" maintained"among" Crossdata"servers" using"Infinispan" If"the"connector"does" not"support"metadata" opera8ons"those"are" skipped" 1% 2%
  • 20. Managing streaming sources o Nowadays-use-cases-expect-some-type-of-streaming-datasource • Streaming-data-has-an-ephemeral-nature • In-Stra3o-Crossdata-we-defined-the-ephemeral-table-abstrac3on- #CassandraSummit-2014 to-work-with-streaming-sources-as-classical- RDBMS-tables 20" streaming" source" {schema:{col1:…},…}" col1:text" col2:int" col3:int" col4:text" Streaming_query0" …" Streaming_queryn"
  • 21. Streaming queries o Streaming-queries-are-infinite-by-defini3on • A-3me-window-is-defined-to-create-a-batch-like-view-of-the-rows- ingested-by-the-system-in-that-period • The-user-launches-queries-specifying-a-processing-3me-window " Crossdata-provides-methods-to-list-and-stop-running-streaming- #CassandraSummit-2014 queries 21"
  • 22. Streaming queries: windows syntax #CassandraSummit-2014 22" SELECT fieldGroup,avg(Field2) FROM eph_table WITH WINDOW 5 minutes WHERE field1=100 AND field2>100 GROUP BY fieldGroup;
  • 23. Joining batch and streaming SELECT * FROM demo.temporal WITH WINDOW 10 secs INNER JOIN demo.users #CassandraSummit-2014 ON users.name = temporal.name; SELECT * FROM demo.temporal WITH WINDOW 10 secs " SELECT * FROM demo.users " INNER JOIN ON users.name = temporal.name " 23"
  • 25. Full text search with o Clients-request-the-ability-to-perform-full-text-searches o We-have-developed-an-integra3on-between-Lucene-and- Cassandra o C*-users-can-now-enjoy-all-Lucene-features: • Full-text-searches,-range-queries,-fuzzy-queries…. #CassandraSummit-2014 25" https://github.com/Stratio/stratio-cassandra
  • 26. Stratio Lucene 2i #CassandraSummit-2014 26" C*" node" C*" node" Lucene" index" C*" node" Lucene" index" C*" node" Lucene" index" C*" node" Lucene" index" Lucene" index"
  • 27. Full text search queries o With-Crossdata,-we-simplify: • The-crea3on-syntax- • The-query-syntax-using-the-match-operator #CassandraSummit-2014 27" CREATE&FULLTEXT&INDEX&ON&app.users(name,email);& SELECT&*&FROM&app.users&& where&email&MATCH&‘*@stratio.com’;&
  • 29. Why Spark? o Stra3o-Crossdata-uses-Spark-to-perform-nonRna3vely-supported-opera3ons o Spark-brings-several-benefits-over-Hadoop-o InRMemory-processing o RDD-abstrac3on o Simpler-API-o Increased-flexibility-(e.g.,-not-need-for-iden3ty-mapping) #CassandraSummit-2014 29"
  • 30. What about Spark SQL? o Different-approach-to-query-execu3on • We-only-use-Spark-when-it-speedups-queries " Na3ve-drivers-are-faster-for-simple-queries " Spark-SQL-has-limited-RDD-sources • Avoid-some-Spark-limita3ons • Several-batch-and-streaming-contexts-in-a-single-JVM-SPARKR2243 #CassandraSummit-2014 30"
  • 31. Query approach SparkSQL"approach" Crossdata"approach" #CassandraSummit-2014 SparkSQL" Spark" Cassandra" Spark" Na8ve"driver" Cassandra" 31" Stra8o"Crossdata"
  • 32. Our Cassandra-Spark integration o Project-started-in-June-2013 " With-the-objec3ve-of-providing-a-method-to-interact-with- Cassandra-from-Spark " Ini3al-approach-based-on-the-HadoopInputFormat-interface " Current-version-uses-the-na3ve-Datastax-Java-driver #CassandraSummit-2014 32" https://github.com/Stratio/stratio-deep
  • 33. Our Cassandra-Spark integration o Benchmark-in-process-comparing-our-solu3on-with-the- Datastax-Spark-driver • Results-highly-influenced-by-the-split-size • Ini3al-results-are-promising-for-Stra3o-Spark-Integra3on-using- Datastax-default-values • Group-by-–-up-to-40%-faster • Join-–-up-to-17%-faster • Stay-tuned-for-the-benchmark-publica3on! #CassandraSummit-2014 33"
  • 34. Spark vs Lucene 2i #CassandraSummit-2014 34" Time" Spark" Lucen"2i" Records/node"
  • 36. Stratio Crossdata ODBC o WellRknown-interface-standard-(for-BI-tools,-external-apps,-…) o We-have-implemented-for-Crossdata-using-Simba-SDK o ODBC-opens-the-full-poten3al-of-Stra3o-Crossdata-to-the-external- world o Currently-tested-with-Tableau,-Qlikview-and-MS-Excel #CassandraSummit-2014 36" One ODBC for all datastores!
  • 38. The future o Security o Query-op3mizer-and-smart-query-planner o Leverage-system-sta3s3cs o Support-for-UDFs o Become-an-Apache-project #CassandraSummit-2014 38" https://github.com/Stratio/stratio-meta
  • 39. We are looking for an Apache Champion #CassandraSummit-2014 39" Can"you" help"us?"
  • 40. A wish list for Cassandra o Ability-to-stop-running-queries o Interac3ve-users-are-unpredictable o Some-excep3on-paths-are-not-clear-or-defined-(e.g.,-secondary-indexes) o Distribute-some-of-the-opera3ons-currently-performed-on-the-coordinator • E.g.,-aggrega3ons-like-count(*) #CassandraSummit-2014 40"
  • 41. Stratio Crossdata An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea [email protected] [email protected] #CassandraSummit-201441"