SlideShare a Scribd company logo
End User Panel on Real-Time Data Analytics
Building Predictive Applications with
Real-Time Data Pipelines and Streamliner
Eric Frenkiel, CEO and Co-Founder, MemSQL
Going Real-Time is the Next Phase for Big Data
More
Devices
More
Interconnectivity
More
User Demand
…and companies are at risk of being left behind
MemSQL Architecture
St ream in g Da ta W areh o u se
Streaming
Integrated streaming
with Streamliner
Database
High volume transactions
for structured and
unstructured data
Data Warehouse
Fast, scalable
SQL for immediate
analytics
Applications and Technology Trends
Real-Time Analytics Risk-Management Personalization
Portfolio Tracking
Monitoring and
Detection
Internet of Things | Real-Time Data Pipelines | Operationalizing Apache Spark
Put Apache Spark in the fast lane.
Persist. Perform. Perfect.
Changing the Way the World Invests
Noah Zucker, Vice President – Tactical Engineering, Novus Partners
Scalable Portfolio Intelligence with MemSQL
 100+ Investment Managers, $2 Trillion AUM
 Research Platform: 10,000+ Institutions
 Founded 2007, Privately Held
We help investors discover their true
investment acumen and risk
About Novus
True Investment Acumen and Risk…at Scale
Top-Tier Client List
 24/7 ETL Handholding
 Overnight Failure =
Business Hours Slowdown
 Scala worker pool limited
by the database
 Non-trivial code changes
needed to shard and scale
Before MemSQL…
Today’s Portfolio Intelligence…Right Now
Before MemSQL:
With MemSQL:
90 Min.
2 Min.
Customer
Data
Persistent
StoreETL Analytics
(Scala)
First-Class JSON Support…Happy Developers
memsql> select * from tasks t where t.task::uid::%clientId = 7;
+---------+---------------------------------------------------------------+
| task_id | task |
+---------+---------------------------------------------------------------+
| 3 | {"uid":{"clientId":7,"id":1009,"which":"P"},"user":"noahlz"} |
+---------+---------------------------------------------------------------+
1 row in set (0.00 sec)
Salat
 Client team focuses on
service, not ETL
 Predictable application
performance
 Scala workers: 12  126
 Add servers to scale –
No code changes needed
With MemSQL…
http://www.novus.com
http://tech.novus.com
@NovusCode
Ian Hansen, Software Engineering Manager
Digital Ocean
ETL Tools for Small Teams
Problem: Business Intelligence Slows as We Grow
 Data lives in SQL
 Easy to ask new questions in SQL
 But… Business Intelligence tasks taking longer
 Database isn’t built for quick aggregations
Solution: Scale-out SQL Database
 SQL team stays powerful
 Quick to iterate with quick answers
 Prepare for the future!
Problem: Data isn’t in MemSQL
Plus
 You don’t have an engineer on
your team
 It’s hard to get an engineer’s time
 You’ve got a job to do…
(which is taking more and more
time)
Solution: ETL Using REPLACE INTO
 MySQL SQL flavor (available in MemSQL)
 Handles new rows and updates on rows
 Easy to write
• Query source database then replace into target database
 Many other scale-out SQL databases don’t have
equivalent
Problem: Now Load JSON Event Data
 ~300K events per day
 Many different types of JSON events
Solution: MemSQL Loader + JSON Type
 Only loads new files (or files
whose content has changed)
 Parallelizes the process
 Transformation script
simple: return id and raw json data
 SQL team unaffected by new
JSON events
./memsql-loader load /opt/events/**
--table events
--script=/opt/events-etl
--file-id-column file_id
--columns id,data
Problem: Processing Data on Select
 Need computed value in SQL query
 Computing the value slows down queries
 Computed value used on many queries
• e.g. domain from a URL string
Solution: Persistent Columns
 Pre-compute result and
save it on the row
 Automatically updated if
row changes
 No need to alter ETL
pipeline
ALTER TABLE events
ADD COLUMN (
referring_domain AS
substring_index(substring(data::$re
ferrer, (locate('//',
data::$referrer)) + 2), '/', 1)
PERSISTED varchar(255)
)
Solution: Persistent Columns
Use pre-computed value in select
memsql> select data, referring_domain from events limit 2;
+-------------------------------------+------------------+
| data | referring_domain |
+-------------------------------------+------------------+
| {"referrer":"http://example.com/b"} | example.com |
| {"referrer":"http://example.com/a"} | example.com |
+-------------------------------------+------------------+
Tools
 REPLACE INTO syntax
 JSON native type
 MemSQL Loader
 Persistent columns
 Now, MemSQL Streamliner
We Want More Data
We are Hiring
Mike DePrizio, Senior Architect, Akamai Technologies
Unlocking Revenue with In-Memory Technology
We are the leading provider of
cloud services for delivering,
optimizing and securing online
content and business applications
$1.96B
Revenue
1,300
Locations
5,000+
Customers
5,100+
Employees
CORPORATE STATS (2014):
OUR HISTORY:
Founded 1998 and rooted in MIT
technology—solving Internet
congestion with math not hardware
The Business of Billing
Billing domino effect
 Akamai  Customers  Sub-customers
Daily billing requires:
 Fast data delivery
 Accurate data
Old Model New Model
Generating a bill at end of month for
customer services
Generating a bill at the end of every
day for sub-customer services
Current Billing Data Management
Gather logs from 190,000+ servers in 1400 locations in 110
countries
 Multiple PBs/day aggregate/reduce into relevant billing data feed
 Typical data record: 3 key fields plus metrics
 Load resulting data record into our RDBMS system
Greatest Challenges
 Current system cannot handle expected throughput
 Difficult to quickly scale up existing environments
 New model will generate 10x+ data
Deploying MemSQL
Application
Daily Sub-customer billing
Problem
Existing RDMS pipeline loads were maxed out at 150-
300K upserts/second, could not keep up with projected
size of new billing model
Results
MemSQL cluster performs at 1.9
million upserts/second, allowing
transition from monthly to daily billing
Billing Data resource
usage statistics
INSERT... ON
DUPLICATE KEY
UPDATE...
(1.9 million/sec)
Billing Application
• Compute sub-customer
charges daily
• Roll up sub-customer usage by
customer/cloud provider
• More sophisticated platform
offers customers better
service, partners new business
opportunities
Results Speak for Themselves
 2M upserts/second on AWS EC2
instances
 Scalability on commodity hardware
 Meeting our billing windows
 Unlocking revenue
 Adapt PoC for real-world
situations
 Continue scaling linearly
 Optimize results with small
cluster deployment
What Next?
Eric Frenkiel, MemSQL CEO and co-founder
September 30, 2015 • New York, NY
Introducing MemSQL Streamliner
 One click deployment of
integrated Apache Spark
 Put Spark in the Fast Lane
• GUI pipeline setup
• Multiple data pipelines
• Real-time transformation
 Eliminates batch ETL
 Open source on GitHub
Introducing the MemSQL Streamliner
Simple Deployment Process
Application
1. Deploy MemSQL
Cluster
In-Memory | Distributed | Relational
Application
2. Deploy Spark
Cluster
Application
Kafka Connects to Each Node
Cluster
Application
Streamliner Architecture
First of many integrated Apache Spark solutions
Other
Real-Time Data
Sources Application
Apache Spark
Future Solution
Future Machine
Learning Solution
STREAMLINER
Streamliner ETL Detail
Other
Real-Time Data
Sources Application
Apache Spark
Future Solution
Future Machine
Learning Solution
STREAMLINER
STREAMLINER
Custom
Future Extractor
JSON
Custom
Future Transformer
Extract Transform Load
Building Predictive Applications
Streamliner
Input
User Jar
SAS Generated PMML
Industrial
Equipment
Sensor Data
S1 S2 S3 P1 P2 P3
Scoring Real-Time Data
with Predictive Models
Sensor 1 Predictive Model 1
Streamliner Benefits
 Build end-to-end data pipelines in minutes
 Reduce data latency from days or hours to ZERO
 Support thousands of concurrent users running real-time
queries
 Give users immediate access to fresh data via innovative
applications
THE GAME
See MemSQL Streamliner in Action at Booth #831
Ad

More Related Content

What's hot (19)

Scylla Summit 2018: Worry-free ingestion - flow-control of writes in Scylla
Scylla Summit 2018: Worry-free ingestion - flow-control of writes in ScyllaScylla Summit 2018: Worry-free ingestion - flow-control of writes in Scylla
Scylla Summit 2018: Worry-free ingestion - flow-control of writes in Scylla
ScyllaDB
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
confluent
 
Elk meetup boston - logz.io
Elk meetup boston -  logz.ioElk meetup boston -  logz.io
Elk meetup boston - logz.io
tomerlevy9
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
confluent
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
Rose Toomey
 
Easily create dashboards to manage your databases with OVH
Easily create dashboards to manage your databases with OVH Easily create dashboards to manage your databases with OVH
Easily create dashboards to manage your databases with OVH
OVHcloud
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
ScyllaDB
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Till Rohrmann
 
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Spark Summit
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
confluent
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
Deploying and Operating KSQL
Deploying and Operating KSQLDeploying and Operating KSQL
Deploying and Operating KSQL
confluent
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
Xin Wang
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos Christine
Spark Summit
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
Akka Streams - From Zero to Kafka
Akka Streams - From Zero to KafkaAkka Streams - From Zero to Kafka
Akka Streams - From Zero to Kafka
Mark Harrison
 
Prometheus on AWS
Prometheus on AWSPrometheus on AWS
Prometheus on AWS
Mitsuhiro Tanda
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
Knoldus Inc.
 
Exploring KSQL Patterns
Exploring KSQL PatternsExploring KSQL Patterns
Exploring KSQL Patterns
confluent
 
Scylla Summit 2018: Worry-free ingestion - flow-control of writes in Scylla
Scylla Summit 2018: Worry-free ingestion - flow-control of writes in ScyllaScylla Summit 2018: Worry-free ingestion - flow-control of writes in Scylla
Scylla Summit 2018: Worry-free ingestion - flow-control of writes in Scylla
ScyllaDB
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
confluent
 
Elk meetup boston - logz.io
Elk meetup boston -  logz.ioElk meetup boston -  logz.io
Elk meetup boston - logz.io
tomerlevy9
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
confluent
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
Rose Toomey
 
Easily create dashboards to manage your databases with OVH
Easily create dashboards to manage your databases with OVH Easily create dashboards to manage your databases with OVH
Easily create dashboards to manage your databases with OVH
OVHcloud
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
ScyllaDB
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Till Rohrmann
 
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Spark Summit
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
confluent
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
Deploying and Operating KSQL
Deploying and Operating KSQLDeploying and Operating KSQL
Deploying and Operating KSQL
confluent
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
Xin Wang
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos Christine
Spark Summit
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
Akka Streams - From Zero to Kafka
Akka Streams - From Zero to KafkaAkka Streams - From Zero to Kafka
Akka Streams - From Zero to Kafka
Mark Harrison
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
Knoldus Inc.
 
Exploring KSQL Patterns
Exploring KSQL PatternsExploring KSQL Patterns
Exploring KSQL Patterns
confluent
 

Similar to Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics (20)

Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
SingleStore
 
Whats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 CwWhats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 Cw
Eduardo Castro
 
Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2
Eduardo Castro
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
Lightbend
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
SingleStore
 
Microsoft SQL Server - SQL Server Migrations Presentation
Microsoft SQL Server - SQL Server Migrations PresentationMicrosoft SQL Server - SQL Server Migrations Presentation
Microsoft SQL Server - SQL Server Migrations Presentation
Microsoft Private Cloud
 
BI 2008 Simple
BI 2008 SimpleBI 2008 Simple
BI 2008 Simple
llangit
 
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
RightScale
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Saptak Sen
 
Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2
Eduardo Castro
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
SingleStore
 
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginnersSQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
Tobias Koprowski
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
Lew Tucker
 
Hp Polyserve Database Utility For Sql Server Consolidation
Hp Polyserve Database Utility For Sql Server ConsolidationHp Polyserve Database Utility For Sql Server Consolidation
Hp Polyserve Database Utility For Sql Server Consolidation
CB UTBlog
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
政宏 张
 
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Melissa Kolodziej
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Adaptive Server Farms for the Data Center
Adaptive Server Farms for the Data CenterAdaptive Server Farms for the Data Center
Adaptive Server Farms for the Data Center
elliando dias
 
CV Chandrajit Samanta
CV Chandrajit SamantaCV Chandrajit Samanta
CV Chandrajit Samanta
Chandrajit Samanta ([email protected])
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
LarryZaman
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
SingleStore
 
Whats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 CwWhats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 Cw
Eduardo Castro
 
Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2
Eduardo Castro
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
Lightbend
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
SingleStore
 
Microsoft SQL Server - SQL Server Migrations Presentation
Microsoft SQL Server - SQL Server Migrations PresentationMicrosoft SQL Server - SQL Server Migrations Presentation
Microsoft SQL Server - SQL Server Migrations Presentation
Microsoft Private Cloud
 
BI 2008 Simple
BI 2008 SimpleBI 2008 Simple
BI 2008 Simple
llangit
 
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
RightScale
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Saptak Sen
 
Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2
Eduardo Castro
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
SingleStore
 
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginnersSQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
Tobias Koprowski
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
Lew Tucker
 
Hp Polyserve Database Utility For Sql Server Consolidation
Hp Polyserve Database Utility For Sql Server ConsolidationHp Polyserve Database Utility For Sql Server Consolidation
Hp Polyserve Database Utility For Sql Server Consolidation
CB UTBlog
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
政宏 张
 
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Melissa Kolodziej
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Adaptive Server Farms for the Data Center
Adaptive Server Farms for the Data CenterAdaptive Server Farms for the Data Center
Adaptive Server Farms for the Data Center
elliando dias
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
LarryZaman
 
Ad

More from SingleStore (20)

Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
SingleStore
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
SingleStore
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
SingleStore
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
SingleStore
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
SingleStore
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
SingleStore
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
SingleStore
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
SingleStore
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
SingleStore
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
SingleStore
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
SingleStore
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
SingleStore
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
SingleStore
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
SingleStore
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
SingleStore
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
SingleStore
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore
 
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
SingleStore
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
SingleStore
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
SingleStore
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
SingleStore
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
SingleStore
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore
 
Introduction to MemSQL
Introduction to MemSQLIntroduction to MemSQL
Introduction to MemSQL
SingleStore
 
An Engineering Approach to Database Evaluations
An Engineering Approach to Database EvaluationsAn Engineering Approach to Database Evaluations
An Engineering Approach to Database Evaluations
SingleStore
 
Building a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed ArchitectureBuilding a Fault Tolerant Distributed Architecture
Building a Fault Tolerant Distributed Architecture
SingleStore
 
Stream Processing with Pipelines and Stored Procedures
Stream Processing with Pipelines  and Stored ProceduresStream Processing with Pipelines  and Stored Procedures
Stream Processing with Pipelines and Stored Procedures
SingleStore
 
Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017Curriculum Associates Strata NYC 2017
Curriculum Associates Strata NYC 2017
SingleStore
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
SingleStore
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
SingleStore
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
SingleStore
 
Teaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AITeaching Databases to Learn in the World of AI
Teaching Databases to Learn in the World of AI
SingleStore
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudGartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
SingleStore
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming DataGartner Catalyst 2017: Image Recognition on Streaming Data
Gartner Catalyst 2017: Image Recognition on Streaming Data
SingleStore
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore
 
Ad

Recently uploaded (20)

Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 

Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics

  • 1. End User Panel on Real-Time Data Analytics Building Predictive Applications with Real-Time Data Pipelines and Streamliner Eric Frenkiel, CEO and Co-Founder, MemSQL
  • 2. Going Real-Time is the Next Phase for Big Data More Devices More Interconnectivity More User Demand …and companies are at risk of being left behind
  • 3. MemSQL Architecture St ream in g Da ta W areh o u se Streaming Integrated streaming with Streamliner Database High volume transactions for structured and unstructured data Data Warehouse Fast, scalable SQL for immediate analytics
  • 4. Applications and Technology Trends Real-Time Analytics Risk-Management Personalization Portfolio Tracking Monitoring and Detection Internet of Things | Real-Time Data Pipelines | Operationalizing Apache Spark
  • 5. Put Apache Spark in the fast lane. Persist. Perform. Perfect.
  • 6. Changing the Way the World Invests Noah Zucker, Vice President – Tactical Engineering, Novus Partners Scalable Portfolio Intelligence with MemSQL
  • 7.  100+ Investment Managers, $2 Trillion AUM  Research Platform: 10,000+ Institutions  Founded 2007, Privately Held We help investors discover their true investment acumen and risk About Novus
  • 8. True Investment Acumen and Risk…at Scale
  • 10.  24/7 ETL Handholding  Overnight Failure = Business Hours Slowdown  Scala worker pool limited by the database  Non-trivial code changes needed to shard and scale Before MemSQL…
  • 11. Today’s Portfolio Intelligence…Right Now Before MemSQL: With MemSQL: 90 Min. 2 Min. Customer Data Persistent StoreETL Analytics (Scala)
  • 12. First-Class JSON Support…Happy Developers memsql> select * from tasks t where t.task::uid::%clientId = 7; +---------+---------------------------------------------------------------+ | task_id | task | +---------+---------------------------------------------------------------+ | 3 | {"uid":{"clientId":7,"id":1009,"which":"P"},"user":"noahlz"} | +---------+---------------------------------------------------------------+ 1 row in set (0.00 sec) Salat
  • 13.  Client team focuses on service, not ETL  Predictable application performance  Scala workers: 12  126  Add servers to scale – No code changes needed With MemSQL…
  • 15. Ian Hansen, Software Engineering Manager Digital Ocean ETL Tools for Small Teams
  • 16. Problem: Business Intelligence Slows as We Grow  Data lives in SQL  Easy to ask new questions in SQL  But… Business Intelligence tasks taking longer  Database isn’t built for quick aggregations
  • 17. Solution: Scale-out SQL Database  SQL team stays powerful  Quick to iterate with quick answers  Prepare for the future!
  • 18. Problem: Data isn’t in MemSQL Plus  You don’t have an engineer on your team  It’s hard to get an engineer’s time  You’ve got a job to do… (which is taking more and more time)
  • 19. Solution: ETL Using REPLACE INTO  MySQL SQL flavor (available in MemSQL)  Handles new rows and updates on rows  Easy to write • Query source database then replace into target database  Many other scale-out SQL databases don’t have equivalent
  • 20. Problem: Now Load JSON Event Data  ~300K events per day  Many different types of JSON events
  • 21. Solution: MemSQL Loader + JSON Type  Only loads new files (or files whose content has changed)  Parallelizes the process  Transformation script simple: return id and raw json data  SQL team unaffected by new JSON events ./memsql-loader load /opt/events/** --table events --script=/opt/events-etl --file-id-column file_id --columns id,data
  • 22. Problem: Processing Data on Select  Need computed value in SQL query  Computing the value slows down queries  Computed value used on many queries • e.g. domain from a URL string
  • 23. Solution: Persistent Columns  Pre-compute result and save it on the row  Automatically updated if row changes  No need to alter ETL pipeline ALTER TABLE events ADD COLUMN ( referring_domain AS substring_index(substring(data::$re ferrer, (locate('//', data::$referrer)) + 2), '/', 1) PERSISTED varchar(255) )
  • 24. Solution: Persistent Columns Use pre-computed value in select memsql> select data, referring_domain from events limit 2; +-------------------------------------+------------------+ | data | referring_domain | +-------------------------------------+------------------+ | {"referrer":"http://example.com/b"} | example.com | | {"referrer":"http://example.com/a"} | example.com | +-------------------------------------+------------------+
  • 25. Tools  REPLACE INTO syntax  JSON native type  MemSQL Loader  Persistent columns  Now, MemSQL Streamliner
  • 26. We Want More Data
  • 28. Mike DePrizio, Senior Architect, Akamai Technologies Unlocking Revenue with In-Memory Technology
  • 29. We are the leading provider of cloud services for delivering, optimizing and securing online content and business applications $1.96B Revenue 1,300 Locations 5,000+ Customers 5,100+ Employees CORPORATE STATS (2014): OUR HISTORY: Founded 1998 and rooted in MIT technology—solving Internet congestion with math not hardware
  • 30. The Business of Billing Billing domino effect  Akamai  Customers  Sub-customers Daily billing requires:  Fast data delivery  Accurate data Old Model New Model Generating a bill at end of month for customer services Generating a bill at the end of every day for sub-customer services
  • 31. Current Billing Data Management Gather logs from 190,000+ servers in 1400 locations in 110 countries  Multiple PBs/day aggregate/reduce into relevant billing data feed  Typical data record: 3 key fields plus metrics  Load resulting data record into our RDBMS system
  • 32. Greatest Challenges  Current system cannot handle expected throughput  Difficult to quickly scale up existing environments  New model will generate 10x+ data
  • 33. Deploying MemSQL Application Daily Sub-customer billing Problem Existing RDMS pipeline loads were maxed out at 150- 300K upserts/second, could not keep up with projected size of new billing model Results MemSQL cluster performs at 1.9 million upserts/second, allowing transition from monthly to daily billing Billing Data resource usage statistics INSERT... ON DUPLICATE KEY UPDATE... (1.9 million/sec) Billing Application • Compute sub-customer charges daily • Roll up sub-customer usage by customer/cloud provider • More sophisticated platform offers customers better service, partners new business opportunities
  • 34. Results Speak for Themselves  2M upserts/second on AWS EC2 instances  Scalability on commodity hardware  Meeting our billing windows  Unlocking revenue
  • 35.  Adapt PoC for real-world situations  Continue scaling linearly  Optimize results with small cluster deployment What Next?
  • 36. Eric Frenkiel, MemSQL CEO and co-founder September 30, 2015 • New York, NY Introducing MemSQL Streamliner
  • 37.  One click deployment of integrated Apache Spark  Put Spark in the Fast Lane • GUI pipeline setup • Multiple data pipelines • Real-time transformation  Eliminates batch ETL  Open source on GitHub Introducing the MemSQL Streamliner
  • 39. 1. Deploy MemSQL Cluster In-Memory | Distributed | Relational Application
  • 41. Kafka Connects to Each Node Cluster Application
  • 42. Streamliner Architecture First of many integrated Apache Spark solutions Other Real-Time Data Sources Application Apache Spark Future Solution Future Machine Learning Solution STREAMLINER
  • 43. Streamliner ETL Detail Other Real-Time Data Sources Application Apache Spark Future Solution Future Machine Learning Solution STREAMLINER STREAMLINER Custom Future Extractor JSON Custom Future Transformer Extract Transform Load
  • 44. Building Predictive Applications Streamliner Input User Jar SAS Generated PMML Industrial Equipment Sensor Data S1 S2 S3 P1 P2 P3 Scoring Real-Time Data with Predictive Models Sensor 1 Predictive Model 1
  • 45. Streamliner Benefits  Build end-to-end data pipelines in minutes  Reduce data latency from days or hours to ZERO  Support thousands of concurrent users running real-time queries  Give users immediate access to fresh data via innovative applications
  • 46. THE GAME See MemSQL Streamliner in Action at Booth #831

Editor's Notes