SlideShare a Scribd company logo
Advanced data science algorithms applied to scalable stream processing by David Piris and Ignacio García
Advanced data science algorithms
applied to scalable stream processing
David Piris Valenzuela
Nacho García Fernández
Ignacio.g.Fernandez@treelogic.com
@0xNacho
david.piris@treelogic.com
@davidpiris
3
About Treelogic
 R&D intensive company with the mission of adapting technological knowledge to
improve quality standards in our daily life
 8 ongoing H2020 projects (coordinating 3 of them)
 8 ongoing FP7 projects (coordinating 5 of them)
 Focused on providing Big Data Analytics in all the world
 Internal organization
Research lines
 Big Data
 Computer vision
 Data science
 Social Media Analysis
 Security
ICT solutions
 Security & Safety
 Justice
 Health
 Transport
 Financial Services
 ICT tailored solutions
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
6
Why we need Big Data
7
Why we need Big Data
 Public and private sector companies store a huge mount of data
 Countries with huge databases store data from
 Population
 Medical records
 Taxes
 Online transactions
 Mobile transactions
 Social Networks
In a single day, tweets generates 12 TB!!
8
Why we need Big Data
2.5 Exabytes are produced every day!!!
 530.000.000 million songs
 150.000.000 iPhones
 5 million laptops
 90 years of HD Video
9
Why we need Big Data
How can we manage all data?
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
11
Big Data: Solutions
First we can manage all historical repository, and retrieve some value from
data stored
 Batch architecture
 MapReduce
 Hadoop Ecosystem
12
Big Data: Solutions
13
Big Data: Solutions
Batch processing with Hadoop takes a lot of time and the need to process
ingested data and display results in a shortest way possible brings new
architecture and tools
 Lambda architecture
 Spark (memory vs disk)
14
Big Data: Solutions
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
16
Big data: real-time processing
 Faster results
 Accurate results
 Less expense
 Please consumers
17
Big data: real-time processing
As previously said, we need to extract and visualize information in near real
time…
18
Big data: real-time processing
 Flink as engine process
 Stream processing
 Windowing with events time semantics
 Streaming and batch processing
19
Big data: real-time processing
Kappa architecture
 Batch layer removed
 Only one set of code needs to be maintained
20
Big data: real-time processing
 No need to use batch layer
 Avoid use disk in engine process (latency)
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE WANT
6. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
22
Big data: available tools
23
Incremental algorithms
 BI & BA people always want to made some common operations to retrieve
value and visualize data
 We have operational tools in a relational or batch environment
 How we can obtain average for a data stream that is changing every
second, minutes or even milliseconds…?
 Common average operation is indicated for historical repository, data input
without any changes in the moment we start the process to obtain it.
 Do we have tools to make it possible in a real time deployment?
24
Incremental algorithms
Answer is NO!
25
Incremental algorithms
Flink gives us the chance to operate with a new window processing concept.
We can decide and configure "small time pieces", and make some
operations or manipulate data in that time space.
26
Incremental algorithms
With Flink and windowing…
27
Incremental algorithms
 These algorithms consume streams of data and are able to update their
results in a parallel manner without the need of saving the processed data
 Using checkpoints in windowing, allows us to store result from previous
window process
28
Incremental algorithms
Our analytics & visualization solution implemented in a real time architecture
29
Incremental algorithms
If you are a BI or BA professional...we care about you!
30
Incremental algorithms
 Currently, we have implemented:
 Average
 Mode
 Variance
 Correlation
 Covariance
 Min
 Max
31
Incremental algorithms
 Currently we are working on:
 Median
32
Incremental algorithms
 In roadmap…
 Standard deviation
 Order by
 Discretization
 Contains
 Split
 Validate range values
 Set default value to specific output
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
34
Apache Flink vs Apache Spark
 Pure streams for all workloads
 Optimizer
 Low latency, high throughput
 Global, session, time and count based
window criteria
 Provides automatic memory management
 Micro-batches for all workloads
 No job optimizer
 High latency as compared to Flink
 Time-based window criteria
 Configurable memory management. Spark
1.6+ has move towards automating
memory management
35
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
37
Incremental algorithms in Flink
38
Incremental algorithms in Flink
 Default behavior in Apache Flink:
 With incremental algorithms:
39
Incremental algorithms in Flink
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
41
Apache Kudu
 Provides a combination of fast inserts / updates and efficient columnar
scans to enable real-time analytic workloads
 It is a new complements to HDFS and HBase
 Designed for use cases that require fast analytics on fast data
 Low query latency
 V1.0.1 was released on October 11, 2016
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
43
PROTEUS: a steel making scenario
 Steel industry is a key sector for the European community.
 PROTEUS was introduced last year at Big Data Spain by Treelogic *
 Hot Strip mills (sometimes) produces steel with defects
 Predict coil parameters (thickness, width, flatness) using real-time and historical data
 Detecting defective coils in an early stage saves money. The production process can be
modified / stopped.
 Proposed architecture is being validated in this project
 7870 variables with a frequency of 500ms: data-in-motion
 700.000 registers for each variables. 500GB time series and flatness map: data-at-rest
* https://www.youtube.com/watch?v=EIH7HLyqhfE
44
PROTEUS: a steel-making scenario
 Steel industry is a key sector for the European community.
 PROTEUS was introduced last year at Big Data Spain by Treelogic *
 Hot Strip mills (sometimes) produces steel with defects
 Predict coil parameters (thickness, width, flatness) using real-time and historical data
 Detecting defective coils in an early stage saves money. The production process can be
modified / stopped.
 Proposed architecture is being validated in this project
 7870 variables with a frequency of 500ms: data-in-motion
 700.000 registers for each variables. 500GB time series and flatness map: data-at-rest
* https://www.youtube.com/watch?v=EIH7HLyqhfE
CONTENTS
1. WHY WE NEED BIG DATA
2. BIG DATA: SOLUTIONS
3. BIG DATA: REAL-TIME PROCESSING
4. INCREMENTAL ALGORITHMS
5. WHAT WE NEED
1. A stream processing engine
2. Online incremental algorithms
3. A distributed data storage system
4. A use case
5. A visualization layer
46
Websockets
 Websocket is a computer communication protocol providing full-duplex
communication channels over a single TCP connection.
 Extremely faster than HTTP
 Its API is standardized by the W3C
47
Apache Flink & Websockets
 Data sinks consume DataSets and are used to store or return them.
 Flink comes with a variety of built-in output formats that are encapsulated behind
operations on the DataSet:
 writeAsText()
 writeAsFormattedText()
 writeAsCsv()
 print()
 write()
 We’ve developed a WebsocketSink enabling Flink to send outputs to a given
websocket endpoint.
 Based on the javax-websocket-client-api 1.1 spec.
48
Incremental architecture: our approach
49
50
ProteicJS
https://github.com/proteus-h2020/proteic/
51
ProteicJS: Visualizations
52
ProteicJS: Researching on visualization
 Currently researching on new ways of visualizing data and ML models
53
ProteicJS & Apache Flink
54
How to get it all
https://github.com/proteus-h2020/proteus-docker
Advanced data science algorithms
applied to scalable stream processing
David Piris Valenzuela
Nacho García Fernández
Ignacio.g.Fernandez@treelogic.com
@0xNacho
david.piris@treelogic.com
@davidpiris
Ad

More Related Content

What's hot (20)

The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
Big Data Spain
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Big Data Spain
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
DataWorks Summit/Hadoop Summit
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
Trieu Nguyen
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Databricks
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
Databricks
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Big Data Spain
 
Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...
Robert Sanders
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
DataWorks Summit/Hadoop Summit
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Ali Hodroj
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
DataWorks Summit
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Ali Hodroj
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
Big Data Spain
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Big Data Spain
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
Trieu Nguyen
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Databricks
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
Databricks
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Big Data Spain
 
Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...
Robert Sanders
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
DataWorks Summit/Hadoop Summit
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Ali Hodroj
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
DataWorks Summit
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Ali Hodroj
 

Viewers also liked (11)

Big Migrations: Moving elephant herds by Carlos Izquierdo
Big Migrations: Moving elephant herds by Carlos IzquierdoBig Migrations: Moving elephant herds by Carlos Izquierdo
Big Migrations: Moving elephant herds by Carlos Izquierdo
Big Data Spain
 
Converging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyConverging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven Poutsy
Big Data Spain
 
Why Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoWhy Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén Casado
Big Data Spain
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
Big Data Spain
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Big Data Spain
 
Big data in 140 characters by Joe Rice
Big data in 140 characters by Joe RiceBig data in 140 characters by Joe Rice
Big data in 140 characters by Joe Rice
Big Data Spain
 
From data to AI with the Machine Learning Canvas by Louis Dorard Slides
From data to AI with the Machine Learning Canvas by Louis  Dorard SlidesFrom data to AI with the Machine Learning Canvas by Louis  Dorard Slides
From data to AI with the Machine Learning Canvas by Louis Dorard Slides
Big Data Spain
 
Assessing spatial accessibility to primary health care services in the Metrop...
Assessing spatial accessibility to primary health care services in the Metrop...Assessing spatial accessibility to primary health care services in the Metrop...
Assessing spatial accessibility to primary health care services in the Metrop...
Big Data Spain
 
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro BarberoFrom data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
Big Data Spain
 
GPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo MoliniGPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo Molini
Big Data Spain
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
Big Migrations: Moving elephant herds by Carlos Izquierdo
Big Migrations: Moving elephant herds by Carlos IzquierdoBig Migrations: Moving elephant herds by Carlos Izquierdo
Big Migrations: Moving elephant herds by Carlos Izquierdo
Big Data Spain
 
Converging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyConverging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven Poutsy
Big Data Spain
 
Why Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén CasadoWhy Apache Flink is better than Spark by Rubén Casado
Why Apache Flink is better than Spark by Rubén Casado
Big Data Spain
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
Big Data Spain
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Big Data Spain
 
Big data in 140 characters by Joe Rice
Big data in 140 characters by Joe RiceBig data in 140 characters by Joe Rice
Big data in 140 characters by Joe Rice
Big Data Spain
 
From data to AI with the Machine Learning Canvas by Louis Dorard Slides
From data to AI with the Machine Learning Canvas by Louis  Dorard SlidesFrom data to AI with the Machine Learning Canvas by Louis  Dorard Slides
From data to AI with the Machine Learning Canvas by Louis Dorard Slides
Big Data Spain
 
Assessing spatial accessibility to primary health care services in the Metrop...
Assessing spatial accessibility to primary health care services in the Metrop...Assessing spatial accessibility to primary health care services in the Metrop...
Assessing spatial accessibility to primary health care services in the Metrop...
Big Data Spain
 
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro BarberoFrom data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
Big Data Spain
 
GPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo MoliniGPU Accelerated Natural Language Processing by Guillermo Molini
GPU Accelerated Natural Language Processing by Guillermo Molini
Big Data Spain
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
Ad

Similar to Advanced data science algorithms applied to scalable stream processing by David Piris and Ignacio García (20)

Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Edwin Poot
 
Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8
Navaid Khan
 
Hitachi streaming data platform v8
Hitachi streaming data platform v8Hitachi streaming data platform v8
Hitachi streaming data platform v8
Navaid Khan
 
Hitachi Streaming Data Platform
Hitachi Streaming Data PlatformHitachi Streaming Data Platform
Hitachi Streaming Data Platform
Navaid Khan
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
gluent.
 
Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the Cloud
MapR Technologies
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
exponential-inc
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
Guido Schmutz
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
ModusOptimum
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
VMware Tanzu
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
Remas Ittahir
 
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
COIICV
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
Swiss Data Forum Swiss Data Forum
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
OVHcloud
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
Neo4j
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Matt Stubbs
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Edwin Poot
 
Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8
Navaid Khan
 
Hitachi streaming data platform v8
Hitachi streaming data platform v8Hitachi streaming data platform v8
Hitachi streaming data platform v8
Navaid Khan
 
Hitachi Streaming Data Platform
Hitachi Streaming Data PlatformHitachi Streaming Data Platform
Hitachi Streaming Data Platform
Navaid Khan
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
gluent.
 
Big Data Lessons from the Cloud
Big Data Lessons from the CloudBig Data Lessons from the Cloud
Big Data Lessons from the Cloud
MapR Technologies
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
exponential-inc
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
Guido Schmutz
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
ModusOptimum
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
VMware Tanzu
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
Remas Ittahir
 
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
Miguel Angel Perdiguero - Head of BIG data & analytics Atos Iberia - semanain...
COIICV
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
OVHcloud
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
Neo4j
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Matt Stubbs
 
Ad

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 

Recently uploaded (20)

SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 

Advanced data science algorithms applied to scalable stream processing by David Piris and Ignacio García

  • 2. Advanced data science algorithms applied to scalable stream processing David Piris Valenzuela Nacho García Fernández [email protected] @0xNacho [email protected] @davidpiris
  • 3. 3 About Treelogic  R&D intensive company with the mission of adapting technological knowledge to improve quality standards in our daily life  8 ongoing H2020 projects (coordinating 3 of them)  8 ongoing FP7 projects (coordinating 5 of them)  Focused on providing Big Data Analytics in all the world  Internal organization Research lines  Big Data  Computer vision  Data science  Social Media Analysis  Security ICT solutions  Security & Safety  Justice  Health  Transport  Financial Services  ICT tailored solutions
  • 4. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 5. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 6. 6 Why we need Big Data
  • 7. 7 Why we need Big Data  Public and private sector companies store a huge mount of data  Countries with huge databases store data from  Population  Medical records  Taxes  Online transactions  Mobile transactions  Social Networks In a single day, tweets generates 12 TB!!
  • 8. 8 Why we need Big Data 2.5 Exabytes are produced every day!!!  530.000.000 million songs  150.000.000 iPhones  5 million laptops  90 years of HD Video
  • 9. 9 Why we need Big Data How can we manage all data?
  • 10. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 11. 11 Big Data: Solutions First we can manage all historical repository, and retrieve some value from data stored  Batch architecture  MapReduce  Hadoop Ecosystem
  • 13. 13 Big Data: Solutions Batch processing with Hadoop takes a lot of time and the need to process ingested data and display results in a shortest way possible brings new architecture and tools  Lambda architecture  Spark (memory vs disk)
  • 15. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 16. 16 Big data: real-time processing  Faster results  Accurate results  Less expense  Please consumers
  • 17. 17 Big data: real-time processing As previously said, we need to extract and visualize information in near real time…
  • 18. 18 Big data: real-time processing  Flink as engine process  Stream processing  Windowing with events time semantics  Streaming and batch processing
  • 19. 19 Big data: real-time processing Kappa architecture  Batch layer removed  Only one set of code needs to be maintained
  • 20. 20 Big data: real-time processing  No need to use batch layer  Avoid use disk in engine process (latency)
  • 21. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE WANT 6. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 23. 23 Incremental algorithms  BI & BA people always want to made some common operations to retrieve value and visualize data  We have operational tools in a relational or batch environment  How we can obtain average for a data stream that is changing every second, minutes or even milliseconds…?  Common average operation is indicated for historical repository, data input without any changes in the moment we start the process to obtain it.  Do we have tools to make it possible in a real time deployment?
  • 25. 25 Incremental algorithms Flink gives us the chance to operate with a new window processing concept. We can decide and configure "small time pieces", and make some operations or manipulate data in that time space.
  • 27. 27 Incremental algorithms  These algorithms consume streams of data and are able to update their results in a parallel manner without the need of saving the processed data  Using checkpoints in windowing, allows us to store result from previous window process
  • 28. 28 Incremental algorithms Our analytics & visualization solution implemented in a real time architecture
  • 29. 29 Incremental algorithms If you are a BI or BA professional...we care about you!
  • 30. 30 Incremental algorithms  Currently, we have implemented:  Average  Mode  Variance  Correlation  Covariance  Min  Max
  • 31. 31 Incremental algorithms  Currently we are working on:  Median
  • 32. 32 Incremental algorithms  In roadmap…  Standard deviation  Order by  Discretization  Contains  Split  Validate range values  Set default value to specific output
  • 33. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 34. 34 Apache Flink vs Apache Spark  Pure streams for all workloads  Optimizer  Low latency, high throughput  Global, session, time and count based window criteria  Provides automatic memory management  Micro-batches for all workloads  No job optimizer  High latency as compared to Flink  Time-based window criteria  Configurable memory management. Spark 1.6+ has move towards automating memory management
  • 35. 35
  • 36. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 38. 38 Incremental algorithms in Flink  Default behavior in Apache Flink:  With incremental algorithms:
  • 40. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 41. 41 Apache Kudu  Provides a combination of fast inserts / updates and efficient columnar scans to enable real-time analytic workloads  It is a new complements to HDFS and HBase  Designed for use cases that require fast analytics on fast data  Low query latency  V1.0.1 was released on October 11, 2016
  • 42. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 43. 43 PROTEUS: a steel making scenario  Steel industry is a key sector for the European community.  PROTEUS was introduced last year at Big Data Spain by Treelogic *  Hot Strip mills (sometimes) produces steel with defects  Predict coil parameters (thickness, width, flatness) using real-time and historical data  Detecting defective coils in an early stage saves money. The production process can be modified / stopped.  Proposed architecture is being validated in this project  7870 variables with a frequency of 500ms: data-in-motion  700.000 registers for each variables. 500GB time series and flatness map: data-at-rest * https://www.youtube.com/watch?v=EIH7HLyqhfE
  • 44. 44 PROTEUS: a steel-making scenario  Steel industry is a key sector for the European community.  PROTEUS was introduced last year at Big Data Spain by Treelogic *  Hot Strip mills (sometimes) produces steel with defects  Predict coil parameters (thickness, width, flatness) using real-time and historical data  Detecting defective coils in an early stage saves money. The production process can be modified / stopped.  Proposed architecture is being validated in this project  7870 variables with a frequency of 500ms: data-in-motion  700.000 registers for each variables. 500GB time series and flatness map: data-at-rest * https://www.youtube.com/watch?v=EIH7HLyqhfE
  • 45. CONTENTS 1. WHY WE NEED BIG DATA 2. BIG DATA: SOLUTIONS 3. BIG DATA: REAL-TIME PROCESSING 4. INCREMENTAL ALGORITHMS 5. WHAT WE NEED 1. A stream processing engine 2. Online incremental algorithms 3. A distributed data storage system 4. A use case 5. A visualization layer
  • 46. 46 Websockets  Websocket is a computer communication protocol providing full-duplex communication channels over a single TCP connection.  Extremely faster than HTTP  Its API is standardized by the W3C
  • 47. 47 Apache Flink & Websockets  Data sinks consume DataSets and are used to store or return them.  Flink comes with a variety of built-in output formats that are encapsulated behind operations on the DataSet:  writeAsText()  writeAsFormattedText()  writeAsCsv()  print()  write()  We’ve developed a WebsocketSink enabling Flink to send outputs to a given websocket endpoint.  Based on the javax-websocket-client-api 1.1 spec.
  • 49. 49
  • 52. 52 ProteicJS: Researching on visualization  Currently researching on new ways of visualizing data and ML models
  • 54. 54 How to get it all https://github.com/proteus-h2020/proteus-docker
  • 55. Advanced data science algorithms applied to scalable stream processing David Piris Valenzuela Nacho García Fernández [email protected] @0xNacho [email protected] @davidpiris