SlideShare a Scribd company logo
APACHE IGNITE AS A DATA PROCESSING HUB
ROMAN SHTYKH
CYBERAGENT, INC.
See all the presentations from the In-Memory Computing
Summit at http://imcsummit.org
INTRODUCTION
ABOUT ME
Roman Shtykh
 R&D Engineer at CyberAgent, Inc.
 Areas of focus
 Data streaming and NLP
 Committer on the Apache Ignite and MyBatis projects
 Judoka
 @rshtykh
CYBERAGENT, INC.
 Internet ads
 Games
 Media
 Investing
25%
13%
52%
3%
7%
Games
Media
Internet ads
Investing
Other
* As of Sep 2015
AMEBA SERVICES
・ Monthly visitors (DUB total):
6 billion*
・ Number of member users :
about 39 million*
CyberAgent, Inc.
Ameba Services
* As of Dec 2014
• Games
• Community services
• Content curation
• Other
AMEBA SERVICES
Ameba Pigg
CONTENTS
 Apache Ignite
 Feed your data
 Log Aggregation with Apache Flume
 Integration with Apache Ignite
 Streaming Data with Apache Kafka
 Data Pipeline with Kafka and Ignite: Example
APACHE IGNITE
 “High-performance, integrated and distributed in-memory platform for computing and transacting on
large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or
flash-based technologies.”
 High performance, unlimited scalability and resiliency
 High-performance transactions and fast analytics
 Hadoop Acceleration, Apache Spark
 Apache project
https://ignite.apache.org/
MAKING APACHE IGNITE A DATA PROCESSING HUB
 Question: How to feed data?
 A simple solution: Create a client node
MAKING APACHE IGNITE A DATA PROCESSING HUB
 Question: How to feed data?
 A simple solution: Create a client node
 Is it reliable?
 Does it scale?
 Ignite-only solution?
 Does it keep your operational costs low?
MAKING APACHE IGNITE A DATA PROCESSING HUB
 Question: How to feed data?
 A simple solution: Create a client node
 Is it reliable?
 Does it scale?
 Ignite-only solution?
 Does it keep your operational costs low?
LOG AGGREGATION
WITH APACHE
FLUME
LOG AGGREGATION WITH APACHE FLUME
 Flume
 “Distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log
data.”
 Scalable
 Flexible
 Robust and fault tolerant
 Declarative configuration
 Apache project
DATA FLOW IN FLUME
Source Sink
Agent
ChannelIncoming data
to another Agent
or Destination
DATA FLOW IN FLUME (REPLICATION/MULTIPLEXING)
Source
Sink
Agent
Channel
Incoming data
SinkChannelChannel Selector
DATA FLOW IN FLUME (RELIABILITY)
 No data is lost (configurable)
Source Sink
Agent
ChannelIncoming data
Source tx Sink tx
LOG TRANSFER AT AMEBA
Ameba
Service
Aggregato
r
Aggregato
r
Aggregat
or
Monitoring
Recommend
er System
Elastic
Search
Hadoop
Batch processing
HBase
Stream Processing
(Onix)
Stream Processing
(HBaseSink)
Ameba
Service
Ameba
Service
LOG TRANSFER AT AMEBA
 Web Hosts
 More than 1600
 Size
 5.0 TB/day (raw)
 Traffic at peak
 160Mbps (compressed)
IGNITE SINK
 Reads Flume events from a channel
 With a user-implemented pluggable transformer converts them into cacheable entries
 Adding it requires no modification to the existing architecture
FLUME ⇒ IGNITE (1)
Source
Ignite
Sink
Agent
ChannelIncoming data
new connection
FLUME ⇒ IGNITE (2)
Source
Ignite
Sink
Agent
ChannelIncoming data
Sink tx
start tx
FLUME ⇒ IGNITE (3)
Source
Ignite
Sink
Agent
ChannelIncoming data
Sink tx
take
event send events
ENABLING FLUME SINK
 Steps
1. Implement EventTransformer
 convert Flume events into cacheable entries (java.util.Map<K, V>)
2. Put transformer’s jar to ${FLUME_HOME}/plugins.d/ignite/lib
3. Put IgniteSink and Ignite core jar files to ${FLUME_HOME}/plugins.d/ignite/libext
4. Set up a Flume agent
 Sink setup
a1.sinks.k1.type = org.apache.ignite.stream.flume.IgniteSink
a1.sinks.k1.igniteCfg = /some-path/ignite.xml
a1.sinks.k1.cacheName = testCache
a1.sinks.k1.eventTransformer = my.company.MyEventTransformer
a1.sinks.k1.batchSize = 100
FLUME SINKS
 HDFS
 THRIFT
 AVRO
 HBASE
 ElasticSearch
 IRC
 IGNITE
APACHE FLUME & APACHE IGNITE
 If you do data aggregation with Flume
 Adding an Ignite cluster is as simple as writing a simple data transformer and deploying a new Flume agent
 If you store your data (and do computations) in Ignite
 Improving data injection becomes easy with Flume sink
 Combining Apache Flume and Ignite makes/keeps your data pipeline (both aggregation and processing)
 Scalable
 Reliable
 Highly-Performant
STREAMING DATA
WITH APACHE KAFKA
APACHE KAFKA
“Publish-subscribe messaging rethought as a distributed commit log”
 Low latency
 High Throughput
 Partitioned and Replicated
 Kafka is an essential component of any data pipeline today
http://kafka.apache.org/
APACHE KAFKA
 Messages are grouped in topics
 Each partition is a log
 Each partition is managed by a broker
(when replicated, one broker is the partition leader)
 Producers & consumers (consumer groups)
 Used for
 Log aggregation
 Activity tracking
 Monitoring
 Stream processing
http://kafka.apache.org/documentation.html
KAFKA CONNECT
 Designed for large scale stream data integration using Kafka
 Provides an abstraction from communication with your Kafka cluster
 Offset management
 Delivery semantics
 Fault tolerance
 Monitoring, etc.
 Worker (scalability & fault tolerance)
 Connector (task config)
 Task (thread)
 Standalone & Distributed execution models
http://www.confluent.io/blog/apache-kafka-0.9-is-released
INGESTING DATA STREAMS
 Two ways
 Kafka Streamer
 Sink Connector
SQL queries Distributed closures
Transactions
Connect
ETL
STREAMING VIA SINK CONNECTOR
 Configure your connector
 Configure Kafka Connect worker
 Start your connector
# connector
name=my-ignite-connector
connector.class=IgniteSinkConnector
tasks.max=2
topics=someTopic1,someTopic2
# cache
cacheName=myCache
cacheAllowOverwrite=true
igniteCfg=/some-path/ignite.xml
$ bin/connect-standalone.sh myconfig/connect-standalone.properties myconfig/ignite-connector.properties
STREAMING VIA SINK CONNECTOR
 Easy data pipeline
 Records from Kafka are written to Ignite grid via high-performance IgniteDataStreamer
 At-least-once delivery guarantee
 As of 1.6, start a new connector to write to a different cache
a b c d e
0 1 2 … Kafka offsets
a.key, a.val
b.key, b.val
…
a2 b2 c2 d2 e2
INGESTING DATA STREAMS
 Bi-directional streaming
SQL queries Distributed closures
Transactions
Connect
Events
Continuous queries
ConnectSin
k
Sourc
e
STREAMING BACK TO KAFKA
 Listening to cache events
 PUT
 READ
 REMOVED
 EXPIRED, etc.
 Remote filtering can be enabled
 Kafka Connect offsets are ignored
 Currently, no delivery guarantees
evt1
evt2
evt3
as records
ENABLING SOURCE CONNECTOR
 Configure your connector
 Define a remote filter if needed
cacheFilterCls=MyCacheEventFilter
 Make sure that event listening is enabled
on the server nodes
 Configure Kafka Connect worker
 Start your connector
#connector
name=ignite-src-connector
connector.class=org.apache.ignite.stream.kafka.connect.IgniteSourceConn
ector
tasks.max=2
#topics, events
topicNames=test
cacheEvts=put,removed
#cache
cacheName=myCache
igniteCfg=myconfig/ignite.xml
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.ignite.stream.kafka.connect.serialization.CacheEventCo
nverter
APACHE KAFKA & APACHE IGNITE
 If you do data streaming with Kafka
 Adding an Ignite cluster is as simple as writing a configuration file (and creating a filter if you need it for source)
 If you store your data (and do computations) in Ignite
 Improving data injection and listening for events on data becomes easy with Kafka Connectors
 Combining Apache Kafka and Ignite makes/keeps your data pipeline
 Scalable
 Reliable
 Highly-Performant
 Covers a wide range of ETL contexts
DATA PIPELINE WITH
KAFKA AND IGNITE
EXAMPLE
DATA PIPELINE WITH KAFKA AND IGNITE
 Requirements
 instant processing and analysis
 scalable and resilient to failures
 low latency
 high throughput
 flexibility
DATA PIPELINE WITH KAFKA AND IGNITE
 Filter and aggregate events
data Flume
filter/transform
data
slow down on heavy loads
more channels/layers
DATA PIPELINE WITH KAFKA AND IGNITE
data
filter
transfor
m
etc.
• Parsimonious resource use
• Replay enabled
• More operations on streams
• Flexibility
Other
source
s
DATA PIPELINE WITH KAFKA AND IGNITE
 Filter and aggregate events
 Store events
 Notify about updates on aggregates
data
filter
transfor
m
etc.
Connectors
DATA PIPELINE WITH KAFKA AND IGNITE
 Filter and aggregate events
 Store events
 Notify about updates on aggregates
data
filter
transfor
m
etc.
Connectors
DATA PIPELINE WITH KAFKA AND IGNITE
 Improving ads delivery
clicks
impressions
ads
Ads
delivery
Ads
recommender
storage/
computatio
n
Image
storage
data & computation
in one place
DATA PIPELINE WITH KAFKA AND IGNITE
 Improving ads delivery
 Better network utilization and reliability
clicks
impression
s
ads
Ads
delivery
Ads
recommende
r
storage/
computatio
n
Image
storag
e
Anomaly
detection
OTHER
INTEGRATIONS
OTHER COMPLETED INTEGRATIONS
 CAMEL
 MQTT
 STORM
 FLINK SINK
 TWITTER
THE END
Ad

More Related Content

What's hot (19)

Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
Lynn Langit
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
DataWorks Summit/Hadoop Summit
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovGridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
JAXLondon2014
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Spark Summit
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
In-Memory Computing Summit
 
Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at Scale
Elasticsearch
 
Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache Ambari
DataWorks Summit
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Andrei Savu
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Spark Summit
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
Shu-Jeng Hsieh
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Denis Magda
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articles
Karan Gulati
 
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioBursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Alluxio, Inc.
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
DataWorks Summit/Hadoop Summit
 
Cloud native data platform
Cloud native data platformCloud native data platform
Cloud native data platform
Li Gao
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
Lynn Langit
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
DataWorks Summit/Hadoop Summit
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovGridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
JAXLondon2014
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Spark Summit
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
In-Memory Computing Summit
 
Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at Scale
Elasticsearch
 
Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache Ambari
DataWorks Summit
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Andrei Savu
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Spark Summit
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Denis Magda
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articles
Karan Gulati
 
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioBursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Alluxio, Inc.
 
Cloud native data platform
Cloud native data platformCloud native data platform
Cloud native data platform
Li Gao
 

Viewers also liked (20)

August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Yahoo Developer Network
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
DataStax
 
Apache ignite Datagrid
Apache ignite DatagridApache ignite Datagrid
Apache ignite Datagrid
Surinder Mehra
 
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
In-Memory Computing Summit
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
Joseph Kuo
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
In-Memory Computing Summit
 
scale14x-bigtop-overview-roadmap
scale14x-bigtop-overview-roadmapscale14x-bigtop-overview-roadmap
scale14x-bigtop-overview-roadmap
Nate D'Amico
 
Processamento em Big Data
Processamento em Big DataProcessamento em Big Data
Processamento em Big Data
Luiz Henrique Zambom Santana
 
Petrolook Dataflow Diagram
Petrolook Dataflow DiagramPetrolook Dataflow Diagram
Petrolook Dataflow Diagram
mriddel
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 features
anand murari
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
In-Memory Computing Summit
 
10 Things About Spark
10 Things About Spark 10 Things About Spark
10 Things About Spark
Roger Brinkley
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
In-Memory Computing Summit
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
DataWorks Summit/Hadoop Summit
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Yahoo Developer Network
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
DataStax
 
Apache ignite Datagrid
Apache ignite DatagridApache ignite Datagrid
Apache ignite Datagrid
Surinder Mehra
 
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
In-Memory Computing Summit
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
Joseph Kuo
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
In-Memory Computing Summit
 
scale14x-bigtop-overview-roadmap
scale14x-bigtop-overview-roadmapscale14x-bigtop-overview-roadmap
scale14x-bigtop-overview-roadmap
Nate D'Amico
 
Petrolook Dataflow Diagram
Petrolook Dataflow DiagramPetrolook Dataflow Diagram
Petrolook Dataflow Diagram
mriddel
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 features
anand murari
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
In-Memory Computing Summit
 
10 Things About Spark
10 Things About Spark 10 Things About Spark
10 Things About Spark
Roger Brinkley
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
In-Memory Computing Summit
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
Ad

Similar to IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub (20)

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
Attunity
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
Nick Dearden
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
confluent
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming Platform
Paolo Castagna
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
Learn Apache Kafka Online | Comprehensive Kafka Course & Training
Learn Apache Kafka Online | Comprehensive Kafka Course & TrainingLearn Apache Kafka Online | Comprehensive Kafka Course & Training
Learn Apache Kafka Online | Comprehensive Kafka Course & Training
Accentfuture
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sureshraj43
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
Attunity
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
Nick Dearden
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
confluent
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming Platform
Paolo Castagna
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
Learn Apache Kafka Online | Comprehensive Kafka Course & Training
Learn Apache Kafka Online | Comprehensive Kafka Course & TrainingLearn Apache Kafka Online | Comprehensive Kafka Course & Training
Learn Apache Kafka Online | Comprehensive Kafka Course & Training
Accentfuture
 
Ad

More from In-Memory Computing Summit (19)

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
In-Memory Computing Summit
 
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
In-Memory Computing Summit
 
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
In-Memory Computing Summit
 
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
In-Memory Computing Summit
 
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory EasyIMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
In-Memory Computing Summit
 
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
In-Memory Computing Summit
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
In-Memory Computing Summit
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
In-Memory Computing Summit
 
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
In-Memory Computing Summit
 
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
In-Memory Computing Summit
 
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
In-Memory Computing Summit
 
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory EasyIMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
In-Memory Computing Summit
 
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
In-Memory Computing Summit
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
In-Memory Computing Summit
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
In-Memory Computing Summit
 

Recently uploaded (20)

What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 

IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub

  • 1. APACHE IGNITE AS A DATA PROCESSING HUB ROMAN SHTYKH CYBERAGENT, INC. See all the presentations from the In-Memory Computing Summit at http://imcsummit.org
  • 3. ABOUT ME Roman Shtykh  R&D Engineer at CyberAgent, Inc.  Areas of focus  Data streaming and NLP  Committer on the Apache Ignite and MyBatis projects  Judoka  @rshtykh
  • 4. CYBERAGENT, INC.  Internet ads  Games  Media  Investing 25% 13% 52% 3% 7% Games Media Internet ads Investing Other * As of Sep 2015
  • 5. AMEBA SERVICES ・ Monthly visitors (DUB total): 6 billion* ・ Number of member users : about 39 million* CyberAgent, Inc. Ameba Services * As of Dec 2014 • Games • Community services • Content curation • Other
  • 7. CONTENTS  Apache Ignite  Feed your data  Log Aggregation with Apache Flume  Integration with Apache Ignite  Streaming Data with Apache Kafka  Data Pipeline with Kafka and Ignite: Example
  • 8. APACHE IGNITE  “High-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash-based technologies.”  High performance, unlimited scalability and resiliency  High-performance transactions and fast analytics  Hadoop Acceleration, Apache Spark  Apache project https://ignite.apache.org/
  • 9. MAKING APACHE IGNITE A DATA PROCESSING HUB  Question: How to feed data?  A simple solution: Create a client node
  • 10. MAKING APACHE IGNITE A DATA PROCESSING HUB  Question: How to feed data?  A simple solution: Create a client node  Is it reliable?  Does it scale?  Ignite-only solution?  Does it keep your operational costs low?
  • 11. MAKING APACHE IGNITE A DATA PROCESSING HUB  Question: How to feed data?  A simple solution: Create a client node  Is it reliable?  Does it scale?  Ignite-only solution?  Does it keep your operational costs low?
  • 13. LOG AGGREGATION WITH APACHE FLUME  Flume  “Distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.”  Scalable  Flexible  Robust and fault tolerant  Declarative configuration  Apache project
  • 14. DATA FLOW IN FLUME Source Sink Agent ChannelIncoming data to another Agent or Destination
  • 15. DATA FLOW IN FLUME (REPLICATION/MULTIPLEXING) Source Sink Agent Channel Incoming data SinkChannelChannel Selector
  • 16. DATA FLOW IN FLUME (RELIABILITY)  No data is lost (configurable) Source Sink Agent ChannelIncoming data Source tx Sink tx
  • 17. LOG TRANSFER AT AMEBA Ameba Service Aggregato r Aggregato r Aggregat or Monitoring Recommend er System Elastic Search Hadoop Batch processing HBase Stream Processing (Onix) Stream Processing (HBaseSink) Ameba Service Ameba Service
  • 18. LOG TRANSFER AT AMEBA  Web Hosts  More than 1600  Size  5.0 TB/day (raw)  Traffic at peak  160Mbps (compressed)
  • 19. IGNITE SINK  Reads Flume events from a channel  With a user-implemented pluggable transformer converts them into cacheable entries  Adding it requires no modification to the existing architecture
  • 20. FLUME ⇒ IGNITE (1) Source Ignite Sink Agent ChannelIncoming data new connection
  • 21. FLUME ⇒ IGNITE (2) Source Ignite Sink Agent ChannelIncoming data Sink tx start tx
  • 22. FLUME ⇒ IGNITE (3) Source Ignite Sink Agent ChannelIncoming data Sink tx take event send events
  • 23. ENABLING FLUME SINK  Steps 1. Implement EventTransformer  convert Flume events into cacheable entries (java.util.Map<K, V>) 2. Put transformer’s jar to ${FLUME_HOME}/plugins.d/ignite/lib 3. Put IgniteSink and Ignite core jar files to ${FLUME_HOME}/plugins.d/ignite/libext 4. Set up a Flume agent  Sink setup a1.sinks.k1.type = org.apache.ignite.stream.flume.IgniteSink a1.sinks.k1.igniteCfg = /some-path/ignite.xml a1.sinks.k1.cacheName = testCache a1.sinks.k1.eventTransformer = my.company.MyEventTransformer a1.sinks.k1.batchSize = 100
  • 24. FLUME SINKS  HDFS  THRIFT  AVRO  HBASE  ElasticSearch  IRC  IGNITE
  • 25. APACHE FLUME & APACHE IGNITE  If you do data aggregation with Flume  Adding an Ignite cluster is as simple as writing a simple data transformer and deploying a new Flume agent  If you store your data (and do computations) in Ignite  Improving data injection becomes easy with Flume sink  Combining Apache Flume and Ignite makes/keeps your data pipeline (both aggregation and processing)  Scalable  Reliable  Highly-Performant
  • 27. APACHE KAFKA “Publish-subscribe messaging rethought as a distributed commit log”  Low latency  High Throughput  Partitioned and Replicated  Kafka is an essential component of any data pipeline today http://kafka.apache.org/
  • 28. APACHE KAFKA  Messages are grouped in topics  Each partition is a log  Each partition is managed by a broker (when replicated, one broker is the partition leader)  Producers & consumers (consumer groups)  Used for  Log aggregation  Activity tracking  Monitoring  Stream processing http://kafka.apache.org/documentation.html
  • 29. KAFKA CONNECT  Designed for large scale stream data integration using Kafka  Provides an abstraction from communication with your Kafka cluster  Offset management  Delivery semantics  Fault tolerance  Monitoring, etc.  Worker (scalability & fault tolerance)  Connector (task config)  Task (thread)  Standalone & Distributed execution models http://www.confluent.io/blog/apache-kafka-0.9-is-released
  • 30. INGESTING DATA STREAMS  Two ways  Kafka Streamer  Sink Connector SQL queries Distributed closures Transactions Connect ETL
  • 31. STREAMING VIA SINK CONNECTOR  Configure your connector  Configure Kafka Connect worker  Start your connector # connector name=my-ignite-connector connector.class=IgniteSinkConnector tasks.max=2 topics=someTopic1,someTopic2 # cache cacheName=myCache cacheAllowOverwrite=true igniteCfg=/some-path/ignite.xml $ bin/connect-standalone.sh myconfig/connect-standalone.properties myconfig/ignite-connector.properties
  • 32. STREAMING VIA SINK CONNECTOR  Easy data pipeline  Records from Kafka are written to Ignite grid via high-performance IgniteDataStreamer  At-least-once delivery guarantee  As of 1.6, start a new connector to write to a different cache a b c d e 0 1 2 … Kafka offsets a.key, a.val b.key, b.val … a2 b2 c2 d2 e2
  • 33. INGESTING DATA STREAMS  Bi-directional streaming SQL queries Distributed closures Transactions Connect Events Continuous queries ConnectSin k Sourc e
  • 34. STREAMING BACK TO KAFKA  Listening to cache events  PUT  READ  REMOVED  EXPIRED, etc.  Remote filtering can be enabled  Kafka Connect offsets are ignored  Currently, no delivery guarantees evt1 evt2 evt3 as records
  • 35. ENABLING SOURCE CONNECTOR  Configure your connector  Define a remote filter if needed cacheFilterCls=MyCacheEventFilter  Make sure that event listening is enabled on the server nodes  Configure Kafka Connect worker  Start your connector #connector name=ignite-src-connector connector.class=org.apache.ignite.stream.kafka.connect.IgniteSourceConn ector tasks.max=2 #topics, events topicNames=test cacheEvts=put,removed #cache cacheName=myCache igniteCfg=myconfig/ignite.xml key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.ignite.stream.kafka.connect.serialization.CacheEventCo nverter
  • 36. APACHE KAFKA & APACHE IGNITE  If you do data streaming with Kafka  Adding an Ignite cluster is as simple as writing a configuration file (and creating a filter if you need it for source)  If you store your data (and do computations) in Ignite  Improving data injection and listening for events on data becomes easy with Kafka Connectors  Combining Apache Kafka and Ignite makes/keeps your data pipeline  Scalable  Reliable  Highly-Performant  Covers a wide range of ETL contexts
  • 37. DATA PIPELINE WITH KAFKA AND IGNITE EXAMPLE
  • 38. DATA PIPELINE WITH KAFKA AND IGNITE  Requirements  instant processing and analysis  scalable and resilient to failures  low latency  high throughput  flexibility
  • 39. DATA PIPELINE WITH KAFKA AND IGNITE  Filter and aggregate events data Flume filter/transform data slow down on heavy loads more channels/layers
  • 40. DATA PIPELINE WITH KAFKA AND IGNITE data filter transfor m etc. • Parsimonious resource use • Replay enabled • More operations on streams • Flexibility Other source s
  • 41. DATA PIPELINE WITH KAFKA AND IGNITE  Filter and aggregate events  Store events  Notify about updates on aggregates data filter transfor m etc. Connectors
  • 42. DATA PIPELINE WITH KAFKA AND IGNITE  Filter and aggregate events  Store events  Notify about updates on aggregates data filter transfor m etc. Connectors
  • 43. DATA PIPELINE WITH KAFKA AND IGNITE  Improving ads delivery clicks impressions ads Ads delivery Ads recommender storage/ computatio n Image storage data & computation in one place
  • 44. DATA PIPELINE WITH KAFKA AND IGNITE  Improving ads delivery  Better network utilization and reliability clicks impression s ads Ads delivery Ads recommende r storage/ computatio n Image storag e Anomaly detection
  • 46. OTHER COMPLETED INTEGRATIONS  CAMEL  MQTT  STORM  FLINK SINK  TWITTER