SlideShare a Scribd company logo
Building a Real-Time
Data Processing Pipeline
Using Apache Kafka, Kafka Connect,
Elasticsearch, and Kibana
Paul Brebner
Instaclustr—Technology Evangelist
Instaclustr Sponsored Booth Presentation
30 September ApacheCon 2020
©Instaclustr Pty Limited, 2020
Blogs (54): www.instaclustr.com/paul-brebner/
Who Am I? What do I do?
1 year ago (ApacheCon Europe 2019)—Look it’s a light!
©Instaclustr Pty Limited, 2020
Who Is Instaclustr?
©Instaclustr Pty Limited, 2020
A complete ecosystem to support mission
critical applications.
Instaclustr Managed Platform
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
on the
Instaclustr
Managed
Platform
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
on the
Instaclustr
Managed
Platform
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
on the
Instaclustr
Managed
Platform
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
on the
Instaclustr
Managed
Platform
on multiple
cloud providers
This talk…
Focuses on three
recent additions
to our managed
platform:
• Kafka Connect
• Elasticsearch
• Kibana
©Instaclustr Pty Limited, 2020
• Technology Overview
• What’s the Story?
• Data sources
• Provisioning clusters
• Configuring Kafka source and sink connectors
• Elasticsearch mappings
• Kibana Visualizations
• Elasticsearch Ingest Pipeline
• Kibana Maps
• Handling failure
Overview
©Instaclustr Pty Limited, 2020
In general integration can be—complicated…
©Instaclustr Pty Limited, 2020
● Zero-code integration
● High availability
● Elastic scaling independent of Kafka
Source Connectors Sink Connectors
Kafka Connect cluster
Syslog
Kafka Cluster
and many more.. and many more..
Sources Sinks
Or—Easy, with Kafka Connect
What?
Why?
● Distributed solution to integrate Kafka with
other heterogeneous data sources/stores.
● Connectors (source or sink) handle
specifics of particular integrations
o Source Kafka
o Kafka Sink
©Instaclustr Pty Limited, 2020
Elasticsearch—scalable
search of indexed
documents
Kibana—visualization
Open Distro for
Elasticsearch—100% Apache
2.0 licensed
Documents
Indices
Managed Elasticsearch + Kibana
©Instaclustr Pty Limited, 2020
What’s The Story?
©Instaclustr Pty Limited, 2020
What’s The Story?
©Instaclustr Pty Limited, 2020
Kafka Summit—CDC
built Kafka COVID-19
pipeline in < 30 days
What’s The Story?
©Instaclustr Pty Limited, 2020
Instaclustr consultants built
an integration demo using
public climate change data
via REST connectors running
on docker
Kafka Summit—CDC
built Kafka COVID-19
pipeline in < 30 days
What’s The Story?
©Instaclustr Pty Limited, 2020
Instaclustr consultants built
an integration demo using
public climate change data
via REST connectors running
on docker
Idea: Use streaming REST
public data sources
AND deploy on Instaclustr
managed platform
Kafka Summit—CDC
built Kafka COVID-19
pipeline in < 30 days
What’s The Story?
©Instaclustr Pty Limited, 2020
Instaclustr consultants built
an integration demo using
public climate change data
via REST connectors running
on docker
Idea: Use streaming REST
public data sources
AND deploy on Instaclustr
managed platform
Look for public streaming
REST APIs with easy to use
JSON data format, complete
data, interesting domain,
not political or apocalyptic…
Impossible?
Kafka Summit—CDC
built Kafka COVID-19
pipeline in < 30 days
https://oceanservice.noaa.gov/
Success! Tides follow Lunar Day
USA Tidal Data
National Oceanic and Atmospheric Administration
©Instaclustr Pty Limited, 2020
Bonus, NOAA tidal map https://tidesandcurrents.noaa.gov/map/
©Instaclustr Pty Limited, 2020
Bonus, NOAA tidal map https://tidesandcurrents.noaa.gov/map/
What’s here?
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
API description https://api.tidesandcurrents.noaa.gov/api/prod/
©Instaclustr Pty Limited, 2020
REST Example
Specify station ID, data type and datum
(I used water level, mean sea level), latest data point, JSON
Call
https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8724580&
product=water_level&datum=msl&units=metric&time_zone=gmt&application=instaclustr&
format=json
Returns
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597",
"s":"0.005", "f":"1,0,0,0", "q":"p"}]}
©Instaclustr Pty Limited, 2020
REST call
JSON result
Let’s start the pipeline using this
REST API for data sources…
©Instaclustr Pty Limited, 2020
What Else Do We Need?
The Instaclustr Console
Provision Kafka and
Kafka Connect clusters
©Instaclustr Pty Limited, 2020
What Else Do We Need?
Select cloud
provider, region,
instance size and
number, security etc.
©Instaclustr Pty Limited, 2020
What Else Do We Need?
Tell Kafka connect
cluster which Kafka
cluster to use, then
provision
©Instaclustr Pty Limited, 2020
Your IP
Now we have a Kafka and Kafka Connect clusters
©Instaclustr Pty Limited, 2020
Next, find a REST connector, deploy to S3 bucket, tell connect cluster
which bucket, configure connector and run
REST source
connector
Tides Topic
REST call
JSON result
(Automatically created)
©Instaclustr Pty Limited, 2020
BYO connectors instructions
https://www.instaclustr.com/support/documentation/kafka-
connect/accessing-and-using-kafka-connect/updating-custom-
connectors/
curl https://connectorClusterIP:8083/connectors -k -u name:password -X POST -H 'Content-Type: application/json' -d '
{
"name": "source_rest_tide_1",
"config": {
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"org.apache.kafka.connect.storage.StringConverter",
"connector.class": "com.tm.kafka.connect.rest.RestSourceConnector",
"tasks.max": "1",
"rest.source.poll.interval.ms": "600000",
"rest.source.method": "GET",
"rest.source.url":
"https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8454000&product=water_level&datum=
msl&units=metric&time_zone=gmt&application=instaclustr&format=json",
"rest.source.headers": "Content-Type:application/json,Accept:application/json",
"rest.source.topic.selector": "com.tm.kafka.connect.rest.selector.SimpleTopicSelector",
"rest.source.destination.topics": "tides-topic"
}
}'
REST source connector configuration including connector
name, class, URL, topic
©Instaclustr Pty Limited, 2020
Polls every 10 minutes, writes result to Kafka topic, picked 5 sensors
to use, so 5 connector instances running.
Now have tidal data coming into the tides topic, what next?
REST source
connector
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
©Instaclustr Pty Limited, 2020
Next - Provision Elasticsearch+Kibana clusters
©Instaclustr Pty Limited, 2020
And configure the included Elasticsearch sink connector
to send data to Elasticsearch
REST source
connector
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
Elastic sink connector Tides Index
©Instaclustr Pty Limited, 2020
curl https://connectorClusterIP:8083/connectors -k -u name:password -X POST -H 'Content-Type: application/json' -d '
{
"name" : "elastic-sink-tides",
"config" :
{
"connector.class" : "com.datamountaineer.streamreactor.connect.elastic7.ElasticSinkConnector",
"tasks.max" : 3,
"topics" : "tides",
"connect.elastic.hosts" : ”ip",
"connect.elastic.port" : 9201,
"connect.elastic.kcql" : "INSERT INTO tides-index SELECT * FROM tides-topic",
"connect.elastic.use.http.username" : ”elasticName",
"connect.elastic.use.http.password" : ”elasticPassword"
}
}'
Configure sink connector name, class, index and topic.
The index is created with default mappings if it doesn’t already exist.
©Instaclustr Pty Limited, 2020
REST source
connector
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
Elastic sink connector Tides Index
Great! It’s All Working!? Sort Of!
Tide data arriving in Tides Index!
But, in default index mappings, everything is a String.
To graph them as time series by name need a custom mapping.
©Instaclustr Pty Limited, 2020
{"metadata": {
"id":”String",
"name":”String",
"lat":”String”,
"lon":”String"},
"data":[{
"t":”String",
"v":”String"}]}
curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index" -X PUT -H 'Content-Type: application/json' -d'
{
"mappings" : {
"properties" : {
"data" : {
"properties" : {
"t" : { "type" : "date",
"format" : "yyyy-MM-dd HH:mm"
},
"v" : { "type" : "double" },
"f" : { "type" : "text" },
"q" : { "type" : "text" },
"s" : { "type" : "text" }
}
},
"metadata" : {
"properties" : {
"id" : { "type" : "text" },
"lat" : { "type" : "text" },
"long" : { "type" : "text" },
"name" : { "type" : ”keyword" } }}}} }'
Custom mapping “t” is a date, “v” is a double, and “name” is a keyword.
©Instaclustr Pty Limited, 2020
• Every time you
• Change an Elasticsearch index mapping, you have to
• Delete the index
• Index all the data again
• But where does the data come from?
• Two options:
• Using a Kafka sink connector the data is already in the
Kafka topic, so just replay it, or,
• Use Elasticsearch reindex operation
• The hard part is over, now…
Reindexing!
©Instaclustr Pty Limited, 2020
Start Kibana With A Single Click
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
Visualization Steps
1: Index Pattern (to get data from Elasticsearch)
Settings -> Index Patterns -> Create Index Pattern -> Define ->
Configure with “t” as timefilter field
2. Create Visualization (to create a graph type)
Visualizations -> Create Visualization -> New Visualization ->
Line -> Choose Source = pattern from 1
3. Configure Graph Settings (to display data correctly)
Select time range, select aggregation for y-axis = average ->
data.v -> select Buckets -> Split series metadata.name -> X-axis
-> Data Histogram = data.t
©Instaclustr Pty Limited, 2020
Time (x axis) vs. average (over 30m) tide level (relative to
average level) in meters for the 5 sample stations
©Instaclustr Pty Limited, 2020
Showing Lunar Day (24 hours 50 minutes)
Lunar Day (24h 50m)
©Instaclustr Pty Limited, 2020
Tidalrange
Showing Tidal Range (high tide – low tide)
©Instaclustr Pty Limited, 2020
By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon:
Revealing Hidden Tidal Energy, Public Domain
Tide range varies depending on moon, sun, local geography, and weather!
©Instaclustr Pty Limited, 2020
By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon:
Revealing Hidden Tidal Energy, Public Domain
Neah Bay is near here
©Instaclustr Pty Limited, 2020
By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon:
Revealing Hidden Tidal Energy, Public Domain
Australia’s Biggest Tide is here
©Instaclustr Pty Limited, 2020
Tides of over 11 meters are forced through two narrow passes
creating the popular tourist attraction known as the Horizontal
Waterfalls in the Kimberley's Talbot Bay.
Next, a map to show the sensor locations to understand tidal ranges
(Photo by Richard Costin)
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
But, there are no geo-points in the data!
Mapping Steps
1. Add geo-point field to index mapping
2. Create Elasticsearch ingest pipeline to construct new field
3. Add as default ingest pipeline to index
Problem:
Elasticsearch doesn’t
recognize separate lat
lon fields as geo-points
Solution:
Add an Elasticsearch
ingest pipeline to pre-
process documents
before they are
indexed
(Need to reindex again)
©Instaclustr Pty Limited, 2020
curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index" -X PUT -H 'Content-Type: application/json' -d'
{
"mappings" : {
"properties" : {
"data" : {
"properties" : {
"t" : { "type" : "date",
"format" : "yyyy-MM-dd HH:mm"
},
"v" : { "type" : "double" },
"f" : { "type" : "text" },
"q" : { "type" : "text" },
"s" : { "type" : "text" }
}
},
"metadata" : {
"properties" : {
"id" : { "type" : "text" },
"lat" : { "type" : "text" },
"long" : { "type" : "text" },
"location" : { "type" : "geo_point" },
"name" : { "type" : ”keyword" } }}}} }'
1. Add a new “location” field with a geo_point data type to the mapping and index
©Instaclustr Pty Limited, 2020
curl -u elasticName:elasticPassword ”elasticURL:9201/ _ingest/pipeline/locationPipe" -X PUT -H 'Content-Type:
application/json' -d'
{
"description" : ”construct geo-point String field",
"processors" : [
{
"set" : {
"field": "metadata.location",
"value": "{{metadata.lat}},{{metadata.lon}}"
}
}
]
}
'
2. Create new ingest pipeline to construct new location geo-point
String from existing lat lon fields
©Instaclustr Pty Limited, 2020
3. Add locationPipe as default pipeline to the index
curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index/_settings?pretty" -X PUT -H 'Content-Type:
application/json' -d'
{
"index" : {
"default_pipeline" : ”locationPipe"
}
}
'
©Instaclustr Pty Limited, 2020
REST source
connector
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
Elastic sink connector Tides Index
Now we have a pipeline transforming the raw data and adding
geo-point location data in Elasticsearch
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081”,
”location”: “24.5508,-81.8081”},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
Mapping Visualization Steps
1. Create Visualization
Visualizations -> Create visualization -> New Coordinate Map
-> Select index patterns -> Visualization with default map
2. Configure Graph Settings (to display data correctly)
Select Metrics -> Aggregation (min) -> Field -> data.v -> Buckets -> Geo
coordinates -> Geohash -> Field -> metadata.location
Reuse existing
index pattern
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
Map showing sensor locations and min values over last week
Add your own custom Web Map Service (WMS) layers
URL https://services.nationalmap.gov/arcgis/services/USGSNAIPPlus/MapServer/WMSServer
Layers 1,2,3,5,6,7,9,10,11,13,14,15,17,18,19,21,22,23,25,26,27,29,30,31,32
©Instaclustr Pty Limited, 2020
REST source
connector
Tides Topic
REST call
JSON result
{"error": {"message":"No
data was found. This
product may not be
offered at this station
at the requested
time."}}
Elastic sink connector Tides Index
What can go wrong? REST call can return error message, but doesn’t treat it as an
error so it’s sent to Tides Topic.
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
REST source
connector
Tides Topic
REST call
JSON result
{"error": {"message":"No
data was found. This
product may not be
offered at this station
at the requested
time."}}
Elastic sink connector Tides Index
Elastic sink connector tries to read the error message and fails to FAILED state.
Exceptions viewable in the Kafka connect logs topic.
Exceptions viewable in the Kafka connect logs topic.
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
X
FAILED
X
Connect logs topic
REST source
connector
Tides Topic
REST call
JSON result
{"error": {"message":"No
data was found. This
product may not be
offered at this station
at the requested
time."}}
Elastic sink connector Tides Index
Current workaround is to monitor and regularly restart failed connectors.
Exceptions viewable in the Kafka connect logs topic.
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
FAILED?
RUNNING
Restart!
X
REST source
connector
Tides Topic
REST call
JSON result
{"error": {"message":"No
data was found. This
product may not be
offered at this station
at the requested
time."}}
Elastic sink connector Tides Index
Better solution - if connectors support KIP-298 “Error Handling in Connect” (not all do)
then configure to ignore input errors.
Errors sent to ”dead letter” topic.
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
Ignore
Dead letter topic
• Instaclustr consultants, Kafka and Elasticsearch dev teams ,
graphic design and marketing teams
• Zeke, Mussa, Michael, Hendra, Rob, Harvey, Jill, Gina and
more!
• Try us out! Build the same or your own pipeline with our
free trial at Instaclustr.com
Thanks to…
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
www.instaclustr.com
info@instaclustr.com
@instaclustr
THANK
YOU!
Ad

More Related Content

What's hot (20)

GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processing
confluent
 
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsCloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Neil Avery
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Supply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache KafkaSupply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache Kafka
Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Kai Wähner
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel Industry
Kai Wähner
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Kai Wähner
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data ArchitectureRediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
confluent
 
Apache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT WorldApache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT World
confluent
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
Guido Schmutz
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
confluent
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update Conference
Eldert Grootenboer
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kai Wähner
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
confluent
 
GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processing
confluent
 
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsCloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Neil Avery
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Supply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache KafkaSupply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache Kafka
Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Kai Wähner
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel Industry
Kai Wähner
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Kai Wähner
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data ArchitectureRediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
confluent
 
Apache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT WorldApache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT World
confluent
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
Guido Schmutz
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
confluent
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update Conference
Eldert Grootenboer
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kai Wähner
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
confluent
 

Similar to Building a real-time data processing pipeline using Apache Kafka, Kafka Connect, Elasticsearch and Kibana (20)

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
confluent
 
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Guido Schmutz
 
Cloud Platform for IoT
Cloud Platform for IoTCloud Platform for IoT
Cloud Platform for IoT
Naoto Umemori
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowSpinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
PaulBrebner2
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
All Things Open
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE
 
Guillotina: The Asyncio REST Resource API
Guillotina: The Asyncio REST Resource APIGuillotina: The Asyncio REST Resource API
Guillotina: The Asyncio REST Resource API
Nathan Van Gheem
 
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Confluent-Ably-AWS-ID-2023 - GSlide.pptxConfluent-Ably-AWS-ID-2023 - GSlide.pptx
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Ahmed791434
 
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan BocutiuMQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
landoop
 
Web of things introduction
Web of things introductionWeb of things introduction
Web of things introduction
承翰 蔡
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
Mesosphere Inc.
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
david-read
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data Analytics
Cisco Canada
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
confluent
 
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Guido Schmutz
 
Cloud Platform for IoT
Cloud Platform for IoTCloud Platform for IoT
Cloud Platform for IoT
Naoto Umemori
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowSpinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
PaulBrebner2
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
All Things Open
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE
 
Guillotina: The Asyncio REST Resource API
Guillotina: The Asyncio REST Resource APIGuillotina: The Asyncio REST Resource API
Guillotina: The Asyncio REST Resource API
Nathan Van Gheem
 
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Confluent-Ably-AWS-ID-2023 - GSlide.pptxConfluent-Ably-AWS-ID-2023 - GSlide.pptx
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Ahmed791434
 
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan BocutiuMQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
landoop
 
Web of things introduction
Web of things introductionWeb of things introduction
Web of things introduction
承翰 蔡
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
Mesosphere Inc.
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
david-read
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data Analytics
Cisco Canada
 
Ad

More from Paul Brebner (20)

Streaming More For Less With Apache Kafka Tiered Storage
Streaming More For Less With Apache Kafka Tiered StorageStreaming More For Less With Apache Kafka Tiered Storage
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
30 Of My Favourite Open Source Technologies In 30 Minutes
30 Of My Favourite Open Source Technologies In 30 Minutes30 Of My Favourite Open Source Technologies In 30 Minutes
30 Of My Favourite Open Source Technologies In 30 Minutes
Paul Brebner
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Architecting Applications With Multiple Open Source Big Data Technologies
Architecting Applications With Multiple Open Source Big Data TechnologiesArchitecting Applications With Multiple Open Source Big Data Technologies
Architecting Applications With Multiple Open Source Big Data Technologies
Paul Brebner
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
Paul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
Paul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Paul Brebner
 
Streaming More For Less With Apache Kafka Tiered Storage
Streaming More For Less With Apache Kafka Tiered StorageStreaming More For Less With Apache Kafka Tiered Storage
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
30 Of My Favourite Open Source Technologies In 30 Minutes
30 Of My Favourite Open Source Technologies In 30 Minutes30 Of My Favourite Open Source Technologies In 30 Minutes
30 Of My Favourite Open Source Technologies In 30 Minutes
Paul Brebner
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Architecting Applications With Multiple Open Source Big Data Technologies
Architecting Applications With Multiple Open Source Big Data TechnologiesArchitecting Applications With Multiple Open Source Big Data Technologies
Architecting Applications With Multiple Open Source Big Data Technologies
Paul Brebner
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
Paul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
Paul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Paul Brebner
 
Ad

Recently uploaded (20)

Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 

Building a real-time data processing pipeline using Apache Kafka, Kafka Connect, Elasticsearch and Kibana

  • 1. Building a Real-Time Data Processing Pipeline Using Apache Kafka, Kafka Connect, Elasticsearch, and Kibana Paul Brebner Instaclustr—Technology Evangelist Instaclustr Sponsored Booth Presentation 30 September ApacheCon 2020 ©Instaclustr Pty Limited, 2020
  • 2. Blogs (54): www.instaclustr.com/paul-brebner/ Who Am I? What do I do? 1 year ago (ApacheCon Europe 2019)—Look it’s a light! ©Instaclustr Pty Limited, 2020
  • 4. A complete ecosystem to support mission critical applications. Instaclustr Managed Platform ©Instaclustr Pty Limited, 2020
  • 10. Open Source Big Data Technologies on the Instaclustr Managed Platform ©Instaclustr Pty Limited, 2020
  • 11. ©Instaclustr Pty Limited, 2020 Open Source Big Data Technologies on the Instaclustr Managed Platform
  • 12. ©Instaclustr Pty Limited, 2020 Open Source Big Data Technologies on the Instaclustr Managed Platform
  • 13. ©Instaclustr Pty Limited, 2020 Open Source Big Data Technologies on the Instaclustr Managed Platform on multiple cloud providers
  • 14. This talk… Focuses on three recent additions to our managed platform: • Kafka Connect • Elasticsearch • Kibana ©Instaclustr Pty Limited, 2020
  • 15. • Technology Overview • What’s the Story? • Data sources • Provisioning clusters • Configuring Kafka source and sink connectors • Elasticsearch mappings • Kibana Visualizations • Elasticsearch Ingest Pipeline • Kibana Maps • Handling failure Overview ©Instaclustr Pty Limited, 2020
  • 16. In general integration can be—complicated… ©Instaclustr Pty Limited, 2020
  • 17. ● Zero-code integration ● High availability ● Elastic scaling independent of Kafka Source Connectors Sink Connectors Kafka Connect cluster Syslog Kafka Cluster and many more.. and many more.. Sources Sinks Or—Easy, with Kafka Connect What? Why? ● Distributed solution to integrate Kafka with other heterogeneous data sources/stores. ● Connectors (source or sink) handle specifics of particular integrations o Source Kafka o Kafka Sink ©Instaclustr Pty Limited, 2020
  • 18. Elasticsearch—scalable search of indexed documents Kibana—visualization Open Distro for Elasticsearch—100% Apache 2.0 licensed Documents Indices Managed Elasticsearch + Kibana ©Instaclustr Pty Limited, 2020
  • 20. What’s The Story? ©Instaclustr Pty Limited, 2020 Kafka Summit—CDC built Kafka COVID-19 pipeline in < 30 days
  • 21. What’s The Story? ©Instaclustr Pty Limited, 2020 Instaclustr consultants built an integration demo using public climate change data via REST connectors running on docker Kafka Summit—CDC built Kafka COVID-19 pipeline in < 30 days
  • 22. What’s The Story? ©Instaclustr Pty Limited, 2020 Instaclustr consultants built an integration demo using public climate change data via REST connectors running on docker Idea: Use streaming REST public data sources AND deploy on Instaclustr managed platform Kafka Summit—CDC built Kafka COVID-19 pipeline in < 30 days
  • 23. What’s The Story? ©Instaclustr Pty Limited, 2020 Instaclustr consultants built an integration demo using public climate change data via REST connectors running on docker Idea: Use streaming REST public data sources AND deploy on Instaclustr managed platform Look for public streaming REST APIs with easy to use JSON data format, complete data, interesting domain, not political or apocalyptic… Impossible? Kafka Summit—CDC built Kafka COVID-19 pipeline in < 30 days
  • 24. https://oceanservice.noaa.gov/ Success! Tides follow Lunar Day USA Tidal Data National Oceanic and Atmospheric Administration ©Instaclustr Pty Limited, 2020
  • 25. Bonus, NOAA tidal map https://tidesandcurrents.noaa.gov/map/ ©Instaclustr Pty Limited, 2020
  • 26. Bonus, NOAA tidal map https://tidesandcurrents.noaa.gov/map/ What’s here? ©Instaclustr Pty Limited, 2020
  • 30. REST Example Specify station ID, data type and datum (I used water level, mean sea level), latest data point, JSON Call https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8724580& product=water_level&datum=msl&units=metric&time_zone=gmt&application=instaclustr& format=json Returns {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597", "s":"0.005", "f":"1,0,0,0", "q":"p"}]} ©Instaclustr Pty Limited, 2020
  • 31. REST call JSON result Let’s start the pipeline using this REST API for data sources… ©Instaclustr Pty Limited, 2020
  • 32. What Else Do We Need? The Instaclustr Console Provision Kafka and Kafka Connect clusters ©Instaclustr Pty Limited, 2020
  • 33. What Else Do We Need? Select cloud provider, region, instance size and number, security etc. ©Instaclustr Pty Limited, 2020
  • 34. What Else Do We Need? Tell Kafka connect cluster which Kafka cluster to use, then provision ©Instaclustr Pty Limited, 2020 Your IP
  • 35. Now we have a Kafka and Kafka Connect clusters ©Instaclustr Pty Limited, 2020
  • 36. Next, find a REST connector, deploy to S3 bucket, tell connect cluster which bucket, configure connector and run REST source connector Tides Topic REST call JSON result (Automatically created) ©Instaclustr Pty Limited, 2020 BYO connectors instructions https://www.instaclustr.com/support/documentation/kafka- connect/accessing-and-using-kafka-connect/updating-custom- connectors/
  • 37. curl https://connectorClusterIP:8083/connectors -k -u name:password -X POST -H 'Content-Type: application/json' -d ' { "name": "source_rest_tide_1", "config": { "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.storage.StringConverter", "connector.class": "com.tm.kafka.connect.rest.RestSourceConnector", "tasks.max": "1", "rest.source.poll.interval.ms": "600000", "rest.source.method": "GET", "rest.source.url": "https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8454000&product=water_level&datum= msl&units=metric&time_zone=gmt&application=instaclustr&format=json", "rest.source.headers": "Content-Type:application/json,Accept:application/json", "rest.source.topic.selector": "com.tm.kafka.connect.rest.selector.SimpleTopicSelector", "rest.source.destination.topics": "tides-topic" } }' REST source connector configuration including connector name, class, URL, topic ©Instaclustr Pty Limited, 2020
  • 38. Polls every 10 minutes, writes result to Kafka topic, picked 5 sensors to use, so 5 connector instances running. Now have tidal data coming into the tides topic, what next? REST source connector Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} ©Instaclustr Pty Limited, 2020
  • 39. Next - Provision Elasticsearch+Kibana clusters ©Instaclustr Pty Limited, 2020
  • 40. And configure the included Elasticsearch sink connector to send data to Elasticsearch REST source connector Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elastic sink connector Tides Index ©Instaclustr Pty Limited, 2020
  • 41. curl https://connectorClusterIP:8083/connectors -k -u name:password -X POST -H 'Content-Type: application/json' -d ' { "name" : "elastic-sink-tides", "config" : { "connector.class" : "com.datamountaineer.streamreactor.connect.elastic7.ElasticSinkConnector", "tasks.max" : 3, "topics" : "tides", "connect.elastic.hosts" : ”ip", "connect.elastic.port" : 9201, "connect.elastic.kcql" : "INSERT INTO tides-index SELECT * FROM tides-topic", "connect.elastic.use.http.username" : ”elasticName", "connect.elastic.use.http.password" : ”elasticPassword" } }' Configure sink connector name, class, index and topic. The index is created with default mappings if it doesn’t already exist. ©Instaclustr Pty Limited, 2020
  • 42. REST source connector Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elastic sink connector Tides Index Great! It’s All Working!? Sort Of! Tide data arriving in Tides Index! But, in default index mappings, everything is a String. To graph them as time series by name need a custom mapping. ©Instaclustr Pty Limited, 2020 {"metadata": { "id":”String", "name":”String", "lat":”String”, "lon":”String"}, "data":[{ "t":”String", "v":”String"}]}
  • 43. curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index" -X PUT -H 'Content-Type: application/json' -d' { "mappings" : { "properties" : { "data" : { "properties" : { "t" : { "type" : "date", "format" : "yyyy-MM-dd HH:mm" }, "v" : { "type" : "double" }, "f" : { "type" : "text" }, "q" : { "type" : "text" }, "s" : { "type" : "text" } } }, "metadata" : { "properties" : { "id" : { "type" : "text" }, "lat" : { "type" : "text" }, "long" : { "type" : "text" }, "name" : { "type" : ”keyword" } }}}} }' Custom mapping “t” is a date, “v” is a double, and “name” is a keyword. ©Instaclustr Pty Limited, 2020
  • 44. • Every time you • Change an Elasticsearch index mapping, you have to • Delete the index • Index all the data again • But where does the data come from? • Two options: • Using a Kafka sink connector the data is already in the Kafka topic, so just replay it, or, • Use Elasticsearch reindex operation • The hard part is over, now… Reindexing! ©Instaclustr Pty Limited, 2020
  • 45. Start Kibana With A Single Click ©Instaclustr Pty Limited, 2020
  • 47. Visualization Steps 1: Index Pattern (to get data from Elasticsearch) Settings -> Index Patterns -> Create Index Pattern -> Define -> Configure with “t” as timefilter field 2. Create Visualization (to create a graph type) Visualizations -> Create Visualization -> New Visualization -> Line -> Choose Source = pattern from 1 3. Configure Graph Settings (to display data correctly) Select time range, select aggregation for y-axis = average -> data.v -> select Buckets -> Split series metadata.name -> X-axis -> Data Histogram = data.t ©Instaclustr Pty Limited, 2020
  • 48. Time (x axis) vs. average (over 30m) tide level (relative to average level) in meters for the 5 sample stations ©Instaclustr Pty Limited, 2020
  • 49. Showing Lunar Day (24 hours 50 minutes) Lunar Day (24h 50m) ©Instaclustr Pty Limited, 2020
  • 50. Tidalrange Showing Tidal Range (high tide – low tide) ©Instaclustr Pty Limited, 2020
  • 51. By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon: Revealing Hidden Tidal Energy, Public Domain Tide range varies depending on moon, sun, local geography, and weather! ©Instaclustr Pty Limited, 2020
  • 52. By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon: Revealing Hidden Tidal Energy, Public Domain Neah Bay is near here ©Instaclustr Pty Limited, 2020
  • 53. By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon: Revealing Hidden Tidal Energy, Public Domain Australia’s Biggest Tide is here ©Instaclustr Pty Limited, 2020
  • 54. Tides of over 11 meters are forced through two narrow passes creating the popular tourist attraction known as the Horizontal Waterfalls in the Kimberley's Talbot Bay. Next, a map to show the sensor locations to understand tidal ranges (Photo by Richard Costin) ©Instaclustr Pty Limited, 2020
  • 55. ©Instaclustr Pty Limited, 2020 But, there are no geo-points in the data!
  • 56. Mapping Steps 1. Add geo-point field to index mapping 2. Create Elasticsearch ingest pipeline to construct new field 3. Add as default ingest pipeline to index Problem: Elasticsearch doesn’t recognize separate lat lon fields as geo-points Solution: Add an Elasticsearch ingest pipeline to pre- process documents before they are indexed (Need to reindex again) ©Instaclustr Pty Limited, 2020
  • 57. curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index" -X PUT -H 'Content-Type: application/json' -d' { "mappings" : { "properties" : { "data" : { "properties" : { "t" : { "type" : "date", "format" : "yyyy-MM-dd HH:mm" }, "v" : { "type" : "double" }, "f" : { "type" : "text" }, "q" : { "type" : "text" }, "s" : { "type" : "text" } } }, "metadata" : { "properties" : { "id" : { "type" : "text" }, "lat" : { "type" : "text" }, "long" : { "type" : "text" }, "location" : { "type" : "geo_point" }, "name" : { "type" : ”keyword" } }}}} }' 1. Add a new “location” field with a geo_point data type to the mapping and index ©Instaclustr Pty Limited, 2020
  • 58. curl -u elasticName:elasticPassword ”elasticURL:9201/ _ingest/pipeline/locationPipe" -X PUT -H 'Content-Type: application/json' -d' { "description" : ”construct geo-point String field", "processors" : [ { "set" : { "field": "metadata.location", "value": "{{metadata.lat}},{{metadata.lon}}" } } ] } ' 2. Create new ingest pipeline to construct new location geo-point String from existing lat lon fields ©Instaclustr Pty Limited, 2020
  • 59. 3. Add locationPipe as default pipeline to the index curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index/_settings?pretty" -X PUT -H 'Content-Type: application/json' -d' { "index" : { "default_pipeline" : ”locationPipe" } } ' ©Instaclustr Pty Limited, 2020
  • 60. REST source connector Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elastic sink connector Tides Index Now we have a pipeline transforming the raw data and adding geo-point location data in Elasticsearch {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081”, ”location”: “24.5508,-81.8081”}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} LocationPipe ingestor ©Instaclustr Pty Limited, 2020
  • 61. Mapping Visualization Steps 1. Create Visualization Visualizations -> Create visualization -> New Coordinate Map -> Select index patterns -> Visualization with default map 2. Configure Graph Settings (to display data correctly) Select Metrics -> Aggregation (min) -> Field -> data.v -> Buckets -> Geo coordinates -> Geohash -> Field -> metadata.location Reuse existing index pattern ©Instaclustr Pty Limited, 2020
  • 62. ©Instaclustr Pty Limited, 2020 Map showing sensor locations and min values over last week
  • 63. Add your own custom Web Map Service (WMS) layers URL https://services.nationalmap.gov/arcgis/services/USGSNAIPPlus/MapServer/WMSServer Layers 1,2,3,5,6,7,9,10,11,13,14,15,17,18,19,21,22,23,25,26,27,29,30,31,32 ©Instaclustr Pty Limited, 2020
  • 64. REST source connector Tides Topic REST call JSON result {"error": {"message":"No data was found. This product may not be offered at this station at the requested time."}} Elastic sink connector Tides Index What can go wrong? REST call can return error message, but doesn’t treat it as an error so it’s sent to Tides Topic. LocationPipe ingestor ©Instaclustr Pty Limited, 2020
  • 65. REST source connector Tides Topic REST call JSON result {"error": {"message":"No data was found. This product may not be offered at this station at the requested time."}} Elastic sink connector Tides Index Elastic sink connector tries to read the error message and fails to FAILED state. Exceptions viewable in the Kafka connect logs topic. Exceptions viewable in the Kafka connect logs topic. LocationPipe ingestor ©Instaclustr Pty Limited, 2020 X FAILED X Connect logs topic
  • 66. REST source connector Tides Topic REST call JSON result {"error": {"message":"No data was found. This product may not be offered at this station at the requested time."}} Elastic sink connector Tides Index Current workaround is to monitor and regularly restart failed connectors. Exceptions viewable in the Kafka connect logs topic. LocationPipe ingestor ©Instaclustr Pty Limited, 2020 FAILED? RUNNING Restart! X
  • 67. REST source connector Tides Topic REST call JSON result {"error": {"message":"No data was found. This product may not be offered at this station at the requested time."}} Elastic sink connector Tides Index Better solution - if connectors support KIP-298 “Error Handling in Connect” (not all do) then configure to ignore input errors. Errors sent to ”dead letter” topic. LocationPipe ingestor ©Instaclustr Pty Limited, 2020 Ignore Dead letter topic
  • 68. • Instaclustr consultants, Kafka and Elasticsearch dev teams , graphic design and marketing teams • Zeke, Mussa, Michael, Hendra, Rob, Harvey, Jill, Gina and more! • Try us out! Build the same or your own pipeline with our free trial at Instaclustr.com Thanks to… ©Instaclustr Pty Limited, 2020