SlideShare a Scribd company logo
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Part 3 of 3: Deep Dive
Getting started with streaming analytics
Javier Ramirez
AWS Developer Advocate
@supercoco9
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Agenda
Working with Avro and schemas
Time windows, session windows, and accumulators
Complex Event Processing
Streaming analytics with SQL
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Ingestion/in-stream storage: Apache Kafka
A distributed streaming platform
Concepts:
Producers
Topics
Brokers
Consumers
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Apache Avro
Apache Avro™ is a data serialization system.
Avro provides:
• Rich data structures.
• A compact, fast, binary data format.
• A container file, to store persistent data.
• Remote procedure call (RPC).
• Simple integration with dynamic languages. Code
generation is not required
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Confluent Schema Registry
https://docs.confluent.io/current/schema-registry/index.html
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Processing: Apache Flink
Stateful computation over Data Streams
Concepts:
Job Manager/Workers
Source
DataStream
Transforms/Operators
TableAPI/SQL
Sinks
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Different types of streaming windows
13:00 14:008:00 9:00 10:00 11:00 12:00
Processing
Time
Event Time
Processing
Time
11:0010:00 15:0014:0013:0012:00
11:0010:00 15:0014:0013:0012:00
Input
Output
Event Time
Processing
Time 11:0
0
10:0
0
15:0
0
14:0
0
13:0
0
12:0
0
11:0
0
10:0
0
15:0
0
14:0
0
13:0
0
12:0
0
Input
Output
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Apache Flink Windows
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
FlinkCEP
Complex Event Processing for Flink
FlinkCEP is the Complex Event Processing (CEP) library
implemented on top of Flink.
It allows you to detect event patterns in an endless stream
of events, giving you the opportunity to get hold of what’s
important in your data.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Kinesis Data Analytics for SQL
• Sub-second end to end processing latencies
• SQL steps can be chained together in serial or parallel
steps
• Build applications with one or hundreds of queries
• Pre-built functions include everything from sum and
count distinct to machine learning algorithms
• Aggregations run continuously using window operators
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Automatic schema discovery which
works for CSV and JSON data
Supports multiple event types,
arbitrary object nesting, single level
of array nesting
Connect to Streaming Data Sources
Easily connect to Kinesis Data streams and
Kinesis Data Firehose delivery streams
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Firehose
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Built-in AWS Lambda integration provides flexible pre-
processing ahead of SQL code for:
• Normalizing different event types
• Converting other data formats (AVRO, Protobuf, ZIP) to JSON and CSV
• Custom enrichment from database tables or API calls
Pre-process Data Streams Using AWS Lambda
AWS Lambda function
raw data
Amazon Kinesis Data Analytics application
transformed
data
SQL
code
source destination
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Fast, iterative development with SQL templates in console to get started
Interactive SQL Editor
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Window Types
• Sliding, tumbling, and custom windows
• Tumbling windows are fixed size and grouped keys do not overlap
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
• Add a SQL table to your streaming application from Amazon S3
• Periodically update the table by calling the update application API
Enrich your Data Stream using Amazon S3 Data
In-application stream
Amazon Kinesis Data Analytics application
SQL code joining
table and stream
streaming source destination
Amazon
S3
In-application table
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Getting Started
https://ci.apache.org/projects/flink/flink-docs-stable/
Apache Flink official docs
https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html
Getting started with Apache Kafka/Amazon MSK
https://aws.amazon.com/kinesis/
Amazon Kinesis Services for streaming data
https://aws.amazon.com/elasticsearch-service/
Amazon ElasticSearch Service
https://kafka.apache.org/documentation/
Apache Kafka official docs
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
ThanksJavier Ramirez
AWS Developer Advocate
@supercoco9
Ad

More Related Content

Similar to Getting started with streaming analytics: Deep Dive (7)

Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018
Brendan Bouffler
 
Building API Driven Microservices
Building API Driven MicroservicesBuilding API Driven Microservices
Building API Driven Microservices
Chris Munns
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
 
The serverless LAMP stack
The serverless LAMP stackThe serverless LAMP stack
The serverless LAMP stack
⛷️ Ben Smith
 
Re cap2018
Re cap2018Re cap2018
Re cap2018
Richard Harvey
 
AWS Outposts Update
AWS Outposts UpdateAWS Outposts Update
AWS Outposts Update
AWS Daily News
 
Serverless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless EventServerless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless Event
Boaz Ziniman
 
Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018
Brendan Bouffler
 
Building API Driven Microservices
Building API Driven MicroservicesBuilding API Driven Microservices
Building API Driven Microservices
Chris Munns
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
 
Serverless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless EventServerless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless Event
Boaz Ziniman
 

More from javier ramirez (20)

The Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDBThe Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest
javier ramirez
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
javier ramirez
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
javier ramirez
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
javier ramirez
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
javier ramirez
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragón
javier ramirez
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
javier ramirez
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
javier ramirez
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
javier ramirez
 
The Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDBThe Future of Fast Databases: Lessons from a Decade of QuestDB
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest
javier ramirez
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
javier ramirez
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
javier ramirez
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
javier ramirez
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
javier ramirez
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
javier ramirez
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
javier ramirez
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragón
javier ramirez
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
javier ramirez
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
javier ramirez
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
javier ramirez
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
javier ramirez
 
Ad

Recently uploaded (20)

Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Ad

Getting started with streaming analytics: Deep Dive

  • 1. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Part 3 of 3: Deep Dive Getting started with streaming analytics Javier Ramirez AWS Developer Advocate @supercoco9
  • 2. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Agenda Working with Avro and schemas Time windows, session windows, and accumulators Complex Event Processing Streaming analytics with SQL
  • 3. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Ingestion/in-stream storage: Apache Kafka A distributed streaming platform Concepts: Producers Topics Brokers Consumers
  • 4. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Apache Avro Apache Avro™ is a data serialization system. Avro provides: • Rich data structures. • A compact, fast, binary data format. • A container file, to store persistent data. • Remote procedure call (RPC). • Simple integration with dynamic languages. Code generation is not required
  • 5. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Confluent Schema Registry https://docs.confluent.io/current/schema-registry/index.html
  • 6. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Processing: Apache Flink Stateful computation over Data Streams Concepts: Job Manager/Workers Source DataStream Transforms/Operators TableAPI/SQL Sinks
  • 7. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Different types of streaming windows 13:00 14:008:00 9:00 10:00 11:00 12:00 Processing Time Event Time Processing Time 11:0010:00 15:0014:0013:0012:00 11:0010:00 15:0014:0013:0012:00 Input Output Event Time Processing Time 11:0 0 10:0 0 15:0 0 14:0 0 13:0 0 12:0 0 11:0 0 10:0 0 15:0 0 14:0 0 13:0 0 12:0 0 Input Output
  • 8. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Apache Flink Windows https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html
  • 9. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential FlinkCEP Complex Event Processing for Flink FlinkCEP is the Complex Event Processing (CEP) library implemented on top of Flink. It allows you to detect event patterns in an endless stream of events, giving you the opportunity to get hold of what’s important in your data.
  • 10. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Kinesis Data Analytics for SQL • Sub-second end to end processing latencies • SQL steps can be chained together in serial or parallel steps • Build applications with one or hundreds of queries • Pre-built functions include everything from sum and count distinct to machine learning algorithms • Aggregations run continuously using window operators
  • 11. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Automatic schema discovery which works for CSV and JSON data Supports multiple event types, arbitrary object nesting, single level of array nesting Connect to Streaming Data Sources Easily connect to Kinesis Data streams and Kinesis Data Firehose delivery streams Amazon Kinesis Data Streams Amazon Kinesis Data Firehose
  • 12. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Built-in AWS Lambda integration provides flexible pre- processing ahead of SQL code for: • Normalizing different event types • Converting other data formats (AVRO, Protobuf, ZIP) to JSON and CSV • Custom enrichment from database tables or API calls Pre-process Data Streams Using AWS Lambda AWS Lambda function raw data Amazon Kinesis Data Analytics application transformed data SQL code source destination
  • 13. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Fast, iterative development with SQL templates in console to get started Interactive SQL Editor
  • 14. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Window Types • Sliding, tumbling, and custom windows • Tumbling windows are fixed size and grouped keys do not overlap
  • 15. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential • Add a SQL table to your streaming application from Amazon S3 • Periodically update the table by calling the update application API Enrich your Data Stream using Amazon S3 Data In-application stream Amazon Kinesis Data Analytics application SQL code joining table and stream streaming source destination Amazon S3 In-application table
  • 16. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Getting Started https://ci.apache.org/projects/flink/flink-docs-stable/ Apache Flink official docs https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html Getting started with Apache Kafka/Amazon MSK https://aws.amazon.com/kinesis/ Amazon Kinesis Services for streaming data https://aws.amazon.com/elasticsearch-service/ Amazon ElasticSearch Service https://kafka.apache.org/documentation/ Apache Kafka official docs
  • 17. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential ThanksJavier Ramirez AWS Developer Advocate @supercoco9