SlideShare a Scribd company logo
Wellington AWS Meetup
Introduction to
Kinesis
Who Am I?
• Team Leader/Architect
in Business Intelligence/databases
• 17 years experience.
• MCSE BI, OCP DBA, MCDBA
• AWS-ASA-2505
Who are OptimalBI?
• Wellington based BI Consultancy
• “Making Information Visible”
Talk Outline
1. Why do we need Kinesis?
2. What is Kinesis?
3. Demo
4. How does it fit into an
existing data warehouse
5. When to use Kinesis
Big Data
1. Volume
2. Velocity
3. Variety
Kinesis is an answer to
Velocity
Machine learning looks simple:
Data is collected,
magic happens,
and we output it to our users
Traditional Business
Intelligence
Data Store Data
Warehouse
Query
Tool
• Periodic, Batch Extract-
Transform-Load.
• Persistent data source
• High latency
Internet of Things
• Large number of sensors.
• Self registering
• Pushing data
• May or may not retain any
historic data.
= Only one chance to get data
Batch ETL
• Data needs to wait
somewhere between loads.
• If data is only loaded six hours
per day, then four-times as
much hardware is needed.
• Latency of hours
DIY Streaming ETL
“Realtime” “ETL” cluster
DIY Streaming ETL 2.0
Add a queue
DIY Streaming ETL 3+
Cluster more
Getting messy, still problems
Problems with DIY Streaming ETL
1. Message queues deliver once. If you
want to fan out to many readers the
application in front needs to know about
each of them and queue the same
message repeatedly.
2. Order of message delivery is not
guaranteed.
3. If the program reading data crashes
partway through aggregating, messages
are lost.
What is Kinesis
• Kinesis is like a message queue,
but more scalable and with multiple
readers of each message.
• Kinesis is like a NOSQL database, but
with message delivery and daily purging.
• Kinesis is like an Enterprise Service Bus
focused on Analytics.
• For a limited, if common, use case
Kinesis is the best of all.
Kinesis Qualities
• Scalable
• Elastic
• Durable
• Fault Tolerant
• Replayable
Kinesis Components
• Each Queue/DB is called a Stream
• Each stream scales by adding Shards
• Each Shard provides 1 MB/s in and
2MB/s out
• Shards are only $0.44/day, so autoscale
them to give some safety margin
• Also pay about 2 cents per million puts
Kinesis Client Library
• Kinesis expects you to write bespoke
producer and consumer programs
• KCL provides automatic multi-threading
with one worker thread per shard.
• Similar to Hadoop, framework handles
the lifting the bespoke program does the
“reduce”
• You have to autoscale the EC2 groups.
Kinesis Application
instances
Auto Scaling group
instances
Auto Scaling group
instances
Auto Scaling group
Amazon Kinesis
Existing Kinesis Connectors
HTTP POST
AWS SDK
Log4j
Flume
Fluentd
Get* APIs
Amazon Kinesis Client
Library
+
Connector Library
Apache Storm
Amazon Elastic
MapReduce
Sending Reading
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-kinesis.html
Standard AWS Demo Script
1. HIVE already running in EMR
2. Create Kinesis Stream
3. Start Producer
4. Configure HIVE as consumer
Integrating Kinesis into an
existing Data Warehouse
1. Access data in near real-time
2. Facilitate more-traditional ETL
3. Archive
Introduction to AWS Kinesis
Near Real-time Data
1. Analyze individual transactions
2. Send alerts for both individual
transactions and trends
3. Aggregate to feed a
live dashboard
Facilitate Traditional ETL
1. Write lightly transformed data to
S3 to batch COPY into Redshift
2. Pre-compute aggregates, then
write them to S3
3. Provide a durable, replayable
buffer in front of traditional ETL
tools.
Archive
1. In addition to using your data,
Kinesis makes it easy to log the
full incoming data set to S3.
2. An object store makes more
sense for write-once/read-never
data than a database.
When to use Kinesis
1. Internet of Things (IOT)
2. Use for near-real-time
access to data.
3. Have more than one
consumer for each piece of
data.
Thanks
1. Our sponsors:
• API Talent
• AWS
• OptimalPeople
2. Bronwyn and Wyn
3. AWS for images on slides
Ad

More Related Content

Similar to Introduction to AWS Kinesis (20)

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
Moving Quickly with Data Services in the Cloud
Moving Quickly with Data Services in the CloudMoving Quickly with Data Services in the Cloud
Moving Quickly with Data Services in the Cloud
Matthew Dimich
 
SLC .Net User Group -- .Net, Kinesis Firehose, Glue, Athena
SLC .Net User Group -- .Net, Kinesis Firehose, Glue, AthenaSLC .Net User Group -- .Net, Kinesis Firehose, Glue, Athena
SLC .Net User Group -- .Net, Kinesis Firehose, Glue, Athena
Timothy Collinson
 
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Tesora
 
Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScale
mmoline
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 
OpenStack Block Storage 101
OpenStack Block Storage 101OpenStack Block Storage 101
OpenStack Block Storage 101
NetApp
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
Claudio Pontili
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
Jean-Claude Sotto
 
Leveraging OpenStack Cinder for Peak Application Performance
Leveraging OpenStack Cinder for Peak Application PerformanceLeveraging OpenStack Cinder for Peak Application Performance
Leveraging OpenStack Cinder for Peak Application Performance
NetApp
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
Tony Tam
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
Luis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
BEEVA_es
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
aspyker
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
Caserta
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
Ovais Tariq
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Ian Massingham
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
Moving Quickly with Data Services in the Cloud
Moving Quickly with Data Services in the CloudMoving Quickly with Data Services in the Cloud
Moving Quickly with Data Services in the Cloud
Matthew Dimich
 
SLC .Net User Group -- .Net, Kinesis Firehose, Glue, Athena
SLC .Net User Group -- .Net, Kinesis Firehose, Glue, AthenaSLC .Net User Group -- .Net, Kinesis Firehose, Glue, Athena
SLC .Net User Group -- .Net, Kinesis Firehose, Glue, Athena
Timothy Collinson
 
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Tesora
 
Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScale
mmoline
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 
OpenStack Block Storage 101
OpenStack Block Storage 101OpenStack Block Storage 101
OpenStack Block Storage 101
NetApp
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
Claudio Pontili
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
Jean-Claude Sotto
 
Leveraging OpenStack Cinder for Peak Application Performance
Leveraging OpenStack Cinder for Peak Application PerformanceLeveraging OpenStack Cinder for Peak Application Performance
Leveraging OpenStack Cinder for Peak Application Performance
NetApp
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
Tony Tam
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
Luis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
BEEVA_es
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
aspyker
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
Caserta
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
Ovais Tariq
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Ian Massingham
 

Recently uploaded (20)

Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Ad

Introduction to AWS Kinesis

  • 2. Who Am I? • Team Leader/Architect in Business Intelligence/databases • 17 years experience. • MCSE BI, OCP DBA, MCDBA • AWS-ASA-2505
  • 3. Who are OptimalBI? • Wellington based BI Consultancy • “Making Information Visible”
  • 4. Talk Outline 1. Why do we need Kinesis? 2. What is Kinesis? 3. Demo 4. How does it fit into an existing data warehouse 5. When to use Kinesis
  • 5. Big Data 1. Volume 2. Velocity 3. Variety
  • 6. Kinesis is an answer to Velocity Machine learning looks simple: Data is collected, magic happens, and we output it to our users
  • 7. Traditional Business Intelligence Data Store Data Warehouse Query Tool • Periodic, Batch Extract- Transform-Load. • Persistent data source • High latency
  • 8. Internet of Things • Large number of sensors. • Self registering • Pushing data • May or may not retain any historic data. = Only one chance to get data
  • 9. Batch ETL • Data needs to wait somewhere between loads. • If data is only loaded six hours per day, then four-times as much hardware is needed. • Latency of hours
  • 10. DIY Streaming ETL “Realtime” “ETL” cluster
  • 11. DIY Streaming ETL 2.0 Add a queue
  • 12. DIY Streaming ETL 3+ Cluster more Getting messy, still problems
  • 13. Problems with DIY Streaming ETL 1. Message queues deliver once. If you want to fan out to many readers the application in front needs to know about each of them and queue the same message repeatedly. 2. Order of message delivery is not guaranteed. 3. If the program reading data crashes partway through aggregating, messages are lost.
  • 14. What is Kinesis • Kinesis is like a message queue, but more scalable and with multiple readers of each message. • Kinesis is like a NOSQL database, but with message delivery and daily purging. • Kinesis is like an Enterprise Service Bus focused on Analytics. • For a limited, if common, use case Kinesis is the best of all.
  • 15. Kinesis Qualities • Scalable • Elastic • Durable • Fault Tolerant • Replayable
  • 16. Kinesis Components • Each Queue/DB is called a Stream • Each stream scales by adding Shards • Each Shard provides 1 MB/s in and 2MB/s out • Shards are only $0.44/day, so autoscale them to give some safety margin • Also pay about 2 cents per million puts
  • 17. Kinesis Client Library • Kinesis expects you to write bespoke producer and consumer programs • KCL provides automatic multi-threading with one worker thread per shard. • Similar to Hadoop, framework handles the lifting the bespoke program does the “reduce” • You have to autoscale the EC2 groups.
  • 18. Kinesis Application instances Auto Scaling group instances Auto Scaling group instances Auto Scaling group Amazon Kinesis
  • 19. Existing Kinesis Connectors HTTP POST AWS SDK Log4j Flume Fluentd Get* APIs Amazon Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce Sending Reading
  • 20. http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-kinesis.html Standard AWS Demo Script 1. HIVE already running in EMR 2. Create Kinesis Stream 3. Start Producer 4. Configure HIVE as consumer
  • 21. Integrating Kinesis into an existing Data Warehouse 1. Access data in near real-time 2. Facilitate more-traditional ETL 3. Archive
  • 23. Near Real-time Data 1. Analyze individual transactions 2. Send alerts for both individual transactions and trends 3. Aggregate to feed a live dashboard
  • 24. Facilitate Traditional ETL 1. Write lightly transformed data to S3 to batch COPY into Redshift 2. Pre-compute aggregates, then write them to S3 3. Provide a durable, replayable buffer in front of traditional ETL tools.
  • 25. Archive 1. In addition to using your data, Kinesis makes it easy to log the full incoming data set to S3. 2. An object store makes more sense for write-once/read-never data than a database.
  • 26. When to use Kinesis 1. Internet of Things (IOT) 2. Use for near-real-time access to data. 3. Have more than one consumer for each piece of data.
  • 27. Thanks 1. Our sponsors: • API Talent • AWS • OptimalPeople 2. Bronwyn and Wyn 3. AWS for images on slides

Editor's Notes

  • #24: Near-line recommendations, fault-analysis
  • #27: Drinking from the firehose Low Latency Multiple outputs