Introduction to AWS Kinesis

Wellington AWS Meetup
Introduction to
Kinesis

Who Am I?
• Team Leader/Architect
in Business Intelligence/databases
• 17 years experience.
• MCSE BI, OCP DBA, MCDBA
• AWS-ASA-2505

Who are OptimalBI?
• Wellington based BI Consultancy
• “Making Information Visible”

Talk Outline
1. Why do we need Kinesis?
2. What is Kinesis?
3. Demo
4. How does it fit into an
existing data warehouse
5. When to use Kinesis

Big Data
1. Volume
2. Velocity
3. Variety

Kinesis is an answer to
Velocity
Machine learning looks simple:
Data is collected,
magic happens,
and we output it to our users

Traditional Business
Intelligence
Data Store Data
Warehouse
Query
Tool
• Periodic, Batch Extract-
Transform-Load.
• Persistent data source
• High latency

Internet of Things
• Large number of sensors.
• Self registering
• Pushing data
• May or may not retain any
historic data.
= Only one chance to get data

Batch ETL
• Data needs to wait
somewhere between loads.
• If data is only loaded six hours
per day, then four-times as
much hardware is needed.
• Latency of hours

DIY Streaming ETL
“Realtime” “ETL” cluster

DIY Streaming ETL 2.0
Add a queue

DIY Streaming ETL 3+
Cluster more
Getting messy, still problems

Problems with DIY Streaming ETL
1. Message queues deliver once. If you
want to fan out to many readers the
application in front needs to know about
each of them and queue the same
message repeatedly.
2. Order of message delivery is not
guaranteed.
3. If the program reading data crashes
partway through aggregating, messages
are lost.

What is Kinesis
• Kinesis is like a message queue,
but more scalable and with multiple
readers of each message.
• Kinesis is like a NOSQL database, but
with message delivery and daily purging.
• Kinesis is like an Enterprise Service Bus
focused on Analytics.
• For a limited, if common, use case
Kinesis is the best of all.

Kinesis Qualities
• Scalable
• Elastic
• Durable
• Fault Tolerant
• Replayable

Kinesis Components
• Each Queue/DB is called a Stream
• Each stream scales by adding Shards
• Each Shard provides 1 MB/s in and
2MB/s out
• Shards are only $0.44/day, so autoscale
them to give some safety margin
• Also pay about 2 cents per million puts

Kinesis Client Library
• Kinesis expects you to write bespoke
producer and consumer programs
• KCL provides automatic multi-threading
with one worker thread per shard.
• Similar to Hadoop, framework handles
the lifting the bespoke program does the
“reduce”
• You have to autoscale the EC2 groups.

Kinesis Application
instances
Auto Scaling group
instances
Auto Scaling group
instances
Auto Scaling group
Amazon Kinesis

Existing Kinesis Connectors
HTTP POST
AWS SDK
Log4j
Flume
Fluentd
Get* APIs
Amazon Kinesis Client
Library
+
Connector Library
Apache Storm
Amazon Elastic
MapReduce
Sending Reading

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-kinesis.html
Standard AWS Demo Script
1. HIVE already running in EMR
2. Create Kinesis Stream
3. Start Producer
4. Configure HIVE as consumer

Integrating Kinesis into an
existing Data Warehouse
1. Access data in near real-time
2. Facilitate more-traditional ETL
3. Archive

Near Real-time Data
1. Analyze individual transactions
2. Send alerts for both individual
transactions and trends
3. Aggregate to feed a
live dashboard

Facilitate Traditional ETL
1. Write lightly transformed data to
S3 to batch COPY into Redshift
2. Pre-compute aggregates, then
write them to S3
3. Provide a durable, replayable
buffer in front of traditional ETL
tools.

Archive
1. In addition to using your data,
Kinesis makes it easy to log the
full incoming data set to S3.
2. An object store makes more
sense for write-once/read-never
data than a database.

When to use Kinesis
1. Internet of Things (IOT)
2. Use for near-real-time
access to data.
3. Have more than one
consumer for each piece of
data.

Thanks
1. Our sponsors:
• API Talent
• AWS
• OptimalPeople
2. Bronwyn and Wyn
3. AWS for images on slides

Introduction to AWS Kinesis

Recommended

More Related Content

Similar to Introduction to AWS Kinesis (20)

Recently uploaded (20)

Introduction to AWS Kinesis

Editor's Notes