SlideShare a Scribd company logo
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Journey with Streaming Data on AWS
Amazon Web Services (AWS) offers over 165 fully featured cloud services from data
centers globally. AWS launched its first data streaming service, Amazon Kinesis Data
Streams, over five years ago. Now, customers are using streaming data across most
AWS services including two that support running Apache Flink, Amazon EMR and
Amazon Kinesis Data Analytics. In this keynote, we will describe how customers and
their use of streaming data has evolved on AWS. We will look at how streaming data
and Apache Flink are used externally and internally on AWS, and where we see usage
of Apache Flink growing.
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rahul Pathak
General Manager of Databases, Analytics, and Blockchain
Amazon Web Services
Customer Journey with Streaming
Data on AWS
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building with our customers
50+ fully managed streaming
capabilities deployed globally in 22
AWS Regions
2019
2013
First fully managed streaming
service,
Amazon Kinesis Data Streams
2018
Support for Apache Flink
based apps in
Amazon Kinesis Data Analytics
Support for Apache Flink in
Amazon EMR
2016
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming data on AWS in 2013
Internal and external customer struggling with high volume data
• Low latency, continuous data capture
• Durable storage to quickly get data from
unreliable sources
• Scale to cost effectively handle lots of data
AWS Metering
and Billing
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What did customers look like in the past?
The ā€œAverageā€ customer… was attracted by ease of use
• Java developer
• Had one application processing 10s of millions of events per day
• Application performed extract, buffer, and load to Amazon S3
The ā€œLargeā€ customer… was attracted by performance and scale
• Had distributed systems experience and was familiar with Hadoop
• Had two applications processing billions of events per day
• Application performed advanced ETL like joins
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Early streaming data customers on AWS
Streaming extract
and load to Amazon
S3
50 billion daily
ad impressions,
sub-50 ms
responses
100 GB/day
clickstreams from
250+ sites
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Supercell Delivers World-Class Mobile Games
Supercell is a Finnish game company known for the hit
games Clash of Clans, Hay Day, and Boom Beach.
The world of gaming never sleeps
... We owe every player a great
experience, and AWS is our
platform to make that happen.
Sami Yliharju
Services Lead, Supercell
ā€
ā€œ • Started with a non-
streaming architecture
• Use streaming data for
faster ETL and analytics
• Started with archival apps
and kept adding use cases
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Supercell’s data pipeline circa 2012
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Supercell’s streaming pipeline circa 2018
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simplify common use cases
Every customer built a streaming delivery app
• Load streaming data into streams, data lakes and
warehouses
• Zero administration and seamless elasticity
• Direct-to-data store integration
• Serverless continuous data transformations
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Make advanced use cases accessible
Few customers were able to move to real-time analytics
• Analyze data streams in real time
• Interact with streaming data in real-time using Apache
Flink-based apps
• Build fully managed and serverless stream processing
applications
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache Flink at AWS
Customers run Apache Flink on AWS on different services with varying degree
of flexibility and management
Amazon Kinesis
Data Analytics
Amazon Elastic
Kubernetes Service
Amazon EMR
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming data on AWS today
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming data on AWS in 2019
<40%
delivery
apps
3
apps
3
data
stores
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What do customers do today?
The ā€œAverageā€ customer… is attracted by the ease of use and rich features
• New to streaming data (~50% of customers are still net new to streaming)
• Has several applications processing 100s of millions of events per day
The ā€œLargeā€ customer… is attracted to the above but needs flexibility and elasticity
• Uses many languages including SQL, Java, Python
• Has 10s of applications processing 10s of billions of events per day
The ā€œPlatformā€ customer is…attracted to all of the above plus performance
• Builds abstracted platforms on top of streaming services
• Has 100s of applications processing trillions of events per day
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming services are foundational at Amazon
Amazon Go
video analytics
Amazon.com
online catalog
Amazon
CloudWatch
logs
Amazon
S3 events
AWS
metering
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customers built a wide variety of use cases
Near-real-time
home valuation
(Zestimates)
Live clickstream
dashboards refreshed
under 10s
1 billion events per
week from
connected devices
Real-time
game events
analytics
Built event driven,
micro services arch
StreamHub
Online stylist
processing
10 million
events/day
Facilitate
communications
between 100+
microservices
IoT predictive
analytics
Log analytics for
real-time ā€œsingle
pane of glassā€
Serverless event bus
and ingestion
pipeline
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Autodesk makes software for people who make things
Autodesk, a leading provider of 3D design and
engineering software, wants to do more than create
and deliver software.
ā€œUltimately, we are improving our
software products and offering
better service to our customers
because of the real-time visibility
we’re getting into log data.ā€
Tommy Li
Senior Software Architect, Autodesk
ā€
ā€œ • Provides cloud services for
its design software
• Uses streaming analytics to
monitor and improve their
customer experience
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Autodesk’s Streaming Architecture
Amazon Kinesis
Data Streams
Amazon EC2
Amazon Elastic Container
Service
AWS Lambda
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Analytics
Amazon Elasticsearch
Service
Amazon Athena
Amazon CloudWatch
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Data Center Services (DCS) uses Apache Flink
DCS maintains and monitors electrical and physical
topologies for all AWS Data Centers
ā€œWe chose Apache Flink because it
provided simple, extensible interfaces and
scalability. We chose Kinesis Data Analytics
because of its guaranteed uptime, simple
deployment mode, and lower ops cost.
AWS Data Center Team
ā€
ā€œ • Write software to interface
with equipment in data
centers
• Includes electrical power
draw, water usage, weather,
fan speeds, and host
temperatures
• Use insights to drive down
cost of data center
operations
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Data Center Services
• Data is captured from Kinesis Data
Streams and CDC from a noSQL store
(Amazon DynamoDB streams)
• Analytics are calculated using Apache
Flink on Kinesis Data Analytics
• Analytics include drift in circuit breaker
settings to power utilization and much
more
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What will customers look like in the future?
The ā€œAverageā€ customer… doesn’t even know they are using Flink or a
streaming data service
The ā€œLargeā€ customer… has teams across the company with varying levels of
technical sophistication
The ā€œPlatformā€ customer is… any company, no longer requiring teams of
engineers and years of investment
You may already be here but streaming is still new for most
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache Flink at AWS in the future
• Flink is the fastest growing framework reading Kinesis Data
Streams
• Usage is still relatively small compared to simplest of
solutions (e.g. KafkaConsumer, Kinesis Clients) running on EC2
• Excited to work with community of further simplifying
running Flink both on AWS and anywhere else
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It is still Day 1
Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot (20)

PPTX
Process Batch transaction using AzureBlob Integration with Apache Camel
Srikant Mantha
Ā 
PPTX
AWS Outage Analysis
ThousandEyes
Ā 
PPTX
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
Ā 
PDF
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
DataWorks Summit
Ā 
PDF
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
confluent
Ā 
PDF
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
confluent
Ā 
PPTX
Riverbed Granite 2.5
Riverbed Technology
Ā 
PDF
A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
confluent
Ā 
PDF
Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...
HostedbyConfluent
Ā 
PDF
Improving Veteran benefit services through efficient data streaming | Robert ...
HostedbyConfluent
Ā 
PDF
Real time data processing and model inferncing platform with Kafka streams (N...
KafkaZone
Ā 
PDF
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
StreamNative
Ā 
PPTX
How does a Modern Integration Platform Innovate
SEEBURGER
Ā 
PDF
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
confluent
Ā 
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai WƤhner
Ā 
PDF
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Kai WƤhner
Ā 
PPTX
I'm being followed by drones
DataWorks Summit/Hadoop Summit
Ā 
PDF
Apache Kafka and the Data Mesh | Michael Noll, Confluent
HostedbyConfluent
Ā 
PDF
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
DataWorks Summit
Ā 
PDF
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
HostedbyConfluent
Ā 
Process Batch transaction using AzureBlob Integration with Apache Camel
Srikant Mantha
Ā 
AWS Outage Analysis
ThousandEyes
Ā 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
Ā 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
DataWorks Summit
Ā 
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
confluent
Ā 
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
confluent
Ā 
Riverbed Granite 2.5
Riverbed Technology
Ā 
A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions
confluent
Ā 
Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...
HostedbyConfluent
Ā 
Improving Veteran benefit services through efficient data streaming | Robert ...
HostedbyConfluent
Ā 
Real time data processing and model inferncing platform with Kafka streams (N...
KafkaZone
Ā 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
StreamNative
Ā 
How does a Modern Integration Platform Innovate
SEEBURGER
Ā 
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
confluent
Ā 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai WƤhner
Ā 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Kai WƤhner
Ā 
I'm being followed by drones
DataWorks Summit/Hadoop Summit
Ā 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
HostedbyConfluent
Ā 
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)
DataWorks Summit
Ā 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
HostedbyConfluent
Ā 

Similar to Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS (20)

PPTX
Building a Real-Time Data Platform on AWS
Injae Kwak
Ā 
PDF
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
AWS Summits
Ā 
PDF
Build real-time streaming data pipelines to AWS with Confluent
confluent
Ā 
PDF
Confluent_AWS_ImmersionDay_Q42023.pdf
Ahmed791434
Ā 
PPTX
From raw data to business insights. A modern data lake
javier ramirez
Ā 
PDF
Serverless in Big Data
Eric Johnson
Ā 
PDF
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summits
Ā 
PPTX
Make your data fly - Building data platform in AWS
Kimmo KantojƤrvi
Ā 
PDF
Builders' Day - Building Data Lakes for Analytics On AWS LC
Amazon Web Services LATAM
Ā 
PPTX
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
Ā 
PDF
¿Quién es Amazon Web Services?
Software Guru
Ā 
PDF
Architecting Data Lakes on AWS
Sajith Appukuttan
Ā 
PDF
Serverless Architectural Patterns - ServerlessDays TLV
Boaz Ziniman
Ā 
PDF
Big Data, IngenierĆ­a de datos, y Data Lakes en AWS
javier ramirez
Ā 
PPTX
TECHTalks - Philadelphia PA - Brien Blandford
EagleDream Technologies
Ā 
PDF
How Disney+ uses fast data ubiquity to improve the customer experience
Martin Zapletal
Ā 
PDF
Serverless Big Data Architectures: Serverless Data Analytics
Kristana Kane
Ā 
PDF
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
javier ramirez
Ā 
PDF
Building Modern Streaming Analytics with Confluent on AWS
confluent
Ā 
PDF
Big data and Analytics on AWS
2nd Watch
Ā 
Building a Real-Time Data Platform on AWS
Injae Kwak
Ā 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
AWS Summits
Ā 
Build real-time streaming data pipelines to AWS with Confluent
confluent
Ā 
Confluent_AWS_ImmersionDay_Q42023.pdf
Ahmed791434
Ā 
From raw data to business insights. A modern data lake
javier ramirez
Ā 
Serverless in Big Data
Eric Johnson
Ā 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summits
Ā 
Make your data fly - Building data platform in AWS
Kimmo KantojƤrvi
Ā 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Amazon Web Services LATAM
Ā 
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
Ā 
¿Quién es Amazon Web Services?
Software Guru
Ā 
Architecting Data Lakes on AWS
Sajith Appukuttan
Ā 
Serverless Architectural Patterns - ServerlessDays TLV
Boaz Ziniman
Ā 
Big Data, IngenierĆ­a de datos, y Data Lakes en AWS
javier ramirez
Ā 
TECHTalks - Philadelphia PA - Brien Blandford
EagleDream Technologies
Ā 
How Disney+ uses fast data ubiquity to improve the customer experience
Martin Zapletal
Ā 
Serverless Big Data Architectures: Serverless Data Analytics
Kristana Kane
Ā 
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
javier ramirez
Ā 
Building Modern Streaming Analytics with Confluent on AWS
confluent
Ā 
Big data and Analytics on AWS
2nd Watch
Ā 
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
Ā 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
Ā 
PPTX
ā€œAlexa, be quiet!ā€: End-to-end near-real time model building and evaluation i...
Flink Forward
Ā 
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
Ā 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
Ā 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
Ā 
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
Ā 
PPTX
One sink to rule them all: Introducing the new Async Sink
Flink Forward
Ā 
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
Ā 
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
Ā 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
Ā 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
Ā 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
Ā 
PPTX
The Current State of Table API in 2022
Flink Forward
Ā 
PDF
Flink SQL on Pulsar made easy
Flink Forward
Ā 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
Ā 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
Ā 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
Ā 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
Ā 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
Ā 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
Ā 
Evening out the uneven: dealing with skew in Flink
Flink Forward
Ā 
ā€œAlexa, be quiet!ā€: End-to-end near-real time model building and evaluation i...
Flink Forward
Ā 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
Ā 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
Ā 
Autoscaling Flink with Reactive Mode
Flink Forward
Ā 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
Ā 
One sink to rule them all: Introducing the new Async Sink
Flink Forward
Ā 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
Ā 
Flink powered stream processing platform at Pinterest
Flink Forward
Ā 
Apache Flink in the Cloud-Native Era
Flink Forward
Ā 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
Ā 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
Ā 
The Current State of Table API in 2022
Flink Forward
Ā 
Flink SQL on Pulsar made easy
Flink Forward
Ā 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
Ā 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
Ā 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
Ā 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
Ā 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
Ā 
Ad

Recently uploaded (20)

PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
Ā 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
Ā 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
Ā 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
Ā 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
Ā 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
Ā 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
Ā 
PDF
ā€œComputer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,ā€ a ...
Edge AI and Vision Alliance
Ā 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
Ā 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
Ā 
PDF
ā€œNPU IP Hardware Shaped Through Software and Use-case Analysis,ā€ a Presentati...
Edge AI and Vision Alliance
Ā 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
Ā 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
Ā 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
Ā 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
Ā 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
Ā 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
Ā 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
Ā 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
Ā 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
Ā 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
Ā 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
Ā 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
Ā 
The Project Compass - GDG on Campus MSIT
dscmsitkol
Ā 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
Ā 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
Ā 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
Ā 
ā€œComputer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,ā€ a ...
Edge AI and Vision Alliance
Ā 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
Ā 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
Ā 
ā€œNPU IP Hardware Shaped Through Software and Use-case Analysis,ā€ a Presentati...
Edge AI and Vision Alliance
Ā 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
Ā 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
Ā 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
Ā 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
Ā 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
Ā 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
Ā 
How do you fast track Agentic automation use cases discovery?
DianaGray10
Ā 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
Ā 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
Ā 

Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS

  • 1. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customer Journey with Streaming Data on AWS Amazon Web Services (AWS) offers over 165 fully featured cloud services from data centers globally. AWS launched its first data streaming service, Amazon Kinesis Data Streams, over five years ago. Now, customers are using streaming data across most AWS services including two that support running Apache Flink, Amazon EMR and Amazon Kinesis Data Analytics. In this keynote, we will describe how customers and their use of streaming data has evolved on AWS. We will look at how streaming data and Apache Flink are used externally and internally on AWS, and where we see usage of Apache Flink growing.
  • 2. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rahul Pathak General Manager of Databases, Analytics, and Blockchain Amazon Web Services Customer Journey with Streaming Data on AWS
  • 3. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building with our customers 50+ fully managed streaming capabilities deployed globally in 22 AWS Regions 2019 2013 First fully managed streaming service, Amazon Kinesis Data Streams 2018 Support for Apache Flink based apps in Amazon Kinesis Data Analytics Support for Apache Flink in Amazon EMR 2016
  • 4. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming data on AWS in 2013 Internal and external customer struggling with high volume data • Low latency, continuous data capture • Durable storage to quickly get data from unreliable sources • Scale to cost effectively handle lots of data AWS Metering and Billing
  • 5. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What did customers look like in the past? The ā€œAverageā€ customer… was attracted by ease of use • Java developer • Had one application processing 10s of millions of events per day • Application performed extract, buffer, and load to Amazon S3 The ā€œLargeā€ customer… was attracted by performance and scale • Had distributed systems experience and was familiar with Hadoop • Had two applications processing billions of events per day • Application performed advanced ETL like joins
  • 6. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Early streaming data customers on AWS Streaming extract and load to Amazon S3 50 billion daily ad impressions, sub-50 ms responses 100 GB/day clickstreams from 250+ sites
  • 7. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Supercell Delivers World-Class Mobile Games Supercell is a Finnish game company known for the hit games Clash of Clans, Hay Day, and Boom Beach. The world of gaming never sleeps ... We owe every player a great experience, and AWS is our platform to make that happen. Sami Yliharju Services Lead, Supercell ā€ ā€œ • Started with a non- streaming architecture • Use streaming data for faster ETL and analytics • Started with archival apps and kept adding use cases
  • 8. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Supercell’s data pipeline circa 2012
  • 9. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Supercell’s streaming pipeline circa 2018
  • 10. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simplify common use cases Every customer built a streaming delivery app • Load streaming data into streams, data lakes and warehouses • Zero administration and seamless elasticity • Direct-to-data store integration • Serverless continuous data transformations
  • 11. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Make advanced use cases accessible Few customers were able to move to real-time analytics • Analyze data streams in real time • Interact with streaming data in real-time using Apache Flink-based apps • Build fully managed and serverless stream processing applications
  • 12. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache Flink at AWS Customers run Apache Flink on AWS on different services with varying degree of flexibility and management Amazon Kinesis Data Analytics Amazon Elastic Kubernetes Service Amazon EMR
  • 13. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming data on AWS today
  • 14. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming data on AWS in 2019 <40% delivery apps 3 apps 3 data stores
  • 15. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What do customers do today? The ā€œAverageā€ customer… is attracted by the ease of use and rich features • New to streaming data (~50% of customers are still net new to streaming) • Has several applications processing 100s of millions of events per day The ā€œLargeā€ customer… is attracted to the above but needs flexibility and elasticity • Uses many languages including SQL, Java, Python • Has 10s of applications processing 10s of billions of events per day The ā€œPlatformā€ customer is…attracted to all of the above plus performance • Builds abstracted platforms on top of streaming services • Has 100s of applications processing trillions of events per day
  • 16. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming services are foundational at Amazon Amazon Go video analytics Amazon.com online catalog Amazon CloudWatch logs Amazon S3 events AWS metering
  • 17. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customers built a wide variety of use cases Near-real-time home valuation (Zestimates) Live clickstream dashboards refreshed under 10s 1 billion events per week from connected devices Real-time game events analytics Built event driven, micro services arch StreamHub Online stylist processing 10 million events/day Facilitate communications between 100+ microservices IoT predictive analytics Log analytics for real-time ā€œsingle pane of glassā€ Serverless event bus and ingestion pipeline
  • 18. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Autodesk makes software for people who make things Autodesk, a leading provider of 3D design and engineering software, wants to do more than create and deliver software. ā€œUltimately, we are improving our software products and offering better service to our customers because of the real-time visibility we’re getting into log data.ā€ Tommy Li Senior Software Architect, Autodesk ā€ ā€œ • Provides cloud services for its design software • Uses streaming analytics to monitor and improve their customer experience
  • 19. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Autodesk’s Streaming Architecture Amazon Kinesis Data Streams Amazon EC2 Amazon Elastic Container Service AWS Lambda Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics Amazon Elasticsearch Service Amazon Athena Amazon CloudWatch
  • 20. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Data Center Services (DCS) uses Apache Flink DCS maintains and monitors electrical and physical topologies for all AWS Data Centers ā€œWe chose Apache Flink because it provided simple, extensible interfaces and scalability. We chose Kinesis Data Analytics because of its guaranteed uptime, simple deployment mode, and lower ops cost. AWS Data Center Team ā€ ā€œ • Write software to interface with equipment in data centers • Includes electrical power draw, water usage, weather, fan speeds, and host temperatures • Use insights to drive down cost of data center operations
  • 21. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Data Center Services • Data is captured from Kinesis Data Streams and CDC from a noSQL store (Amazon DynamoDB streams) • Analytics are calculated using Apache Flink on Kinesis Data Analytics • Analytics include drift in circuit breaker settings to power utilization and much more
  • 22. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What will customers look like in the future? The ā€œAverageā€ customer… doesn’t even know they are using Flink or a streaming data service The ā€œLargeā€ customer… has teams across the company with varying levels of technical sophistication The ā€œPlatformā€ customer is… any company, no longer requiring teams of engineers and years of investment You may already be here but streaming is still new for most
  • 23. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache Flink at AWS in the future • Flink is the fastest growing framework reading Kinesis Data Streams • Usage is still relatively small compared to simplest of solutions (e.g. KafkaConsumer, Kinesis Clients) running on EC2 • Excited to work with community of further simplifying running Flink both on AWS and anywhere else
  • 24. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It is still Day 1
  • 25. Ā© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!