SlideShare a Scribd company logo
Store stream data on Data Lake
Principal Data Architect at Home24
Data Services: Search, Recommendations, Ranking
Worked on: Here Maps, Sapo.pt, DataJet, Xing, …
Scala, Perl, Prolog, Java, SQL, R, …
AWS: Step-Functions, Lambda Function, EMR, EC2,
Batch, SQS, SNS, Firehose, Athena, API Gateway, ...
home24.tech.blog
€
Store stream data on Data Lake
● 15 persons of 12 Nationalities
● Serverless Lovers. For data ingestion we have:
● AWS Technologies: Step-Functions, Cloud-Formation, Lambda Functions,
Athena, EMR, Redshift, S3, ...
Production Development
Number of Lambdas 625 2311
Number of Step Function 113 490
Consumed time (a month) 3,383,525 sec (39 days) 5,371,037 sec (62 days)
Number of requests (a month) 2,014,203 Requests 3,300,118 Requests
● Majority of our Streams are low rate messages
● The Big Stream doesn’t have an easily predictable rate of
messages and can peak to 100 messages/sec
● We will have many more low rate Streams
Main requirements
● Store new Stream Data in Raw S3 Bucket
● Refine Raw S3 Bucket data to a Refined S3 Bucket
● Wrong formatted messages shall not stop the flow
● Notification shall be sent on bad data
● Data must be refined in less than 10 minutes
Other
● Able to replay many days of data fast
● For development, every developer shall be able to deploy his version
independently
Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Architecture
● A SQS Queue collects all data from the SNS
Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Architecture
● A SQS Queue collects all data from the SNS
● A Lambda copies the data from the SQS to a
Firehose
● The Lambda Function is invoked once a
minute via CloudWatch Event
Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Architecture
● A SQS Queue collects all data from the SNS
● A Lambda copies the data from the SQS to a
Firehose
● The Lambda Function is invoked once a
minute via CloudWatch Event
● Firehose merges the data and creates files
on Raw S3 Bucket
Requirement
● When some message are not
processable, send a notification.
Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
● On case of error, the messages will
end on the Dead-Letter Queue
Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
● On case of error, the messages will
end on the Dead-Letter Queue
● Non empty Dead-Letter SQS means
there is an error on the data
Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
● On case of error, the messages will
end on the Dead-Letter Queue
● Non empty Dead-Letter SQS means
there is an error on the data
● After fixing the Lambda function, one
can always copy the messages back
to the Raw SQS
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
● The Lambda Function is invoked once
a minute via CloudWatch Event and
process all unprocessed files
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
● The Lambda Function is invoked once
a minute via CloudWatch Event and
process all unprocessed files
● A file with the same key, as Raw file, is
created on the Refine S3 Bucket
Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
● The Lambda Function is invoked once
a minute via CloudWatch Event and
process all unprocessed files
● A file with the same key, as Raw file, is
created on the Refine S3 Bucket
● Messages that fail to process will end
on the Dead Letter Queue
Requirements
● Replay multiple days of data
Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
● Since the files in Raw and Refine
have the same key, the files will
always overwrite the existing ones
Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
● Since the files in Raw and Refine
have the same key, the files will
always overwrite the existing ones
● The execution time of the Refiner
Lambda will rise and the Refiner
Lambdas will work in parallel
Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
● Since the files in Raw and Refine
have the same key, the files will
always overwrite the existing ones
● The execution time of the Refiner
Lambda will rise and the Refiner
Lambdas will work in parallelParallelism:
● our Lambda goes to ~190 sec, 3 lambdas
running in parallel.
● 9198 S3 objects
● 30 GB of GZip data, 10GB/hour
Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Architecture
● We created an internal SNS
where we clone the external
messages
Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Architecture
● We created an internal SNS
where we clone the external
messages
● SNS can write to multiple
SQS
Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Architecture
● We created an internal SNS
where we clone the external
messages
● SNS can write to multiple
SQS
● Same CloudFormation magic
and every developer can
deploy his own Environment
EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
Devops Higher Low
EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
Devops Higher Low
Scale Scale while it has credits to 1
vCPU. To have more vCPUs you
need to use more expensive
instance types or implement
autoscaling
Out of the box until a certain
level.
2 vCPU * 5 Lambdas = 10
vCPUs
EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
Devops Higher Low
Scale Scale while it has credits to 1
vCPU. To have more vCPUs you
need to use more expensive
instance types or implement
autoscaling
Out of the box until a certain
level.
2 vCPU * 5 Lambdas = 10
vCPUs
Price wise, lambda seems a good solution. For our problems, 10 vCPUs is
clearly more than enough.
Kinesys SQS
Slow stream 2 Shards 24.5$/month
Puts 0.042$/Month
Requests
2.07$/month
We analyze our 2 types of stream of data:
● Slow Stream: 1 message/sec (2.6 million requests/month)
On SQS you pay PUTs and GETs on Kinesys you pay PUTs
Kinesys SQS
Slow stream 2 Shards 24.5$/month
Puts 0.042$/Month
Requests
2.07$/month
Fast stream 3 Shards 36.7$/month
Puts 1.1$/month
Requests
51.8$/month
We analyze our 2 types of stream of data:
● Slow Stream: 1 message/sec (2.6 million requests/month)
● Fast Stream: 25 message/second (64.8 million requests/month)
with spikes of 100 message/second
On SQS you pay PUTs and GETs on Kinesys you pay PUTs
Kinesys SQS
Slow stream 2 Shards 24.5$/month
Puts 0.042$/Month
Requests
2.07$/month
Fast stream 3 Shards 36.7$/month
Puts 1.1$/month
Requests
51.8$/month
Errors Errors have to be controlled
externally
Errors go to
DeadLeter Queue
We analyze our 2 types of stream of data:
● Slow Stream: 1 message/sec (2.6 million requests/month)
● Fast Stream: 25 message/second (64.8 million requests/month)
with spikes of 100 message/second
On SQS you pay PUTs and GETs on Kinesys you pay PUTs
● You just pay for what you use
● Scalability is not an issue at our messages volume (top 100
messages/second)
○ SQS and Firehose can easily process that volume of messages
○ Multiple Lambdas can work in parallel in case of high traffic or
replay.
● Separated Lambdas by Stream help understanding the logs
● Separated environments simplify developers work
● Data is on S3 and it can be queried via Athena, EMR, Redshift
Spectrum, ...
Questions
Answers
Ad

More Related Content

What's hot (20)

ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015 ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015
Renato Javier Marroquín Mogrovejo
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
Tao Feng
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
Monal Daxini
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
Martin Kleppmann
 
Air traffic controller - Streams Processing meetup
Air traffic controller  - Streams Processing meetupAir traffic controller  - Streams Processing meetup
Air traffic controller - Streams Processing meetup
Ed Yakabosky
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
blueboxtraveler
 
Harvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's FeedHarvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's Feed
Mohamed El-Geish
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Spark Streaming into context
Spark Streaming into contextSpark Streaming into context
Spark Streaming into context
David Martínez Rego
 
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
DataStax
 
Thinking Functionally with Clojure
Thinking Functionally with ClojureThinking Functionally with Clojure
Thinking Functionally with Clojure
John Stevenson
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
MongoDB
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
Abhishek Shivanna
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
Demi Ben-Ari
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
Tao Feng
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
Monal Daxini
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
Martin Kleppmann
 
Air traffic controller - Streams Processing meetup
Air traffic controller  - Streams Processing meetupAir traffic controller  - Streams Processing meetup
Air traffic controller - Streams Processing meetup
Ed Yakabosky
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
blueboxtraveler
 
Harvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's FeedHarvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's Feed
Mohamed El-Geish
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
DataStax
 
Thinking Functionally with Clojure
Thinking Functionally with ClojureThinking Functionally with Clojure
Thinking Functionally with Clojure
John Stevenson
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
MongoDB
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
Abhishek Shivanna
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
Demi Ben-Ari
 

Similar to Store stream data on Data Lake (20)

Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
Antons Kranga
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
Chen-en Lu
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
Codemotion
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Codemotion Tel Aviv
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
kartraj
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
Max Lapan
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
Lars Marius Garshol
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to Cassandra
Demi Ben-Ari
 
"EventStoreDb: To be, or not to be, that is the question", Illia Maier
"EventStoreDb: To be, or not to be, that is the question",  Illia Maier"EventStoreDb: To be, or not to be, that is the question",  Illia Maier
"EventStoreDb: To be, or not to be, that is the question", Illia Maier
Fwdays
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
Rob Winters
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Streamsets Inc.
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Rick Bilodeau
 
AWS Lambdas are cool - Cheminfo Stories Day 1
AWS Lambdas are cool - Cheminfo Stories Day 1AWS Lambdas are cool - Cheminfo Stories Day 1
AWS Lambdas are cool - Cheminfo Stories Day 1
ChemAxon
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Emprovise
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
Amazon Web Services Korea
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
Antons Kranga
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
Chen-en Lu
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
Codemotion
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Codemotion Tel Aviv
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
kartraj
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
Max Lapan
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to Cassandra
Demi Ben-Ari
 
"EventStoreDb: To be, or not to be, that is the question", Illia Maier
"EventStoreDb: To be, or not to be, that is the question",  Illia Maier"EventStoreDb: To be, or not to be, that is the question",  Illia Maier
"EventStoreDb: To be, or not to be, that is the question", Illia Maier
Fwdays
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
Rob Winters
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Streamsets Inc.
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Rick Bilodeau
 
AWS Lambdas are cool - Cheminfo Stories Day 1
AWS Lambdas are cool - Cheminfo Stories Day 1AWS Lambdas are cool - Cheminfo Stories Day 1
AWS Lambdas are cool - Cheminfo Stories Day 1
ChemAxon
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Emprovise
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Ad

More from Marcos Rebelo (6)

Coordinating external data importer services using AWS step functions
Coordinating external data importer services using AWS step functionsCoordinating external data importer services using AWS step functions
Coordinating external data importer services using AWS step functions
Marcos Rebelo
 
Mojolicious
MojoliciousMojolicious
Mojolicious
Marcos Rebelo
 
Perl5i
Perl5iPerl5i
Perl5i
Marcos Rebelo
 
Modern Perl
Modern PerlModern Perl
Modern Perl
Marcos Rebelo
 
Perl Introduction
Perl IntroductionPerl Introduction
Perl Introduction
Marcos Rebelo
 
Perl In The Command Line
Perl In The Command LinePerl In The Command Line
Perl In The Command Line
Marcos Rebelo
 
Coordinating external data importer services using AWS step functions
Coordinating external data importer services using AWS step functionsCoordinating external data importer services using AWS step functions
Coordinating external data importer services using AWS step functions
Marcos Rebelo
 
Perl In The Command Line
Perl In The Command LinePerl In The Command Line
Perl In The Command Line
Marcos Rebelo
 
Ad

Recently uploaded (19)

Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 

Store stream data on Data Lake

  • 2. Principal Data Architect at Home24 Data Services: Search, Recommendations, Ranking Worked on: Here Maps, Sapo.pt, DataJet, Xing, … Scala, Perl, Prolog, Java, SQL, R, … AWS: Step-Functions, Lambda Function, EMR, EC2, Batch, SQS, SNS, Firehose, Athena, API Gateway, ...
  • 5. ● 15 persons of 12 Nationalities ● Serverless Lovers. For data ingestion we have: ● AWS Technologies: Step-Functions, Cloud-Formation, Lambda Functions, Athena, EMR, Redshift, S3, ... Production Development Number of Lambdas 625 2311 Number of Step Function 113 490 Consumed time (a month) 3,383,525 sec (39 days) 5,371,037 sec (62 days) Number of requests (a month) 2,014,203 Requests 3,300,118 Requests
  • 6. ● Majority of our Streams are low rate messages ● The Big Stream doesn’t have an easily predictable rate of messages and can peak to 100 messages/sec ● We will have many more low rate Streams
  • 7. Main requirements ● Store new Stream Data in Raw S3 Bucket ● Refine Raw S3 Bucket data to a Refined S3 Bucket ● Wrong formatted messages shall not stop the flow ● Notification shall be sent on bad data ● Data must be refined in less than 10 minutes Other ● Able to replay many days of data fast ● For development, every developer shall be able to deploy his version independently
  • 8. Requirements ● Collect data from SNS ● The data must be stored as received in S3. ● Files size must be easy to process on Lambda (< 10MB) ● At least 1 file per minute must be created
  • 9. Requirements ● Collect data from SNS ● The data must be stored as received in S3. ● Files size must be easy to process on Lambda (< 10MB) ● At least 1 file per minute must be created Architecture ● A SQS Queue collects all data from the SNS
  • 10. Requirements ● Collect data from SNS ● The data must be stored as received in S3. ● Files size must be easy to process on Lambda (< 10MB) ● At least 1 file per minute must be created Architecture ● A SQS Queue collects all data from the SNS ● A Lambda copies the data from the SQS to a Firehose ● The Lambda Function is invoked once a minute via CloudWatch Event
  • 11. Requirements ● Collect data from SNS ● The data must be stored as received in S3. ● Files size must be easy to process on Lambda (< 10MB) ● At least 1 file per minute must be created Architecture ● A SQS Queue collects all data from the SNS ● A Lambda copies the data from the SQS to a Firehose ● The Lambda Function is invoked once a minute via CloudWatch Event ● Firehose merges the data and creates files on Raw S3 Bucket
  • 12. Requirement ● When some message are not processable, send a notification.
  • 13. Requirement ● When some message are not processable, send a notification. Architecture ● The data is deleted from the SQS Queue after successful copy to Firehose
  • 14. Requirement ● When some message are not processable, send a notification. Architecture ● The data is deleted from the SQS Queue after successful copy to Firehose ● On case of error, the messages will end on the Dead-Letter Queue
  • 15. Requirement ● When some message are not processable, send a notification. Architecture ● The data is deleted from the SQS Queue after successful copy to Firehose ● On case of error, the messages will end on the Dead-Letter Queue ● Non empty Dead-Letter SQS means there is an error on the data
  • 16. Requirement ● When some message are not processable, send a notification. Architecture ● The data is deleted from the SQS Queue after successful copy to Firehose ● On case of error, the messages will end on the Dead-Letter Queue ● Non empty Dead-Letter SQS means there is an error on the data ● After fixing the Lambda function, one can always copy the messages back to the Raw SQS
  • 17. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3
  • 18. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3 Architecture ● When a new file is created on Raw S3 Bucket a message is sent to SQS via SNS
  • 19. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3 Architecture ● When a new file is created on Raw S3 Bucket a message is sent to SQS via SNS ● The Lambda Function is invoked once a minute via CloudWatch Event and process all unprocessed files
  • 20. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3 Architecture ● When a new file is created on Raw S3 Bucket a message is sent to SQS via SNS ● The Lambda Function is invoked once a minute via CloudWatch Event and process all unprocessed files ● A file with the same key, as Raw file, is created on the Refine S3 Bucket
  • 21. Requirements ● Decompress data (zip, deflate, gz, base64, ...) ● Normalize fields (dates for example) ● Add metadata ● Convert all to JSON ● Stored on S3 Architecture ● When a new file is created on Raw S3 Bucket a message is sent to SQS via SNS ● The Lambda Function is invoked once a minute via CloudWatch Event and process all unprocessed files ● A file with the same key, as Raw file, is created on the Refine S3 Bucket ● Messages that fail to process will end on the Dead Letter Queue
  • 23. Requirements ● Replay multiple days of data Architecture ● Lambda Function List files on the Raw S3 Bucket and send messages to SQS
  • 24. Requirements ● Replay multiple days of data Architecture ● Lambda Function List files on the Raw S3 Bucket and send messages to SQS ● Since the files in Raw and Refine have the same key, the files will always overwrite the existing ones
  • 25. Requirements ● Replay multiple days of data Architecture ● Lambda Function List files on the Raw S3 Bucket and send messages to SQS ● Since the files in Raw and Refine have the same key, the files will always overwrite the existing ones ● The execution time of the Refiner Lambda will rise and the Refiner Lambdas will work in parallel
  • 26. Requirements ● Replay multiple days of data Architecture ● Lambda Function List files on the Raw S3 Bucket and send messages to SQS ● Since the files in Raw and Refine have the same key, the files will always overwrite the existing ones ● The execution time of the Refiner Lambda will rise and the Refiner Lambdas will work in parallelParallelism: ● our Lambda goes to ~190 sec, 3 lambdas running in parallel. ● 9198 S3 objects ● 30 GB of GZip data, 10GB/hour
  • 27. Requirement ● Developers shall be able to deploy their Stream Processors ● No interaction with external team shall be required
  • 28. Requirement ● Developers shall be able to deploy their Stream Processors ● No interaction with external team shall be required Architecture ● We created an internal SNS where we clone the external messages
  • 29. Requirement ● Developers shall be able to deploy their Stream Processors ● No interaction with external team shall be required Architecture ● We created an internal SNS where we clone the external messages ● SNS can write to multiple SQS
  • 30. Requirement ● Developers shall be able to deploy their Stream Processors ● No interaction with external team shall be required Architecture ● We created an internal SNS where we clone the external messages ● SNS can write to multiple SQS ● Same CloudFormation magic and every developer can deploy his own Environment
  • 31. EC2 Lambda CPU / Price 1 t2.nano (5% vCPU and 500MB) 0.0063*24*30 = 4.536$/month Considering 3 seconds a minute with the highest memory (2 vCPU and 1536 MB) 3*60*24*30*10*(0.000002501+0 .0000002) = 3.5$/month
  • 32. EC2 Lambda CPU / Price 1 t2.nano (5% vCPU and 500MB) 0.0063*24*30 = 4.536$/month Considering 3 seconds a minute with the highest memory (2 vCPU and 1536 MB) 3*60*24*30*10*(0.000002501+0 .0000002) = 3.5$/month Devops Higher Low
  • 33. EC2 Lambda CPU / Price 1 t2.nano (5% vCPU and 500MB) 0.0063*24*30 = 4.536$/month Considering 3 seconds a minute with the highest memory (2 vCPU and 1536 MB) 3*60*24*30*10*(0.000002501+0 .0000002) = 3.5$/month Devops Higher Low Scale Scale while it has credits to 1 vCPU. To have more vCPUs you need to use more expensive instance types or implement autoscaling Out of the box until a certain level. 2 vCPU * 5 Lambdas = 10 vCPUs
  • 34. EC2 Lambda CPU / Price 1 t2.nano (5% vCPU and 500MB) 0.0063*24*30 = 4.536$/month Considering 3 seconds a minute with the highest memory (2 vCPU and 1536 MB) 3*60*24*30*10*(0.000002501+0 .0000002) = 3.5$/month Devops Higher Low Scale Scale while it has credits to 1 vCPU. To have more vCPUs you need to use more expensive instance types or implement autoscaling Out of the box until a certain level. 2 vCPU * 5 Lambdas = 10 vCPUs Price wise, lambda seems a good solution. For our problems, 10 vCPUs is clearly more than enough.
  • 35. Kinesys SQS Slow stream 2 Shards 24.5$/month Puts 0.042$/Month Requests 2.07$/month We analyze our 2 types of stream of data: ● Slow Stream: 1 message/sec (2.6 million requests/month) On SQS you pay PUTs and GETs on Kinesys you pay PUTs
  • 36. Kinesys SQS Slow stream 2 Shards 24.5$/month Puts 0.042$/Month Requests 2.07$/month Fast stream 3 Shards 36.7$/month Puts 1.1$/month Requests 51.8$/month We analyze our 2 types of stream of data: ● Slow Stream: 1 message/sec (2.6 million requests/month) ● Fast Stream: 25 message/second (64.8 million requests/month) with spikes of 100 message/second On SQS you pay PUTs and GETs on Kinesys you pay PUTs
  • 37. Kinesys SQS Slow stream 2 Shards 24.5$/month Puts 0.042$/Month Requests 2.07$/month Fast stream 3 Shards 36.7$/month Puts 1.1$/month Requests 51.8$/month Errors Errors have to be controlled externally Errors go to DeadLeter Queue We analyze our 2 types of stream of data: ● Slow Stream: 1 message/sec (2.6 million requests/month) ● Fast Stream: 25 message/second (64.8 million requests/month) with spikes of 100 message/second On SQS you pay PUTs and GETs on Kinesys you pay PUTs
  • 38. ● You just pay for what you use ● Scalability is not an issue at our messages volume (top 100 messages/second) ○ SQS and Firehose can easily process that volume of messages ○ Multiple Lambdas can work in parallel in case of high traffic or replay. ● Separated Lambdas by Stream help understanding the logs ● Separated environments simplify developers work ● Data is on S3 and it can be queried via Athena, EMR, Redshift Spectrum, ...