SlideShare a Scribd company logo
Introduction to
Streaming & Messaging
AWS Big Data Demystified
Flume ,Kafka,SQS,Kinesis streams & firehose & Analytics
Omid Vahdaty, Big Data Ninja
What is batch Processing?
the execution of a series of programs each on a set or "batch" of inputs, rather than a single input (which
would instead be a custom job
What is Streaming ?
Streaming Data is data that is generated continuously by thousands of data sources, which typically send in
the data records simultaneously, and in small sizes (order of Kilobytes)
Streaming VS. Batch Processing
Batch Stream
Data Scope Query the entire batch, with slight delay Query most recent events
defined in a time window.
Data Size Large data sets A few Individual records
Latency? Minutes ,hours Seconds, Milliseconds
Analysis Complex Analytics Basic: aggregations, metrics etc.
Streaming vs messaging
● Stream - when you need to do complex analytics in flight - e.g vote
application. it's about processing infinite input stream (in contrast to batch processing that is applied to finite
inputs).
● Message - when you need to do per event - an operation. - e.g log
● https://stackoverflow.com/questions/41744506/difference-between-stream-processing-and-message-processing
Challenges with Streaming Data
● Processing layer
○ Consuming data
○ Processing data
○ Notifying storage layer what to do.
● Storage layer
○ Ordering mechanism
○ Strong Consistency mechanism
● In general MUST have features:
○ scalability
○ data durability
○ fault tolerance
Messaging VS Streaming?
● Messaging: framed
message based
protocol.
● E.g 3 messages sent will
look like:
○ Hello world
○ Hello world
○ Hello world
● Streaming: unframed
data (bytes) stream
based protocol
● E.g 3 messages sent
will look like:
○ Hell
○ ow wo
○ rld Hel
○ low wor
○ ldHellow wo
○ rld
Messaging
Open Source: Kafka, flume
AWS: SQS
Flume
Flume Pros:
● Good documentation with many existing implementation
patterns to follow
● Easy integration with existing monitoring framework
● Integration with Cloudera Manager to monitor Flume
Flume
Flume Cons:
● Event rather that stream centric
● Calculating capacity is not an exact science but rather
confirmed through trials
● Throughput is dependent on the channel backing store.
● Flume lacks the clear scaling and resiliency configurations
(trivial with Kafka and Kinesis)
Kafka
Kafka Pros:
● High achievable ingest rates with clear scaling pattern
● High resiliency via distributed replicas with little impact on throughput
Kafka Cons:
● No current framework for monitoring and configuring producers
Flume VS. Kafka
Flume Kafka
Choose when you desire No need for customization.
Need out of the box
components such HDFS
sink
Need a custom made high
availability delivery system
Velocity high higher
Event processing
Flume Kafka
Original Motivation distributed, reliable, and
available system for efficiently
collecting, aggregating and
moving large amounts of log
data from many different
sources to a centralized data
store. Built around hadoop
ecosystem
general purpose
distributed publish-
subscribe messaging
system Multi-consumer
ultra-high availability
messaging system.
Data Flow push pull
event availability JDBC Databases
Channel, file Channel.
Loose flume agent =
losing data.
replication of your
events data by design.
Commercial support Cloudera Cloudera
Collectors built in Yes. just the messaging
Use Case: Kafka and Flume combined
● Flume supports: Kafka source, Kafka channel, Kafka sink
● So, take the advantage of both and combine them to your needs.
Use Case: Kafka as a Channel
AWS SQS
● a fast, reliable, scalable, fully managed message queuing service
● decouple the components of a cloud application, move data between diverse, distributed
application components without losing messages and without requiring each component to be
always available.
● high throughput and at-least-once processing, and FIFO queues
● all messages are stored redundantly across multiple servers and data centers.
● Start with three API calls : SendMessage, ReceiveMessage, and DeleteMessage. Additional
APIs are available to provide advanced functionality.
● Queues
○ Standard queues offer maximum throughput, best-effort ordering, and at-least-once
delivery.
○ FIFO queues are designed to ensure strict ordering and exactly-once processing, with
limited throughput.
● scales dynamically
● Authentication mechanisms
AWS SQS use cases
● Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have
a queue of work items and want to track the successful completion of each item independently. Amazon
SQS tracks the ack/fail, so the application does not have to maintain a persistent checkpoint/cursor.
Amazon SQS will delete acked messages and redeliver failed messages after a configured visibility
timeout.
● Individual message delay. For example, you have a job queue and need to schedule individual jobs
with a delay. With Amazon SQS, you can configure individual messages to have a delay of up to 15
minutes.
● Dynamically increasing concurrency/throughput at read time. For example, you have a work
queue and want to add more readers until the backlog is cleared. With Amazon Kinesis, you can scale
up to a sufficient number of shards (note, however, that you'll need to provision enough shards ahead
of time).
● Leveraging Amazon SQS’s ability to scale transparently. For example, you buffer requests and
the load changes as a result of occasional load spikes or the natural growth of your business. Because
each buffered request can be processed independently.
Typical SQL use case: decoupling APP layers.
Streaming
AWS Kinetics:
Streams,Firehose,Analytics
AWS Kinesis (streams)
● build custom applications that process or analyze streams
● continuously capture and store terabytes of data per hour
● Hundreds sources
● allows for real-time data processing
● Easy to use, get started in minutes
○ Kinesis Client Library
○ Kinesis Producer Library
● allows you to have multiple Applications processing the same stream concurrently.
● The throughput can scale from megabytes to terabytes per hour
● synchronously replicates your streaming data across three AZ
● preserves your data for up to 7 days
AWS Kinesis (streams) use cases
● Log and Event collection
● Mobile Data collection
● Real Time Analytics
○ when loading data from transactional databases into data warehouses.
○ Multi-stage processing using specialized algorithms
○ stream partitioning for finer control over scaling
● Gaming Data feed
AWS Kinesis (streams) use cases
● Routing related records to the same record processor (as in
streaming MapReduce). For example, counting and aggregation
are simpler when all records for a given key are routed to
the same record processor.
● Ordering of records. For example, you want to transfer log data
from the application host to the processing/archival host while
maintaining the order of log statements.
● Ability for multiple applications to consume the same stream
concurrently. For example, you have one application that updates
a real-time dashboard and another that archives data to Amazon
Redshift. You want both applications to consume data from the
same stream concurrently and independently.
● Ability to consume records in the same order a few hours
later. For example, you have a billing application and an audit
application that runs a few hours behind the billing application.
Because Amazon Kinesis stores data for up to 24 hours, you can run
the audit application up to 24 hours behind the billing application.
AWS Kinesis (streams)
Kinesis Pros:
● High achievable ingest rates with clear scaling pattern
● Similar throughput and resiliency characteristics to Kafka
● Integrates with other AWS services like EMR and Data Pipeline.
Kinesis Cons:
● No current framework for monitoring and configuring producers
● Cloud service only. Possible increase in latency from source to Kinesis.
AWS Kinesis (streams)
AWS Kinesis Firehose
● the easiest way to load streaming data into AWS.
● capture, transform, and load streaming data
○ integrates into Kinesis Analytics, S3, Redshift, Elasticsearch Service
○ Serverless Transformation on RAW data. (lambda function)
■ E.g transform log file into CSV format
● Firehose can back up all untransformed records to your S3 bucket concurrently while delivering transformed records to
the destination. You can enable source record backup
● enabling near real-time analytics
● Easy to use.
● Monitoring options.
● Limits
○ 20 stream per regions
○ Each stream
■ 2000 transaction per sec
■ 5000 records per sec
■ 5MB/s
■ Support 24 hours replay in cases on downtime
Kinesis Firehose agent
● Java software app that send data to streams/firehose
● monitors a set of files for new data and then sends streams/firehose
● It handles file rotation, checkpointing, and retrial upon failures.
● supports Amazon CloudWatch so that you can closely monitor and troubleshoot the data flow from
the agent.
● Data processing options:
○ SINGLELINE – This option converts a multi-line record to a single line record by removing
newline characters, and leading and trailing spaces.
○ CSVTOJSON – This option converts a record from delimiter separated format to JSON
format.
○ LOGTOJSON – This option converts a record from several commonly used log formats to
JSON format. Currently supported log formats are Apache Common Log, Apache Combined
Log, Apache Error Log, and RFC3164 (syslog).
● https://github.com/awslabs/amazon-kinesis-agent
● Amazon Kinesis Firehose will only output to Amazon S3 buckets and Amazon Redshift clusters in the same region.
Write a JAVA agent to Firehose
● AWS java SDK
● Firehose API
○ Single record: PutRecord
○ Batch: PutRecordBatch.
● Key concepts:
○ Firehose delivery stream
○ Data producer - i.e web server creating log.
○ Record: The data of interest that your data producer sends to a Firehose delivery stream. A record can be as
large as 1000 KB.
○ buffer size (in MB )
○ buffer interval (seconds)
● Java examples:
○ http://docs.aws.amazon.com/firehose/latest/dev/writing-with-sdk.html
Firehose and redshift use case
Streams VS. Firehose
● Data producers: logs , web, mobile
● Data consumers: EMR , S3, redshift
● Stream delivery : streams & firehose
● Key concepts for Streams:
■ Basic unit: Shard
● 1MB ingerss, 2MB egress per shard
● 10 shards? X10 performance.
● default limit of 10 shards per region
● no limit to the number of shards or streams in an account.
● Partition keys are used to identify different shards in a stream
● Sequence numbers are unique identifiers for records inserted into a shard. They increase
monotonically, and are specific to individual shards.
● Streaming data is replicated by Kinesis across three separate availability zones
● data is available in a stream for 24 hours
■ Streams API to control scale. Upt TBs per hours
■ Monitoring is available through Amazon Cloudwatch.
Streams VS. Firehose
● Data producers: logs , web, mobile
● Data consumers: EMR , S3, redshift
● Stream delivery : streams & firehose
● Key concepts for Firehose:
■ can scale to gigabytes of streaming data per second
■ batching, encrypting and compressing of data
■ automatically scale to meet demand, which is in contrast to Kinesis Streams
Stream VS. Firehose
Streams Firehose
Purpose real-time processing of streaming big data/.
"real time" "custom"
real-time processing of streaming big
data, It builds on the existing Kinesis
framework "Zero Administration" "Direct"
no need to write code.
Loading methods HTTPS, the Kinesis Producer Library, the
Kinesis Client Library, and the Kinesis Agent,
Java SDK
HTTPS, the Kinesis Producer Library, the
Kinesis Client Library, and the Kinesis
Agent, Java SDK
Transform methods Encryption, compression, Lambda
Stream VS. Firehose
Streams Firehose
Target S3.redshift, DynamoDB, elasticsearch,
Apache Storm, kibana
S3.redshift
Replay Default 24 hours, up to 7 days, data
replication to 3 AZ automatically.
Monitoring Cloud Watch Cloud Watch
Scaling manual Automatic
Kinesis Streams VS. SQS
Kinesis streams SQS
Purpose real-time processing of streaming big data message queue to store messages
transmitted between distributed
application components.
routing of records using a given key, ordering of
records, the ability for multiple clients to read
messages from the same stream concurrently,
messaging semantics so that your
application can track the
successful completion of work
items in a queue
Scale manual Auto
redundancy 3 AZ by default, replay of messages up to 7 days
Kinesis Analytics : in-flight analytics.
● process streaming data in real time with standard SQL
● Amazon Kinesis Analytics enables you to create and run SQL queries on streaming data
● Easy 3 steps
1. Configure Input stream (kinesis stream, kinesis firehose)
a. Automatically created Schema
b. Manually change schema if you like
2. Write SQL query
3. Configure output stream: s3, redshift, elastics search
● Elastic: scale up down
● Managed service
● Standard SQL
Kinesis Analytics : in-flight analytics.
Stay in touch...
● Omid Vahdaty
● +972-54-2384178
● https://amazon-aws-big-data-demystified.ninja/
● https://www.meetup.com/AWS-Big-Data-Demystified/
● https://www.facebook.com/groups/amazon.aws.big.da
ta.demystified/
Ad

More Related Content

What's hot (20)

Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
ScyllaDB
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
DataStax Academy
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
ScyllaDB
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
Data Con LA
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
Acunu
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay Search
Cloudera, Inc.
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your Needs
ScyllaDB
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
ScyllaDB
 
Cassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analyticsCassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analytics
Anirvan Chakraborty
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
Ryan Bosshart
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
DataStax
 
What Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoWhat Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and Go
ScyllaDB
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasift
huguk
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
Data Con LA
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
ScyllaDB
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
DataStax Academy
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
ScyllaDB
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
Data Con LA
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
Acunu
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay Search
Cloudera, Inc.
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your Needs
ScyllaDB
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
ScyllaDB
 
Cassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analyticsCassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analytics
Anirvan Chakraborty
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
DataStax
 
What Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoWhat Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and Go
ScyllaDB
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasift
huguk
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
Data Con LA
 

Similar to Amazon aws big data demystified | Introduction to streaming and messaging flume kafka sqs kinesis (20)

Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Omid Vahdaty
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recap
CloudHesive
 
1.0 - AWS-DAS-Collection-Kinesis.pdf
1.0 - AWS-DAS-Collection-Kinesis.pdf1.0 - AWS-DAS-Collection-Kinesis.pdf
1.0 - AWS-DAS-Collection-Kinesis.pdf
SreeGe1
 
AWS 2017 re:Invent re:Cap - TriNimbus Presentation Slides
AWS 2017 re:Invent re:Cap - TriNimbus Presentation SlidesAWS 2017 re:Invent re:Cap - TriNimbus Presentation Slides
AWS 2017 re:Invent re:Cap - TriNimbus Presentation Slides
TriNimbus
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Monal Daxini
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
Lars Albertsson
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
RenjithPillai26
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
Federico Palladoro
 
Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?
ScyllaDB
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Emprovise
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
Amazon Web Services LATAM
 
CloudCamp Athens presentation: Introduction to cloud computing
CloudCamp Athens presentation: Introduction to cloud computingCloudCamp Athens presentation: Introduction to cloud computing
CloudCamp Athens presentation: Introduction to cloud computing
Fotis Stamatelopoulos
 
re:Invent 2018 re:Cap for Toronto AWS User Group
re:Invent 2018 re:Cap for Toronto AWS User Groupre:Invent 2018 re:Cap for Toronto AWS User Group
re:Invent 2018 re:Cap for Toronto AWS User Group
Daniel Zivkovic
 
AWS Chicago user group - October 2015 "reInvent Replay"
AWS Chicago user group - October 2015 "reInvent Replay"AWS Chicago user group - October 2015 "reInvent Replay"
AWS Chicago user group - October 2015 "reInvent Replay"
Cohesive Networks
 
AWS SECURITY STATAGIES AND FRAMEWORK PRINCIPLES
AWS SECURITY STATAGIES AND FRAMEWORK PRINCIPLESAWS SECURITY STATAGIES AND FRAMEWORK PRINCIPLES
AWS SECURITY STATAGIES AND FRAMEWORK PRINCIPLES
jldavis3
 
AWS basics
AWS basicsAWS basics
AWS basics
mbaric
 
Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
Federico Feroldi
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Omid Vahdaty
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recap
CloudHesive
 
1.0 - AWS-DAS-Collection-Kinesis.pdf
1.0 - AWS-DAS-Collection-Kinesis.pdf1.0 - AWS-DAS-Collection-Kinesis.pdf
1.0 - AWS-DAS-Collection-Kinesis.pdf
SreeGe1
 
AWS 2017 re:Invent re:Cap - TriNimbus Presentation Slides
AWS 2017 re:Invent re:Cap - TriNimbus Presentation SlidesAWS 2017 re:Invent re:Cap - TriNimbus Presentation Slides
AWS 2017 re:Invent re:Cap - TriNimbus Presentation Slides
TriNimbus
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Monal Daxini
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
Lars Albertsson
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
RenjithPillai26
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
Federico Palladoro
 
Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?Captial One: Why Stream Data as Part of Data Transformation?
Captial One: Why Stream Data as Part of Data Transformation?
ScyllaDB
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Emprovise
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
Amazon Web Services LATAM
 
CloudCamp Athens presentation: Introduction to cloud computing
CloudCamp Athens presentation: Introduction to cloud computingCloudCamp Athens presentation: Introduction to cloud computing
CloudCamp Athens presentation: Introduction to cloud computing
Fotis Stamatelopoulos
 
re:Invent 2018 re:Cap for Toronto AWS User Group
re:Invent 2018 re:Cap for Toronto AWS User Groupre:Invent 2018 re:Cap for Toronto AWS User Group
re:Invent 2018 re:Cap for Toronto AWS User Group
Daniel Zivkovic
 
AWS Chicago user group - October 2015 "reInvent Replay"
AWS Chicago user group - October 2015 "reInvent Replay"AWS Chicago user group - October 2015 "reInvent Replay"
AWS Chicago user group - October 2015 "reInvent Replay"
Cohesive Networks
 
AWS SECURITY STATAGIES AND FRAMEWORK PRINCIPLES
AWS SECURITY STATAGIES AND FRAMEWORK PRINCIPLESAWS SECURITY STATAGIES AND FRAMEWORK PRINCIPLES
AWS SECURITY STATAGIES AND FRAMEWORK PRINCIPLES
jldavis3
 
AWS basics
AWS basicsAWS basics
AWS basics
mbaric
 
Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
Federico Feroldi
 
Ad

More from Omid Vahdaty (20)

Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
Omid Vahdaty
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
 
The technology of fake news between a new front and a new frontier | Big Dat...
The technology of fake news  between a new front and a new frontier | Big Dat...The technology of fake news  between a new front and a new frontier | Big Dat...
The technology of fake news between a new front and a new frontier | Big Dat...
Omid Vahdaty
 
Making your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedMaking your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data Demystified
Omid Vahdaty
 
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
Omid Vahdaty
 
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
Omid Vahdaty
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data Demystified
Omid Vahdaty
 
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
Omid Vahdaty
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
Omid Vahdaty
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
Omid Vahdaty
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystified
Omid Vahdaty
 
Aws s3 security
Aws s3 securityAws s3 security
Aws s3 security
Omid Vahdaty
 
Introduction to aws dynamo db
Introduction to aws dynamo dbIntroduction to aws dynamo db
Introduction to aws dynamo db
Omid Vahdaty
 
Hive vs. Impala
Hive vs. ImpalaHive vs. Impala
Hive vs. Impala
Omid Vahdaty
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
Omid Vahdaty
 
Cloud Architecture best practices
Cloud Architecture best practicesCloud Architecture best practices
Cloud Architecture best practices
Omid Vahdaty
 
Multi Cloud Challanges Review
Multi Cloud Challanges ReviewMulti Cloud Challanges Review
Multi Cloud Challanges Review
Omid Vahdaty
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
Omid Vahdaty
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
 
The technology of fake news between a new front and a new frontier | Big Dat...
The technology of fake news  between a new front and a new frontier | Big Dat...The technology of fake news  between a new front and a new frontier | Big Dat...
The technology of fake news between a new front and a new frontier | Big Dat...
Omid Vahdaty
 
Making your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedMaking your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data Demystified
Omid Vahdaty
 
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
Omid Vahdaty
 
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
Omid Vahdaty
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data Demystified
Omid Vahdaty
 
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
Omid Vahdaty
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
Omid Vahdaty
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
Omid Vahdaty
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystified
Omid Vahdaty
 
Introduction to aws dynamo db
Introduction to aws dynamo dbIntroduction to aws dynamo db
Introduction to aws dynamo db
Omid Vahdaty
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
Omid Vahdaty
 
Cloud Architecture best practices
Cloud Architecture best practicesCloud Architecture best practices
Cloud Architecture best practices
Omid Vahdaty
 
Multi Cloud Challanges Review
Multi Cloud Challanges ReviewMulti Cloud Challanges Review
Multi Cloud Challanges Review
Omid Vahdaty
 
Ad

Recently uploaded (20)

Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 

Amazon aws big data demystified | Introduction to streaming and messaging flume kafka sqs kinesis

  • 1. Introduction to Streaming & Messaging AWS Big Data Demystified Flume ,Kafka,SQS,Kinesis streams & firehose & Analytics Omid Vahdaty, Big Data Ninja
  • 2. What is batch Processing? the execution of a series of programs each on a set or "batch" of inputs, rather than a single input (which would instead be a custom job
  • 3. What is Streaming ? Streaming Data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes)
  • 4. Streaming VS. Batch Processing Batch Stream Data Scope Query the entire batch, with slight delay Query most recent events defined in a time window. Data Size Large data sets A few Individual records Latency? Minutes ,hours Seconds, Milliseconds Analysis Complex Analytics Basic: aggregations, metrics etc.
  • 5. Streaming vs messaging ● Stream - when you need to do complex analytics in flight - e.g vote application. it's about processing infinite input stream (in contrast to batch processing that is applied to finite inputs). ● Message - when you need to do per event - an operation. - e.g log ● https://stackoverflow.com/questions/41744506/difference-between-stream-processing-and-message-processing
  • 6. Challenges with Streaming Data ● Processing layer ○ Consuming data ○ Processing data ○ Notifying storage layer what to do. ● Storage layer ○ Ordering mechanism ○ Strong Consistency mechanism ● In general MUST have features: ○ scalability ○ data durability ○ fault tolerance
  • 7. Messaging VS Streaming? ● Messaging: framed message based protocol. ● E.g 3 messages sent will look like: ○ Hello world ○ Hello world ○ Hello world ● Streaming: unframed data (bytes) stream based protocol ● E.g 3 messages sent will look like: ○ Hell ○ ow wo ○ rld Hel ○ low wor ○ ldHellow wo ○ rld
  • 9. Flume Flume Pros: ● Good documentation with many existing implementation patterns to follow ● Easy integration with existing monitoring framework ● Integration with Cloudera Manager to monitor Flume
  • 10. Flume Flume Cons: ● Event rather that stream centric ● Calculating capacity is not an exact science but rather confirmed through trials ● Throughput is dependent on the channel backing store. ● Flume lacks the clear scaling and resiliency configurations (trivial with Kafka and Kinesis)
  • 11. Kafka Kafka Pros: ● High achievable ingest rates with clear scaling pattern ● High resiliency via distributed replicas with little impact on throughput Kafka Cons: ● No current framework for monitoring and configuring producers
  • 12. Flume VS. Kafka Flume Kafka Choose when you desire No need for customization. Need out of the box components such HDFS sink Need a custom made high availability delivery system Velocity high higher Event processing
  • 13. Flume Kafka Original Motivation distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Built around hadoop ecosystem general purpose distributed publish- subscribe messaging system Multi-consumer ultra-high availability messaging system. Data Flow push pull event availability JDBC Databases Channel, file Channel. Loose flume agent = losing data. replication of your events data by design. Commercial support Cloudera Cloudera Collectors built in Yes. just the messaging
  • 14. Use Case: Kafka and Flume combined ● Flume supports: Kafka source, Kafka channel, Kafka sink ● So, take the advantage of both and combine them to your needs.
  • 15. Use Case: Kafka as a Channel
  • 16. AWS SQS ● a fast, reliable, scalable, fully managed message queuing service ● decouple the components of a cloud application, move data between diverse, distributed application components without losing messages and without requiring each component to be always available. ● high throughput and at-least-once processing, and FIFO queues ● all messages are stored redundantly across multiple servers and data centers. ● Start with three API calls : SendMessage, ReceiveMessage, and DeleteMessage. Additional APIs are available to provide advanced functionality. ● Queues ○ Standard queues offer maximum throughput, best-effort ordering, and at-least-once delivery. ○ FIFO queues are designed to ensure strict ordering and exactly-once processing, with limited throughput. ● scales dynamically ● Authentication mechanisms
  • 17. AWS SQS use cases ● Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have a queue of work items and want to track the successful completion of each item independently. Amazon SQS tracks the ack/fail, so the application does not have to maintain a persistent checkpoint/cursor. Amazon SQS will delete acked messages and redeliver failed messages after a configured visibility timeout. ● Individual message delay. For example, you have a job queue and need to schedule individual jobs with a delay. With Amazon SQS, you can configure individual messages to have a delay of up to 15 minutes. ● Dynamically increasing concurrency/throughput at read time. For example, you have a work queue and want to add more readers until the backlog is cleared. With Amazon Kinesis, you can scale up to a sufficient number of shards (note, however, that you'll need to provision enough shards ahead of time). ● Leveraging Amazon SQS’s ability to scale transparently. For example, you buffer requests and the load changes as a result of occasional load spikes or the natural growth of your business. Because each buffered request can be processed independently.
  • 18. Typical SQL use case: decoupling APP layers.
  • 20. AWS Kinesis (streams) ● build custom applications that process or analyze streams ● continuously capture and store terabytes of data per hour ● Hundreds sources ● allows for real-time data processing ● Easy to use, get started in minutes ○ Kinesis Client Library ○ Kinesis Producer Library ● allows you to have multiple Applications processing the same stream concurrently. ● The throughput can scale from megabytes to terabytes per hour ● synchronously replicates your streaming data across three AZ ● preserves your data for up to 7 days
  • 21. AWS Kinesis (streams) use cases ● Log and Event collection ● Mobile Data collection ● Real Time Analytics ○ when loading data from transactional databases into data warehouses. ○ Multi-stage processing using specialized algorithms ○ stream partitioning for finer control over scaling ● Gaming Data feed
  • 22. AWS Kinesis (streams) use cases ● Routing related records to the same record processor (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor. ● Ordering of records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements. ● Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently. ● Ability to consume records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis stores data for up to 24 hours, you can run the audit application up to 24 hours behind the billing application.
  • 23. AWS Kinesis (streams) Kinesis Pros: ● High achievable ingest rates with clear scaling pattern ● Similar throughput and resiliency characteristics to Kafka ● Integrates with other AWS services like EMR and Data Pipeline. Kinesis Cons: ● No current framework for monitoring and configuring producers ● Cloud service only. Possible increase in latency from source to Kinesis.
  • 25. AWS Kinesis Firehose ● the easiest way to load streaming data into AWS. ● capture, transform, and load streaming data ○ integrates into Kinesis Analytics, S3, Redshift, Elasticsearch Service ○ Serverless Transformation on RAW data. (lambda function) ■ E.g transform log file into CSV format ● Firehose can back up all untransformed records to your S3 bucket concurrently while delivering transformed records to the destination. You can enable source record backup ● enabling near real-time analytics ● Easy to use. ● Monitoring options. ● Limits ○ 20 stream per regions ○ Each stream ■ 2000 transaction per sec ■ 5000 records per sec ■ 5MB/s ■ Support 24 hours replay in cases on downtime
  • 26. Kinesis Firehose agent ● Java software app that send data to streams/firehose ● monitors a set of files for new data and then sends streams/firehose ● It handles file rotation, checkpointing, and retrial upon failures. ● supports Amazon CloudWatch so that you can closely monitor and troubleshoot the data flow from the agent. ● Data processing options: ○ SINGLELINE – This option converts a multi-line record to a single line record by removing newline characters, and leading and trailing spaces. ○ CSVTOJSON – This option converts a record from delimiter separated format to JSON format. ○ LOGTOJSON – This option converts a record from several commonly used log formats to JSON format. Currently supported log formats are Apache Common Log, Apache Combined Log, Apache Error Log, and RFC3164 (syslog). ● https://github.com/awslabs/amazon-kinesis-agent ● Amazon Kinesis Firehose will only output to Amazon S3 buckets and Amazon Redshift clusters in the same region.
  • 27. Write a JAVA agent to Firehose ● AWS java SDK ● Firehose API ○ Single record: PutRecord ○ Batch: PutRecordBatch. ● Key concepts: ○ Firehose delivery stream ○ Data producer - i.e web server creating log. ○ Record: The data of interest that your data producer sends to a Firehose delivery stream. A record can be as large as 1000 KB. ○ buffer size (in MB ) ○ buffer interval (seconds) ● Java examples: ○ http://docs.aws.amazon.com/firehose/latest/dev/writing-with-sdk.html
  • 29. Streams VS. Firehose ● Data producers: logs , web, mobile ● Data consumers: EMR , S3, redshift ● Stream delivery : streams & firehose ● Key concepts for Streams: ■ Basic unit: Shard ● 1MB ingerss, 2MB egress per shard ● 10 shards? X10 performance. ● default limit of 10 shards per region ● no limit to the number of shards or streams in an account. ● Partition keys are used to identify different shards in a stream ● Sequence numbers are unique identifiers for records inserted into a shard. They increase monotonically, and are specific to individual shards. ● Streaming data is replicated by Kinesis across three separate availability zones ● data is available in a stream for 24 hours ■ Streams API to control scale. Upt TBs per hours ■ Monitoring is available through Amazon Cloudwatch.
  • 30. Streams VS. Firehose ● Data producers: logs , web, mobile ● Data consumers: EMR , S3, redshift ● Stream delivery : streams & firehose ● Key concepts for Firehose: ■ can scale to gigabytes of streaming data per second ■ batching, encrypting and compressing of data ■ automatically scale to meet demand, which is in contrast to Kinesis Streams
  • 31. Stream VS. Firehose Streams Firehose Purpose real-time processing of streaming big data/. "real time" "custom" real-time processing of streaming big data, It builds on the existing Kinesis framework "Zero Administration" "Direct" no need to write code. Loading methods HTTPS, the Kinesis Producer Library, the Kinesis Client Library, and the Kinesis Agent, Java SDK HTTPS, the Kinesis Producer Library, the Kinesis Client Library, and the Kinesis Agent, Java SDK Transform methods Encryption, compression, Lambda
  • 32. Stream VS. Firehose Streams Firehose Target S3.redshift, DynamoDB, elasticsearch, Apache Storm, kibana S3.redshift Replay Default 24 hours, up to 7 days, data replication to 3 AZ automatically. Monitoring Cloud Watch Cloud Watch Scaling manual Automatic
  • 33. Kinesis Streams VS. SQS Kinesis streams SQS Purpose real-time processing of streaming big data message queue to store messages transmitted between distributed application components. routing of records using a given key, ordering of records, the ability for multiple clients to read messages from the same stream concurrently, messaging semantics so that your application can track the successful completion of work items in a queue Scale manual Auto redundancy 3 AZ by default, replay of messages up to 7 days
  • 34. Kinesis Analytics : in-flight analytics. ● process streaming data in real time with standard SQL ● Amazon Kinesis Analytics enables you to create and run SQL queries on streaming data ● Easy 3 steps 1. Configure Input stream (kinesis stream, kinesis firehose) a. Automatically created Schema b. Manually change schema if you like 2. Write SQL query 3. Configure output stream: s3, redshift, elastics search ● Elastic: scale up down ● Managed service ● Standard SQL
  • 35. Kinesis Analytics : in-flight analytics.
  • 36. Stay in touch... ● Omid Vahdaty ● +972-54-2384178 ● https://amazon-aws-big-data-demystified.ninja/ ● https://www.meetup.com/AWS-Big-Data-Demystified/ ● https://www.facebook.com/groups/amazon.aws.big.da ta.demystified/