SlideShare a Scribd company logo
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Radhika Ravirala, Solutions Architect, AWS
August 17, 2017
Serverless Big Data Architectures
Serverless Data Analytics
Agenda
Cloud Architecture Evolution – Why Serverless
Data and Analytics Flow
Key Services Overview
Design Patterns
Call to Action
Cloud Architecture Evolution
Virtualized Managed Serverless
Virtualized
Servers
Managed
Platforms
Serverless
Analytics
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless characteristics
Data and Analytics Flow
Ingest/
Collect
Store
Analyze/
Process
Visualization/
Consume
Orchestrate/Transform
What Is the Temperature of Your Data / Access ?
Orchestration/Transform
AWS Big Data Services
Ingest/ Collect Store Analyze/ Process
Visualization/
Consume
Batch
ETL/ELT
Realtime
ETL/ELT
Transactional
/ CDC
B.I. Tools
Data Science
Notebooks
Bulk Transport
File/Object Upload
Streaming Ingest
Commits Transactional
NoSQL
Data Lake
Streaming Storage
Dashboards
Batch Analytics
Interactive
Querying
Machine Learning/
Deep Learning
Realtime Analytics
…
Orchestration/Transform
AWS Big Data Services
Ingest/ Collect Store Analyze/ Process
Visualization/
Consume
= Serverless
Serverless
Managed
Virtualized
Batch
ETL/ELT
Realtime
ETL/ELT
Transactional
/ CDC
B.I. Tools
Data Science
Notebooks
Bulk Transport
File/Object Upload
Streaming Ingest
Commits Transactional
NoSQL
Data Lake
Streaming Storage
Dashboards
Batch Analytics
Interactive
Querying
Machine Learning/
Deep Learning
Realtime Analytics
Orchestration/Transform
AWS Big Data Services
EMR EC2
S3
RedshiftDynamoDB
AWS DMS (CDC)
AWS Lambda
Kinesis Analytics Amazon Athena
Amazon
QuickSight
Aurora
AWS Glue AWS Step
Functions
Kinesis
Streams
Ingest/ Collect Store Analyze/ Process
Visualization/
Consume
AWS
Snowball
ISV
Connectors
Kinesis
Firehose
S3 Transfer
Acceleration
= Serverless
Amazon
ElasticSearc
h
Key Services Overview
Big Data Storage for Virtually All AWS Services
Amazon S3
• Store anything
• Object storage
• Scalable
• 99.999999999% durability
• Extremely low cost
Amazon
DynamoDB
Fast & Flexible NoSQL Database Service
• NoSQL Database
• Seamless scalability
• Zero admin
• Single digit millisecond latency
Amazon
Kinesis
Real-time Streaming Platform
• Streams, Firehose, Analytics
• Real-time processing
• High throughput; elastic
• Easy to use
• Integration with S3, EMR,
Redshift, DynamoDB
Amazon Kinesis
Streams
• For Technical Developers
• Build your own custom
applications that process
or analyze streaming
data
Amazon Kinesis
Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into S3, Amazon Redshift
and Amazon Elasticsearch
Amazon Kinesis
Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries
Amazon Kinesis: Streaming Data Made Easy
Services make it easy to capture, deliver and process streams on AWS
AWS Lambda
• Run your code in the cloud - fully
managed and highly-available
• Triggered through API or state
changes in your setup
• Scales automatically to match the
incoming event rate
• Node.js (JavaScript), Python, Java,
and C#
• Charged per 100ms execution time
Serverless Compute
Amazon
Athena
Interactive Query Service
• Query directly from
Amazon S3
• Use ANSI SQL
• Serverless
• Multiple Data Formats
• Pay per query
AWS Glue
Fully Managed ETL Service
• Catalog data sources
• Identify data formats & data types
• Error Handling
• Manage and scale resources
• Generate ETL code
• Schedules, executes ETL jobs
New !
AWS Glue: services
Data Catalog
 Hive metastore compatible metadata repository of data sources.
 Crawls data source to infer table, data type, partition format.
Job Execution
 Runs jobs in Spark containers – automatic scaling based on
SLA.
 Glue is serverless - only pay for the resources you consume.
Job Authoring
 Generates Python code to move data from source to destination.
 Edit with your favorite IDE; share code snippets using Git.
• Fast and cloud-powered
• Easy to use, no infrastructure to
manage
• Scales to 100s of thousands of
users
• Quick calculations with SPICE
• 1/10th the cost of legacy BI
software
Business Intelligence
Amazon
QuickSight
Serverless Design Patterns
Real-time Analytics
Producer
Apache
Kafka
KCL
AWS
Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Notifications
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alert
Analytics
Output KPI
Serverless
Managed
DynamoDB
Streams
Kinesis
Streams
Virtualized
Kinesis
Analytics
Ingest/ Collect Store Analyze/ Process
Visualization/
Consume
Apache
FlinkSQS
Interactive Queries
Ingest/ Collect Store Analyze/ Process
Visualization/
Consume
Producer Amazon S3
Amazon
Redshift
Amazon EMR
Presto
Impala
Spark
Interactive
Amazon
Athena
Serverless
Managed
Virtualized
QuickSight
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight Amazon AI EMR Redshift
Athena Kinesis RDS
Central Storage
Secure, cost-effective
Storage in Amazon S3
S3
Snowball Database Migration
Service
Kinesis Firehose Direct Connect
Data Ingestion
Get your data into S3
Quickly and securely
Protect and Secure
Use entitlements to ensure data is secure and users’ identities are verified
Processing & Analytics
Use of predictive and prescriptive
analytics to gain better understanding
Security Token
Service
CloudWatch CloudTrail Key Management
Service
Data Lake Reference Architecture
= Serverless
Amazon S3
Data Lake
Amazon Kinesis
Streams & Firehose
Hadoop / Spark
Streaming Analytics Tools
Amazon Redshift
Data Warehouse
Amazon DynamoDB
NoSQL Database
AWS Lambda
Spark Streaming
on EMR
Amazon
Elasticsearch Service
Relational Database
Amazon EMR
Amazon Aurora
Amazon Machine Learning
Predictive Analytics
Any Open Source Tool
of Choice on EC2
Data Science Sandbox
Visualization /
Reporting
Apache Storm
on EMR
Apache Flink
on EMR
Amazon Kinesis
Analytics
Serving Tier
Clusterless SQL Query
Amazon Athena
DataSourcesTransactionalData
Amazon Glue
Clusterless ETL
Amazon ElastiCache
Redis
Data Lake and
Real-time
Analytics
Serverless ETL
Store Transform Store Analyze/ Process
Visualize/
Consume
Amazon S3
Apache
Kafka
Kinesis
Streams Amazon EMR
Spark
Flink
AWS Glue
AWS Lambda
ISV
Amazon S3
Apache
Kafka
Redshift
Kinesis
Streams
Data CatalogAWS Glue
DynamoDB
Streams
DynamoDB Hive M/D
Serverless nicely fits into big data platforms
• AWS Serverless Big Data Services
• Complements existing big data flows
• Focus on the analytics and not on infrastructure or servers
• Don’t focus on the scaling, availability, and undifferentiated
heavy lifting
• Pay only for what you use
• Easily try out different tools, analytics, and solutions
DEMO
Serverless Big Data Architectures: Serverless Data Analytics

More Related Content

Recently uploaded (20)

PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 

Featured (20)

PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
Search Engine Journal
 
PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
Chiara Aliotta
 
PDF
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
PDF
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
SocialHRCamp
 
PDF
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
PDF
Everything You Need To Know About ChatGPT
Expeed Software
 
PDF
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
PDF
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
PDF
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
PDF
Skeleton Culture Code
Skeleton Technologies
 
PDF
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
PDF
Content Methodology: A Best Practices Report (Webinar)
contently
 
PPTX
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
PDF
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
PDF
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
PDF
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
PDF
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
PDF
Getting into the tech field. what next
Tessa Mero
 
PDF
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
PDF
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
2024 Trend Updates: What Really Works In SEO & Content Marketing
Search Engine Journal
 
Storytelling For The Web: Integrate Storytelling in your Design Process
Chiara Aliotta
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
SocialHRCamp
 
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Ad

Serverless Big Data Architectures: Serverless Data Analytics

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Radhika Ravirala, Solutions Architect, AWS August 17, 2017 Serverless Big Data Architectures Serverless Data Analytics
  • 2. Agenda Cloud Architecture Evolution – Why Serverless Data and Analytics Flow Key Services Overview Design Patterns Call to Action
  • 3. Cloud Architecture Evolution Virtualized Managed Serverless Virtualized Servers Managed Platforms Serverless Analytics
  • 4. No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in Serverless characteristics
  • 5. Data and Analytics Flow Ingest/ Collect Store Analyze/ Process Visualization/ Consume Orchestrate/Transform
  • 6. What Is the Temperature of Your Data / Access ?
  • 7. Orchestration/Transform AWS Big Data Services Ingest/ Collect Store Analyze/ Process Visualization/ Consume Batch ETL/ELT Realtime ETL/ELT Transactional / CDC B.I. Tools Data Science Notebooks Bulk Transport File/Object Upload Streaming Ingest Commits Transactional NoSQL Data Lake Streaming Storage Dashboards Batch Analytics Interactive Querying Machine Learning/ Deep Learning Realtime Analytics …
  • 8. Orchestration/Transform AWS Big Data Services Ingest/ Collect Store Analyze/ Process Visualization/ Consume = Serverless Serverless Managed Virtualized Batch ETL/ELT Realtime ETL/ELT Transactional / CDC B.I. Tools Data Science Notebooks Bulk Transport File/Object Upload Streaming Ingest Commits Transactional NoSQL Data Lake Streaming Storage Dashboards Batch Analytics Interactive Querying Machine Learning/ Deep Learning Realtime Analytics
  • 9. Orchestration/Transform AWS Big Data Services EMR EC2 S3 RedshiftDynamoDB AWS DMS (CDC) AWS Lambda Kinesis Analytics Amazon Athena Amazon QuickSight Aurora AWS Glue AWS Step Functions Kinesis Streams Ingest/ Collect Store Analyze/ Process Visualization/ Consume AWS Snowball ISV Connectors Kinesis Firehose S3 Transfer Acceleration = Serverless Amazon ElasticSearc h
  • 11. Big Data Storage for Virtually All AWS Services Amazon S3 • Store anything • Object storage • Scalable • 99.999999999% durability • Extremely low cost
  • 12. Amazon DynamoDB Fast & Flexible NoSQL Database Service • NoSQL Database • Seamless scalability • Zero admin • Single digit millisecond latency
  • 13. Amazon Kinesis Real-time Streaming Platform • Streams, Firehose, Analytics • Real-time processing • High throughput; elastic • Easy to use • Integration with S3, EMR, Redshift, DynamoDB
  • 14. Amazon Kinesis Streams • For Technical Developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift and Amazon Elasticsearch Amazon Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming Data Made Easy Services make it easy to capture, deliver and process streams on AWS
  • 15. AWS Lambda • Run your code in the cloud - fully managed and highly-available • Triggered through API or state changes in your setup • Scales automatically to match the incoming event rate • Node.js (JavaScript), Python, Java, and C# • Charged per 100ms execution time Serverless Compute
  • 16. Amazon Athena Interactive Query Service • Query directly from Amazon S3 • Use ANSI SQL • Serverless • Multiple Data Formats • Pay per query
  • 17. AWS Glue Fully Managed ETL Service • Catalog data sources • Identify data formats & data types • Error Handling • Manage and scale resources • Generate ETL code • Schedules, executes ETL jobs New !
  • 18. AWS Glue: services Data Catalog  Hive metastore compatible metadata repository of data sources.  Crawls data source to infer table, data type, partition format. Job Execution  Runs jobs in Spark containers – automatic scaling based on SLA.  Glue is serverless - only pay for the resources you consume. Job Authoring  Generates Python code to move data from source to destination.  Edit with your favorite IDE; share code snippets using Git.
  • 19. • Fast and cloud-powered • Easy to use, no infrastructure to manage • Scales to 100s of thousands of users • Quick calculations with SPICE • 1/10th the cost of legacy BI software Business Intelligence Amazon QuickSight
  • 22. Interactive Queries Ingest/ Collect Store Analyze/ Process Visualization/ Consume Producer Amazon S3 Amazon Redshift Amazon EMR Presto Impala Spark Interactive Amazon Athena Serverless Managed Virtualized QuickSight
  • 23. Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Amazon AI EMR Redshift Athena Kinesis RDS Central Storage Secure, cost-effective Storage in Amazon S3 S3 Snowball Database Migration Service Kinesis Firehose Direct Connect Data Ingestion Get your data into S3 Quickly and securely Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding Security Token Service CloudWatch CloudTrail Key Management Service Data Lake Reference Architecture = Serverless
  • 24. Amazon S3 Data Lake Amazon Kinesis Streams & Firehose Hadoop / Spark Streaming Analytics Tools Amazon Redshift Data Warehouse Amazon DynamoDB NoSQL Database AWS Lambda Spark Streaming on EMR Amazon Elasticsearch Service Relational Database Amazon EMR Amazon Aurora Amazon Machine Learning Predictive Analytics Any Open Source Tool of Choice on EC2 Data Science Sandbox Visualization / Reporting Apache Storm on EMR Apache Flink on EMR Amazon Kinesis Analytics Serving Tier Clusterless SQL Query Amazon Athena DataSourcesTransactionalData Amazon Glue Clusterless ETL Amazon ElastiCache Redis Data Lake and Real-time Analytics
  • 25. Serverless ETL Store Transform Store Analyze/ Process Visualize/ Consume Amazon S3 Apache Kafka Kinesis Streams Amazon EMR Spark Flink AWS Glue AWS Lambda ISV Amazon S3 Apache Kafka Redshift Kinesis Streams Data CatalogAWS Glue DynamoDB Streams DynamoDB Hive M/D
  • 26. Serverless nicely fits into big data platforms • AWS Serverless Big Data Services • Complements existing big data flows • Focus on the analytics and not on infrastructure or servers • Don’t focus on the scaling, availability, and undifferentiated heavy lifting • Pay only for what you use • Easily try out different tools, analytics, and solutions
  • 27. DEMO