SlideShare a Scribd company logo
1© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | 1© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Asif Abbasi – Specialist SA Analytics
03.Nov.2019
Cutting to the chase for Machine Learning
Analytics Ecosystem & AWS Lake Formation
@masifabbasi
2© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Agenda
• Analytics Ecosystem
• Why did we build AWS Lake Formation?
• What is AWS Lake Formation?
• How does AWS Lake Formation help you?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics Value Stream
idea insight
COLLECT STORE
PROCESS/
ANALYZE
CONSUME
time to first answer
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agile Analytics
• Experiment
• Invest in and scale up success
• Fail fast
• Adapt and evolve rapidly
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Modern Data Platform
• Treat data as a reusable asset
(while keeping the cost of reuse low)
• Apply an open set of processing
strategies in parallel
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Store all your data, forever,
at every stage of its lifecycle
Apply it using the right tool for the job
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Foundations: Amazon S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Events and Lifecycle Management
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent
Access
Amazon Glacier
Create
Delete
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of Using S3 as Your Data Lake Foundation
• Unlimited number of objects and
volume
• 99.99% availability
• 99.999999999% durability
• Versioning
• Tiered storage via lifecycle policies
• SSL, client/server-side encryption
at rest
• Low cost (~ $2700/month/100 TB)
• Natively supported by big data
frameworks (eg. Spark, Hive,
Presto)
• Decouples storage and compute
• Run transient compute clusters
(with Amazon EC2 Spot
Instances)
• Multiple, heterogeneous
clusters can use same data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ingest
Kinesis Firehose
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Discover and Govern Your Data
Kinesis Firehose
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Glue Data Catalog
Hive metastore-compatible, highly-available
metadata repository:
• Search metadata for data discovery
• Connection info – JDBC URLs, credentials
• Classification for identifying and parsing
files
• Versioning of table metadata as schemas
evolve and other metadata are updated
• Table definitions – usable by Redshift,
Athena, Glue, EMR
Populate using Hive DDL, bulk import, or
automatically through crawlers.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Catalog and Query Multiple Sources
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Access Control and Auditing
Kinesis Firehose
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Access Control and Auditing
IAM
Amazon
S3
Amazon
DynamoDB
Amazon EMR Amazon
Kinesis
Amazon
Athena
Service API Access
AWS
CloudTrail
Identity and Access Management
• Security at the API and data level
3rd party ecosystem security tools
• Blue Talon, Apache Ranger, Sentry, etc
Storage-level access logging and audit
• AWS CloudTrail logs API calls
• S3 access logging
• Log analytics with Athena
Encryption
• AWS Server-Side Encryption
• AWS Key Management Service
• AWS CloudHSM
• Custom materials providers
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Process and Analyze
Kinesis Firehose
Athena
Query Service
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Job Authoring with AWS Glue
• Python/Scala code
generated by AWS Glue
• Connect a notebook or
IDE to AWS Glue
• Existing code brought into
AWS Glue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Job Execution with AWS Glue
• Schedule-based
• Event-based
• On demand
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless Queries: Amazon Athena
• Interactive queries
• ANSI SQL
• No infrastructure or administration
• Zero spin up time
• Query data in its raw format
• AVRO, Text, CSV, JSON, weblogs, AWS
service logs
• Convert to an optimized form like ORC
or Parquet for the best performance
and lowest cost
• No loading of data, no ETL required
• Stream data from directly from Amazon S3
• Take advantage of Amazon S3 durability
and availability
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Warehousing: Amazon Redshift
Managed, massively parallel, petabyte-scale, relational data warehouse
Scale from 160GB to 2PB online
Automatic streaming backup/restore to S3,
Automatic failover and recovery
ANSI SQL interface
Load data from S3, DynamoDB and EMR
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Query Exabytes of Data: Redshift Spectrum
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object
storage
Data
Catalog
Apache Hive
Metastore
Run Amazon Redshift SQL
queries against exabytes
of data in Amazon S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Managed Hadoop Framework: Amazon EMR
Scalable compute clusters as a
service
• Create ephemeral clusters –
only pay for the compute
you need
• Integrated with S3 (EMRFS)
• Use spot instances to
reduce processing time
AND costs
• Dynamically scale up and
down
• Data encryption at-rest and
in-transit
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Business Analytics Service: Amazon QuickSight
Deep Integration with
AWS Data Sources
Amazon RDS,
Aurora
Amazon
Redshift
Amazon
Athena
Amazon S3
Flat Files
• Fully managed
• 1/10 cost of traditional BI solutions
• Integrated with AWS data sources
and third-party sources
• SPICE – Super-fast, Parallel, In-
memory, Calculation Engine
• Collaborate, share and publish data
sets and analyses
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building a Data Strategy on AWS
Kinesis Firehose
Athena
Query Service
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
An Open Ecosystem of Applications
Processing &
Analytics
BI & Data Visualization
Kinesis Streams
& Firehose
Batch
EMR
Hadoop, Spark,
Presto
Redshift
Data Warehouse
Athena
Query Service
AWS Batch
Predictive
Real-time
AWS Lambda
Apache Storm
on EMR
Apache
Flink on
EMR
Spark
Streaming on
EMR
Elasticsearch
Service
Kinesis Analytics,
Kinesis Streams
EastiCache DAX
Online/Transactional
DynamoDB
(NoSQL)
Aurora
(Relational)
Neptun
e
(Graph)
26© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | 26© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Data Lakes
27© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
A data lake is a centralized repository that allows
you to store all your structured and unstructured
data at any scale
28© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Why data lakes?
Data Lakes provide:
Relational and non-relational data
Scale-out to EBs
Diverse set of analytics and machine learning tools
Work on data without any data movement
Designed for low cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
1001100001001010111
0010101011100101010
0001011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW
Queries
Big data
processing
Interactive Real-time
29© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Any analytic
workload, any scale,
at the lowest possible
cost
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
Analytics
Machine Learning
Real-time
Data Movement
30© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Typical steps of building a data lake
Setup storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
31© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Building data lakes can still take months
32© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Data preparation accounts for ~80% of the work
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
33© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Sample of steps required Find sources
Create Amazon Simple Storage Service (Amazon S3) locations
Configure access policies
Map tables to Amazon S3 locations
ETL jobs to load and clean data
Create metadata access policies
Configure access from analytics services
Rinse and repeat for other:
data sets, users, and end-services
And more:
manage and monitor ETL jobs
update metadata catalog as data changes
update policies across services as users and permissions change
manually maintain cleansing scripts
create audit processes for compliance
…
Manual | Error-prone | Time consuming
34© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Enforce security policies
across multiple services
Gain and manage new
insights
Identify, ingest, clean, and
transform data
Build a secure data lake in days
AWS Lake Formation
35© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
How it works
36© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Register existing data or import new
Amazon S3 forms the storage layer for
Lake Formation
Register existing S3 buckets that
contain your data
Ask Lake Formation to create required
S3 buckets and import data into them
Data is stored in your account. You
have direct access to it. No lock-in.
Data Lake Storage
Data
Catalog
Access
Control
Data
import
Lake Formation
Crawlers ML-based
data prep
37© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Easily load data to your data lake
logs
DBs
Blueprints
Data Lake Storage
Data
Catalog
Access
Control
Data
import
Lake Formation
Crawlers ML-based
data prep
one-shot
incremental
38© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
With blueprints
You
1. Point us to the source
2. Tell us the location to load
to in your data lake
3. Specify how often you want
to load the data
Blueprints
1. Discover the source table(s)
schema
2. Automatically convert to
the target data format
3. Automatically partition the
data based on the
partitioning schema
4. Keep track of data that was
already processed
5. You can customize any of
the above
39© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Blueprints build on AWS Glue
40© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Easily de-duplicate your data with ML transforms
41© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Fuzzy de-duplication – under the hood
42© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Fuzzy de-duplication – Innovations
400M+
7.5B+
2.5
43© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Secure once, access in multiple ways
Data Lake Storage
Data
Catalog
Access
Control
Lake Formation
Admin
44© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Security permissions in Lake Formation
Control data access with simple
grant and revoke permissions
Specify permissions on tables and
columns rather than on buckets
and objects
Easily view policies granted to a
particular user
Audit all data access at one place
45© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Security permissions in Lake Formation
Search and view permissions
granted to a user, role, or group in
one place
Verify permissions granted to a
user
Easily revoke policies for a user
46© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Grant table and column-level permissions
User 1
User 2
47© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Security – deep dive
User
IAM users, roles
Active Directory Amazon S3
48© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Search and collaborate across multiple users
Text-based, faceted search
across all metadata
Add attributes like Data
owners, stewards, and other as
table properties
Add data sensitivity level,
column definitions, and others
as column properties
Text-based search and filtering
Query data in Amazon Athena
49© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Audit and monitor in real time
See detailed alerts in the
console
Download audit logs for
further analytics
Data ingest and catalog
notifications also published to
Amazon CloudWatch events
50© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Example: a data lake in 3 easy steps
51© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Step 1: Blueprints to ingest data
52© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Monitor the import
1
53© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Imported data as table in the data lake
54© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Step 2: Grant permissions to securely share data
55© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Step 3: Run query in Amazon Athena
56© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
AWS Lake Formation Pricing
No additional charges – Only pay for the
underlying services used.
57© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Customer interest
“We are very excited about the launch of AWS Lake
Formation, which provides a central point of control to
easily load, clean, secure, and catalog data from thousands
of clients to our AWS-based data lake, dramatically
reducing our operational load. … Additionally, AWS Lake
Formation will be HIPAA compliant from day one …”
Aaron Symanski, CTO, Change Healthcare
“I can’t wait for my team to get our hands on AWS Lake
Formation. With an enterprise-ready option like Lake
Formation, we will be able to spend more time deriving
value from our data rather than doing the heavy lifting
involved in manually setting up and managing our data
lake.”
Joshua Couch, VP Engineering, Fender Digital
58© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Thank you!
Ad

More Related Content

More from AWS Riyadh User Group (20)

AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif AbbasiAWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver VankerAWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed RaafatAWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS Riyadh User Group
 
Demistifying serverless on aws
Demistifying serverless on awsDemistifying serverless on aws
Demistifying serverless on aws
AWS Riyadh User Group
 
Amazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML ModelsAmazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML Models
AWS Riyadh User Group
 
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on awsAWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Riyadh User Group
 
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in awsAWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Riyadh User Group
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Riyadh User Group
 
AWS Amplify
AWS AmplifyAWS Amplify
AWS Amplify
AWS Riyadh User Group
 
EC2 and S3 Level 100
EC2 and S3 Level 100EC2 and S3 Level 100
EC2 and S3 Level 100
AWS Riyadh User Group
 
Devops on AWS
Devops on AWSDevops on AWS
Devops on AWS
AWS Riyadh User Group
 
Blockchain on AWS
Blockchain on AWSBlockchain on AWS
Blockchain on AWS
AWS Riyadh User Group
 
AWS AI Services
AWS AI ServicesAWS AI Services
AWS AI Services
AWS Riyadh User Group
 
AWS Cloudformation Session 01
AWS Cloudformation Session 01AWS Cloudformation Session 01
AWS Cloudformation Session 01
AWS Riyadh User Group
 
AWS Cloud Security
AWS Cloud SecurityAWS Cloud Security
AWS Cloud Security
AWS Riyadh User Group
 
AWS Messaging
AWS MessagingAWS Messaging
AWS Messaging
AWS Riyadh User Group
 
Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2
AWS Riyadh User Group
 
Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1
AWS Riyadh User Group
 
Containers on AWS
Containers on AWSContainers on AWS
Containers on AWS
AWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif AbbasiAWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
AWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver VankerAWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS reinvent 2019 recap - Riyadh - Network and Security - Anver Vanker
AWS Riyadh User Group
 
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed RaafatAWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS Riyadh User Group
 
Amazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML ModelsAmazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML Models
AWS Riyadh User Group
 
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on awsAWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Riyadh User Group
 
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in awsAWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Riyadh User Group
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Riyadh User Group
 
Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2
AWS Riyadh User Group
 
Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1
AWS Riyadh User Group
 

Recently uploaded (20)

AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Ad

Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Formation

  • 1. 1© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | 1© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Asif Abbasi – Specialist SA Analytics 03.Nov.2019 Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Formation @masifabbasi
  • 2. 2© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Agenda • Analytics Ecosystem • Why did we build AWS Lake Formation? • What is AWS Lake Formation? • How does AWS Lake Formation help you?
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics Value Stream idea insight COLLECT STORE PROCESS/ ANALYZE CONSUME time to first answer
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agile Analytics • Experiment • Invest in and scale up success • Fail fast • Adapt and evolve rapidly
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Modern Data Platform • Treat data as a reusable asset (while keeping the cost of reuse low) • Apply an open set of processing strategies in parallel
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Store all your data, forever, at every stage of its lifecycle Apply it using the right tool for the job
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage Foundations: Amazon S3
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Events and Lifecycle Management Standard Active data Archive dataInfrequently accessed data Standard - Infrequent Access Amazon Glacier Create Delete
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of Using S3 as Your Data Lake Foundation • Unlimited number of objects and volume • 99.99% availability • 99.999999999% durability • Versioning • Tiered storage via lifecycle policies • SSL, client/server-side encryption at rest • Low cost (~ $2700/month/100 TB) • Natively supported by big data frameworks (eg. Spark, Hive, Presto) • Decouples storage and compute • Run transient compute clusters (with Amazon EC2 Spot Instances) • Multiple, heterogeneous clusters can use same data
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ingest Kinesis Firehose
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Catalog: Discover and Govern Your Data Kinesis Firehose
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Glue Data Catalog Hive metastore-compatible, highly-available metadata repository: • Search metadata for data discovery • Connection info – JDBC URLs, credentials • Classification for identifying and parsing files • Versioning of table metadata as schemas evolve and other metadata are updated • Table definitions – usable by Redshift, Athena, Glue, EMR Populate using Hive DDL, bulk import, or automatically through crawlers.
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Catalog and Query Multiple Sources
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Access Control and Auditing Kinesis Firehose
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Access Control and Auditing IAM Amazon S3 Amazon DynamoDB Amazon EMR Amazon Kinesis Amazon Athena Service API Access AWS CloudTrail Identity and Access Management • Security at the API and data level 3rd party ecosystem security tools • Blue Talon, Apache Ranger, Sentry, etc Storage-level access logging and audit • AWS CloudTrail logs API calls • S3 access logging • Log analytics with Athena Encryption • AWS Server-Side Encryption • AWS Key Management Service • AWS CloudHSM • Custom materials providers
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Process and Analyze Kinesis Firehose Athena Query Service
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Job Authoring with AWS Glue • Python/Scala code generated by AWS Glue • Connect a notebook or IDE to AWS Glue • Existing code brought into AWS Glue
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Job Execution with AWS Glue • Schedule-based • Event-based • On demand
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless Queries: Amazon Athena • Interactive queries • ANSI SQL • No infrastructure or administration • Zero spin up time • Query data in its raw format • AVRO, Text, CSV, JSON, weblogs, AWS service logs • Convert to an optimized form like ORC or Parquet for the best performance and lowest cost • No loading of data, no ETL required • Stream data from directly from Amazon S3 • Take advantage of Amazon S3 durability and availability
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Warehousing: Amazon Redshift Managed, massively parallel, petabyte-scale, relational data warehouse Scale from 160GB to 2PB online Automatic streaming backup/restore to S3, Automatic failover and recovery ANSI SQL interface Load data from S3, DynamoDB and EMR
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Query Exabytes of Data: Redshift Spectrum Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore Run Amazon Redshift SQL queries against exabytes of data in Amazon S3
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Managed Hadoop Framework: Amazon EMR Scalable compute clusters as a service • Create ephemeral clusters – only pay for the compute you need • Integrated with S3 (EMRFS) • Use spot instances to reduce processing time AND costs • Dynamically scale up and down • Data encryption at-rest and in-transit
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Analytics Service: Amazon QuickSight Deep Integration with AWS Data Sources Amazon RDS, Aurora Amazon Redshift Amazon Athena Amazon S3 Flat Files • Fully managed • 1/10 cost of traditional BI solutions • Integrated with AWS data sources and third-party sources • SPICE – Super-fast, Parallel, In- memory, Calculation Engine • Collaborate, share and publish data sets and analyses
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building a Data Strategy on AWS Kinesis Firehose Athena Query Service
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. An Open Ecosystem of Applications Processing & Analytics BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse Athena Query Service AWS Batch Predictive Real-time AWS Lambda Apache Storm on EMR Apache Flink on EMR Spark Streaming on EMR Elasticsearch Service Kinesis Analytics, Kinesis Streams EastiCache DAX Online/Transactional DynamoDB (NoSQL) Aurora (Relational) Neptun e (Graph)
  • 26. 26© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | 26© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Data Lakes
  • 27. 27© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • 28. 28© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Why data lakes? Data Lakes provide: Relational and non-relational data Scale-out to EBs Diverse set of analytics and machine learning tools Work on data without any data movement Designed for low cost storage and analytics OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 1001100001001010111 0010101011100101010 0001011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 29. 29© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Any analytic workload, any scale, at the lowest possible cost AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight Analytics Machine Learning Real-time Data Movement
  • 30. 30© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Typical steps of building a data lake Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5
  • 31. 31© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Building data lakes can still take months
  • 32. 32© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 33. 33© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Sample of steps required Find sources Create Amazon Simple Storage Service (Amazon S3) locations Configure access policies Map tables to Amazon S3 locations ETL jobs to load and clean data Create metadata access policies Configure access from analytics services Rinse and repeat for other: data sets, users, and end-services And more: manage and monitor ETL jobs update metadata catalog as data changes update policies across services as users and permissions change manually maintain cleansing scripts create audit processes for compliance … Manual | Error-prone | Time consuming
  • 34. 34© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Enforce security policies across multiple services Gain and manage new insights Identify, ingest, clean, and transform data Build a secure data lake in days AWS Lake Formation
  • 35. 35© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | How it works
  • 36. 36© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Register existing data or import new Amazon S3 forms the storage layer for Lake Formation Register existing S3 buckets that contain your data Ask Lake Formation to create required S3 buckets and import data into them Data is stored in your account. You have direct access to it. No lock-in. Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep
  • 37. 37© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Easily load data to your data lake logs DBs Blueprints Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep one-shot incremental
  • 38. 38© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | With blueprints You 1. Point us to the source 2. Tell us the location to load to in your data lake 3. Specify how often you want to load the data Blueprints 1. Discover the source table(s) schema 2. Automatically convert to the target data format 3. Automatically partition the data based on the partitioning schema 4. Keep track of data that was already processed 5. You can customize any of the above
  • 39. 39© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Blueprints build on AWS Glue
  • 40. 40© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Easily de-duplicate your data with ML transforms
  • 41. 41© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Fuzzy de-duplication – under the hood
  • 42. 42© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Fuzzy de-duplication – Innovations 400M+ 7.5B+ 2.5
  • 43. 43© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Secure once, access in multiple ways Data Lake Storage Data Catalog Access Control Lake Formation Admin
  • 44. 44© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
  • 45. 45© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Security permissions in Lake Formation Search and view permissions granted to a user, role, or group in one place Verify permissions granted to a user Easily revoke policies for a user
  • 46. 46© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Grant table and column-level permissions User 1 User 2
  • 47. 47© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Security – deep dive User IAM users, roles Active Directory Amazon S3
  • 48. 48© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Search and collaborate across multiple users Text-based, faceted search across all metadata Add attributes like Data owners, stewards, and other as table properties Add data sensitivity level, column definitions, and others as column properties Text-based search and filtering Query data in Amazon Athena
  • 49. 49© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Audit and monitor in real time See detailed alerts in the console Download audit logs for further analytics Data ingest and catalog notifications also published to Amazon CloudWatch events
  • 50. 50© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Example: a data lake in 3 easy steps
  • 51. 51© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Step 1: Blueprints to ingest data
  • 52. 52© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Monitor the import 1
  • 53. 53© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Imported data as table in the data lake
  • 54. 54© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Step 2: Grant permissions to securely share data
  • 55. 55© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Step 3: Run query in Amazon Athena
  • 56. 56© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | AWS Lake Formation Pricing No additional charges – Only pay for the underlying services used.
  • 57. 57© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Customer interest “We are very excited about the launch of AWS Lake Formation, which provides a central point of control to easily load, clean, secure, and catalog data from thousands of clients to our AWS-based data lake, dramatically reducing our operational load. … Additionally, AWS Lake Formation will be HIPAA compliant from day one …” Aaron Symanski, CTO, Change Healthcare “I can’t wait for my team to get our hands on AWS Lake Formation. With an enterprise-ready option like Lake Formation, we will be able to spend more time deriving value from our data rather than doing the heavy lifting involved in manually setting up and managing our data lake.” Joshua Couch, VP Engineering, Fender Digital
  • 58. 58© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Thank you!