SlideShare a Scribd company logo
Asanka Padmakumara
Business Intelligence Consultant,
• Blog: asankap.wordpress.com
• Linked In: linkedin.com/in/asankapadmakumara
• Twitter: @asanka_e
• Facebook: facebook.com/asankapk
Move Your On-
Prem Data to a
Lake in the
Clouds
Agenda
• Where are we right now?
• Why we need to go for Data Lake?
• What is Azure Data Lake?
• How do we get there?
• Demo
• Q & A
Where are we right now?
What are the challenges?
• Limited storage
• Limited processing power
• High hardware cost
• High maintains cost
• No disaster recovery
• Availability and reliability issues
• Scalability issues
• Security
• Solution: Azure Data Lake
What is Azure Data Lake?
• Highly scalable data storage and analytics service
• Intended for big data storage and analysis
• A faster and efficient solution than on-prem data centers
• Three services:
Analytics
Storage
HDInsight
(“managed clusters”)
Azure Data Lake Analytics
Azure Data Lake Storage
Azure Data Lake Architecture
Azure Data Lake Store
• Built for Hadoop
• Compatible with most components in Hadoop Eco-
systems
• Web HDFS API
• Unlimited storage, petabyte files
• Performance-tuned for big data analytics
• High throughput, IOPs
• Multiple parts of a file in multiple servers:
Parallel reading
• Enterprise-ready: Highly-available and secure
• All Data, One Place
• Any Data in native format
• No schema, No prior processing
Optimized for Big Data Analytics
• Multiple copies of same file in
improve reading
• Locally-redundant
(multiple copies of data in one Azure
region)
• Parallel reading and writing
• Configurable throughput
• No Limitation in file size or storage
Secure Data in Azure Data Lake Store
• Authentication
• Azure Active Directory
• All AAD features
• End-user authentication or Service-to-service authentication
• Access Control
• POSIX-style permissions
• Read, Write, Execute
• ACLs can be enabled on the root folder, on subfolders, and on individual files.
• Encryption
• Encryption at rest
• Encryption at transit -HTTPS
How to ingest data to Azure Data Lake Store
• Small Data Sets
• Azure Portal
• Azure Power Shell
• Azure – Cross Platform CLI 2.0
• Data Lake Tools For Visual Studio
• Streamed data
• Azure Stream Analytics
• Azure HDInsight Storm
• Data Lake Store .NET SDK
• Relational data
• Apache Sqoop
• Azure Data Factory
• Large Data Set
• Azure Power Shell
• Azure – Cross Platform CLI 2.0
• Azure Data Lake Store .NET SDK
• Azure Data Factory
• Really Large Data Sets
• Azure ExpressRoute
• Azure Import/Export service
How it different from Azure Blob Storage
Azure Data Lake Store Azure Blob Storage
Purpose
Optimized storage for big data analytics
workloads
General purpose
Use Case
Batch, interactive, streaming analytics and
machine learning data such as log files, IoT
data, click streams, large datasets
Any type of text or binary data, such
as application back end, backup data,
media storage for streaming and
general purpose data
Key Concepts
Contains folders, which in turn contains data
stored as files
Contains containers, which in turn has
data in the form of blobs
Size limits
No limits on account sizes, file sizes or number
of files
500 TiB
Geo-redundancy
Locally-redundant (multiple copies of data in
one Azure region)
Locally redundant (LRS), globally
redundant (GRS), read-access globally
redundant (RA-GRS).
Azure Data Lake Analytics
• Massive processing power
• Adjustable parallelism
• No server, VM, Cluster to
maintain.
• Pay for the Job
• Use existing .Net, R and
Python libraries.
• New language : U-SQL
C#SQL
U-SQL
• Combination of Declarative Logic of SQL and Procedure
logic of C#
• Case sensitive
• “Schema on Read”
U-SQL
@ExtraRuns =
SELECT IPLYear, Bowler,
SUM( string.IsNullOrWhiteSpace(ExtraRuns)? 0:
Convert.ToInt32(ExtraRuns)
) AS ExtraRuns,
ExtraType
FROM @MatchData
GROUP BY IPLYear,Bowler,ExtraType;
How do we go there? Azure Data Factory
Your feedbacks are essential to me ..!
Demo/ Q&A
Pricing
• Pay-as-you-go
• For a 1 TB storage, for a month = $39.94
• Monthly commitment packages
• For a 1 TB storage, for a month = $35
• Usage base:
https://azure.microsoft.com/en-us/pricing/details/data-lake-store/
Usage Price
Write operations (per 10,000) $0.05
Read operations (per 10,000) $0.004
Delete operations Free
Transaction size limit No limit
Ad

More Related Content

What's hot (18)

DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applications
decode2016
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowSimplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
PyData
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
Lynn Langit
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articles
Karan Gulati
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
Mohit Chhabra
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Configuration in azure done right
Configuration in azure done rightConfiguration in azure done right
Configuration in azure done right
Rick van den Bosch
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Avinash Ramineni
 
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Databricks
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
David Phillips
 
R in Power BI
R in Power BIR in Power BI
R in Power BI
Eric Bragas
 
Cloud native data platform
Cloud native data platformCloud native data platform
Cloud native data platform
Li Gao
 
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Databricks
 
Azure CosmosDB
Azure CosmosDBAzure CosmosDB
Azure CosmosDB
Fernando Mejía
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cloudian
 
Cosmosdb graph
Cosmosdb graphCosmosdb graph
Cosmosdb graph
Mohit Chhabra
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
Antonios Chatzipavlis
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applications
decode2016
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowSimplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
PyData
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
Lynn Langit
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articles
Karan Gulati
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
Mohit Chhabra
 
Configuration in azure done right
Configuration in azure done rightConfiguration in azure done right
Configuration in azure done right
Rick van den Bosch
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Avinash Ramineni
 
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Databricks
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
David Phillips
 
Cloud native data platform
Cloud native data platformCloud native data platform
Cloud native data platform
Li Gao
 
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Databricks
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cloudian
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
Dremio Corporation
 

Similar to Move your on prem data to a lake in a Lake in Cloud (20)

4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
PROIDEA
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
Revathiparamanathan
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
Md Kamaruzzaman
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
Rick van den Bosch
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
Amazon Web Services Korea
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Azure Data Platform Overview.pdf
Azure Data Platform Overview.pdfAzure Data Platform Overview.pdf
Azure Data Platform Overview.pdf
Dustin Vannoy
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
Simon Ambridge
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Jason Strate
 
Accesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data PlatformAccesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data Platform
Luca Di Fino
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
PROIDEA
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
Md Kamaruzzaman
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
Amazon Web Services Korea
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Azure Data Platform Overview.pdf
Azure Data Platform Overview.pdfAzure Data Platform Overview.pdf
Azure Data Platform Overview.pdf
Dustin Vannoy
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
Simon Ambridge
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Jason Strate
 
Accesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data PlatformAccesso ai dati con Azure Data Platform
Accesso ai dati con Azure Data Platform
Luca Di Fino
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
Ad

Recently uploaded (20)

Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Ad

Move your on prem data to a lake in a Lake in Cloud

  • 1. Asanka Padmakumara Business Intelligence Consultant, • Blog: asankap.wordpress.com • Linked In: linkedin.com/in/asankapadmakumara • Twitter: @asanka_e • Facebook: facebook.com/asankapk
  • 2. Move Your On- Prem Data to a Lake in the Clouds
  • 3. Agenda • Where are we right now? • Why we need to go for Data Lake? • What is Azure Data Lake? • How do we get there? • Demo • Q & A
  • 4. Where are we right now?
  • 5. What are the challenges? • Limited storage • Limited processing power • High hardware cost • High maintains cost • No disaster recovery • Availability and reliability issues • Scalability issues • Security • Solution: Azure Data Lake
  • 6. What is Azure Data Lake? • Highly scalable data storage and analytics service • Intended for big data storage and analysis • A faster and efficient solution than on-prem data centers • Three services: Analytics Storage HDInsight (“managed clusters”) Azure Data Lake Analytics Azure Data Lake Storage
  • 7. Azure Data Lake Architecture
  • 8. Azure Data Lake Store • Built for Hadoop • Compatible with most components in Hadoop Eco- systems • Web HDFS API • Unlimited storage, petabyte files • Performance-tuned for big data analytics • High throughput, IOPs • Multiple parts of a file in multiple servers: Parallel reading • Enterprise-ready: Highly-available and secure • All Data, One Place • Any Data in native format • No schema, No prior processing
  • 9. Optimized for Big Data Analytics • Multiple copies of same file in improve reading • Locally-redundant (multiple copies of data in one Azure region) • Parallel reading and writing • Configurable throughput • No Limitation in file size or storage
  • 10. Secure Data in Azure Data Lake Store • Authentication • Azure Active Directory • All AAD features • End-user authentication or Service-to-service authentication • Access Control • POSIX-style permissions • Read, Write, Execute • ACLs can be enabled on the root folder, on subfolders, and on individual files. • Encryption • Encryption at rest • Encryption at transit -HTTPS
  • 11. How to ingest data to Azure Data Lake Store • Small Data Sets • Azure Portal • Azure Power Shell • Azure – Cross Platform CLI 2.0 • Data Lake Tools For Visual Studio • Streamed data • Azure Stream Analytics • Azure HDInsight Storm • Data Lake Store .NET SDK • Relational data • Apache Sqoop • Azure Data Factory • Large Data Set • Azure Power Shell • Azure – Cross Platform CLI 2.0 • Azure Data Lake Store .NET SDK • Azure Data Factory • Really Large Data Sets • Azure ExpressRoute • Azure Import/Export service
  • 12. How it different from Azure Blob Storage Azure Data Lake Store Azure Blob Storage Purpose Optimized storage for big data analytics workloads General purpose Use Case Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data Key Concepts Contains folders, which in turn contains data stored as files Contains containers, which in turn has data in the form of blobs Size limits No limits on account sizes, file sizes or number of files 500 TiB Geo-redundancy Locally-redundant (multiple copies of data in one Azure region) Locally redundant (LRS), globally redundant (GRS), read-access globally redundant (RA-GRS).
  • 13. Azure Data Lake Analytics • Massive processing power • Adjustable parallelism • No server, VM, Cluster to maintain. • Pay for the Job • Use existing .Net, R and Python libraries. • New language : U-SQL
  • 14. C#SQL U-SQL • Combination of Declarative Logic of SQL and Procedure logic of C# • Case sensitive • “Schema on Read” U-SQL @ExtraRuns = SELECT IPLYear, Bowler, SUM( string.IsNullOrWhiteSpace(ExtraRuns)? 0: Convert.ToInt32(ExtraRuns) ) AS ExtraRuns, ExtraType FROM @MatchData GROUP BY IPLYear,Bowler,ExtraType;
  • 15. How do we go there? Azure Data Factory
  • 16. Your feedbacks are essential to me ..!
  • 18. Pricing • Pay-as-you-go • For a 1 TB storage, for a month = $39.94 • Monthly commitment packages • For a 1 TB storage, for a month = $35 • Usage base: https://azure.microsoft.com/en-us/pricing/details/data-lake-store/ Usage Price Write operations (per 10,000) $0.05 Read operations (per 10,000) $0.004 Delete operations Free Transaction size limit No limit

Editor's Notes

  • #5: On prem Lots of data Limited space Maintain the servers Lot of processing power
  • #6: Grow hardware on demand Upgrade instantly Availability and readability : Multiple copies of data, Down time for maintains, hardware familiar causes business issues Increase decrease hardware on demand Ability to fail fast, if fail , no need of hardware Ability move into latest technologies Scalability: take time to scale
  • #9: Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS) Works with application which support Web HDFS 3 copy in a single IOPS: input output operations per seconds
  • #10: Automatically optimized for any throughput
  • #14: 250 AU max: 1 AU= 2 core cpu, 6 GB ram Pay As you go: Price: 1 Au for 1 Hour 2$ Monthly : 100 Au , 100$
  • #15: Declarative logic Procedure logic sql to query, C# to customize Case sensitive C# data type C# comparison Some commonly used SQL keywords, including WHILE, UPDATE, and MERGE are not supported in U-SQL
  • #16: A cloud integration service Workflow called Pipelines Activities in pipeline Integration Run time: self hosted Activities : Copy data, run ssis packages, execute SPs, Execute U-SQL queries Price: no of activity runs and data moment hours Or SSIS runtime based on VM and time