SlideShare a Scribd company logo
Running cost effective big data workloads with
Azure Synapse and Azure Data Lake Storage
James Baker, @BigJamesBigData
Michael Rys, @MikeDoesBigData
Agenda 1. Modernize your big data workloads
2. Cost improvements with Synapse and ADLS
3. Demo
4. .NET for Apache Spark
Traditional on-prem analytics pipeline
Operational
database
Business/custom apps
Operational
database
Operational
database
Enterprise data
warehouse
Data mart
Data mart
Data mart
ETL
ETL
ETL
ETL ETL
ETL
ETL
Reporting
Analytics
Data mining
Modern data warehouse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Ingest Prep & train Model & serve
Store
Azure Data Lake Storage
Azure SQL
Data Warehouse
Azure DatabricksAzure Data Factory
Power BI
Modern data warehouse with Azure Synapse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Azure
Synapse
Analytics Power BI
Store
Azure Data Lake Storage
Modern data warehouse with Azure Synapse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Analytics runtimes
SQL
Common data estate
Shared meta data
Unified experience
Synapse Studio
Store
Azure Data Lake Storage
Power BI
Synapse component benefits
Existing tooling and skills
No requirement for retraining or new tooling to work with familiar T-SQL
environments
• .NET Support
Reducing training costs for big
data .NET developers
Provisioned model
 Workload Groups - maximize
resource utilization
 Materialized views and resultset
cache - smaller, cheaper clusters
Apache Spark Synapse SQL
• Hyperspace materialized
views
Requires smaller, cheaper
clusters to achieve the same
tasks
Serverless model
 Paying exactly for what you use - no
overprovisioning
 no clusters to monitor and manage -
lower maintenance costs
Cost optimization with Azure Data Lake Storage
Disaggregated compute
and storage with shared
metadata layer
Lifecycle management
for optimizing TCO
Lower compute resources
because of high performance
Azure Synapse and ADLS: Integration benefits
Integrated
workload
monitoring
Shared security Query
Acceleration
Shared
metadata
Single
management
portal
Data stewards
apply security
policies only once
Data owners
define data model
once only
Smaller, cheaper
clusters
Reduced
monitoring and
diagnosis costs
Lowers training costs
Synapse SQL - Serverless
Shared Metadata Experience
Spark Databases and Tables backed by Parquet
become automatically available in:
• Synapse SQL serverless
• Synapse SQL provisioned
• Synapse Pipelines
as external tables of the same name.
Spark Compute
X Auto-Expose Metadata Objects
CREATE DATABASE DBS1
CREATE SCHEMA $DBS1
CREATE DATABASE DBS1
CREATE TABLE DBS1.T1
CREATE EXTERNAL TABLE DBS1.dbo.T1
CREATE EXTERNAL TABLE $DBS1.T1
Auto-Expose Metadata Objects
Benefits:
• No need to run orchestration jobs to move data or meta
data between computes
• No duplication of data at the storage level
SELECT *
FROM DBS1.dbo.T1
SELECT *
FROM DBS1.T1 SELECT *
FROM $DBS1.T1
Synapse Hive Metastore Serverless SQL System Catalog
Provisioned Synapse SQL DB
Synapse SQL - Provisioned Instance
Synapse integration with ADLS Query Acceleration
Reduces total cost of ownership because analytics
frameworks don’t need to parse and load as much data
Delivers performance improvements due to less data
transferred over network
Optimize access to structured data by filtering data directly in
the storage service
Analytics queries typically require only ~20% of total data read
Deeply integrated into Azure Synapse Analytics for
improved performance and cost
AzureDataLakeStorage
Query
Acceleration
1
2
5
4
Data
3
Azure Synapse Analytics
Demo: .NET for Spark and shared
metadata experience in Azure
Synapse
Michael Rys, @MikeDoesBigData
Demo: .NET for Spark and shared metadata
experience in Azure Synapse
Analysis with
interactive .NET
for Spark
Notebook
Data prep with
Spark Scala
Twitter CSV files
Seamless analysis
with Synapse SQL
What has
Michael been
up?
Mentions
Topics
Who was
interacting
with Michael?
Michael
@MikeDoesBigData
Using Query
Acceleration
Synapse Shared
Meta Data
ANNOUNCING: .NET for Apache Spark v1.0 is released!
 First-class C# and F# bindings to Apache Spark,
bringing the power of big data analytics to .NET
developers
Apache Spark 2.4/3.0
Data Frames, Structured
Streaming, Delta Lake
.NET Standard 2.0
C# and F#
ML.NET
Performance optimized
with Apache Arrow and
HW Vectorization
First class integration in
Azure Synapse: Batch
Submission
Interactive .NET notebooks
Learn more at
http://dot.net/Spark
Call to action
 Check out sessions:
 OD217 - Real-time analytics and BI using Azure Synapse Link for Azure Cosmos DB by Euan Garden
 DB111 - Building real-time enterprise analytics solutions with Azure Synapse Analytics by Saveen Reddy
 Introduce to Azure Data Lake Storage – https://aka.ms/adls
 Get started with Azure Synapse – https://aka.ms/azuresynapse
 Learn more about .NET for Apache Spark - https://dot.net/Spark
 Leverage Informatica + Microsoft - DW migration
offer: https://aka.ms/SynapseInformaticaPOV
 Connect with us on Twitter:
Tweet us: @MikeDoesBigData @BigJamesBigData
Tag with: #OD220
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ignite 2020, OD220)
Ad

More Related Content

What's hot (20)

DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
Azure Storage Services - Part 01
Azure Storage Services - Part 01Azure Storage Services - Part 01
Azure Storage Services - Part 01
Neeraj Kumar
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Cathrine Wilhelmsen
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
Mark Kromer
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
Timothy McAliley
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
Albert Hoitingh
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Introduction to Azure SQL DB
Introduction to Azure SQL DBIntroduction to Azure SQL DB
Introduction to Azure SQL DB
Christopher Foot
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Microsoft Azure Technical Overview
Microsoft Azure Technical OverviewMicrosoft Azure Technical Overview
Microsoft Azure Technical Overview
gjuljo
 
Next Generation Data Integration with Azure Data Factory
Next Generation Data Integration with Azure Data FactoryNext Generation Data Integration with Azure Data Factory
Next Generation Data Integration with Azure Data Factory
Tom Kerkhove
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
BizTalk360
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Microsoft Fabric Intro D Koutsanastasis
Microsoft Fabric Intro D KoutsanastasisMicrosoft Fabric Intro D Koutsanastasis
Microsoft Fabric Intro D Koutsanastasis
Uni Systems S.M.S.A.
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
Azure Storage Services - Part 01
Azure Storage Services - Part 01Azure Storage Services - Part 01
Azure Storage Services - Part 01
Neeraj Kumar
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Cathrine Wilhelmsen
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
Mark Kromer
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
Timothy McAliley
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
Albert Hoitingh
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Introduction to Azure SQL DB
Introduction to Azure SQL DBIntroduction to Azure SQL DB
Introduction to Azure SQL DB
Christopher Foot
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Microsoft Azure Technical Overview
Microsoft Azure Technical OverviewMicrosoft Azure Technical Overview
Microsoft Azure Technical Overview
gjuljo
 
Next Generation Data Integration with Azure Data Factory
Next Generation Data Integration with Azure Data FactoryNext Generation Data Integration with Azure Data Factory
Next Generation Data Integration with Azure Data Factory
Tom Kerkhove
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
BizTalk360
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Microsoft Fabric Intro D Koutsanastasis
Microsoft Fabric Intro D KoutsanastasisMicrosoft Fabric Intro D Koutsanastasis
Microsoft Fabric Intro D Koutsanastasis
Uni Systems S.M.S.A.
 

Similar to Running cost effective big data workloads with Azure Synapse and ADLS (MS Ignite 2020, OD220) (20)

Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
Modern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptx
ssuser290967
 
Synapse for mere mortals
Synapse for mere mortalsSynapse for mere mortals
Synapse for mere mortals
Michael Stephenson
 
Azure Data Platform Overview.pdf
Azure Data Platform Overview.pdfAzure Data Platform Overview.pdf
Azure Data Platform Overview.pdf
Dustin Vannoy
 
Lake Database Database Template Map Data in Azure Synapse Analytics
Lake Database  Database Template  Map Data in Azure Synapse AnalyticsLake Database  Database Template  Map Data in Azure Synapse Analytics
Lake Database Database Template Map Data in Azure Synapse Analytics
Erwin de Kreuk
 
Day 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analyticsDay 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
Ross McNeely
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
Data Con LA
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
introduction to azure synapse analytics.
introduction to azure synapse analytics.introduction to azure synapse analytics.
introduction to azure synapse analytics.
GravenGuan
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdfHow Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
Addend Analytics
 
Azure Databricks - An Introduction 2019 Roadshow.pptx
Azure Databricks - An Introduction 2019 Roadshow.pptxAzure Databricks - An Introduction 2019 Roadshow.pptx
Azure Databricks - An Introduction 2019 Roadshow.pptx
pascalsegoul
 
Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
Riccardo Zamana
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
Shu-Jeng Hsieh
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
Modern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptx
ssuser290967
 
Azure Data Platform Overview.pdf
Azure Data Platform Overview.pdfAzure Data Platform Overview.pdf
Azure Data Platform Overview.pdf
Dustin Vannoy
 
Lake Database Database Template Map Data in Azure Synapse Analytics
Lake Database  Database Template  Map Data in Azure Synapse AnalyticsLake Database  Database Template  Map Data in Azure Synapse Analytics
Lake Database Database Template Map Data in Azure Synapse Analytics
Erwin de Kreuk
 
Day 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analyticsDay 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
Ross McNeely
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
Data Con LA
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
introduction to azure synapse analytics.
introduction to azure synapse analytics.introduction to azure synapse analytics.
introduction to azure synapse analytics.
GravenGuan
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdfHow Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
Addend Analytics
 
Azure Databricks - An Introduction 2019 Roadshow.pptx
Azure Databricks - An Introduction 2019 Roadshow.pptxAzure Databricks - An Introduction 2019 Roadshow.pptx
Azure Databricks - An Introduction 2019 Roadshow.pptx
pascalsegoul
 
Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
Riccardo Zamana
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
Shu-Jeng Hsieh
 
Ad

More from Michael Rys (20)

Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
Ad

Recently uploaded (20)

md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 

Running cost effective big data workloads with Azure Synapse and ADLS (MS Ignite 2020, OD220)

  • 1. Running cost effective big data workloads with Azure Synapse and Azure Data Lake Storage James Baker, @BigJamesBigData Michael Rys, @MikeDoesBigData
  • 2. Agenda 1. Modernize your big data workloads 2. Cost improvements with Synapse and ADLS 3. Demo 4. .NET for Apache Spark
  • 3. Traditional on-prem analytics pipeline Operational database Business/custom apps Operational database Operational database Enterprise data warehouse Data mart Data mart Data mart ETL ETL ETL ETL ETL ETL ETL Reporting Analytics Data mining
  • 4. Modern data warehouse Logs (structured) Media (unstructured) Files (unstructured) Business/custom apps (structured) Ingest Prep & train Model & serve Store Azure Data Lake Storage Azure SQL Data Warehouse Azure DatabricksAzure Data Factory Power BI
  • 5. Modern data warehouse with Azure Synapse Logs (structured) Media (unstructured) Files (unstructured) Business/custom apps (structured) Azure Synapse Analytics Power BI Store Azure Data Lake Storage
  • 6. Modern data warehouse with Azure Synapse Logs (structured) Media (unstructured) Files (unstructured) Business/custom apps (structured) Analytics runtimes SQL Common data estate Shared meta data Unified experience Synapse Studio Store Azure Data Lake Storage Power BI
  • 7. Synapse component benefits Existing tooling and skills No requirement for retraining or new tooling to work with familiar T-SQL environments • .NET Support Reducing training costs for big data .NET developers Provisioned model  Workload Groups - maximize resource utilization  Materialized views and resultset cache - smaller, cheaper clusters Apache Spark Synapse SQL • Hyperspace materialized views Requires smaller, cheaper clusters to achieve the same tasks Serverless model  Paying exactly for what you use - no overprovisioning  no clusters to monitor and manage - lower maintenance costs
  • 8. Cost optimization with Azure Data Lake Storage Disaggregated compute and storage with shared metadata layer Lifecycle management for optimizing TCO Lower compute resources because of high performance
  • 9. Azure Synapse and ADLS: Integration benefits Integrated workload monitoring Shared security Query Acceleration Shared metadata Single management portal Data stewards apply security policies only once Data owners define data model once only Smaller, cheaper clusters Reduced monitoring and diagnosis costs Lowers training costs
  • 10. Synapse SQL - Serverless Shared Metadata Experience Spark Databases and Tables backed by Parquet become automatically available in: • Synapse SQL serverless • Synapse SQL provisioned • Synapse Pipelines as external tables of the same name. Spark Compute X Auto-Expose Metadata Objects CREATE DATABASE DBS1 CREATE SCHEMA $DBS1 CREATE DATABASE DBS1 CREATE TABLE DBS1.T1 CREATE EXTERNAL TABLE DBS1.dbo.T1 CREATE EXTERNAL TABLE $DBS1.T1 Auto-Expose Metadata Objects Benefits: • No need to run orchestration jobs to move data or meta data between computes • No duplication of data at the storage level SELECT * FROM DBS1.dbo.T1 SELECT * FROM DBS1.T1 SELECT * FROM $DBS1.T1 Synapse Hive Metastore Serverless SQL System Catalog Provisioned Synapse SQL DB Synapse SQL - Provisioned Instance
  • 11. Synapse integration with ADLS Query Acceleration Reduces total cost of ownership because analytics frameworks don’t need to parse and load as much data Delivers performance improvements due to less data transferred over network Optimize access to structured data by filtering data directly in the storage service Analytics queries typically require only ~20% of total data read Deeply integrated into Azure Synapse Analytics for improved performance and cost AzureDataLakeStorage Query Acceleration 1 2 5 4 Data 3 Azure Synapse Analytics
  • 12. Demo: .NET for Spark and shared metadata experience in Azure Synapse Michael Rys, @MikeDoesBigData
  • 13. Demo: .NET for Spark and shared metadata experience in Azure Synapse Analysis with interactive .NET for Spark Notebook Data prep with Spark Scala Twitter CSV files Seamless analysis with Synapse SQL What has Michael been up? Mentions Topics Who was interacting with Michael? Michael @MikeDoesBigData Using Query Acceleration Synapse Shared Meta Data
  • 14. ANNOUNCING: .NET for Apache Spark v1.0 is released!  First-class C# and F# bindings to Apache Spark, bringing the power of big data analytics to .NET developers Apache Spark 2.4/3.0 Data Frames, Structured Streaming, Delta Lake .NET Standard 2.0 C# and F# ML.NET Performance optimized with Apache Arrow and HW Vectorization First class integration in Azure Synapse: Batch Submission Interactive .NET notebooks Learn more at http://dot.net/Spark
  • 15. Call to action  Check out sessions:  OD217 - Real-time analytics and BI using Azure Synapse Link for Azure Cosmos DB by Euan Garden  DB111 - Building real-time enterprise analytics solutions with Azure Synapse Analytics by Saveen Reddy  Introduce to Azure Data Lake Storage – https://aka.ms/adls  Get started with Azure Synapse – https://aka.ms/azuresynapse  Learn more about .NET for Apache Spark - https://dot.net/Spark  Leverage Informatica + Microsoft - DW migration offer: https://aka.ms/SynapseInformaticaPOV  Connect with us on Twitter: Tweet us: @MikeDoesBigData @BigJamesBigData Tag with: #OD220

Editor's Notes

  • #4: Establish the baseline The need to provision for max utilization/max consumption Architectural brittleness in moving data across physical stores
  • #5: Pay for consumption model Compute elasticity Data evolves ‘in place’ within ubiquitous storage service
  • #6: Encapsulates the MDW pattern within the Synapse service Retain benefits of pay for consumption & ubiquitous store
  • #7: Unified experience leveraging heterogenous set of tools/frameworks Shared meta data service means that table definitions do not need to be restated as pipeline flows
  • #8: Spark - .NET for Spark is included by default SQL Serverless pay for use only – means no under-utilized clusters running No clusters == reduced maintenance costs Cost control features – caps for usage to avoid cost blowout Provisioned Workload groups provide query isolation with maximum utilization Workload – prioritize queries Materialized Views, Indexing, and Resultset cache are critically important for minimizing IO, less data read, less data cached, less data processed == smaller cheaper clusters
  • #10: Talk about the benefits of the tight integration between Synapse & ADLS: Reduced retraining costs due to use of familiar T-SQL environment & single pane of glass Only need to apply metadata once (metastore + access control) Accelerated IO integration with QA - improved performance == improved cost
  • #12: Analytics workloads require you to work with huge amounts of data. But the data typically used is 20% of the total data. So you end up processing more data than you should. With Query Acceleration for ADLS, the filtering is done on the storage layer itself, which helps save cost and improve performance. Query acceleration has been used a lot of customers and is generally available in all regions now.