SlideShare a Scribd company logo
Michael Rys
Principal Program Manager, Big Data @ Microsoft
@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Federated Distributed Queries
U-SQL Federated Distributed Queries (SQLBits 2016)
Query data where it lives
Easily query data in multiple Azure data stores without moving it to a single store
Benefits
• Avoid moving large amounts of data across the
network between stores
• Single view of data irrespective of physical location
• Minimize data proliferation issues caused by
maintaining multiple copies
• Single query language for all data
• Each data store maintains its own sovereignty
• Design choices based on the need
• Push SQL expressions to remote SQL sources
• Filters
• Joins
U-SQL
Query
Query
Azure
Storage Blobs
Azure SQL
in VMs
Azure
SQL DB
Azure Data
Lake Analytics
Azure
SQL Data Warehouse
Azure
Data Lake Storage
Federated
queries
• Minimize data proliferation through data consolidation
• Same U-SQL over all Azure data (WASB, SQL Azure)
• Efficient and reliable execution strategies
• Striving to maintain semantic equivalence
• Design choices based on requirements:
• Schema-less design
• fast time-to-query and exploratory analysis
• Schematized design
• protect applications from data source changes
• Advanced federated query capabilities:
• Built-in decisions to optimize for performance
• push downs of joins, predicates, projection
• Control when and what to push down
• Prevent data source overload
• Provide control over semantics
Data sources and
external tables
• Secure credential
management
• Data sources to manage
connections and
remoting of queries
• Schematized design:
external tables to provide
early bound tables for
federated queries
Create secret in PowerShell
New-AzureRMDataLakeAnalyticsCatalogSecret
Create credential
CREATE CREDENTIAL Secret
WITH USER_NAME = “user@server", IDENTITY = "Secret";
Create external data source on
• Azure SQL DB
• Azure SQL DW
• SQL Server in Azure VM
CREATE DATA SOURCE SQL_PATIENTS FROM SQLSERVER WITH
( PROVIDER_STRING =
"Database=DB;Trusted_Connection=False;Encrypt=False"
, CREDENTIAL = Secret
, REMOTABLE_TYPES = (bool, byte, short, string, DateTime)
);
External tables (optional)
CREATE EXTERNAL TABLE sql_patients (
[custkey] int,
[name] string,
[address] string
) FROM SQL_PATIENTS LOCATION "dbo.patients";
Federated
queries
• Queries have to be in a
different script from data
source
• Pass-through queries to
execute remote language
• Schema-less design:
query data source
location
• Schematized design:
query external tables
• Semantics of federated
queries close to U-SQL
and C#
Pass-Through Query
@alive_patients =
SELECT *
FROM EXTERNAL SQL_PATIENTS EXECUTE @"
SELECT name
, CASE WHEN is_alive = 1
THEN 'Alive' ELSE 'Deceased' END AS status
, address, nationkey, phone
FROM dbo.patients";
Query Data Source Location
@patients = SELECT *
FROM EXTERNAL master.SQL_PATIENTS LOCATION "dbo.patients";
Query External Tables
@patients = SELECT * FROM EXTERNAL master.dbo.sql_patients;
Execution
• U-SQL Semantics
• Pushes predicates and even joins based on remotable types
http://aka.ms/AzureDataLake

More Related Content

What's hot (20)

PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Michael Rys
 
PPTX
Azure Data Lake and U-SQL
Michael Rys
 
PPTX
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
PPTX
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Reading & Writing Files (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
PPTX
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
PPTX
SQLBits X Scaling out with SQL Azure Federations
Michael Rys
 
PPTX
Azure DocumentDB 101
Ike Ellis
 
PPTX
3 CityNetConf - sql+c#=u-sql
Łukasz Grala
 
PPTX
Data virtualization using polybase
Antonios Chatzipavlis
 
PPTX
SQL Server Extended Events
Stuart Moore
 
PPTX
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...
Stuart Moore
 
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
PPTX
Practical examples of using extended events
Dean Richards
 
PPTX
Deep Dive into Azure Data Factory v2
Eric Bragas
 
PPTX
Azure Data Factory Data Flow Preview December 2019
Mark Kromer
 
PDF
Introduction to NoSQL
Dimitar Danailov
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Michael Rys
 
Azure Data Lake and U-SQL
Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
U-SQL Reading & Writing Files (SQLBits 2016)
Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
SQLBits X Scaling out with SQL Azure Federations
Michael Rys
 
Azure DocumentDB 101
Ike Ellis
 
3 CityNetConf - sql+c#=u-sql
Łukasz Grala
 
Data virtualization using polybase
Antonios Chatzipavlis
 
SQL Server Extended Events
Stuart Moore
 
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...
Stuart Moore
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Practical examples of using extended events
Dean Richards
 
Deep Dive into Azure Data Factory v2
Eric Bragas
 
Azure Data Factory Data Flow Preview December 2019
Mark Kromer
 
Introduction to NoSQL
Dimitar Danailov
 

Viewers also liked (11)

PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
PPTX
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
PPTX
U-SQL Query Execution and Performance Tuning
Michael Rys
 
PPTX
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Meta Data Catalog (SQLBits 2016)
Michael Rys
 
PPTX
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
PPTX
Power BI
Stéphane Fréchette
 
PDF
Microsoft Power BI Overview
Netwoven Inc.
 
PPTX
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
PPTX
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
U-SQL Query Execution and Performance Tuning
Michael Rys
 
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
U-SQL Meta Data Catalog (SQLBits 2016)
Michael Rys
 
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Microsoft Power BI Overview
Netwoven Inc.
 
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
Ad

Similar to U-SQL Federated Distributed Queries (SQLBits 2016) (20)

PPTX
Modern Analytics Academy - Data Modeling (1).pptx
ssuser290967
 
PDF
Azure - Data Platform
giventocode
 
PPTX
Scalable relational database with SQL Azure
Shy Engelberg
 
PPTX
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Jason Strate
 
PPTX
CosmosDB.pptx
Udaiappa Ramachandran
 
PPTX
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
tdc-globalcode
 
PPTX
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
PPTX
Azure intoduksjon for it pro 02 data protection public
Morgan Simonsen
 
PPTX
Document db
Christian Holm Diget
 
PPTX
Extending your data to the cloud
Microsoft TechNet - Belgium and Luxembourg
 
PDF
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
PDF
A to z for sql azure databases
Antonios Chatzipavlis
 
PPTX
Contains the SQLite database management classes that an application would use...
GabrielPachasAlvarad
 
PDF
World2016_T5_S5_SQLServerFunctionalOverview
Farah Omer
 
PPTX
Cloud Powered Mobile Apps With Azure
Vinh Nguyen Quang
 
PPTX
Cloud Powered Mobile Apps with Azure
GameLandVN
 
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
PPTX
cosmodb ppt.pptxfkhkfsgkhgfkfghkhsadaljlsfdfhkgjh
Central University of South Bihar
 
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
PPTX
azure data engineer course | azure data engineering certification
eshwarvisualpath
 
Modern Analytics Academy - Data Modeling (1).pptx
ssuser290967
 
Azure - Data Platform
giventocode
 
Scalable relational database with SQL Azure
Shy Engelberg
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Jason Strate
 
CosmosDB.pptx
Udaiappa Ramachandran
 
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
tdc-globalcode
 
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
Azure intoduksjon for it pro 02 data protection public
Morgan Simonsen
 
Extending your data to the cloud
Microsoft TechNet - Belgium and Luxembourg
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
A to z for sql azure databases
Antonios Chatzipavlis
 
Contains the SQLite database management classes that an application would use...
GabrielPachasAlvarad
 
World2016_T5_S5_SQLServerFunctionalOverview
Farah Omer
 
Cloud Powered Mobile Apps With Azure
Vinh Nguyen Quang
 
Cloud Powered Mobile Apps with Azure
GameLandVN
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
cosmodb ppt.pptxfkhkfsgkhgfkfghkhsadaljlsfdfhkgjh
Central University of South Bihar
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
azure data engineer course | azure data engineering certification
eshwarvisualpath
 
Ad

More from Michael Rys (10)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 

Recently uploaded (20)

PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
BinarySearchTree in datastructures in detail
kichokuttu
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
What Is Data Integration and Transformation?
subhashenia
 

U-SQL Federated Distributed Queries (SQLBits 2016)

  • 1. Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Federated Distributed Queries
  • 3. Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits • Avoid moving large amounts of data across the network between stores • Single view of data irrespective of physical location • Minimize data proliferation issues caused by maintaining multiple copies • Single query language for all data • Each data store maintains its own sovereignty • Design choices based on the need • Push SQL expressions to remote SQL sources • Filters • Joins U-SQL Query Query Azure Storage Blobs Azure SQL in VMs Azure SQL DB Azure Data Lake Analytics Azure SQL Data Warehouse Azure Data Lake Storage
  • 4. Federated queries • Minimize data proliferation through data consolidation • Same U-SQL over all Azure data (WASB, SQL Azure) • Efficient and reliable execution strategies • Striving to maintain semantic equivalence • Design choices based on requirements: • Schema-less design • fast time-to-query and exploratory analysis • Schematized design • protect applications from data source changes • Advanced federated query capabilities: • Built-in decisions to optimize for performance • push downs of joins, predicates, projection • Control when and what to push down • Prevent data source overload • Provide control over semantics
  • 5. Data sources and external tables • Secure credential management • Data sources to manage connections and remoting of queries • Schematized design: external tables to provide early bound tables for federated queries Create secret in PowerShell New-AzureRMDataLakeAnalyticsCatalogSecret Create credential CREATE CREDENTIAL Secret WITH USER_NAME = “user@server", IDENTITY = "Secret"; Create external data source on • Azure SQL DB • Azure SQL DW • SQL Server in Azure VM CREATE DATA SOURCE SQL_PATIENTS FROM SQLSERVER WITH ( PROVIDER_STRING = "Database=DB;Trusted_Connection=False;Encrypt=False" , CREDENTIAL = Secret , REMOTABLE_TYPES = (bool, byte, short, string, DateTime) ); External tables (optional) CREATE EXTERNAL TABLE sql_patients ( [custkey] int, [name] string, [address] string ) FROM SQL_PATIENTS LOCATION "dbo.patients";
  • 6. Federated queries • Queries have to be in a different script from data source • Pass-through queries to execute remote language • Schema-less design: query data source location • Schematized design: query external tables • Semantics of federated queries close to U-SQL and C# Pass-Through Query @alive_patients = SELECT * FROM EXTERNAL SQL_PATIENTS EXECUTE @" SELECT name , CASE WHEN is_alive = 1 THEN 'Alive' ELSE 'Deceased' END AS status , address, nationkey, phone FROM dbo.patients"; Query Data Source Location @patients = SELECT * FROM EXTERNAL master.SQL_PATIENTS LOCATION "dbo.patients"; Query External Tables @patients = SELECT * FROM EXTERNAL master.dbo.sql_patients; Execution • U-SQL Semantics • Pushes predicates and even joins based on remotable types

Editor's Notes

  • #4: DATA SOURCE: Represents a remote data source such as Azure SQL Database. Have to specify all the details (connection string, credentials, etc required to connect to and issues queries. EXTERNAL TABLE: A local table, with columns defined in C# types, that redirects queries issued against it to the remote table that it is based on. U-SQL automatically does the type conversion. External tables lets you impose a specific schema against the remote data, shielding you from remote schema changes. You can issue queries that ‘join’ external and local tables. PASS THROUGH queries: These queries are issued directly against the remote data source in the syntax of the remote data source (say T-SQL for Azure SQL database). REMOTABLE_TYPES: For every external data source you have to specify the list of ‘remoteable types. This list constrains the types of queries that will be remoted. Ex: REMOTABLE_TYPES = (bool, byte, short, ushort, int, decimal); LAZY METADATA LOADING: Here the remote data schematized only when the query is actually issues to the remote data source. Your program must be able to deal with remote schema changes.