SlideShare a Scribd company logo
SQLBits 2016
Azure Data Lake &
U-SQL
Michael Rys, @MikeDoesBigData
http://www.azure.com/datalake
{mrys, usql}@microsoft.com
Azure Data Lake
U-SQL
U-SQL Intro (SQLBits 2016)
•
•
•
 hard to work with anything other than
structured data
 difficult to extend with custom code
 User often has to
care about scale and performance
 SQL is 2nd class within string
 Often no code reuse/
sharing across queries
Get benefits of both!
Makes it easy for you by unifying:
• Unstructured and structured data processing
• Declarative SQL and custom imperative Code
• Local and remote Queries
• Increase productivity and agility from Day 1 and
at Day 100 for YOU!
U-SQL Intro (SQLBits 2016)
Extend U-SQL with C#/.NET
Built-in operators,
function, aggregates
C# expressions (in SELECT expressions)
User-defined aggregates (UDAGGs)
User-defined functions (UDFs)
User-defined operators (UDOs)
U-SQL Intro (SQLBits 2016)
DECLARE Constants/U-SQL Types
DECLARE @text1 string = "Hello World";
DECLARE @text2 string = @"Hello World";
DECLARE @text3 char = 'a';
DECLARE @text4 string = "BEGIN" + @text1 + "END";
DECLARE @text5 string = string.Format("BEGIN{0}END", @text1);
DECLARE @numeric1 sbyte = 0;
DECLARE @numeric2 short = 1;
DECLARE @numeric3 int = 2;
DECLARE @numeric4 long = 3L;
DECLARE @numeric5 float = 4.0f;
DECLARE @numeric6 double = 5.0;
DECLARE @d1 DateTime = System.DateTime.Parse("1979/03/31");
DECLARE @d2 DateTime = DateTime.Now;
DECLARE @misc1 bool = true;
DECLARE @misc2 Guid = System.Guid.Parse("BEF7A4E8-F583-4804-9711-7E608215EBA6");
DECLARE @misc4 byte [] = new byte[] { 0, 1, 2, 3, 4};
DECLARE @map SqlMap<string,string> =
new SqlMap<string,string> {{"key","value"}, {"key","value"});
DECLARE @arr SqlArray<string> = new SqlArray<string> {"val1", "val2"});
DECLARE @nullableint int? = null;
text
numeric
Date/time
Other
U-SQL Language Philosophy
Declarative Query and Transformation Language:
• Uses SQL’s SELECT FROM WHERE with GROUP
BY/Aggregation, Joins, SQL Analytics functions
• Optimizable, Scalable
Expression-flow programming style:
• Easy to use functional lambda composition
• Composable, globally optimizable
Operates on Unstructured & Structured Data
• Schema on read over files
• Relational metadata objects (e.g. database, table)
Extensible from ground up:
• Type system is based on C#
• Expression language IS C#
• User-defined functions (U-SQL and C#)
• User-defined Aggregators (C#)
• User-defined Operators (UDO) (C#)
U-SQL provides the Parallelization and Scale-out
Framework for Usercode
• EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER,
COMBINER, APPLIER
Federated query across distributed data sources
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;
2
 Automatic "in-lining"
 optimized out-of-
the-box
 Per job
parallelization
 visibility into execution
 Heatmap to identify
bottlenecks

More Related Content

What's hot (20)

PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
PPTX
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
PPTX
Microsoft's Hadoop Story
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Michael Rys
 
PPTX
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Query Execution and Performance Tuning
Michael Rys
 
PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
PPT
Sqlite
Kumar
 
PDF
DMann-SQLDeveloper4Reporting
David Mann
 
PPTX
Rdbms
Parthiv Prem
 
PDF
Introduction to SQLite: The Most Popular Database in the World
jkreibich
 
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
Microsoft's Hadoop Story
Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
U-SQL Query Execution and Performance Tuning
Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
Sqlite
Kumar
 
DMann-SQLDeveloper4Reporting
David Mann
 
Introduction to SQLite: The Most Popular Database in the World
jkreibich
 

Similar to U-SQL Intro (SQLBits 2016) (20)

PPTX
3 CityNetConf - sql+c#=u-sql
Łukasz Grala
 
PPTX
Azure Data Lake and U-SQL
Michael Rys
 
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
PDF
USQ Landdemos Azure Data Lake
Trivadis
 
PPTX
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
PDF
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
PPTX
Azure Data Lake and Azure Data Lake Analytics
Waqas Idrees
 
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
PPTX
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
PPTX
Tokyo azure meetup #2 big data made easy
Tokyo Azure Meetup
 
PDF
USQL Trivadis Azure Data Lake Event
Trivadis
 
PDF
Talavant Data Lake Analytics
Sean Forgatch
 
PPTX
Dive Into Azure Data Lake - PASS 2017
Ike Ellis
 
PPTX
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Tom Kerkhove
 
PPT
SQL Server 2008 Overview
Eric Nelson
 
PPT
What's New for Developers in SQL Server 2008?
ukdpe
 
PPTX
SQL Server Data Tools
Marco Parenzan
 
PPTX
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
Andrew Brust
 
PPT
ordbms.ppt
HODCA1
 
PPTX
An intro to Azure Data Lake
Rick van den Bosch
 
3 CityNetConf - sql+c#=u-sql
Łukasz Grala
 
Azure Data Lake and U-SQL
Michael Rys
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
USQ Landdemos Azure Data Lake
Trivadis
 
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
Azure Data Lake and Azure Data Lake Analytics
Waqas Idrees
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Tokyo azure meetup #2 big data made easy
Tokyo Azure Meetup
 
USQL Trivadis Azure Data Lake Event
Trivadis
 
Talavant Data Lake Analytics
Sean Forgatch
 
Dive Into Azure Data Lake - PASS 2017
Ike Ellis
 
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
Tom Kerkhove
 
SQL Server 2008 Overview
Eric Nelson
 
What's New for Developers in SQL Server 2008?
ukdpe
 
SQL Server Data Tools
Marco Parenzan
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
Andrew Brust
 
ordbms.ppt
HODCA1
 
An intro to Azure Data Lake
Rick van den Bosch
 
Ad

More from Michael Rys (8)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
Ad

Recently uploaded (20)

PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 

U-SQL Intro (SQLBits 2016)

  • 1. SQLBits 2016 Azure Data Lake & U-SQL Michael Rys, @MikeDoesBigData http://www.azure.com/datalake {mrys, usql}@microsoft.com
  • 5.  hard to work with anything other than structured data  difficult to extend with custom code
  • 6.  User often has to care about scale and performance  SQL is 2nd class within string  Often no code reuse/ sharing across queries
  • 7. Get benefits of both! Makes it easy for you by unifying: • Unstructured and structured data processing • Declarative SQL and custom imperative Code • Local and remote Queries • Increase productivity and agility from Day 1 and at Day 100 for YOU!
  • 9. Extend U-SQL with C#/.NET Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined aggregates (UDAGGs) User-defined functions (UDFs) User-defined operators (UDOs)
  • 11. DECLARE Constants/U-SQL Types DECLARE @text1 string = "Hello World"; DECLARE @text2 string = @"Hello World"; DECLARE @text3 char = 'a'; DECLARE @text4 string = "BEGIN" + @text1 + "END"; DECLARE @text5 string = string.Format("BEGIN{0}END", @text1); DECLARE @numeric1 sbyte = 0; DECLARE @numeric2 short = 1; DECLARE @numeric3 int = 2; DECLARE @numeric4 long = 3L; DECLARE @numeric5 float = 4.0f; DECLARE @numeric6 double = 5.0; DECLARE @d1 DateTime = System.DateTime.Parse("1979/03/31"); DECLARE @d2 DateTime = DateTime.Now; DECLARE @misc1 bool = true; DECLARE @misc2 Guid = System.Guid.Parse("BEF7A4E8-F583-4804-9711-7E608215EBA6"); DECLARE @misc4 byte [] = new byte[] { 0, 1, 2, 3, 4}; DECLARE @map SqlMap<string,string> = new SqlMap<string,string> {{"key","value"}, {"key","value"}); DECLARE @arr SqlArray<string> = new SqlArray<string> {"val1", "val2"}); DECLARE @nullableint int? = null; text numeric Date/time Other
  • 12. U-SQL Language Philosophy Declarative Query and Transformation Language: • Uses SQL’s SELECT FROM WHERE with GROUP BY/Aggregation, Joins, SQL Analytics functions • Optimizable, Scalable Expression-flow programming style: • Easy to use functional lambda composition • Composable, globally optimizable Operates on Unstructured & Structured Data • Schema on read over files • Relational metadata objects (e.g. database, table) Extensible from ground up: • Type system is based on C# • Expression language IS C# • User-defined functions (U-SQL and C#) • User-defined Aggregators (C#) • User-defined Operators (UDO) (C#) U-SQL provides the Parallelization and Scale-out Framework for Usercode • EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER, COMBINER, APPLIER Federated query across distributed data sources REFERENCE MyDB.MyAssembly; CREATE TABLE T( cid int, first_order DateTime , last_order DateTime, order_count int , order_amount float ); @o = EXTRACT oid int, cid int, odate DateTime, amount float FROM "/input/orders.txt" USING Extractors.Csv(); @c = EXTRACT cid int, name string, city string FROM "/input/customers.txt" USING Extractors.Csv(); @j = SELECT c.cid, MIN(o.odate) AS firstorder , MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt , AGG<MyAgg.MySum>(c.amount) AS totalamount FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid WHERE c.city.StartsWith("New") && MyNamespace.MyFunction(o.odate) > 10 GROUP BY c.cid; OUTPUT @j TO "/output/result.txt" USING new MyData.Write(); INSERT INTO T SELECT * FROM @j;
  • 13. 2  Automatic "in-lining"  optimized out-of- the-box  Per job parallelization  visibility into execution  Heatmap to identify bottlenecks

Editor's Notes

  • #5: Add velocity?
  • #6: Hard to operate on unstructured data: Even Hive requires meta data to be created to operate on unstructured data. Adding Custom Java functions, aggregators and SerDes is involving a lot of steps and often access to server’s head node and differs based on type of operation. Requires many tools and steps. Some examples: Hive UDAgg Code and compile .java into .jar Extend AbstractGenericUDAFResolver class: Does type checking, argument checking and overloading Extend GenericUDAFEvaluator class: implements logic in 8 methods. - Deploy: Deploy jar into class path on server Edit FunctionRegistry.java to register as built-in Update the content of show functions with ant Hive UDF (as of v0.13) Code Load JAR into head node or at URI CREATE FUNCTION USING JAR to register and load jar into classpath for every function (instead of registering jar and just use the functions)
  • #7: Spark supports Custom “inputters and outputters” for defining custom RDDs No UDAGGs Simple integration of UDFs but only for duration of program. No reuse/sharing. Cloud dataflow? Requires has to care about scale and perf Spark UDAgg Is not yet supported ( SPARK-3947) Spark UDF Write inline function def westernState(state: String) = Seq("CA", "OR", "WA", "AK").contains(state) for SQL usage need to register the table customerTable.registerTempTable("customerTable") Register each UDF sqlContext.udf.register("westernState", westernState _) Call it val westernStates = sqlContext.sql("SELECT * FROM customerTable WHERE westernState(state)")
  • #8: Offers Auto-scaling and performance Operates on unstructured data without tables needed Easy to extend declaratively with custom code: consistent model for UDO, UDF and UDAgg. Easy to query remote sources even without external tables U-SQL UDAgg Code and compile .cs file: Implement IAggregate’s 3 methods :Init(), Accumulate(), Terminate() C# takes case of type checking, generics etc. Deploy: Tooling: one click registration in user db of assembly By Hand: Copy file to ADL CREATE ASSEMBLY to register assembly Use via AGG<MyNamespace.MyAggregate<T>>(a) U-SQL UDF Code in C#, register assembly once, call by C# name.
  • #9: Remove SCOPE for external customers?
  • #10: Extensions require .NET assemblies to be registered with a database
  • #13: Use for language experts