SlideShare a Scribd company logo
Michael Rys
Principal Program Manager, Big Data @ Microsoft
@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL User-Defined Operators (UDOs)
Extend U-SQL with C#/.NET
Built-in operators,
function, aggregates
C# expressions (in SELECT expressions)
User-defined aggregates (UDAGGs)
User-defined functions (UDFs)
User-defined operators (UDOs)
What are
UDOs?
User-Defined Extractors
User-Defined Outputters
User-Defined Processors
• Take one row and produce one row
• Pass-through versus transforming
User-Defined Appliers
• Take one row and produce 0 to n rows
• Used with OUTER/CROSS APPLY
User-Defined Combiners
• Combines rowsets (like a user-defined join)
User-Defined Reducers
• Take n rows and produce 1 row
Called with explicit U-SQL Syntax that takes a UDO
instance (created as part of the execution):
• EXTRACT
• OUTPUT
• PROCESS
• COMBINE
• REDUCE
UDO/UDT Tips
and Warnings
• Use:
• READONLY clause to allow pushing predicates through UDOs
• REQUIRED clause to allow column pruning through UDOs
• PRESORT (coming)
• Use SELECT with UDFs instead of PROCESS
• Use User-defined Aggregators instead of REDUCE
• Hint Cardinality if you use CROSS APPLY and it
does chose the wrong plan
• Learn to use Windowing Functions (OVER
expression)
• Use SQL.MAP and SQL.ARRAY instead of C#
Dictionary and array
• Some use-cases for PROCESS/REDUCE/COMBINE:
• The logic needs to dynamically access the input and/or output
schema. E.g., create a JSON doc for the data in the row where the
columns are not known apriori.
• Your UDF based solution creates too much memory pressure and
you can write your code more memory efficient in a UDO
What are UDFs
and UDAGGs?
• UDFs are user-defined C# scalar
functions that can be called like any
scalar C# function
• UDAGGs are user-defined aggregators
• Called by special syntax AGG<…>
• Enables templatized user-defined aggregators
• UDFs, UDAGGs and UDOs must be
provided by a referenced assembly
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
UDO model
• Marking UDOs
• Parameterizing UDOs
• UDO signature
• UDO-specific processing
pattern
• Rowsets and their schemas
in UDOs
• Setting results
• By position
• By name
[SqlUserDefinedExtractor]
public class DriverExtractor : IExtractor
{
private byte[] _row_delim;
private string _col_delim;
private Encoding _encoding;
// Define a non-default constructor since I want to pass in my own parameters
public DriverExtractor( string row_delim = "rn", string col_delim = ",“
, Encoding encoding = null )
{
_encoding = encoding == null ? Encoding.UTF8 : encoding;
_row_delim = _encoding.GetBytes(row_delim);
_col_delim = col_delim;
} // DriverExtractor
// Converting text to target schema
private void OutputValueAtCol_I(string c, int i, IUpdatableRow outputrow)
{
var schema = outputrow.Schema;
if (schema[i].Type == typeof(int))
{
var tmp = Convert.ToInt32(c);
outputrow.Set(i, tmp);
}
...
} //SerializeCol
public override IEnumerable<IRow> Extract( IUnstructuredReader input
, IUpdatableRow outputrow)
{
foreach (var row in input.Split(_row_delim))
{
using(var s = new StreamReader(row, _encoding))
{
int i = 0;
foreach (var c in s.ReadToEnd().Split(new[] { _col_delim }, StringSplitOptions.None))
{
OutputValueAtCol_I(c, i++, outputrow);
} // foreach
} // using
yield return outputrow.AsReadOnly();
} // foreach
} // Extract
} // class DriverExtractor
UDAGG model
• UDAGG extends
IAggregate interface
• Requires implementation
of Init(), Accumulate(),
and Terminate() methods
• Can have multiple
arguments
• Can be generic
• Called with special syntax
to provide support for
generic UDAGGs
public class MyCountAggregate : IAggregate<int, long>
{
private int count;
public override void Init() { count = 0; }
public override void Accumulate(int i) { count += i; }
public override long Terminate(){ return count; }
}
public class MyTwoArgAggregate : IAggregate<string, long, int>
{
public override void Init() {…}
public override void Accumulate(string s, long l) {…}
public override int Terminate() {…}
}
public class GenericListAggregate<T1, TResult> : IAggregate<T1, TResult>
where TResult : IList<T1>, new()
{
private TResult result;
public override void Init() { this.result = new TResult(); }
public override void Accumulate(T1 t1) { this.result.Add(t1);}
public override TResult Terminate() { return this.result;}
}
SELECT AGG<MyNamespace.MyCountAggregate>(a) AS ms FROM @X;
Additional
Resources
Documentation
U-SQL UDO Expressions: https://msdn.microsoft.com/en-
us/library/azure/mt621319.aspx
U-SQL OUTPUT Statement: https://msdn.microsoft.com/en-
us/library/azure/mt621334.aspx
U-SQL UDO Programmer’s Guide: Under development
U-SQL Performance Presentation:
http://www.slideshare.net/MichaelRys/usql-query-execution-
and-performance-tuning
Sample Projects
https://github.com/Azure/usql/tree/master/Examples/Ambulan
ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data
https://github.com/Azure/usql/tree/master/Examples/TweetAn
alysis
http://aka.ms/AzureDataLake

More Related Content

What's hot (20)

PPTX
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Michael Rys
 
PPTX
Microsoft's Hadoop Story
Michael Rys
 
PPTX
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
PPTX
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
PPTX
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
PPTX
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
PPTX
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
PPTX
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
PPTX
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
PDF
Spark SQL with Scala Code Examples
Todd McGrath
 
PDF
Cubes – pluggable model explained
Stefan Urbanek
 
PDF
Python business intelligence (PyData 2012 talk)
Stefan Urbanek
 
PDF
Bubbles – Virtual Data Objects
Stefan Urbanek
 
PPT
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Julian Hyde
 
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Michael Rys
 
Microsoft's Hadoop Story
Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
Spark SQL with Scala Code Examples
Todd McGrath
 
Cubes – pluggable model explained
Stefan Urbanek
 
Python business intelligence (PyData 2012 talk)
Stefan Urbanek
 
Bubbles – Virtual Data Objects
Stefan Urbanek
 
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Julian Hyde
 

Viewers also liked (8)

PPTX
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL Query Execution and Performance Tuning
Michael Rys
 
PPTX
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
PPTX
Azure Data Lake and U-SQL
Michael Rys
 
PPTX
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
PPTX
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
PPTX
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
PPTX
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
U-SQL Query Execution and Performance Tuning
Michael Rys
 
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Azure Data Lake and U-SQL
Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Ilyas F ☁☁☁
 
Ad

Similar to U-SQL User-Defined Operators (UDOs) (SQLBits 2016) (18)

PPTX
3 CityNetConf - sql+c#=u-sql
Łukasz Grala
 
PPTX
Using existing language skillsets to create large-scale, cloud-based analytics
Microsoft Tech Community
 
PPTX
Dive Into Azure Data Lake - PASS 2017
Ike Ellis
 
PPTX
Paris Datageeks meetup 05102016
Michel Caradec
 
PPTX
C# + SQL = Big Data
Sascha Dittmann
 
PDF
USQ Landdemos Azure Data Lake
Trivadis
 
PDF
Supporting Over a Thousand Custom Hive User Defined Functions
Databricks
 
PDF
Talavant Data Lake Analytics
Sean Forgatch
 
PPT
SQL Server 2005 CLR Integration
webhostingguy
 
PDF
Software Developer Training
rungwiroon komalittipong
 
PPTX
6 database
siragezeynu
 
PPT
A Metadata-Driven Approach to Computing Financial Analytics in a Relational D...
inscit2006
 
PDF
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
PPT
ordbms.ppt
HODCA1
 
PPTX
5.C#
Raghu nath
 
PPTX
introduction to database system concepts 2
Rajasekhar364622
 
PPTX
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
PDF
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
3 CityNetConf - sql+c#=u-sql
Łukasz Grala
 
Using existing language skillsets to create large-scale, cloud-based analytics
Microsoft Tech Community
 
Dive Into Azure Data Lake - PASS 2017
Ike Ellis
 
Paris Datageeks meetup 05102016
Michel Caradec
 
C# + SQL = Big Data
Sascha Dittmann
 
USQ Landdemos Azure Data Lake
Trivadis
 
Supporting Over a Thousand Custom Hive User Defined Functions
Databricks
 
Talavant Data Lake Analytics
Sean Forgatch
 
SQL Server 2005 CLR Integration
webhostingguy
 
Software Developer Training
rungwiroon komalittipong
 
6 database
siragezeynu
 
A Metadata-Driven Approach to Computing Financial Analytics in a Relational D...
inscit2006
 
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
ordbms.ppt
HODCA1
 
introduction to database system concepts 2
Rajasekhar364622
 
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
Ad

More from Michael Rys (7)

PPTX
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
PPTX
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
PPTX
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 

Recently uploaded (20)

PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 

U-SQL User-Defined Operators (UDOs) (SQLBits 2016)

  • 1. Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL User-Defined Operators (UDOs)
  • 2. Extend U-SQL with C#/.NET Built-in operators, function, aggregates C# expressions (in SELECT expressions) User-defined aggregates (UDAGGs) User-defined functions (UDFs) User-defined operators (UDOs)
  • 3. What are UDOs? User-Defined Extractors User-Defined Outputters User-Defined Processors • Take one row and produce one row • Pass-through versus transforming User-Defined Appliers • Take one row and produce 0 to n rows • Used with OUTER/CROSS APPLY User-Defined Combiners • Combines rowsets (like a user-defined join) User-Defined Reducers • Take n rows and produce 1 row Called with explicit U-SQL Syntax that takes a UDO instance (created as part of the execution): • EXTRACT • OUTPUT • PROCESS • COMBINE • REDUCE
  • 4. UDO/UDT Tips and Warnings • Use: • READONLY clause to allow pushing predicates through UDOs • REQUIRED clause to allow column pruning through UDOs • PRESORT (coming) • Use SELECT with UDFs instead of PROCESS • Use User-defined Aggregators instead of REDUCE • Hint Cardinality if you use CROSS APPLY and it does chose the wrong plan • Learn to use Windowing Functions (OVER expression) • Use SQL.MAP and SQL.ARRAY instead of C# Dictionary and array • Some use-cases for PROCESS/REDUCE/COMBINE: • The logic needs to dynamically access the input and/or output schema. E.g., create a JSON doc for the data in the row where the columns are not known apriori. • Your UDF based solution creates too much memory pressure and you can write your code more memory efficient in a UDO
  • 5. What are UDFs and UDAGGs? • UDFs are user-defined C# scalar functions that can be called like any scalar C# function • UDAGGs are user-defined aggregators • Called by special syntax AGG<…> • Enables templatized user-defined aggregators • UDFs, UDAGGs and UDOs must be provided by a referenced assembly
  • 7. UDO model • Marking UDOs • Parameterizing UDOs • UDO signature • UDO-specific processing pattern • Rowsets and their schemas in UDOs • Setting results • By position • By name [SqlUserDefinedExtractor] public class DriverExtractor : IExtractor { private byte[] _row_delim; private string _col_delim; private Encoding _encoding; // Define a non-default constructor since I want to pass in my own parameters public DriverExtractor( string row_delim = "rn", string col_delim = ",“ , Encoding encoding = null ) { _encoding = encoding == null ? Encoding.UTF8 : encoding; _row_delim = _encoding.GetBytes(row_delim); _col_delim = col_delim; } // DriverExtractor // Converting text to target schema private void OutputValueAtCol_I(string c, int i, IUpdatableRow outputrow) { var schema = outputrow.Schema; if (schema[i].Type == typeof(int)) { var tmp = Convert.ToInt32(c); outputrow.Set(i, tmp); } ... } //SerializeCol public override IEnumerable<IRow> Extract( IUnstructuredReader input , IUpdatableRow outputrow) { foreach (var row in input.Split(_row_delim)) { using(var s = new StreamReader(row, _encoding)) { int i = 0; foreach (var c in s.ReadToEnd().Split(new[] { _col_delim }, StringSplitOptions.None)) { OutputValueAtCol_I(c, i++, outputrow); } // foreach } // using yield return outputrow.AsReadOnly(); } // foreach } // Extract } // class DriverExtractor
  • 8. UDAGG model • UDAGG extends IAggregate interface • Requires implementation of Init(), Accumulate(), and Terminate() methods • Can have multiple arguments • Can be generic • Called with special syntax to provide support for generic UDAGGs public class MyCountAggregate : IAggregate<int, long> { private int count; public override void Init() { count = 0; } public override void Accumulate(int i) { count += i; } public override long Terminate(){ return count; } } public class MyTwoArgAggregate : IAggregate<string, long, int> { public override void Init() {…} public override void Accumulate(string s, long l) {…} public override int Terminate() {…} } public class GenericListAggregate<T1, TResult> : IAggregate<T1, TResult> where TResult : IList<T1>, new() { private TResult result; public override void Init() { this.result = new TResult(); } public override void Accumulate(T1 t1) { this.result.Add(t1);} public override TResult Terminate() { return this.result;} } SELECT AGG<MyNamespace.MyCountAggregate>(a) AS ms FROM @X;
  • 9. Additional Resources Documentation U-SQL UDO Expressions: https://msdn.microsoft.com/en- us/library/azure/mt621319.aspx U-SQL OUTPUT Statement: https://msdn.microsoft.com/en- us/library/azure/mt621334.aspx U-SQL UDO Programmer’s Guide: Under development U-SQL Performance Presentation: http://www.slideshare.net/MichaelRys/usql-query-execution- and-performance-tuning Sample Projects https://github.com/Azure/usql/tree/master/Examples/Ambulan ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data https://github.com/Azure/usql/tree/master/Examples/TweetAn alysis

Editor's Notes

  • #3: C# is the extension story for U-SQL Expressions in SELECT statement User-defined operators (UDOs) User-defined functions (UDFs) User-defined aggregates (UDAGGs) User-defined types (UDTs) UDOs are central to U-SQL user experience UDFs, UDAGGs, UDOs and UDTs require assemblies to be registered (one-time cost, fixed assembly version) UDFs UDAGGs, UDOs and UDTs will automatically be available after referencing assembly in script One version of assembly per database Assembly with same short name is not allowed Tooling provides code-behind and aut-odeploy experience