SlideShare a Scribd company logo
Michael Rys
Principal Program Manager, Big Data @ Microsoft
@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Partitioned Data and Tables
Data
Partitioning
Files
Tables
Partitioning of unstructured data
• Use File Sets to provide semantic partition pruning
Table Partitioning and Distribution
• Fine grained (horizontal) partitioning/distribution
• Distributes within a partition (together with clustering) to
keep same data values close
• Choose for:
• Join alignment, partition size, filter selectivity
• Coarse grained (vertical) partitioning
• Based on Partition keys
• Partition is addressable in language
• Query predicates will allow partition pruning
Distribution
Scheme
When to use?
HASH(keys) Automatic Hash for fast item lookup
DIRECT HASH(id) Exact control of hash bucket value
RANGE(keys) Keeps ranges together
ROUND ROBIN To get equal distribution (if others give skew)
Partitions,
Distributions and
Clusters L
o
g
i
c
a
l
PARTITION (@date1) PARTITION (@date2) PARTITION (@date3)
TABLE T ( key …, C …, date DateTime, …
, INDEX i CLUSTERED (key, C)
PARTITIONED BY BUCKETS (date)
HASH (key) INTO 4)
P
h
y
s
i
c
a
l
HASH DISTRIBUTION 1
HASH DISTRIBUTION 2
HASH DISTRIBUTION 3
HASH DISTRIBUTION 1
HASH DISTRIBUTION 1
HASH DISTRIBUTION 2
HASH DISTRIBUTION 3
HASH DISTRIBUTION 4 HASH DISTRIBUTION 3
C1
C2
C3
C1
C2
C4
C5
C4
C6
C6
C7
C8
C7
C5
C6
C9
C10
C1
C3
/catalog/…/tables/Guid(T)/
Guid(T.p1).ss Guid(T.p2).ss Guid(T.p3).ss
U-SQL Partitioned Data and Tables (SQLBits 2016)
ADL Store Basics
A VERY BIG FILE
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Files are split apart into Extents.
For availability and reliability, extents
are replicated (3 copies).
Enables:
• Parallel read
• Parallel write
Extent
As file size increases, more opportunities for
parallelism
Vertex
Extent Vertex
Extent Vertex
Extent Vertex
Small File Bigger File
Search engine clicks data set
A log of how many clicks a certain domain got within a session
SessionID Domain Clicks
3 cnn.com 9
1 whitehouse.gov 14
2 facebook.com 8
3 reddit.com 78
2 microsoft.com 1
1 facebook.com 5
3 microsoft.com 11
Data Partitioning Compared
Extent 2 Extent 3Extent 1
File
Keys (Domain) are scattered across
the extents
Extent 2 Extent 3
FB
WH
CNN
FB
WH
CNN
FB
WH
CNN
WH
WH
WH
CNN
CNN
CNN
FB
FB
FB
Extent 1
U-SQL Table partitioned on Domain
The keys are now “close together” also
the index tells U-SQL exactly which
extents contain the key
CREATE TABLE MyDB.dbo.ClickData
(
SessionId int,
Domain string,
Clinks int,
INDEX idx1
CLUSTERED (Domain ASC)
PARTITIONED BY HASH (Domain) INTO 3
);
INSERT INTO MyDB.dbo.ClickData
SELECT *
FROM @clickdata;
Creating and Filling a U-SQL Table
Find all the rows for cnn.com
@ClickData =
SELECT
Session int,
Domain string,
Clicks int
FROM “/clickdata.tsv”
USING Extractors.Tsv();
@rows = SELECT *
FROM @ClickData
WHERE Domain == “cnn.com”;
OUTPUT @rows
TO “/output.tsv”
USING Outputters.tsv();
@ClickData =
SELECT *
FROM MyDB.dbo.ClickData;
@rows = SELECT *
FROM @ClickData
WHERE Domain == “cnn.com”;
OUTPUT @rows
TO “/output.tsv”
USING Outputters.tsv();
File U-SQL Table partitioned on Domain
Read Read
Write Write Write
Read
Filter Filter Filter
CNN,
FB,
WH
EXTENT 1 EXTENT 2 EXTENT 3
CNN,
FB,
WH
CNN,
FB,
WH
Because “CNN” could be
anywhere, all extents must be
read.
Read
Write
Filter
FB
EXTENT 1 EXTENT 2 EXTENT 3
WH CNN
Thanks to “Partition Elimination”
and the U-SQL Table, the job only
reads from the extent that is
known to have the relevant key
File U-SQL Table Distributed by Domain
How many clicks per domain?
@rows = SELECT
Domain,
SUM(Clicks) AS TotalClicks
FROM @ClickData
GROUP BY Domain;
File
Read Read
Partition Partition
Full Agg
Write
Full Agg
Write
Full Agg
Write
Read
Partition
Partial Agg Partial Agg Partial Agg
CNN,
FB,
WH
EXTENT 1 EXTENT 2 EXTENT 3
CNN,
FB,
WH
CNN,
FB,
WH
U-SQL Table Distributed by Domain
Read Read
Full Agg Full Agg
Write Write
Read
Full Agg
Write
FB
EXTENT 1
WH
EXTENT 2
CNN
EXTENT 3
Expensive!
Benefits of
Partitioned Tables
Benefits
• Partitions are addressable
• Enables finer-grained data lifecycle management at
partition level
• Manage parallelism in querying by number of partitions
• Query predicates provide partition elimination
• Predicate has to be constant-foldable
Use partitioned tables for
• Managing large amounts of incrementally growing
structured data
• Queries with strong locality predicates
• point in time, for specific market etc
• Managing windows of data
• provide data for last x months for processing
Benefits of
Distribution in
Tables
Benefits
• Design for most frequent/costly queries
• Manage data skew in partition/table
• Manage parallelism in querying (by number of
distributions)
• Manage minimizing data movement in joins
• Provide distribution seeks and range scans for query
predicates (distribution bucket elimination)
Distribution in tables is mandatory, chose according to
desired benefits
Benefits of
Clustered Index
in Distribution
Benefits
• Design for most frequent/costly queries
• Manage data skew in distribution bucket
• Provide locality of same data values
• Provide seeks and range scans for query predicates (index
lookup)
Clustered index in tables is mandatory, chose according to
desired benefits
Pro Tip:
Distribution keys should be prefix of Clustered Index keys
// TABLE(s) - Structured Files (24 hours daily log impressions)
CREATE TABLE Impressions (Day DateTime, Market string, ClientId int, ...
INDEX IX CLUSTERED(Market, ClientId)
PARTITIONED BY
BUCKETS (Day)
HASH(Market, ClientId) INTO 100
);
DECLARE @today DateTime = DateTime.Parse("2015/10/30");
// Market = Vertical Partitioning
ALTER TABLE Impressions ADD PARTITION (@today);
// …
// Daily INSERT(s)
INSERT INTO Impressions(Market, ClientId)
PARTITION(@today)
SELECT * FROM @Q
;
// …
// Both levels are elimination (H+V)
@Impressions =
SELECT * FROM dbo.Impressions
WHERE
Market == "en"
AND Day == @today
;
U-SQL Optimizations
Partition Elimination – TABLE(s)
Partition Elimination
• Horizontal and vertical partitioning
• Horizontal is traditional within file (range, hash, robin)
• Vertical is across files (bucketing)
• Immutable file system
• Design according to your access patterns
Enumerate all partitions filtering for today
30.ss
30.1.ss
29.ss
28.ss
29.1.ss
Impressions
…
deen
jp
de
PE across files + within each file
@Inpressions =
SELECT * FROM
searchDM.SML.PageView(@start, @end) AS PageView
OPTION(LOWDISTINCTNESS=Query)
;
// Q1(A,B)
@Sessions =
SELECT
ClientId,
Query,
SUM(PageClicks) AS Clicks
FROM
@Impressions
GROUP BY
Query, ClientId
;
// Q2(B)
@Display =
SELECT * FROM @Sessions
INNER JOIN @Campaigns
ON @Sessions.Query == @Campaigns.Query
;
U-SQL Optimizations
Partitioning – Minimize (re)partitions
Input must be partitioned on:
(Query)
Input must be partitioned on:
(Query) or (ClientId) or (Query, ClientId)
Optimizer wants to partition only once
But Query could be skewed
Data Partitioning
• Re-Partitioning is very expensive
• Many U-SQL operators can handle multiple partitioning choices
• Optimizer bases decision upon estimations
Wrong statistics may result in worse query performance
// Unstructured (24 hours daily log impressions)
@Huge = EXTRACT ClientId int, ...
FROM @"wasb://ads@wcentralus/2015/10/30/{*}.nif"
;
// Small subset (ie: ForgetMe opt out)
@Small = SELECT * FROM @Huge
WHERE Bing.ForgetMe(x,y,z)
OPTION(ROWCOUNT=500)
;
// Result (not enough info to determine simple Broadcast
join)
@Remove = SELECT * FROM Bing.Sessions
INNER JOIN @Small ON Sessions.Client ==
@Small.Client
;
U-SQL Optimizations
Partitioning - Cardinality
Broadcast JOIN right?
Broadcast is now a candidate.
Wrong statistics may result in worse query performance
=> CREATE STATISTICS
Optimizer has no stats this is small...
U-SQL Partitioned Data and Tables (SQLBits 2016)
Partitioned
tables
Use partitioned tables for
querying parts of large
amounts of incrementally
growing structured data
Get partition elimination
optimizations with the
right query predicates
Creating partition table
CREATE TABLE PartTable(id int, event_date DateTime, lat float, long float
, INDEX idx CLUSTERED (vehicle_id ASC)
PARTITIONED BY BUCKETS (event_date) HASH (vehicle_id) INTO 4);
Creating partitions
DECLARE @pdate1 DateTime = new DateTime(2014, 9, 14, 00,00,00,00,DateTimeKind.Utc);
DECLARE @pdate2 DateTime = new DateTime(2014, 9, 15, 00,00,00,00,DateTimeKind.Utc);
ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@pdate2);
Loading data into partitions dynamically
DECLARE @date1 DateTime = DateTime.Parse("2014-09-14");
DECLARE @date2 DateTime = DateTime.Parse("2014-09-16");
INSERT INTO vehiclesP ON INTEGRITY VIOLATION IGNORE
SELECT vehicle_id, event_date, lat, long FROM @data
WHERE event_date >= @date1 AND event_date <= @date2;
• Filters and inserts clean data only, ignore “dirty” data
Loading data into partitions statically
ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@baddate);
INSERT INTO vehiclesP ON INTEGRITY VIOLATION MOVE TO @baddate
SELECT vehicle_id, lat, long FROM @data
WHERE event_date >= @date1 AND event_date <= @date2;
• Filters and inserts clean data only, put “dirty” data into special partition
Data “Skew”
(aka “a vertex is receiving too much data”)
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
California
Texas
NewYork
Florida
Illinois
Pennsylvania
Ohio
Georgia
Michigan
NorthCarolina
NewJersey
Virginia
Washington
Massachusetts
Arizona
Indiana
Tennessee
Missouri
Maryland
Wisconsin
Minnesota
Colorado
Alabama
SouthCarolina
Louisiana
Kentucky
Oregon
Oklahoma
Connecticut
Iowa
Mississippi
Arkansas
Kansas
Utah
Nevada
NewMexico
Nebraska
WestVirginia
Idaho
Hawaii
Maine
NewHampshire
RhodeIsland
Montana
Delaware
SouthDakota
Alaska
NorthDakota
DistrictofColumbia
Vermont
Wyoming
Population by State
Data Skew
U-SQL Table partitioned on
Domain
Relatively even distribution
Extent 2 Extent 3
WH
CNNFB
Extent 1
U-SQL Table partitioned on Domain
Skewed Distribution
Extent 2 Extent 3
WH CNNFB
Extent 1
Why is this a problem?
• Vertexes have a 5 hour runtime limit!
• Your UDO may excessively allocate memory.
• Your memory usage may not be obvious due to garbage collection
Diagnostics with Data Skew
Data Skew Graph A lot of data brought
to a couple of vertexes
What are your Options?
• Re-partition your input data to get a better distribution
• Use a different partitioning scheme
• Pick a different key
• Use more than one key for partitioning
• Use Data Hints to identify “low distinctness” in keys
@rows =
SELECT
Gender,
AGG<MyAgg>(Income) AS Result
FROM
@HugeInput
GROUP BY Gender;
Gender==Female
@HugeInput
Vertex 0 Vertex 1
Gender==Male
What are your Options?
• Use a Recursive Aggregator (if possible)
• If a Row-Level combiner mode (if possible)
A non-recursive operation
VERTEX 1
1 2 3 4 5 6 7 8 36
Implement a custom SUM aggregator…Implement a custom SUM aggregator…
A recursive operation
Vertex 3Vertex 2Vertex 1
1 2 3 4 5 6 7 8
6 15 15
36
Not all operations can be made recursive!
High-Level Performance Advice
Learn U-SQL
Leverage Native U-SQL Constructs first
UDOs are Evil 
Can’t optimize UDOs like pure U-SQL
code.
Understand your Data
Volume, Distribution, Partitioning,
Growth
Additional
Resources
Documentation
Tables and Partitions: https://msdn.microsoft.com/en-
us/library/azure/mt621324.aspx
Statistics: https://msdn.microsoft.com/en-
us/library/azure/mt621312.aspx
U-SQL Performance Presentation:
http://www.slideshare.net/MichaelRys/usql-query-execution-
and-performance-tuning
Sample Data
https://github.com/Azure/usql/blob/master/Examples/Samples
/Data/AmbulanceData
Sample Project
https://github.com/Azure/usql/tree/master/Examples/Ambulan
ceDemos
http://aka.ms/AzureDataLake
Ad

More Related Content

What's hot (20)

U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Michael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
Michael Rys
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
Michael Rys
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
Chester Chen
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With Hadoop
Julian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
kristinferrier
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Julian Hyde
 
What's new in Mondrian 4?
What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?
Julian Hyde
 
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Julian Hyde
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Michael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
Michael Rys
 
U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
Michael Rys
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
Michael Rys
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
Chester Chen
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With Hadoop
Julian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
kristinferrier
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Julian Hyde
 
What's new in Mondrian 4?
What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?
Julian Hyde
 
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Julian Hyde
 

Viewers also liked (14)

U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
How do we read film?
How do we read film?How do we read film?
How do we read film?
sudhishkamath
 
Knowledge to action: changing the dynamic between patients and providers - en...
Knowledge to action: changing the dynamic between patients and providers - en...Knowledge to action: changing the dynamic between patients and providers - en...
Knowledge to action: changing the dynamic between patients and providers - en...
Paul Gallant
 
Sage Success Story Island Lake Tesort
Sage Success Story Island Lake TesortSage Success Story Island Lake Tesort
Sage Success Story Island Lake Tesort
David Beard
 
Salt Water Pool - Is it just a Fad?
Salt Water Pool - Is it just a Fad?Salt Water Pool - Is it just a Fad?
Salt Water Pool - Is it just a Fad?
Chlorine Genie Inc
 
حضرت امام حسین رضی اللہ تعالی کا عظیم الشان مقام - Hazrat Imam Hussain (Razia...
حضرت امام حسین رضی اللہ تعالی کا عظیم الشان مقام - Hazrat Imam Hussain (Razia...حضرت امام حسین رضی اللہ تعالی کا عظیم الشان مقام - Hazrat Imam Hussain (Razia...
حضرت امام حسین رضی اللہ تعالی کا عظیم الشان مقام - Hazrat Imam Hussain (Razia...
muzaffertahir9
 
え?まだMAMPで消耗してんの?
え?まだMAMPで消耗してんの?え?まだMAMPで消耗してんの?
え?まだMAMPで消耗してんの?
Takayuki Miyauchi
 
20161223_Mobility as a Service - postgrau smart mobility UPC_PDF
20161223_Mobility as a Service - postgrau smart mobility UPC_PDF20161223_Mobility as a Service - postgrau smart mobility UPC_PDF
20161223_Mobility as a Service - postgrau smart mobility UPC_PDF
Josep Laborda
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
Ilyas F ☁☁☁
 
How do we read film?
How do we read film?How do we read film?
How do we read film?
sudhishkamath
 
Knowledge to action: changing the dynamic between patients and providers - en...
Knowledge to action: changing the dynamic between patients and providers - en...Knowledge to action: changing the dynamic between patients and providers - en...
Knowledge to action: changing the dynamic between patients and providers - en...
Paul Gallant
 
Sage Success Story Island Lake Tesort
Sage Success Story Island Lake TesortSage Success Story Island Lake Tesort
Sage Success Story Island Lake Tesort
David Beard
 
Salt Water Pool - Is it just a Fad?
Salt Water Pool - Is it just a Fad?Salt Water Pool - Is it just a Fad?
Salt Water Pool - Is it just a Fad?
Chlorine Genie Inc
 
حضرت امام حسین رضی اللہ تعالی کا عظیم الشان مقام - Hazrat Imam Hussain (Razia...
حضرت امام حسین رضی اللہ تعالی کا عظیم الشان مقام - Hazrat Imam Hussain (Razia...حضرت امام حسین رضی اللہ تعالی کا عظیم الشان مقام - Hazrat Imam Hussain (Razia...
حضرت امام حسین رضی اللہ تعالی کا عظیم الشان مقام - Hazrat Imam Hussain (Razia...
muzaffertahir9
 
え?まだMAMPで消耗してんの?
え?まだMAMPで消耗してんの?え?まだMAMPで消耗してんの?
え?まだMAMPで消耗してんの?
Takayuki Miyauchi
 
20161223_Mobility as a Service - postgrau smart mobility UPC_PDF
20161223_Mobility as a Service - postgrau smart mobility UPC_PDF20161223_Mobility as a Service - postgrau smart mobility UPC_PDF
20161223_Mobility as a Service - postgrau smart mobility UPC_PDF
Josep Laborda
 
Ad

Similar to U-SQL Partitioned Data and Tables (SQLBits 2016) (20)

MySQL Indexes
MySQL IndexesMySQL Indexes
MySQL Indexes
Anton Zhukov
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
QUONTRASOLUTIONS
 
Informix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableInformix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_table
Keshav Murthy
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
ukdpe
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
Eric Nelson
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
Julian Hyde
 
Physical Design and Development
Physical Design and DevelopmentPhysical Design and Development
Physical Design and Development
Er. Nawaraj Bhandari
 
Less08 Schema
Less08 SchemaLess08 Schema
Less08 Schema
vivaankumar
 
vFabric SQLFire Introduction
vFabric SQLFire IntroductionvFabric SQLFire Introduction
vFabric SQLFire Introduction
Jags Ramnarayan
 
Lecture05sql 110406195130-phpapp02
Lecture05sql 110406195130-phpapp02Lecture05sql 110406195130-phpapp02
Lecture05sql 110406195130-phpapp02
Lalit009kumar
 
Sql server lesson7
Sql server lesson7Sql server lesson7
Sql server lesson7
Ala Qunaibi
 
Training on Microsoft SQL Server(older version).pptx
Training on Microsoft SQL Server(older version).pptxTraining on Microsoft SQL Server(older version).pptx
Training on Microsoft SQL Server(older version).pptx
naibedyakar00
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
Duyhai Doan
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
Cobus Bernard
 
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1 "Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
Andriy Krayniy
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
Lviv Startup Club
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
Boris Hristov
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
n5712036
 
Introduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for CassandraIntroduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for Cassandra
DataStax Academy
 
Basics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesBasics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration Techniques
Valmik Potbhare
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
QUONTRASOLUTIONS
 
Informix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableInformix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_table
Keshav Murthy
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
ukdpe
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
Eric Nelson
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
Julian Hyde
 
vFabric SQLFire Introduction
vFabric SQLFire IntroductionvFabric SQLFire Introduction
vFabric SQLFire Introduction
Jags Ramnarayan
 
Lecture05sql 110406195130-phpapp02
Lecture05sql 110406195130-phpapp02Lecture05sql 110406195130-phpapp02
Lecture05sql 110406195130-phpapp02
Lalit009kumar
 
Sql server lesson7
Sql server lesson7Sql server lesson7
Sql server lesson7
Ala Qunaibi
 
Training on Microsoft SQL Server(older version).pptx
Training on Microsoft SQL Server(older version).pptxTraining on Microsoft SQL Server(older version).pptx
Training on Microsoft SQL Server(older version).pptx
naibedyakar00
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
Duyhai Doan
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
Cobus Bernard
 
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1 "Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
Andriy Krayniy
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
Lviv Startup Club
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
Boris Hristov
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
n5712036
 
Introduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for CassandraIntroduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for Cassandra
DataStax Academy
 
Basics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesBasics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration Techniques
Valmik Potbhare
 
Ad

More from Michael Rys (8)

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 

Recently uploaded (20)

EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
History of Science and Technologyandits source.pptx
History of Science and Technologyandits source.pptxHistory of Science and Technologyandits source.pptx
History of Science and Technologyandits source.pptx
balongcastrojo
 
presentation of first program exist.pptx
presentation of first program exist.pptxpresentation of first program exist.pptx
presentation of first program exist.pptx
MajidAzeemChohan
 
Call illuminati Agent in uganda+256776963507/0741506136
Call illuminati Agent in uganda+256776963507/0741506136Call illuminati Agent in uganda+256776963507/0741506136
Call illuminati Agent in uganda+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
shit yudh slideshare power likha point presen
shit yudh slideshare power likha point presenshit yudh slideshare power likha point presen
shit yudh slideshare power likha point presen
vishalgurjar11229
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
AllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptxAllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptx
bpkr84
 
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptxPRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
JayeshTaneja4
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
KNN_Logistic_Regression_Presentation_Styled.pptx
KNN_Logistic_Regression_Presentation_Styled.pptxKNN_Logistic_Regression_Presentation_Styled.pptx
KNN_Logistic_Regression_Presentation_Styled.pptx
sonujha1980712
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
History of Science and Technologyandits source.pptx
History of Science and Technologyandits source.pptxHistory of Science and Technologyandits source.pptx
History of Science and Technologyandits source.pptx
balongcastrojo
 
presentation of first program exist.pptx
presentation of first program exist.pptxpresentation of first program exist.pptx
presentation of first program exist.pptx
MajidAzeemChohan
 
shit yudh slideshare power likha point presen
shit yudh slideshare power likha point presenshit yudh slideshare power likha point presen
shit yudh slideshare power likha point presen
vishalgurjar11229
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
AllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptxAllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptx
bpkr84
 
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptxPRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
PRE-NATAL GRnnnmnnnnmmOWTH seminar[1].pptx
JayeshTaneja4
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
KNN_Logistic_Regression_Presentation_Styled.pptx
KNN_Logistic_Regression_Presentation_Styled.pptxKNN_Logistic_Regression_Presentation_Styled.pptx
KNN_Logistic_Regression_Presentation_Styled.pptx
sonujha1980712
 

U-SQL Partitioned Data and Tables (SQLBits 2016)

  • 1. Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Partitioned Data and Tables
  • 2. Data Partitioning Files Tables Partitioning of unstructured data • Use File Sets to provide semantic partition pruning Table Partitioning and Distribution • Fine grained (horizontal) partitioning/distribution • Distributes within a partition (together with clustering) to keep same data values close • Choose for: • Join alignment, partition size, filter selectivity • Coarse grained (vertical) partitioning • Based on Partition keys • Partition is addressable in language • Query predicates will allow partition pruning Distribution Scheme When to use? HASH(keys) Automatic Hash for fast item lookup DIRECT HASH(id) Exact control of hash bucket value RANGE(keys) Keeps ranges together ROUND ROBIN To get equal distribution (if others give skew)
  • 3. Partitions, Distributions and Clusters L o g i c a l PARTITION (@date1) PARTITION (@date2) PARTITION (@date3) TABLE T ( key …, C …, date DateTime, … , INDEX i CLUSTERED (key, C) PARTITIONED BY BUCKETS (date) HASH (key) INTO 4) P h y s i c a l HASH DISTRIBUTION 1 HASH DISTRIBUTION 2 HASH DISTRIBUTION 3 HASH DISTRIBUTION 1 HASH DISTRIBUTION 1 HASH DISTRIBUTION 2 HASH DISTRIBUTION 3 HASH DISTRIBUTION 4 HASH DISTRIBUTION 3 C1 C2 C3 C1 C2 C4 C5 C4 C6 C6 C7 C8 C7 C5 C6 C9 C10 C1 C3 /catalog/…/tables/Guid(T)/ Guid(T.p1).ss Guid(T.p2).ss Guid(T.p3).ss
  • 5. ADL Store Basics A VERY BIG FILE 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Files are split apart into Extents. For availability and reliability, extents are replicated (3 copies). Enables: • Parallel read • Parallel write
  • 6. Extent As file size increases, more opportunities for parallelism Vertex Extent Vertex Extent Vertex Extent Vertex Small File Bigger File
  • 7. Search engine clicks data set A log of how many clicks a certain domain got within a session SessionID Domain Clicks 3 cnn.com 9 1 whitehouse.gov 14 2 facebook.com 8 3 reddit.com 78 2 microsoft.com 1 1 facebook.com 5 3 microsoft.com 11
  • 8. Data Partitioning Compared Extent 2 Extent 3Extent 1 File Keys (Domain) are scattered across the extents Extent 2 Extent 3 FB WH CNN FB WH CNN FB WH CNN WH WH WH CNN CNN CNN FB FB FB Extent 1 U-SQL Table partitioned on Domain The keys are now “close together” also the index tells U-SQL exactly which extents contain the key
  • 9. CREATE TABLE MyDB.dbo.ClickData ( SessionId int, Domain string, Clinks int, INDEX idx1 CLUSTERED (Domain ASC) PARTITIONED BY HASH (Domain) INTO 3 ); INSERT INTO MyDB.dbo.ClickData SELECT * FROM @clickdata; Creating and Filling a U-SQL Table
  • 10. Find all the rows for cnn.com @ClickData = SELECT Session int, Domain string, Clicks int FROM “/clickdata.tsv” USING Extractors.Tsv(); @rows = SELECT * FROM @ClickData WHERE Domain == “cnn.com”; OUTPUT @rows TO “/output.tsv” USING Outputters.tsv(); @ClickData = SELECT * FROM MyDB.dbo.ClickData; @rows = SELECT * FROM @ClickData WHERE Domain == “cnn.com”; OUTPUT @rows TO “/output.tsv” USING Outputters.tsv(); File U-SQL Table partitioned on Domain
  • 11. Read Read Write Write Write Read Filter Filter Filter CNN, FB, WH EXTENT 1 EXTENT 2 EXTENT 3 CNN, FB, WH CNN, FB, WH Because “CNN” could be anywhere, all extents must be read. Read Write Filter FB EXTENT 1 EXTENT 2 EXTENT 3 WH CNN Thanks to “Partition Elimination” and the U-SQL Table, the job only reads from the extent that is known to have the relevant key File U-SQL Table Distributed by Domain
  • 12. How many clicks per domain? @rows = SELECT Domain, SUM(Clicks) AS TotalClicks FROM @ClickData GROUP BY Domain;
  • 13. File Read Read Partition Partition Full Agg Write Full Agg Write Full Agg Write Read Partition Partial Agg Partial Agg Partial Agg CNN, FB, WH EXTENT 1 EXTENT 2 EXTENT 3 CNN, FB, WH CNN, FB, WH U-SQL Table Distributed by Domain Read Read Full Agg Full Agg Write Write Read Full Agg Write FB EXTENT 1 WH EXTENT 2 CNN EXTENT 3 Expensive!
  • 14. Benefits of Partitioned Tables Benefits • Partitions are addressable • Enables finer-grained data lifecycle management at partition level • Manage parallelism in querying by number of partitions • Query predicates provide partition elimination • Predicate has to be constant-foldable Use partitioned tables for • Managing large amounts of incrementally growing structured data • Queries with strong locality predicates • point in time, for specific market etc • Managing windows of data • provide data for last x months for processing
  • 15. Benefits of Distribution in Tables Benefits • Design for most frequent/costly queries • Manage data skew in partition/table • Manage parallelism in querying (by number of distributions) • Manage minimizing data movement in joins • Provide distribution seeks and range scans for query predicates (distribution bucket elimination) Distribution in tables is mandatory, chose according to desired benefits
  • 16. Benefits of Clustered Index in Distribution Benefits • Design for most frequent/costly queries • Manage data skew in distribution bucket • Provide locality of same data values • Provide seeks and range scans for query predicates (index lookup) Clustered index in tables is mandatory, chose according to desired benefits Pro Tip: Distribution keys should be prefix of Clustered Index keys
  • 17. // TABLE(s) - Structured Files (24 hours daily log impressions) CREATE TABLE Impressions (Day DateTime, Market string, ClientId int, ... INDEX IX CLUSTERED(Market, ClientId) PARTITIONED BY BUCKETS (Day) HASH(Market, ClientId) INTO 100 ); DECLARE @today DateTime = DateTime.Parse("2015/10/30"); // Market = Vertical Partitioning ALTER TABLE Impressions ADD PARTITION (@today); // … // Daily INSERT(s) INSERT INTO Impressions(Market, ClientId) PARTITION(@today) SELECT * FROM @Q ; // … // Both levels are elimination (H+V) @Impressions = SELECT * FROM dbo.Impressions WHERE Market == "en" AND Day == @today ; U-SQL Optimizations Partition Elimination – TABLE(s) Partition Elimination • Horizontal and vertical partitioning • Horizontal is traditional within file (range, hash, robin) • Vertical is across files (bucketing) • Immutable file system • Design according to your access patterns Enumerate all partitions filtering for today 30.ss 30.1.ss 29.ss 28.ss 29.1.ss Impressions … deen jp de PE across files + within each file
  • 18. @Inpressions = SELECT * FROM searchDM.SML.PageView(@start, @end) AS PageView OPTION(LOWDISTINCTNESS=Query) ; // Q1(A,B) @Sessions = SELECT ClientId, Query, SUM(PageClicks) AS Clicks FROM @Impressions GROUP BY Query, ClientId ; // Q2(B) @Display = SELECT * FROM @Sessions INNER JOIN @Campaigns ON @Sessions.Query == @Campaigns.Query ; U-SQL Optimizations Partitioning – Minimize (re)partitions Input must be partitioned on: (Query) Input must be partitioned on: (Query) or (ClientId) or (Query, ClientId) Optimizer wants to partition only once But Query could be skewed Data Partitioning • Re-Partitioning is very expensive • Many U-SQL operators can handle multiple partitioning choices • Optimizer bases decision upon estimations Wrong statistics may result in worse query performance
  • 19. // Unstructured (24 hours daily log impressions) @Huge = EXTRACT ClientId int, ... FROM @"wasb://ads@wcentralus/2015/10/30/{*}.nif" ; // Small subset (ie: ForgetMe opt out) @Small = SELECT * FROM @Huge WHERE Bing.ForgetMe(x,y,z) OPTION(ROWCOUNT=500) ; // Result (not enough info to determine simple Broadcast join) @Remove = SELECT * FROM Bing.Sessions INNER JOIN @Small ON Sessions.Client == @Small.Client ; U-SQL Optimizations Partitioning - Cardinality Broadcast JOIN right? Broadcast is now a candidate. Wrong statistics may result in worse query performance => CREATE STATISTICS Optimizer has no stats this is small...
  • 21. Partitioned tables Use partitioned tables for querying parts of large amounts of incrementally growing structured data Get partition elimination optimizations with the right query predicates Creating partition table CREATE TABLE PartTable(id int, event_date DateTime, lat float, long float , INDEX idx CLUSTERED (vehicle_id ASC) PARTITIONED BY BUCKETS (event_date) HASH (vehicle_id) INTO 4); Creating partitions DECLARE @pdate1 DateTime = new DateTime(2014, 9, 14, 00,00,00,00,DateTimeKind.Utc); DECLARE @pdate2 DateTime = new DateTime(2014, 9, 15, 00,00,00,00,DateTimeKind.Utc); ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@pdate2); Loading data into partitions dynamically DECLARE @date1 DateTime = DateTime.Parse("2014-09-14"); DECLARE @date2 DateTime = DateTime.Parse("2014-09-16"); INSERT INTO vehiclesP ON INTEGRITY VIOLATION IGNORE SELECT vehicle_id, event_date, lat, long FROM @data WHERE event_date >= @date1 AND event_date <= @date2; • Filters and inserts clean data only, ignore “dirty” data Loading data into partitions statically ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@baddate); INSERT INTO vehiclesP ON INTEGRITY VIOLATION MOVE TO @baddate SELECT vehicle_id, lat, long FROM @data WHERE event_date >= @date1 AND event_date <= @date2; • Filters and inserts clean data only, put “dirty” data into special partition
  • 22. Data “Skew” (aka “a vertex is receiving too much data”)
  • 24. Data Skew U-SQL Table partitioned on Domain Relatively even distribution Extent 2 Extent 3 WH CNNFB Extent 1 U-SQL Table partitioned on Domain Skewed Distribution Extent 2 Extent 3 WH CNNFB Extent 1
  • 25. Why is this a problem? • Vertexes have a 5 hour runtime limit! • Your UDO may excessively allocate memory. • Your memory usage may not be obvious due to garbage collection
  • 27. Data Skew Graph A lot of data brought to a couple of vertexes
  • 28. What are your Options? • Re-partition your input data to get a better distribution • Use a different partitioning scheme • Pick a different key • Use more than one key for partitioning • Use Data Hints to identify “low distinctness” in keys
  • 29. @rows = SELECT Gender, AGG<MyAgg>(Income) AS Result FROM @HugeInput GROUP BY Gender;
  • 31. What are your Options? • Use a Recursive Aggregator (if possible) • If a Row-Level combiner mode (if possible)
  • 32. A non-recursive operation VERTEX 1 1 2 3 4 5 6 7 8 36 Implement a custom SUM aggregator…Implement a custom SUM aggregator…
  • 33. A recursive operation Vertex 3Vertex 2Vertex 1 1 2 3 4 5 6 7 8 6 15 15 36 Not all operations can be made recursive!
  • 35. Learn U-SQL Leverage Native U-SQL Constructs first UDOs are Evil  Can’t optimize UDOs like pure U-SQL code. Understand your Data Volume, Distribution, Partitioning, Growth
  • 36. Additional Resources Documentation Tables and Partitions: https://msdn.microsoft.com/en- us/library/azure/mt621324.aspx Statistics: https://msdn.microsoft.com/en- us/library/azure/mt621312.aspx U-SQL Performance Presentation: http://www.slideshare.net/MichaelRys/usql-query-execution- and-performance-tuning Sample Data https://github.com/Azure/usql/blob/master/Examples/Samples /Data/AmbulanceData Sample Project https://github.com/Azure/usql/tree/master/Examples/Ambulan ceDemos

Editor's Notes

  • #21: https://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/5-Ambulance-StreamSets-PartitionedTables