SlideShare a Scribd company logo
ETL for the Masses
Régis Baccaro – IBM
@regbac
Our Sponsors
Introduction

Régis Baccaro

@regbac

http://Theblobfarm.wordpress.com
http://Thelovefarm.wordpress.com

regis@baccaro.com
•
•
•
•
•

Founder and lead organizer of SQL Saturday Denmark
PASS Regional Mentor
Works for IBM
Passionate about the community
.Net developer, BI dude, SharePoint fellow and accidental DBA
Agenda
• Power Query and the M language
• E and T and L with Power Query
• Data refresh techniques with PQ
• Next step
Introduction
• Power Query
• Get data experience
• Filter and combine
• Embedded M for repeatable mashup

• Power Query Formula Language (aka M)
•
•
•
•
•

Mostly pure
Higher-order
Dynamically typed
Partially lazy 
Functional programming language
Elements of language
• Expressions – central construct
• Evaluated to a single vlaue

• Values
•
•
•
•
•

Primitives
List – ordered seq.
Record – set of fields
Table
Function
Evaluation
• Excel-like (surprise !)
• Nested records
• In Records
• In Lists

• Lazy evaluation
• Lists and Records (and let)

• Eager evaluation
• Everything else
Functions and Standard Library
• Mapping from a set of values to a single value
• (named parameters) => function body

• Common set of definitions
Operators
• Meaning varies depending on kind of value

• & = text or list concatenation and records merge
Metadata
• Information about a value that is associated with a value
• A record
• Exists for every value
• Unobtrusive way to add information
• Accessed with Value.Metadata
Let .....in expression
• So far only literal values
• Let allows a set of value to be:
• Computed
• Named
• Used in subsequent expressions that follows the in
let
in

Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......),
RowCount = Table.RowCount(Source)
RowCount
IF expression
• Select between 2 expression based on logical condition
Error expression
• When an expression evaluation cannot yield a value
• Raised with error
• Handled with try
• Produces an Error record
• try...otherwise Used with default values
Keywords and Operators
• and as each else error false if in is let meta not
otherwise or section shared then true try type
#binary #date #datetime #datetimezone #duration
#infinity #nan #sections #shared #table #time
• , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? =>
.. ...
The ”E” - Why is Power Query great for Extracting data
• Multiple data sources

Hey wait ! Where is PDW ?
Query folding - A step toward declarative ETL approach
• Declarative vs Imperative
• Query folding similar to predicate pushdown
• Does Power Query have a Query Optimizer ?
• Demo
Query folding - the unofficial list:
• SQL Databases
• OData and OData based sources, such
as the Windows Azure Marketplace
and SharePoint Lists
• Active Directory
• HDFS.Files, Folder.Files, and
Folder.Contents (for basic operations
on paths)

•
•
•
•

Column removal
Renaming
Joins
Type conversions
Real life scenario – ETL for the masses
• Seen a lot of demos
• Build a lot of demos
• They are always so clean !
Real life scenario
Transform
• M is how the magic happens!
• Data manipulation
• Records
• Lists
• Tables

• Merging
• Function calls
What about our scenario?
• Where should I get my data from?
• Pure Excel
• Excel and MDS/DQS/SSIS/SQL
• Web, SQL, XML, ?

• Let me show you ! Input
• (cvr web)
Let’s go to homegrown data?
• Bad web service
• Bad HTML structure
• Let’s go with local data that we can control

Isolated DB

• SQL Server
• Excel

• Let’s Query!
Local storage
Clean up before you merge!
• DQS
Knowledge base with CVR
+ Cleansing project with LinkedIn input
________________________________________
= Demo2.1_AndreasStrandbyClean

+

• Hit ratio increased...

Hit

250

Total

100%
90%
80%

200

70%
60%

150

50%

=

40%

100

30%
20%

50

10%
0

0%

Clean
join

Nested Merge
join
Smarter Power Query
• Expression.Evaluate()
• Examples
• Load query text from file
• Load function from file
• Passing parameters (as constants)

• Demo
Refreshing Power Query data
• Different solutions
• All with flaws !
Refreshing Power Query data – with VB6 !
• Back from 2006
Plus

Minus

Can be scheduled

VB6 – are you kidding ?

More robust than the non-technical
solution

• From Kim GreenLee
Refreshing Power Query data – with PowerShell

Plus

Minus

Robust

Hard to troubleshoot
Can not run in a task in windows task
scheduler unless the user has checked
that the user has to be logged on to run
Refreshing Power Query data – The non-technical way
• Let me show you !
Plus

Minus

Very easy

Not very corporate !
The spreadsheet needs to be open
Excel file not saved
Locked out when it refreshes
Refreshing Power Query data – The non-technical way part 2
• Let me show you !
Plus

Minus

Very easy

Not very corporate !

Uses technique from previous

The spreadsheet needs to be open
Refreshing Power Query data – with SSIS

Plus

Minus

Robust

Requires a SQL Server (wait, it’s a plus!)
Needs a SSIS / C# developer
Refreshing Power Query data – with SSIS
• Using DQS for cleansing input

• Let me show you !
How is Power query going to be used?
• Data store accumulating interesting data points
• Hook into read only data for reporting purposes or data marts
• One file to accumulate (Produce)
• Multiple files or programs to report (Consume)
• I don’t believe in “Data Steward”
• I believe someone will be in charge of procuring and monitoring
data stores of disparate data (such as IT or DBA’s).
Conclusion
• A step toward declarative ETL approach
• Still much work to do !
We have
• A declarative data integration language
• Only surfaced in Power Query
• Can push data to an Excel spreadsheet
Imagine.....
• Connection to heterogenous data sources
THANK YOU!
@REGBAC
HTTP://THEBLOBFARM.WORDPRESS.COM
REGIS@BACCARO.COM
Ad

More Related Content

What's hot (20)

PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQuery
in4400
 
October2019 release
October2019 releaseOctober2019 release
October2019 release
Berkovich Consulting
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introduction
Bishwadeb Dey
 
Power BI - WHat It Is, How It Works, and Why It Matters
Power BI -  WHat It Is, How It Works, and Why It MattersPower BI -  WHat It Is, How It Works, and Why It Matters
Power BI - WHat It Is, How It Works, and Why It Matters
John White
 
Ai in power platform
Ai in power platform Ai in power platform
Ai in power platform
Berkovich Consulting
 
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaHow to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
Vishal Pawar
 
Dax & sql in power bi
Dax & sql in power biDax & sql in power bi
Dax & sql in power bi
Berkovich Consulting
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint Server
SPC Adriatics
 
Power BI
Power BIPower BI
Power BI
Stéphane Fréchette
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Rightpoint
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01
Zeeshan Ikram
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
James Serra
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power bi
Satya Shyam K Jayanty
 
August2019 release PowerBI
August2019 release PowerBI August2019 release PowerBI
August2019 release PowerBI
Berkovich Consulting
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
Vishal Pawar
 
Power BI for CEO
Power BI for CEOPower BI for CEO
Power BI for CEO
Vishal Pawar
 
Primer on Power BI 201506
Primer on Power BI 201506Primer on Power BI 201506
Primer on Power BI 201506
Mark Tabladillo
 
Power BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsPower BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on facts
Ulysses Maclaren
 
Power bi
Power biPower bi
Power bi
Jeff Hart
 
Tableau vs PowerBI
Tableau vs PowerBITableau vs PowerBI
Tableau vs PowerBI
walnutmercury
 
PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQuery
in4400
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introduction
Bishwadeb Dey
 
Power BI - WHat It Is, How It Works, and Why It Matters
Power BI -  WHat It Is, How It Works, and Why It MattersPower BI -  WHat It Is, How It Works, and Why It Matters
Power BI - WHat It Is, How It Works, and Why It Matters
John White
 
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaHow to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
Vishal Pawar
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint Server
SPC Adriatics
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Rightpoint
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01
Zeeshan Ikram
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
James Serra
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power bi
Satya Shyam K Jayanty
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
Vishal Pawar
 
Primer on Power BI 201506
Primer on Power BI 201506Primer on Power BI 201506
Primer on Power BI 201506
Mark Tabladillo
 
Power BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsPower BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on facts
Ulysses Maclaren
 

Similar to ETL for the masses with Power Query and M (20)

Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
Ike Ellis
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAs
Steve Knutson
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
Adam Doyle
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Spark Summit
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
Sky Yin
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & Azure
Sanil Mhatre
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
Ike Ellis
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
European SharePoint Conference
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
Kellyn Pot'Vin-Gorman
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
BIWUG
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
Kellyn Pot'Vin-Gorman
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
Ike Ellis
 
Breaking data
Breaking dataBreaking data
Breaking data
Terry Bunio
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
David Mann
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for Analytics
Ike Ellis
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
elliando dias
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
Ike Ellis
 
Power BI Live Data Sets
Power BI Live Data SetsPower BI Live Data Sets
Power BI Live Data Sets
Jens Vestergaard
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
Ike Ellis
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAs
Steve Knutson
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
Adam Doyle
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Spark Summit
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
Sky Yin
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & Azure
Sanil Mhatre
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
Ike Ellis
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
European SharePoint Conference
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
BIWUG
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
Kellyn Pot'Vin-Gorman
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
Ike Ellis
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
David Mann
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for Analytics
Ike Ellis
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
Ike Ellis
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems
 
Ad

Recently uploaded (20)

Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Ad

ETL for the masses with Power Query and M

  • 1. ETL for the Masses Régis Baccaro – IBM @regbac
  • 3. Introduction Régis Baccaro @regbac http://Theblobfarm.wordpress.com http://Thelovefarm.wordpress.com [email protected] • • • • • Founder and lead organizer of SQL Saturday Denmark PASS Regional Mentor Works for IBM Passionate about the community .Net developer, BI dude, SharePoint fellow and accidental DBA
  • 4. Agenda • Power Query and the M language • E and T and L with Power Query • Data refresh techniques with PQ • Next step
  • 5. Introduction • Power Query • Get data experience • Filter and combine • Embedded M for repeatable mashup • Power Query Formula Language (aka M) • • • • • Mostly pure Higher-order Dynamically typed Partially lazy  Functional programming language
  • 6. Elements of language • Expressions – central construct • Evaluated to a single vlaue • Values • • • • • Primitives List – ordered seq. Record – set of fields Table Function
  • 7. Evaluation • Excel-like (surprise !) • Nested records • In Records • In Lists • Lazy evaluation • Lists and Records (and let) • Eager evaluation • Everything else
  • 8. Functions and Standard Library • Mapping from a set of values to a single value • (named parameters) => function body • Common set of definitions
  • 9. Operators • Meaning varies depending on kind of value • & = text or list concatenation and records merge
  • 10. Metadata • Information about a value that is associated with a value • A record • Exists for every value • Unobtrusive way to add information • Accessed with Value.Metadata
  • 11. Let .....in expression • So far only literal values • Let allows a set of value to be: • Computed • Named • Used in subsequent expressions that follows the in let in Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......), RowCount = Table.RowCount(Source) RowCount
  • 12. IF expression • Select between 2 expression based on logical condition
  • 13. Error expression • When an expression evaluation cannot yield a value • Raised with error • Handled with try • Produces an Error record • try...otherwise Used with default values
  • 14. Keywords and Operators • and as each else error false if in is let meta not otherwise or section shared then true try type #binary #date #datetime #datetimezone #duration #infinity #nan #sections #shared #table #time • , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? => .. ...
  • 15. The ”E” - Why is Power Query great for Extracting data • Multiple data sources Hey wait ! Where is PDW ?
  • 16. Query folding - A step toward declarative ETL approach • Declarative vs Imperative • Query folding similar to predicate pushdown • Does Power Query have a Query Optimizer ? • Demo Query folding - the unofficial list: • SQL Databases • OData and OData based sources, such as the Windows Azure Marketplace and SharePoint Lists • Active Directory • HDFS.Files, Folder.Files, and Folder.Contents (for basic operations on paths) • • • • Column removal Renaming Joins Type conversions
  • 17. Real life scenario – ETL for the masses • Seen a lot of demos • Build a lot of demos • They are always so clean !
  • 19. Transform • M is how the magic happens! • Data manipulation • Records • Lists • Tables • Merging • Function calls
  • 20. What about our scenario? • Where should I get my data from? • Pure Excel • Excel and MDS/DQS/SSIS/SQL • Web, SQL, XML, ? • Let me show you ! Input • (cvr web)
  • 21. Let’s go to homegrown data? • Bad web service • Bad HTML structure • Let’s go with local data that we can control Isolated DB • SQL Server • Excel • Let’s Query! Local storage
  • 22. Clean up before you merge! • DQS Knowledge base with CVR + Cleansing project with LinkedIn input ________________________________________ = Demo2.1_AndreasStrandbyClean + • Hit ratio increased... Hit 250 Total 100% 90% 80% 200 70% 60% 150 50% = 40% 100 30% 20% 50 10% 0 0% Clean join Nested Merge join
  • 23. Smarter Power Query • Expression.Evaluate() • Examples • Load query text from file • Load function from file • Passing parameters (as constants) • Demo
  • 24. Refreshing Power Query data • Different solutions • All with flaws !
  • 25. Refreshing Power Query data – with VB6 ! • Back from 2006 Plus Minus Can be scheduled VB6 – are you kidding ? More robust than the non-technical solution • From Kim GreenLee
  • 26. Refreshing Power Query data – with PowerShell Plus Minus Robust Hard to troubleshoot Can not run in a task in windows task scheduler unless the user has checked that the user has to be logged on to run
  • 27. Refreshing Power Query data – The non-technical way • Let me show you ! Plus Minus Very easy Not very corporate ! The spreadsheet needs to be open Excel file not saved Locked out when it refreshes
  • 28. Refreshing Power Query data – The non-technical way part 2 • Let me show you ! Plus Minus Very easy Not very corporate ! Uses technique from previous The spreadsheet needs to be open
  • 29. Refreshing Power Query data – with SSIS Plus Minus Robust Requires a SQL Server (wait, it’s a plus!) Needs a SSIS / C# developer
  • 30. Refreshing Power Query data – with SSIS • Using DQS for cleansing input • Let me show you !
  • 31. How is Power query going to be used? • Data store accumulating interesting data points • Hook into read only data for reporting purposes or data marts • One file to accumulate (Produce) • Multiple files or programs to report (Consume) • I don’t believe in “Data Steward” • I believe someone will be in charge of procuring and monitoring data stores of disparate data (such as IT or DBA’s).
  • 32. Conclusion • A step toward declarative ETL approach • Still much work to do ! We have • A declarative data integration language • Only surfaced in Power Query • Can push data to an Excel spreadsheet Imagine..... • Connection to heterogenous data sources