DWI - Lecture - 3 - SQL Pivot Case Groupping
DWI - Lecture - 3 - SQL Pivot Case Groupping
Data Warehouses
• Agenda:
Short reminder
BI ecosystem
Data / information / knowledge
Introduction to Tableau
Querying data using SQL – data analysis perspective
Business Advantage
Driven by data
Enterprises today are driven by data, or, to be more precise, information that
is gleaned from data
•It sheds light on what is unknown, it reduces uncertainty, and it turns decision-making from an
art to a science.
But whether it’s Big Data or just plain old data, it requires a lot of work
before it is actually something useful
This is where you need data integration to unify and manage the data, data
warehousing to store and stage it, and BI to present it to decision-makers in
an understandable way.
•It can be a long and complicated process, but there is a path – there are guidelines and best
practices.
OLTP / OLAP
• Goal is to:
Allow for measuring specific operations
Develop an integrated perspective
Enhance competitive advantage
Data Warehouses - 2021/22
Business Intelligence
Every day your business creates an overwhelming amount and variety of data. In order to make smarter decisions,
identify problems and be profitable, you need methods and tools to turn your data into actionable insights.
Business Intelligence
What is business intelligence?
•BI is the process of collecting, storing and
analyzing data from business operations.
•BI provides comprehensive business metrics,
in near-real time, to support better decision
making.
Data Warehouses - 2021/22
•“… an umbrella term that includes the applications, infrastructure and tools,
and best practices that enable access to and analysis of information to
improve and optimize decisions and performance.”
CIO.com defines business intelligence as
•“Companies use BI to improve decision making, cut costs and identify new
business opportunities. BI is more than just corporate reporting and more
than a set of tools to coax data out of enterprise systems. CIOs use BI to
identify inefficient business processes that are ripe for re-engineering.”
Business
Intelligence
• The data-information-
knowledge-wisdom (DIKW)
hierarchy
“Typically information is
defined in terms of data,
knowledge in terms of
information, and wisdom in
• Information
the result of collecting and organizing data in way that
establishes relationships between data items, which thereby
provides context and meaning.
• Knowledge
the concept of understanding information based on recognized
patterns in a way that provides insight to information.
is closely linked to doing and implies know-how and
understanding (Davenport & Prusak 2000)
Data / Information /
Knowledge
• A dummy example
Data:
It’s raining
Information:
The temperature outside dropped 5 degrees, the humidity went up
by 5% in one hour and then it started raining at 3 pm.
• Information:
then you get everything ready by washing, peeling, and cutting
up the vegetables, cutting up the chicken, and opening the cans of
broth, you put it all in the pot and turn on the heat where it cooks
and becomes soup.
• Knowledge:
Now the soup is ready to be put into bowls and eaten
In the DW/BI world business people consume the information in
reports to gain knowledge that helps them make informed business
decisions
Data / Information /
Knowledge
• Two different angles:
As per the contextual
concept,
moves from a phase of
gathering data parts (data),
the connection of raw data
parts (information),
formation of whole
meaningful contents
(knowledge) and
conceptualize and joining
source: http://certguidance.com
Data / Information /
Knowledge
• Involves
infrastructure of managing data
• Involves
analytical components, such as data warehousing, online
• Incorporates
Tools to perform analytical processing
Turning Knowledge into
Actionable Plans
• Actionable Knowledge
Discovered knowledge is of little value if there is no value-
producing action that can be taken as a consequence of
gaining that knowledge.
This means that the business-technology partnership must
Order Amount / ID
20
15
10
15
10
0
0 1 2 3 4 5 6 7 8
Transactional Data
Order Amount / ID
20
15
10
5
0
0 2 4 6 8 10 12 14 16
15
10
0
0 1 2 3 4 5 6 7 8
Analytical Data
• Analytical Data
Each data item is identifiable by a unique set of dimension
values
dimension1, dimension2, …, dimensionN → measure
dimension1, dimension2, …, dimensionN → measure1, …,
measureM
N-dimensional structure
P1 100 105 …
P2 0 120 …
C1 C2 …
Analytical Data
• Pivot Table
Pivot tables allow you to quickly summarize and analyze
complex data
Follows the analytical stance on data
dimension-wise rather than transaction-wise
Dynamic creation of aggregations and summaries
T2 * 0 120 … *
P1 0 0 … 0
P2 0 120 … 120
T1 * 100 105 … *
… … … … …
P1 100 105 … 205
C1 C2 … *
Analytical Data
Product Customer Time Sales
P1 C1 T1 100
T2 * 0 120 … *
P1 0 0 … 0
C1 C2 … *
Analytical Data
Product Customer Time Sales
P1 C1 T1 100
T2 * 0 120 … *
P1 0 0 … 0
C1 C2 … *
Analytical Data
Product Customer Time Sales
P1 C1 T1 100
T2 * 0 120 … *
P1 0 0 … 0
C1 C2 … *
Customer FirstName …
C1 C1 …
Analytical Data
C2 C2 …
C2 C2 …
Product Name …
P1 P1 …
Product Customer Time Sales
P1 P1 …
P2 P2 …
P1 C1 T1 100
T2 * 0 120 … *
P1 0 0 … 0
C1 C2 … *
Analytical Data
• Pivot Table
Rows
Columns
Values
• Pivot Table
Rows
Columns
Values
• „Relationships are a
dynamic, flexible way to
combine data from multiple
tables for analysis”
No join type
Just specify matchine
attributes
Automated
• https://www.tableau.com/learn/tutorials/on-
demand/getting-started-10-
0?playlist=200556
• CASE
Evaluates a list of conditions and returns one of multiple
possible result expressions
similar to SWITCH in programming languages
Can be used in
• CASE example:
SELECT ProductNumber,
CASE ProductLine
WHEN 'R' THEN 'Road'
WHEN 'M' THEN 'Mountain'
WHEN 'T' THEN 'Touring'
WHEN 'S' THEN 'Other sale items'
• CASE example:
SELECT ProductNumber, Name AS ProductName,
Road = CASE ProductLine WHEN 'R' THEN 'X‘ ELSE '‘ END,
Mountain = CASE ProductLine WHEN 'M' THEN 'X‘ELSE '‘ END,
Touring = CASE ProductLine WHEN 'T' THEN 'X‘ELSE '‘ END
FROM Production.Product
• CASE example:
SELECT Title,
CASE WHEN Title IN ('Ms.','Mrs.','Miss') THEN 'Female'
WHEN Title = 'Mr.' THEN 'Male'
ELSE 'Unknown' END AS Gender
FROM Person.Person
• PIVOT
Create pivot table in SQL
SELECT
[non-pivoted column(s)], -- optional
[pivoted column(s)]
FROM ( SELECT query producing sql data for pivot ) AS
TableAlias
PIVOT (
• PIVOT construct
Quite complex queries
All cases need to be known in advance
Example:
SELECT 'AverageCost' AS Cost_Sorted_By_Production_Days,
[0], [1], [2], [3], [4]
…
• PIVOT example:
SELECT
CustomerID, TotalDue, DATEPART(quarter, OrderDate)
AS QR
• PIVOT example:
SELECT CustomerID, [1], [2], [3], [4]
FROM
(SELECT CustomerID, TotalDue, DATEPART(quarter,
OrderDate) AS QR
FROM Sales.SalesOrderHeader
) AS SourceTable
• FROM
(SELECT Color, One, Two, Three, More FROM pvt) p
• UNPIVOT
(Value FOR DTM IN (One, Two, Three, More) ) AS unpvt;
• SELECT
Color, DTM, Value
• UNPIVOT
(Value FOR DTM IN (One, Two, Three, More) ) AS
unpvt;
• GO
Grouping Sets
Year
SELECT ..
Grouping Sets
Year
Year
• Available groupings
TIME, PRODUCT, LOCATION, SUPPLIER
• GROUPING SET
Efficiently replaces a series of UNION-ed queries
The issue of NULL values
The new grouping functions generate NULL values at the subtotal
levels
we have then generated NULLs and real NULLs from the data
itself
• ROLL-UP
produces a result set that contains subtotal rows
regularly grouped rows
ROLLUP gives an aggregate value by using the order of
columns in a GROUP BY clause
• ROLL-UP
GROUP BY ROLLUP (a, b, c) is equivalent to
GROUP BY GROUPING SETS
(a, b, c),(a, b), (a), ()
N elements of the ROLLUP translate to (N+1) grouping sets
• ROLL-UP example:
SELECT year, brand, SUM(qty) FROM sales
GROUP BY ROLLUP(year, brand);
• GROUP BY ROLLUP
Only hierarchical grouping is used
SELECT Time, Product, Location, Supplier, SUM(Gain) FROM Sales
GROUP BY ROLLUP (Time, Product, Location, Supplier);
• GROUP BY ROLLUP
SELECT Academic year, Name, AVG(Grade) FROM Students grades G
GROUP BY ROLLUP(Academic year, Name);
• CUBE operator:
contains all the subtotal rows of a roll-up
in addition contains all cross-tabulation rows
• CUBE:
GROUP BY CUBE (a, b, c)
is equivalent to
GROUP BY GROUPING SETS(a, b, c) (a, b) (a, c) (b, c) (a) (b)
(c) ()
SELECT
• CUBE example:
SELECT year, brand, SUM(qty) FROM sales
GROUP BY CUBE (year, brand);
• GROUP BY CUBE
All grouping combinations are used
SELECT Time, Product, Location, Supplier, SUM(Gain) FROM Sales
GROUP BY CUBE (Time, Product, Location, Supplier);
Sum = 205
Window
• Query
select CustomerName
SUM(OrderAmt) OVER () as NoParm
from XXX
• Can be rewritten:
• Example
select
P.name, PS.Name, PC.Name, AVG(P.ListPrice) OVER () as AVG2
from
[Production].[Product] AS P
• OVER clause
defines the partitioning and ordering of rows (i.e. a window)
for the above functions to operate on – hence, called window
functions.
• OVER clause
defines the partitioning and ordering of rows (i.e. a window)
for the above functions to operate on – hence, called window
functions.
PARTITiON BY
• Example
select
P.name, PS.Name, PC.Name, AVG(P.ListPrice) OVER () as AVG2,
AVG(P.ListPrice) OVER (Partition by P.Name) as AVG3
from
[Production].[Product] AS P
• Ranking
Ranking functions return a ranking value for each row in a
partition
Depending on the function that is used
some rows might receive the same value as other rows.
• ROW_NUMBER
return a row number for each row within the partition based
on the partition and order.
• NTILE
groups the data into ordered and ranked groups
based on the ORDER BY clause
based on the number of groups used
specified as function’s argument
if you specify 4 groups to produce a quartile ranking,
4 ranked values (1-4) will be assigned to each group based on the
• NTILE
If a PARTITION BY clause is used, the NTILE ranking will
be applied within each partition
As with the other ranking functions, the ORDER BY clause is
required to use this function
• Example:
• Functions:
Position Value Functions:
LAG, LEAD, FIRST_VALUE and LAST_VALUE
• Percentile Functions:
CUME_DIST, PERCENT_RANK
return a ranking value
as the ranking percentages move from lowest to highest, each
group’s percent value includes all of the earlier values in the
calculation as well
CUME_DIST
(0,1] that represents the percent of rows that have a value less
• Sum up
Case
Pivot/UnPivot
Window
OVER clause
Ranking
ROW_NUMBER, RANK / DENSE_RANK, NTILE
• David Loshin,
Business Intelligence: The Savvy Manager's
Guide, Getting Onboard with Emerging IT,
Morgan Kaufmann Publishers, 2003
• Swain Scheps,
Business Intelligence for Dummies, Wiley
SOURCES Publishing, 2008
• Reza Rad,
Microsoft SQL Server 2014 Business
Intelligence Development Beginner's Guide,
Packt Publishing, 2014
Bibliography
• SQL The Complete Reference, Third
Edition
Paul Weinberg and James Groff and
Andrew Oppel
McGraw-Hill, 2010
SOURCES
• Beginning T-SQL 2012
Scott Shaw and Kathi Kellenberger