0% found this document useful (0 votes)
13 views

DWI - Lecture - 3 - SQL Pivot Case Groupping

The document discusses business intelligence (BI) and data warehousing. It defines BI as the process of collecting, storing, and analyzing business data to provide metrics and insights that support decision making. Data warehousing is used to consolidate enterprise data to create a common data model and enable consistent reporting. The goal of BI is to allow measuring operations, developing an integrated perspective, and enhancing competitive advantage through informed decisions.

Uploaded by

Khush Saraswat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

DWI - Lecture - 3 - SQL Pivot Case Groupping

The document discusses business intelligence (BI) and data warehousing. It defines BI as the process of collecting, storing, and analyzing business data to provide metrics and insights that support decision making. Data warehousing is used to consolidate enterprise data to create a common data model and enable consistent reporting. The goal of BI is to allow measuring operations, developing an integrated perspective, and enhancing competitive advantage through informed decisions.

Uploaded by

Khush Saraswat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 194

Lecture 3

Data Warehouses

Data Warehouses - 2021/22


Lecture Goals

• Agenda:
 Short reminder
 BI ecosystem
 Data / information / knowledge
 Introduction to Tableau
Querying data using SQL – data analysis perspective

Data Warehouses - 2021/22



 PIVOT
 CASE
 Grouping queries – Grouping Sets / Rollup / Cube
Reminder

Data Warehouses - 2021/22


Operational database is not enough
Special knowledge about a business situation

• conveys business advantage


• allows for proper responses and decisions

21st-century information processing

• we should learn how to think about a company's data as a corporate


information asset - one that can be manipulated in different ways
to corporate benefit

Data Warehouses - 2021/22


Enterprises are adept at gathering all sorts of data
about their customers, prospects, internal business
processes, suppliers, partners, and competitors.
• capturing data, however, is just the beginning

Business Advantage
Driven by data
Enterprises today are driven by data, or, to be more precise, information that
is gleaned from data

•It sheds light on what is unknown, it reduces uncertainty, and it turns decision-making from an
art to a science.

But whether it’s Big Data or just plain old data, it requires a lot of work
before it is actually something useful

Data Warehouses - 2021/22


•You would not want to eat a cup of flour, but baked into a cake with butter, eggs, and sugar for
the right amount of time at the right temperature it is transformed into something delicious.
•Raw data is not suitable for business people that require it to make decisions.
•It is inconsistent, incomplete, outdated, unformatted, and riddled with errors.
•Raw data needs integration, design, modelling, architecting, and other work before it can be
transformed into consumable information.

This is where you need data integration to unify and manage the data, data
warehousing to store and stage it, and BI to present it to decision-makers in
an understandable way.
•It can be a long and complicated process, but there is a path – there are guidelines and best
practices.
OLTP / OLAP

Data Warehouses - 2021/22


Business
Intelligence
• BI environment is used by decision
makers to:
 Understand
 the status of the organization
 Collaborate

Data Warehouses - 2021/22


 on a shared view of data, business and
presentation logic
 Reduce
 the time to decision

• Goal is to:
 Allow for measuring specific operations
 Develop an integrated perspective
 Enhance competitive advantage
Data Warehouses - 2021/22
Business Intelligence
Every day your business creates an overwhelming amount and variety of data. In order to make smarter decisions,
identify problems and be profitable, you need methods and tools to turn your data into actionable insights.
Business Intelligence
What is business intelligence?
•BI is the process of collecting, storing and
analyzing data from business operations.
•BI provides comprehensive business metrics,
in near-real time, to support better decision
making.
Data Warehouses - 2021/22

You can create performance


benchmarks, spot market
trends, increase compliance, and
improve almost every aspect of
your business with better
business intelligence.
Business Intelligence
• BI
 parses all the data generated by a business
 and presents findings in easy-to-digest reports, summaries,
performance measures, dashboards, graphs, charts, maps
and trends
 to provide users with detailed intelligence about the state of
the business and drive management decisions.

• Delivers reports, dashboards and scorecards tailored to


each user’s role and populated with metrics aligned with

Data Warehouses - 2021/22


strategic objectives and goals.

• This top-down style is powered by a classic


data warehousing structure
 consolidates enterprise data and enforces information
consistency by transforming shared data into a common data
model (schema) and BI semantic layer (metadata).
Business Intelligence
• BI is focused on creating operational efficiency through
access to data enabling individuals to most effectively
perform their job functions.

• BI also includes analysis of historical data from multiple


sources enabling informed decision making as well as
problem identification and resolution.

The main use of business intelligence is to help business

Data Warehouses - 2021/22



units, managers, top executives and other operational
workers make better-informed decisions backed up with
accurate data
 e.g. help spot new business opportunities, cut costs, or
identify inefficient processes that need reengineering
Business Intelligence
Traditional Business Intelligence originally emerged
in the 1960s as a system of sharing information across
organizations.

It further developed in the 1980s alongside computer


Data Warehouses - 2021/22

models for decision-making and turning data into


insights before becoming specific offering from BI
teams with IT-reliant service solutions.

Modern BI solutions prioritize flexible self-service


analysis, governed data on trusted platforms,
empowered business users, and speed to insight.
Business Intelligence
• Over the past few years, business intelligence has evolved to
include more processes and activities to help improve performance:
 Data mining: Using databases, statistics and machine learning to
uncover trends in large datasets.
 Reporting: Sharing data analysis to stakeholders so they can draw
conclusions and make decisions.
 Performance metrics and benchmarking: Comparing current
performance data to historical data to track performance against
goals, typically using customized dashboards.
 Descriptive analytics: Using preliminary data analysis to find out
what happened.
 Querying: Asking the data specific questions, BI pulling the answers
from the datasets.

Data Warehouses - 2021/22


 Statistical analysis: Taking the results from descriptive analytics and
further exploring the data using statistics such as how this trend
happened and why.
 Data visualization: Turning data analysis into visual representations
such as charts, graphs, and histograms to more easily consume data.
 Visual analysis: Exploring data through visual storytelling to
communicate insights on the fly and stay in the flow of analysis.
 Data preparation: Compiling multiple data sources, identifying the
dimensions and measurements, preparing it for data analysis.
Business Intelligence

Data Warehouses - 2021/22


source: learn.g2.com
Business Intelligence
• Not a silver bullet
 BI can also be referred to as “descriptive analytics”, as it only
shows past and current state:
 it doesn’t say what to do, but what is or was.
 The responsibility to take action still lies in the hands of the
executives.
 This methodology of “test, look at the data, adjust” is at the
heart and soul of business intelligence.
 It’s all about using data to get a clearer understanding of reality,

Data Warehouses - 2021/22


so that your company can make more strategically sound decisions
 instead of relying only on hunch or gut instinct or some corporate
inertia.

• Before real value can be extracted from the data, each


business needs to establish a data management process
ahead of time.
Business Intelligence

• Business intelligence focuses on descriptive analytics


 BI prioritizes descriptive analytics, which provides a
summary of historical and present data to show what has
happened or what is currently happening.
 BI answers the questions “what” and “how” so you can
replicate what works and change what does not.

• Business analytics focuses on predictive analytics

Data Warehouses - 2021/22


 BA, however, prioritizes predictive analytics, which uses data
mining, modeling and machine learning to determine the
likelihood of future outcomes.
 BA answers the question “why” so it can make more educated
predictions about what will happen.
Business Intelligence
• Real-world applications of BI and BAS
 assume you have an e-commerce business – online store with
clothes
 Business intelligence
 provides helpful reports of the past and current state of your business
 E.g. BI tells you that sales of your new t-shirt design have spiked in
Poland in the past three weeks
 As a result, you can decide to order more of these t-shirts to keep up
with demand
 Business analytics

Data Warehouses - 2021/22


 asks, “Why did sales of new t-shirt spike in Poland?”
 By mining traffic data on your website data, you might learn that a
majority of traffic has come from a post by a popular fashion blogger
who wore your earrings in a popular gala/fashion event
 This insight might suggest to send complimentary merchandise to a
few other prominent fashion bloggers
 You can use the previous sales information to anticipate how many
products you will need to order to keep up with demand if the
bloggers were to post about your products.
Business Intelligence

• Business intelligence is continually evolving according


to business needs and technology
 As companies strive to be more data-driven, efforts to share
data, and collaborate will increase.
 Data visualization will be even more essential to work
together across teams and departments.

Data Warehouses - 2021/22


 Artificial intelligence and machine learning usage in BI will
continue to grow, and businesses can integrate the insights
from AI into a broader BI strategy.
Business Intelligence

Data Warehouses - 2021/22


Business Intelligence

Data Warehouses - 2021/22


source: bi-survey.com
Business Intelligence

Terradata views business intelligence as


•“a way for businesses to avoid the less-than-helpful results of inaccurate,
insufficient data analysis using a procedural and technical infrastructure that
collects, stores, and analyzes data produced by a company’s activities”
Gartner defines business intelligence as
Data Warehouses - 2021/22

•“… an umbrella term that includes the applications, infrastructure and tools,
and best practices that enable access to and analysis of information to
improve and optimize decisions and performance.”
CIO.com defines business intelligence as
•“Companies use BI to improve decision making, cut costs and identify new
business opportunities. BI is more than just corporate reporting and more
than a set of tools to coax data out of enterprise systems. CIOs use BI to
identify inefficient business processes that are ripe for re-engineering.”
Business
Intelligence

• The Data Warehousing Institute


defines BI as:
 The processes, technologies, and tools

Data Warehouses - 2021/22


needed to
 turn data into information,
 information into knowledge,
 and knowledge into plans
 that drive profitable business action.
Data
Information
Knowledge

Data Warehouses - 2021/22


In the business world, knowledge is not just power. It is
the lifeblood of a thriving enterprise. Knowledge comes
from information, and that, in turn, comes from data.
Information asset
• The data-information-knowledge-
wisdom (DIKW) hierarchy
 is a model or construct that has
been used widely within
information science and
knowledge management.
 is referred to variously as the
‘Knowledge Hierarchy’, the
‘Information Hierarchy’ and the
‘Knowledge Pyramid’
 is one of the fundamental,
widely recognized and ‘taken-for-
granted’ models in the
information and knowledge

Data Warehouses - 2021/22


literatures.

• Arises in two separate contexts:


 managing information in
business process settings
 as a logico-conceptual constructs
demanding analysis and
explication
 Martin Frické
Information asset

• The data-information-
knowledge-wisdom (DIKW)
hierarchy
 “Typically information is
defined in terms of data,
knowledge in terms of
information, and wisdom in

Data Warehouses - 2021/22


terms of knowledge.”
 Jennifer Rowley (2007)

• We will focus on DIK part


Data / Information /
Knowledge
• Data is raw, random, and unorganized.
 Data is just a set of signals or symbols – that represent
properties of objects, events and their environments.
 It may be server logs, user behavior events, or any other data
set.
 The prime example of data and data acquisition is provided by
automatic instrument systems; an unmanned weather station, for
instance, may record daily maximum and minimum temperatures
 such recordings are data

Data Warehouses - 2021/22


 It’s unorganized and unprocessed (J. Rowley).

• Information is data that has been organized, structured,


and processed.

• Information is what you use to gain knowledge.


Data / Information /
Knowledge

Data Warehouses - 2021/22


source: https://www.ontotext.com/knowledgehub/fundamentals/dikw-
pyramid/
Data / Information /
Knowledge
• Data is raw, random, and unorganized.
• Information is data that has been organized, structured,
and processed.
 You get Information when you start to make data useful
 Data itself is of no value until it is transformed into a relevant
form
 This is relevant, or usable, or significant, or meaningful, or
processed, data
 In short, information is data with meaning.
 For example, were an enquiry to be “what is the average

Data Warehouses - 2021/22


temperature for July?”
 there may be individual temperatures explicitly recorded as data,
but perhaps not the average temperature; however, the average
temperature can be calculated or inferred from the data about
individual temperatures.

• Information is what you use to gain knowledge.


Data / Information /
Knowledge

Data Warehouses - 2021/22


source: https://www.ontotext.com/knowledgehub/fundamentals/dikw-
pyramid/
Data / Information /
Knowledge
• Data is raw, random, and unorganized.

• Information is data that has been organized, structured,


and processed.

• Information is what you use to gain knowledge.


 Users of this hierarchy often construe knowledge as know-
how, or skill, rather than knowledge in the sense of the
know-that of propositional knowledge
 Knowledge is know-how, for example, how a system works. It is

Data Warehouses - 2021/22


what makes possible the transformation of information into
instructions. It makes control of a system possible. To control a
system is to make it work efficiently.
 It implicitly requires learning.
 It means that we can take data, categorize and process it
generating Information, then organize all this Information in a
way that it can be useful.
Data / Information /
Knowledge

Data Warehouses - 2021/22


source: https://www.ontotext.com/knowledgehub/fundamentals/dikw-
pyramid/
Data / Information /
Knowledge
• Data
 a collection of raw value elements or facts used for calculating,
reasoning, or measuring; may be collected, stored, or processed
but not put into a context from which any meaning can be
inferred.
 "unstructured facts and figures that have the least impact on the
typical manager” (Thierauf, 1999)

• Information
 the result of collecting and organizing data in way that
establishes relationships between data items, which thereby
provides context and meaning.

Data Warehouses - 2021/22


 for data to become information, it must be contextualized,
categorized, calculated and condensed (Davenport & Prusak,
2000)

• Knowledge
 the concept of understanding information based on recognized
patterns in a way that provides insight to information.
 is closely linked to doing and implies know-how and
understanding (Davenport & Prusak 2000)
Data / Information /
Knowledge

• A dummy example
 Data:
 It’s raining
 Information:
 The temperature outside dropped 5 degrees, the humidity went up
by 5% in one hour and then it started raining at 3 pm.

Data Warehouses - 2021/22


 Knowledge:
 A quick increase in the humidity, accompanied by a temperature
drop caused by lower pressure areas, will likely make the
atmosphere unable to hold the moisture and rain. As such, I should
take an umbrela.
Data / Information /
Knowledge
• Data
 a collection of ingredients sitting on the counter, they include
carrots, onions, leeks, garlic, and potatoes from the farmer’s
market, and a package of chicken, a box of rice, and some cans of
broth from the grocery store.
 In the data warehousing (DW)/BI world, this is like source data
from different operational systems.

• Information:
 then you get everything ready by washing, peeling, and cutting
up the vegetables, cutting up the chicken, and opening the cans of
broth, you put it all in the pot and turn on the heat where it cooks
and becomes soup.

Data Warehouses - 2021/22


 In the DW/BI world, the data has been moved into the ETL
(extract, transform, and load) system and is transformed into
information.

• Knowledge:
 Now the soup is ready to be put into bowls and eaten
 In the DW/BI world business people consume the information in
reports to gain knowledge that helps them make informed business
decisions
Data / Information /
Knowledge
• Two different angles:
 As per the contextual
concept,
 moves from a phase of
gathering data parts (data),
the connection of raw data
parts (information),
formation of whole
meaningful contents
(knowledge) and
conceptualize and joining

Data Warehouses - 2021/22


those whole meaningful
contents (wisdom).
 From the understanding
perspective
 as a process starting with
researching & absorbing,
doing, interacting, and
reflecting

source: http://certguidance.com
Data / Information /
Knowledge

Data Warehouses - 2021/22


Turning Data into Information

• The process of turning data into information


 the process of determining what data is to be collected and
managed and in what context.

• Involves
 infrastructure of managing data

Data Warehouses - 2021/22


 incorporates the hardware platforms, relational or other type of
database systems, and associated software tools.
 infrastructure of presenting data
 query and reporting tools that provide access to the data.
Turning Information into
Knowledge

• We accumulate piles of information, which are then


analyzed in many ways until some critical bits of
knowledge are created
 What makes that knowledge critical is that it can be used to
form a plan of action for solving some business problem

• Involves
 analytical components, such as data warehousing, online

Data Warehouses - 2021/22


analytical processing (OLAP), data quality, data profiling,
business rule analysis, and data mining.

• Incorporates
 Tools to perform analytical processing
Turning Knowledge into
Actionable Plans

• Being able to take action based on the intelligence that


we have learned is the key point of any BI strategy
 If you are using BI for micromarketing,
 finding the right customer for your product is irrelevant if you do
not have a plan to contact that customer.

Data Warehouses - 2021/22


 If you are using BI for fraud detection,
 finding the pattern of fraud is of little value if your organization
does not do something to prevent that fraudulent behavior.
Turning Knowledge into
Actionable Plans

• Actionable Knowledge
 Discovered knowledge is of little value if there is no value-
producing action that can be taken as a consequence of
gaining that knowledge.
 This means that the business-technology partnership must

Data Warehouses - 2021/22


work together not just to act on discovered intelligence but to
do so in a timely fashion so as to derive the greatest benefit.
From Data to Knowledge

Data Warehouses - 2021/22


Data should be treated as a BI environment should aid the
strategic resource. transformation of data into a
strategic resource
data is the "fuel" driving the automation of a A part of that process is being able to
business operation, understand and subsequently build the
a company uses computers to help run its business case for the value of information.
business.
Queries
OLAP perspective

Data Warehouses - 2021/22


Querying OLAP

• OLAP query languages


 Getting from OLAP to the data
 As in the relational model, through queries
 In OLTP we have SQL as the standard
query language

Data Warehouses - 2021/22


 However, OLAP operations are hard to
express in SQL
 There is no standard query language for
OLAP
 Though MDX is somewhat popular
 Typical reporting/analytical OLAP
applications requires cross-tab, pivots
 Multidimensional data
Transactional data

• Identified by transaction keys


 each data item is identified by a unique key
 unique key → attribute, attribute, attribute
 1-dimensional structure

Data Warehouses - 2021/22


• A point in such a space
 Is defined by a unique key
Transactional Data

Order Amount / ID
20
15
10

Data Warehouses - 2021/22


5
0
0 2 4 6 8 10 12 14 16
Transactional Data
Order Amount / ID
20
15
10
5
0
0 2 4 6 8 10 12 14 16

Data Warehouses - 2021/22


Order Amount / Day
20

15

10

0
0 1 2 3 4 5 6 7 8
Transactional Data
Order Amount / ID
20
15
10
5
0
0 2 4 6 8 10 12 14 16

Data Warehouses - 2021/22


AVG(Order Amount) / Day
20

15

10

0
0 1 2 3 4 5 6 7 8
Analytical Data

• Analytical Data
 Each data item is identifiable by a unique set of dimension
values
 dimension1, dimension2, …, dimensionN → measure
 dimension1, dimension2, …, dimensionN → measure1, …,
measureM
 N-dimensional structure

Data Warehouses - 2021/22


• Point in such a structure
 Is identified by an intersection of dimensions
 Dimensions’ values
Analytical Data

P1 100 105 …
P2 0 120 …

Data Warehouses - 2021/22


… … … …

C1 C2 …
Analytical Data

• Pivot Table
 Pivot tables allow you to quickly summarize and analyze
complex data
 Follows the analytical stance on data
 dimension-wise rather than transaction-wise
 Dynamic creation of aggregations and summaries

Data Warehouses - 2021/22


• What if the source data is in transactional form ?
 Enforce OLAP perspective
Analytical Data

T2 * 0 120 … *
P1 0 0 … 0
P2 0 120 … 120
T1 * 100 105 … *
… … … … …
P1 100 105 … 205

Data Warehouses - 2021/22


P2 C1 C2 … *
0 0 … 0
… … … … …

C1 C2 … *
Analytical Data
Product Customer Time Sales
P1 C1 T1 100

• Analytical data: P1 C2 T1 105


P2 C2 T2 120

T2 * 0 120 … *
P1 0 0 … 0

Data Warehouses - 2021/22


P2 0 120 … 120
T1 * 100 105 … *
… … … … …
P1 100 105 … 205
P2 C1 C2 … *
0 0 … 0
… … … … …

C1 C2 … *
Analytical Data
Product Customer Time Sales
P1 C1 T1 100

• Analytical data: P1 C2 T1 105


P2 C2 T2 120

T2 * 0 120 … *
P1 0 0 … 0

Data Warehouses - 2021/22


P2 0 120 … 120
T1 * 100 105 … *
… … … … …
P1 100 105 … 205
P2 C1 C2 … *
0 0 … 0
… … … … …

C1 C2 … *
Analytical Data
Product Customer Time Sales
P1 C1 T1 100

• Analytical data: P1 C2 T1 105


P2 C2 T2 120

T2 * 0 120 … *
P1 0 0 … 0

Data Warehouses - 2021/22


P2 0 120 … 120
T1 * 100 105 … *
… … … … …
P1 100 105 … 205
P2 C1 C2 … *
0 0 … 0
… … … … …

C1 C2 … *
Customer FirstName …

C1 C1 …

Analytical Data
C2 C2 …

C2 C2 …

Product Name …

P1 P1 …
Product Customer Time Sales
P1 P1 …

P2 P2 …
P1 C1 T1 100

• Analytical data: P1 C2 T1 105


P2 C2 T2 120

T2 * 0 120 … *
P1 0 0 … 0

Data Warehouses - 2021/22


P2 0 120 … 120
T1 * 100 105 … *
… … … … …
P1 100 105 … 205
P2 C1 C2 … *
0 0 … 0
… … … … …

C1 C2 … *
Analytical Data

• Pivot Table
 Rows
 Columns
 Values

Data Warehouses - 2021/22


 Pages / Filters
Analytical Data

• Pivot Table
 Rows
 Columns
 Values

Data Warehouses - 2021/22


 Pages / Filters
Using Tableau
Desktop

Data Warehouses - 2021/22


Very brief introduction to data vis in Tableau Desktop
Business Intelligence

• One of the more common ways to present business


intelligence is through data visualization.
 Humans are visual creatures and very in tune with patterns
or differences in colors.
 Data visualizations show data in a way that is more
accessible and understandable.

• Visualizations are often compiled into dashboards,

Data Warehouses - 2021/22


which can quickly tell a certain story and highlight
trends or patterns that may not be discovered easily
when manually analyzing the raw data.
 This accessibility also enables more conversations around the
data, leading to broader business impact.
Using Tableau

• „Every data source that you


create in Tableau has a
data model. You can think
of a data model as a
diagram that tells Tableau

Data Warehouses - 2021/22


how it should query data in
the connected database
tables.”
Using Tableau
• Tableau data model has
two layers:
• Logical
• The default view
• You combine data in the
logical layer using
relationships
• No join types here
• Physical

Data Warehouses - 2021/22


• Double-click on a logical table
to modify the physical layer
• You combine data between
tables at the physical layer
using joins and unions.
• Each logical table contains at
least one physical table
Using Tableau

• „Relationships are a
dynamic, flexible way to
combine data from multiple
tables for analysis”
 No join type
 Just specify matchine
attributes
 Automated

Data Warehouses - 2021/22


 Joins are deferred to the time
and context of analysis
 Flexible
 Support many-to-many and
full-outer joins
Using Tableau

Data Warehouses - 2021/22


Using Tableau

• When you connect to data


 some prep work is needed

Data Warehouses - 2021/22


Using Tableau

• When you connect to data


 some prep work is needed

Data Warehouses - 2021/22


Using Tableau

Data Warehouses - 2021/22


Using Tableau
• Quick demo
 Connect
 Data model
 Visualisation
 Basics, Filters, Charts, Format
 Aggregation, Types, Bins,
 Worksheets, Dashboards, Stories

• https://www.tableau.com/learn/tutorials/on-
demand/getting-started-10-
0?playlist=200556

Data Warehouses - 2021/22


Using SQL
Querying OLAP

Data Warehouses - 2021/22


Querying OLAP

• Shortcomings of SQL/92 with regard to OLAP queries


 Hard or impossible to express in SQL
 Multiple aggregations
 Comparisons (with aggregation)
 Reporting features
 Performance penalty

Data Warehouses - 2021/22


 Poor execution of queries with many AND and OR conditions
 Return data
 Relation
SQL3

• SQL-99 → SQL 2003 (SQL3)


 Prepare SQL for OLAP queries
 New SQL commands
 GROUPING SETS
 ROLLUP

Data Warehouses - 2021/22


 CUBE
 New aggregate functions
 Queries of type “top k”
Pivoting and Cases

Data Warehouses - 2021/22


Case

• CASE
 Evaluates a list of conditions and returns one of multiple
possible result expressions
 similar to SWITCH in programming languages
 Can be used in

Data Warehouses - 2021/22


 statements such as SELECT, UPDATE, DELETE and SET
 clauses such as IN, WHERE, ORDER BY and HAVING
Case
• The CASE expression has two formats:
 The simple CASE expression compares an expression to a set
of simple expressions to determine the result.
 CASE input_expression
 WHEN when_expression THEN result_expression [ ...n ]
 [ ELSE else_result_expression ]
 END
 The searched CASE expression evaluates a set of Boolean

Data Warehouses - 2021/22


expressions to determine the result.
 CASE
 WHEN Boolean_expression THEN result_expression [ ...n ]
 [ ELSE else_result_expression ]
 END
Case – querying data

• CASE example:
 SELECT ProductNumber,
 CASE ProductLine
 WHEN 'R' THEN 'Road'
 WHEN 'M' THEN 'Mountain'
 WHEN 'T' THEN 'Touring'
 WHEN 'S' THEN 'Other sale items'

Data Warehouses - 2021/22


 ELSE 'Not for sale'
 END AS Category, Name
 FROM Production.Product
 ORDER BY ProductNumber;
Case – querying data

• CASE example:
 SELECT ProductNumber, Name AS ProductName,
 Road = CASE ProductLine WHEN 'R' THEN 'X‘ ELSE '‘ END,
 Mountain = CASE ProductLine WHEN 'M' THEN 'X‘ELSE '‘ END,
 Touring = CASE ProductLine WHEN 'T' THEN 'X‘ELSE '‘ END
 FROM Production.Product

Data Warehouses - 2021/22


Case – querying data
• CASE
 SQL can produce a cross-tab via the CASE construct
 Quite a complex query
 All cases need to be known in advance
 Example:
 SELECT Store.state AS STATE,
 SUM(CASE WHEN Time.quarter=’Q1’ THEN sales END) AS
Q1

Data Warehouses - 2021/22


 ... ... ...
 SUM(CASE WHEN Time.quarter=’Q4’ THEN sales END) AS
Q4
 FROM F, Store, Time
 WHERE F.storekey=Store.storekey AND F.timekey=Time.timekey
 GROUP BY Store.state
Case – querying data
• CASE example:
 Select
 CustomerID,
 SUM(CASE WHEN DATEPART(quarter,OrderDate)=1 THEN
TotalDue END) AS Q1,
 SUM(CASE WHEN DATEPART(quarter,OrderDate)=2 THEN
TotalDue END) AS Q2,
 SUM(CASE WHEN DATEPART(quarter,OrderDate)=3 THEN
TotalDue END) AS Q3,
 SUM(CASE WHEN DATEPART(quarter,OrderDate)=4 THEN
TotalDue END) AS Q4

Data Warehouses - 2021/22


 from Sales.SalesOrderHeader
 group by CustomerID
Case – querying data
• CASE example:
 Select
 CustomerID,
 SUM(CASE WHEN DATEPART(quarter,OrderDate)=1 THEN
TotalDue END) AS Q1,
 AVG(CASE WHEN DATEPART(quarter,OrderDate)=2 THEN
TotalDue END) AS Q2,
 MIN(CASE WHEN DATEPART(quarter,OrderDate)=3 THEN
TotalDue END) AS Q3,
 MAX(CASE WHEN DATEPART(quarter,OrderDate)=4 THEN
TotalDue END) AS Q4

Data Warehouses - 2021/22


 from Sales.SalesOrderHeader
 group by CustomerID
Case – querying data

• CASE example:
 SELECT Title,
 CASE WHEN Title IN ('Ms.','Mrs.','Miss') THEN 'Female'
 WHEN Title = 'Mr.' THEN 'Male'
 ELSE 'Unknown' END AS Gender
 FROM Person.Person

Data Warehouses - 2021/22


Case – querying data
• CASE example:
 SELECT TotalDollar = SUM (SubTotal), TotalOrders =
COUNT(*), PriceRange =
 CASE
 when SubTotal between 0 and 1000 then 'Small Order‘
 when SubTotal between 1000.0001 and 10000 then 'Average
Order'
 else 'Large Order'
 END,
 FROM Sales.SalesOrderHeader

Data Warehouses - 2021/22


 GROUP BY
 CASE
 when SubTotal between 0 and 1000 then 'Small Order'
 when SubTotal between 1000.0001 and 10000 then 'Average
Order'
 else 'Large Order' end
 ORDER BY TotalDollar desc
Pivot

• PIVOT
 Create pivot table in SQL
 SELECT
[non-pivoted column(s)], -- optional
[pivoted column(s)]
 FROM ( SELECT query producing sql data for pivot ) AS
TableAlias
 PIVOT (

Data Warehouses - 2021/22


<aggregation function>(column for aggregation)
FOR [<column name containing values for pivot table columns>]
IN ( [first pivoted column], ..., [last pivoted column] )
) AS PivotTableAlias
 ORDER BY clause -- optional
Pivot
Data in rows
• PIVOT
 Create pivot table in SQL
 SELECT
[non-pivoted column(s)], -- optional Data in cols
[pivoted column(s)]
 FROM
Where to get the data from
 ( SELECT query producing sql data
for pivot ) AS TableAlias
 PIVOT How to aggregate
and what value to

Data Warehouses - 2021/22


 ( <aggregation function> (column for
aggregation) display
 FOR [<column name containing values
for pivot table columns>]
 IN ( [first pivoted column], ..., [last
pivoted column] )
Where and what
 ) AS PivotTableAlias
data to get for
 ORDER BY clause -- optional
columns
Pivot
• PIVOT follows the following process to obtain the output
result set:
 Performs a GROUP BY on its input_table against the grouping
columns and produces one output row for each group.
 The grouping columns in the output row obtain the corresponding
column values for that group in the input_table.
 Generates values for the columns in the column list for each
output row by performing the following:
 Grouping additionally the rows generated in the GROUP BY in the
previous step against the pivot_column.
 For each output column in the column_list, selecting a subgroup
that satisfies the condition:

Data Warehouses - 2021/22


 pivot_column = CONVERT(<data type of pivot_column>,
'output_column')
 aggregate_function is evaluated against the value_column on this
subgroup and its result is returned as the value of the corresponding
output_column.
 If the subgroup is empty, SQL Server generates a null value for that
output_column.
 If the aggregate function is COUNT and the subgroup is empty, zero
(0) is returned.
Pivot
• PIVOT Data in rows
 Create pivot table in SQL
 Example:
 SELECT
 'AverageCost' AS Data in cols
Cost_Sorted_By_Production_Days,
 [0], [1], [2], [3], [4]
Where to get the data from
 FROM

Data Warehouses - 2021/22


 ( SELECT DaysToManufacture,
StandardCost FROM How to aggregate
Production.Product ) AS SourceTable and what value to
 PIVOT ( display
 AVG(StandardCost)
 FOR DaysToManufacture IN ([0], [1],
[2], [3], [4])
 ) AS PivotTable;
Where and what
data to get for
columns
Pivot

• PIVOT construct
 Quite complex queries
 All cases need to be known in advance
 Example:
 SELECT 'AverageCost' AS Cost_Sorted_By_Production_Days,
[0], [1], [2], [3], [4]
 …

Data Warehouses - 2021/22


 PIVOT (
 AVG(StandardCost)
 FOR DaysToManufacture IN ([0], [1], [2], [3], [4])
 ) AS PivotTable;
Pivot – querying

• PIVOT example:
 SELECT
 CustomerID, TotalDue, DATEPART(quarter, OrderDate)
AS QR

Data Warehouses - 2021/22


 FROM Sales.SalesOrderHeader
Pivot – querying

• PIVOT example:
 SELECT CustomerID, [1], [2], [3], [4]
 FROM
 (SELECT CustomerID, TotalDue, DATEPART(quarter,
OrderDate) AS QR
 FROM Sales.SalesOrderHeader
 ) AS SourceTable

Data Warehouses - 2021/22


 PIVOT (
 Sum (TotalDue)
 FOR QR IN ( [1],[2],[3],[4] )
 ) AS PivotTable
 Order by CustomerID
Pivot – querying
• PIVOT example:
 SELECT CustomerID, [1], [2], [3], [4]
 FROM
 (SELECT CustomerID, TotalDue, DATEPART(quarter, OrderDate) AS QR
 FROM Sales.SalesOrderHeader
 ) AS SourceTable
 PIVOT (
 Sum (TotalDue)
 FOR QR IN ( [1],[2],[3],[4] )
 ) AS PivotTable
 Order by CustomerID

Data Warehouses - 2021/22


Pivot – querying
• PIVOT example:
 SELECT * FROM(
 SELECT Color, ListPrice,
 CASE DaysToManufacture
 WHEN 1 THEN 'One'
 WHEN 2 THEN 'Two'
 WHEN 3 THEN 'Three'
 ELSE 'More'
 END as [DTM]
 FROM Production.Product

Data Warehouses - 2021/22


 ) AS ProductInfo
 PIVOT (
 AVG(ListPrice)
 FOR [DTM] IN ( [One],[Two],[Three],[More] )
 ) AS PivotTable
 ORDER BY Color
UnPivot

• Carries out the opposite operation to PIVOT by rotating


columns of a table-valued expression into column values.

• Notice that UNPIVOT isn't the exact reverse of PIVOT.


 PIVOT carries out an aggregation and merges possible
multiple rows into a single row in the output.
 UNPIVOT doesn't reproduce the original table-valued

Data Warehouses - 2021/22


expression result because rows have been merged.
 Null values in the input of UNPIVOT disappear in the
output.
 When the values disappear, it shows that there may have been
original null values in the input before the PIVOT operation.
UnPivot – querying
• SELECT
 Color, DTM, Value

• FROM
 (SELECT Color, One, Two, Three, More FROM pvt) p

• UNPIVOT
 (Value FOR DTM IN (One, Two, Three, More) ) AS unpvt;

Data Warehouses - 2021/22


• GO
UnPivot – querying

• SELECT
 Color, DTM, Value

Data Warehouses - 2021/22


• FROM
 (SELECT Color, One, Two, Three, More FROM pvt) p

• UNPIVOT
 (Value FOR DTM IN (One, Two, Three, More) ) AS
unpvt;

• GO
Grouping Sets

Data Warehouses - 2021/22


Grouping Sets

• Multiple aggregations in SQL/92


 Create a 2D spreadsheet that shows sum of columns and
rows

Year

Data Warehouses - 2021/22


Name

SELECT ..
Grouping Sets

• Multiple aggregations in SQL/92


 Create a 2D spreadsheet that shows sum of columns and
rows
 each subtotal requires a separate aggregate query

Year

Data Warehouses - 2021/22


SELECT ..
Union all Name
SELECT ..
Union all
SELECT ..
Union all
SELECT ..
Grouping Sets

• Exemplar Star Schema

Data Warehouses - 2021/22


Grouping Sets

• SQL query – average grade by each professor:


 SELECT Name, AVG(Grade)
 FROM Students grades G, Student S
 WHERE G.Student = S.ID

Data Warehouses - 2021/22


 GROUP BY Name;
Grouping Sets

• SQL query – average grade by each professor and year:


 SELECT Academic year, Name, AVG(Grade)
 FROM Students grades G, Academic year A, Professor P
 WHERE G.Professor = P.ID and G.Academic year = A.ID
 GROUP BY Academic year, Name;

Data Warehouses - 2021/22



Grouping Sets

• Multiple aggregations in SQL/92


 Create a 2D spreadsheet that shows sum of columns and
rows
 each subtotal requires a separate aggregate query

Year

Data Warehouses - 2021/22


Name

SELECT Name, Year,


avg(Grade)
SUM
FROM T GROUP BY Name,
Year
Grouping Sets
• Multiple aggregations in SQL/92
 Create a 2D spreadsheet that shows sum of columns and rows
 each subtotal requires a separate aggregate query
 eg. sales by maker as well as car model
SELECT Name, Year,
avg(Grade)
FROM T GROUP BY Name, Year
Year

Data Warehouses - 2021/22


Union all
SELECT Name, ‘*’, avg(Grade)
Name
FROM T GROUP BY Name
Union all
SELECT ‘*’, Year, avg(Grade)
FROM T GROUP BY Year
Union all
SELECT ‘*’, ‘*’, avg(Grade)
FROM sales
Grouping Sets

• SQL3 extends SQL by:


 Extensions to the GROUP BY operator

 GROUP BY GROUPING SETS


 GROUPING SETS defines a list of sets of grouping columns and
creates subtotals for each grouping union of GROUP BY

Data Warehouses - 2021/22


 GROUP BY ROLLUP,
 ROLLUP enables a SELECT statement to calculate multiple levels
of subtotals across a specified group of dimensions
 GROUP BY CUBE,
 CUBE takes a specified set of grouping columns and creates
subtotals for all possible combinations of them
Grouping Sets

• Available groupings
 TIME, PRODUCT, LOCATION, SUPPLIER

Data Warehouses - 2021/22


Grouping Sets

• GROUPING SET
 Efficiently replaces a series of UNION-ed queries
 The issue of NULL values
 The new grouping functions generate NULL values at the subtotal
levels
 we have then generated NULLs and real NULLs from the data
itself

Data Warehouses - 2021/22


 How do we tell the difference?
 Through the GROUPING function GROUPING(.):
 returns 0 for NULL in the data
 returns 1 for generated NULL
Grouping Sets

• GROUP BY GROUPING SETS


 Explicitly state all required groupings
 Comma seperated list of groups - (dim1, dim2, …)
 SELECT Time, Product, Location, Supplier, SUM(Gain)
 FROM Sales
 GROUP BY GROUPING SETS ((Time), (Product), (Location), ());

Data Warehouses - 2021/22



Grouping Sets
• GROUP BY GROUPING SETS
 SELECT Time, ''*'', ''*'', ''*'',
SUM(Gain)
 FROM Sales
 GROUP BY Time
 UNION ALL
 SELECT ''*'', Product, ''*'', ''*'',
SUM(Gain)
 FROM Sales

Data Warehouses - 2021/22


 GROUP BY Product
 UNION ALL
 SELECT ''*'', ''*'', Location, ''*'',
SUM(Gain)
 FROM Sales
 GROUP BY Location
 UNION ALL
 SELECT '*', '*', '*', '*', SUM(Gain)
Grouping Sets

• GROUP BY GROUPING SETS


 SELECT Academic year, Name, AVG(Grade)
 FROM Students grades
 GROUP BY GROUPING SETS (Academic year, Name);

Data Warehouses - 2021/22


• For such a query:
 Select
 P.name, PS.Name, PC.Name, AVG(P.ListPrice)
 From
 [Production].[Product] AS P
 inner join [Production].[ProductSubcategory] AS PS on
P.ProductSubcategoryID = PS.ProductSubcategoryID
 inner join [Production].[ProductCategory] AS PC on

Data Warehouses - 2021/22


PS.ProductCategoryID = PC.ProductCategoryID
 Group By
 grouping sets ( (PC.name), (PS.name), (P.name) )

• What do we get in response?


Grouping Sets

• ROLL-UP
 produces a result set that contains subtotal rows
 regularly grouped rows
 ROLLUP gives an aggregate value by using the order of
columns in a GROUP BY clause

Data Warehouses - 2021/22


 will remove one record from grouping at a time
 The last one
Grouping Sets

• ROLL-UP
 GROUP BY ROLLUP (a, b, c) is equivalent to
 GROUP BY GROUPING SETS
 (a, b, c),(a, b), (a), ()
 N elements of the ROLLUP translate to (N+1) grouping sets

Data Warehouses - 2021/22


 Order is significant to ROLLUP!
 GROUP BY ROLLUP (c, b, a)
 is equivalent with grouping sets of (c, b, a), (c, b), (c), ()
Grouping Sets

• ROLL-UP example:
 SELECT year, brand, SUM(qty) FROM sales
 GROUP BY ROLLUP(year, brand);

Data Warehouses - 2021/22


Grouping Sets

• GROUP BY ROLLUP
 Only hierarchical grouping is used
 SELECT Time, Product, Location, Supplier, SUM(Gain) FROM Sales
 GROUP BY ROLLUP (Time, Product, Location, Supplier);

Data Warehouses - 2021/22


Grouping Sets
• GROUP BY ROLLUP
 SELECT Time, Product, Location, Supplier,
SUM(Gain)
 FROM Sales
 GROUP BY Time, Product, Location, Supplier
 UNION ALL
 SELECT Time, Product, Location, ''*'',
SUM(Gain)
 FROM Sales
 GROUP BY Time, Product, Location
 UNION ALL

Data Warehouses - 2021/22


 SELECT Time, Product, ''*'', ''*'', SUM(Gain)
 FROM Sales
 GROUP BY Time, Product
 UNION ALL
 SELECT Time, ''*'', ''*'', ''*'', SUM(Gain)
 FROM Sales
 GROUP BY Time
 UNION ALL
 SELECT '*', '*', '*', '*', SUM(Gain)
 FROM Sales;
Grouping Sets

• GROUP BY ROLLUP
 SELECT Academic year, Name, AVG(Grade) FROM Students grades G
 GROUP BY ROLLUP(Academic year, Name);

Data Warehouses - 2021/22


• For such a query:
 Select
 P.name, PS.Name, PC.Name, AVG(P.ListPrice)
 From
 [Production].[Product] AS P
 inner join [Production].[ProductSubcategory] AS PS on
P.ProductSubcategoryID = PS.ProductSubcategoryID
 inner join [Production].[ProductCategory] AS PC on

Data Warehouses - 2021/22


PS.ProductCategoryID = PC.ProductCategoryID
 Group By
 rollup ( PC.name, PS.name, P.name )

• What do we get in response?


Grouping Sets

• CUBE operator:
 contains all the subtotal rows of a roll-up
 in addition contains all cross-tabulation rows

 Can also be thought as a series of GROUPING SETs

Data Warehouses - 2021/22


 All permutations of the cubed grouping expressions are computed
along with the grand total
 N elements of a CUBE translate to 2n grouping sets
Grouping Sets

• CUBE:
 GROUP BY CUBE (a, b, c)
 is equivalent to
 GROUP BY GROUPING SETS(a, b, c) (a, b) (a, c) (b, c) (a) (b)
(c) ()

 SELECT

Data Warehouses - 2021/22


 year, brand, SUM(qty)
 FROM sales
 GROUP BY
 CUBE (year, brand);
Grouping Sets

• CUBE example:
 SELECT year, brand, SUM(qty) FROM sales
 GROUP BY CUBE (year, brand);

Data Warehouses - 2021/22


Grouping Sets

• GROUP BY CUBE
 All grouping combinations are used
 SELECT Time, Product, Location, Supplier, SUM(Gain) FROM Sales
 GROUP BY CUBE (Time, Product, Location, Supplier);

Data Warehouses - 2021/22


Grouping Sets
• GROUP BY CUBE
 SELECT Academic year, Name, AVG(Grade) FROM Students grades
 GROUP BY CUBE(Academic year, Name);

Data Warehouses - 2021/22


• For such a query:
 Select
 P.name, PS.Name, PC.Name, AVG(P.ListPrice)
 From
 [Production].[Product] AS P
 inner join [Production].[ProductSubcategory] AS PS on
P.ProductSubcategoryID = PS.ProductSubcategoryID
 inner join [Production].[ProductCategory] AS PC on

Data Warehouses - 2021/22


PS.ProductCategoryID = PC.ProductCategoryID
 Group By
 cube (PC.name,PS.Name, P.Name)

• What do we get in response?


Windows

Data Warehouses - 2021/22


Window
• Window functions
 calculation functions similar to
aggregate functions
 normal aggregating via the GROUP BY
clause combines then hides the
individual rows being aggregated
• The main advantage of using window
functions over regular aggregate functions

Data Warehouses - 2021/22


is:
 window functions do not cause rows to
become grouped into a single output row
 rows retain their separate identities and
an aggregated value will be added to
each row
 window functions have access to
individual rows and can add some of the
attributes from those rows into the
result set
Window

Data Warehouses - 2021/22


Window

Data Warehouses - 2021/22


Window

• Window functions operate on a set of rows and return a


single aggregated value for each row.
 The term window describes the set of rows in the database on
which the function will operate.

• A window function performs a calculation across a set of

Data Warehouses - 2021/22


table rows that are somehow related to the current row.
 Behind the scenes, the window function is able to access more
than just the current row of the query result.
Window

• Types of Window functions


• Window functions  Aggregate Window
 allows you to segment a set Functions
of rows and then apply a  SUM(), MAX(), MIN(), AVG().
function to that set of rows COUNT()
 Ranking Window Functions
• You get,
 RANK(), DENSE_RANK(),

Data Warehouses - 2021/22


 ability to choose an ROW_NUMBER(), NTILE()
aggregation function
 Value Window Functions
 use other functions  LAG(), LEAD(),
 rankings, analytics FIRST_VALUE(),
LAST_VALUE()
Window

• How to define a window


 with the OVER Clause
 you define the window over which the function will be
applied
 by default the window is defined for the entire table
 the result is the same as if you had used an aggregation function
without a GROUP BY clause

Data Warehouses - 2021/22


• Syntax
 window_function ( [ ALL ] expression )
 OVER ( [ PARTITION BY partition_list ] [ ORDER BY
order_list] )
• Arguments
 window_function
 Specify the name of the window function
 ALL
 ALL is an optional keyword. When you will include ALL it will count all
values including duplicate ones. DISTINCT is not supported in window
functions
 expression
 The target column or expression that the functions operates on. In other
words, the name of the column for which we need an aggregated value. For
example, a column containing order amount so that we can see total orders
received.
 OVER

Data Warehouses - 2021/22


 Specifies the window clauses for aggregate functions.
 PARTITION BY partition_list
 Defines the window (set of rows on which window function operates) for
window functions. We need to provide a field or list of fields for the partition
after PARTITION BY clause. Multiple fields need be separated by a comma as
usual. If PARTITION BY is not specified, grouping will be done on entire
table and values will be aggregated accordingly.
 ORDER BY order_list
 Sorts the rows within each partition. If ORDER BY is not specified, ORDER
BY uses the entire table.
Window
 select CustomerName
 SUM(OrderAmt) OVER () as NoParm
 from XXX;

Data Warehouses - 2021/22


Sum = 205
Window

• Query
 select CustomerName
 SUM(OrderAmt) OVER () as NoParm
 from XXX

• Can be rewritten:

Data Warehouses - 2021/22


 select CustomerName, (select
sum(OrderAmt) from XXX) as NoParm
 from XXX
Window

• The OVER clause is what specifies a window function


and must always be included in the statement.

The default in an OVER clause is the entire rowset.

Data Warehouses - 2021/22



Window

• Example
 select
 P.name, PS.Name, PC.Name, AVG(P.ListPrice) OVER () as AVG2
 from
 [Production].[Product] AS P

Data Warehouses - 2021/22


 inner join [Production].[ProductSubcategory] AS PS
 on P.ProductSubcategoryID = PS.ProductSubcategoryID
 inner join [Production].[ProductCategory] AS PC
 on PS.ProductCategoryID = PC.ProductCategoryID
Window
• Example
 select
 P.name, PS.Name, PC.Name,
AVG(P.ListPrice) OVER () as
AVG2
 from
 [Production].[Product] AS P
 inner join
[Production].[ProductSubcategory]

Data Warehouses - 2021/22


AS PS
 on P.ProductSubcategoryID =
PS.ProductSubcategoryID
 inner join
[Production].[ProductCategory] AS
PC
 on PS.ProductCategoryID =
PC.ProductCategoryID
Window

• The OVER clause


 takes three additional arguments to change the scope of the
clause
 PARTITION BY, ORDER BY, and ROWS or RANGE.
 SQL
 OVER (

Data Warehouses - 2021/22


[ PARTITION BY value_expression , ... [ n ] ]
[ ORDER BY order_by_expression ]
[ <ROW or RANGE clause> ]
)
Window
• OVER clause
 defines the partitioning and ordering of rows (i.e. a window)
for the above functions to operate on – hence, called window
functions.

• The OVER clause accepts the following arguments to


define a window:
 PARTITION BY:

Data Warehouses - 2021/22


 Divides the query result set into partitions.
 The window function is applied to each partition separately.
 Multiple fields need to be separated by a comma.
 If the PARTITION BY is not specified
 then the grouping will be done on the entire table and values
will be aggregated accordingly.
Window

• OVER clause
 defines the partitioning and ordering of rows (i.e. a window)
for the above functions to operate on – hence, called window
functions.

• The OVER clause accepts the following arguments to


define a window:

Data Warehouses - 2021/22


 ORDER BY:
 Sorts the rows within each partition.
 If ORDER BY is not specified
 ORDER BY uses the entire table.
Window

• OVER clause
 defines the partitioning and ordering of rows (i.e. a window)
for the above functions to operate on – hence, called window
functions.

• The OVER clause accepts the following arguments to


define a window:
 ROWS or RANGE clause:

Data Warehouses - 2021/22


 Further subsets the rows within the partition by specifying start
and endpoints within the partition.
 The default for ROWS or RANGE clause is
 RANGE BETWEEN UNBOUND PRECEDING AND
CURRENT ROW
Window

• OVER with PARTITION BY


 is used to reduce the scope of the window to which the
aggregation is applied
 divides the result set into partitions based on the criteria specified
 the aggregation function is then applied to each partition
independently

Data Warehouses - 2021/22


 similar to the GROUP BY clause
 the PARTITION BY clause allows you to specify different
partitions within the same select statement – multiple groupings
 you can use different partitions for different result set attributes
WITH
OVER

PARTITiON BY

Data Warehouses - 2021/22


Window
 select CustomerName
1. ,SUM(OrderAmt) OVER () as NoParm
2. ,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
 From XXX order by OrderID;
1 - window 2 - window
Current customer: Joe

Data Warehouses - 2021/22


Sum( )
Sum( )
Window
 select CustomerName
1. ,SUM(OrderAmt) OVER () as NoParm
2. ,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
 From XXX order by OrderID;
1 - window 2 - window
Current customer: Sam

Data Warehouses - 2021/22


Sum( )
Sum( )
Window
 select CustomerName
 ,SUM(OrderAmt) OVER () as NoParm
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
 From XXX order by OrderID;

Data Warehouses - 2021/22


Window
 select CustomerName, OrderDate
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName, OrderDate) as
PartByName
 from XXX

Data Warehouses - 2021/22


Window

• Example
 select
 P.name, PS.Name, PC.Name, AVG(P.ListPrice) OVER () as AVG2,
 AVG(P.ListPrice) OVER (Partition by P.Name) as AVG3
 from
 [Production].[Product] AS P

Data Warehouses - 2021/22


 inner join [Production].[ProductSubcategory] AS PS
 on P.ProductSubcategoryID = PS.ProductSubcategoryID
 inner join [Production].[ProductCategory] AS PC
 on PS.ProductCategoryID = PC.ProductCategoryID
Window
• Example
 select
 P.name, PS.Name, PC.Name,
AVG(P.ListPrice) OVER () as
AVG2,
 AVG(P.ListPrice) OVER (Partition
by P.Name) as AVG3
 from
 [Production].[Product] AS P
 inner join
[Production].[ProductSubcategory]

Data Warehouses - 2021/22


AS PS
 on P.ProductSubcategoryID =
PS.ProductSubcategoryID
 inner join
[Production].[ProductCategory] AS
PC
 on PS.ProductCategoryID =
PC.ProductCategoryID
Window
 select CustomerName, OrderDate
 ,100*OrderAmt/SUM(OrderAmt) OVER (PARTITION BY CustomerName) as
PrcCustomer
 ,100*OrderAmt/SUM(OrderAmt) OVER (PARTITION BY OrderDate) as PrcDay
 from XXX

Data Warehouses - 2021/22


OVER
With Order by

Data Warehouses - 2021/22


Window

• OVER with ORDER BY


 is used to order how the partitions apply the window
function.
 If no partition is specified then the partition is still the whole
table
 If no additional constraints are defined the window is just

Data Warehouses - 2021/22


 Begining – the first element
 End – the current element (last instance)
 equal elements are indistinguishable
Window
 select CustomerName, OrderDate
 , SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate)
as VAL
 (A) from XXX order by OrderID;
 (B) from XXX order by CustomerName, OrderDate;

Data Warehouses - 2021/22


A B
Window
 select CustomerName
 SUM(OrderAmt) OVER (ORDER BY CustomerName) as OrderByName
 (A) from XXX order by OrderID;
 (B) from XXX order by OrderByName

Data Warehouses - 2021/22


A B
Window
 select CustomerName
 1. ,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
 2. ,SUM(OrderAmt) OVER (ORDER BY CustomerName) as OrderByName
 from XXX order by OrderID;
1 - window 2 - window
Current customer: Joe Current customer: Joe

Data Warehouses - 2021/22


Sum( ) Sum( )
Window
 select CustomerName
 1. ,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
 2. ,SUM(OrderAmt) OVER (ORDER BY CustomerName) as OrderByName
 from XXX order by OrderID;
1 - window 2 - window
Current customer: Sam Current customer: Sam

Data Warehouses - 2021/22


Sum( ) Sum( )
Window
 select CustomerName
 ,SUM(OrderAmt) OVER () as NoParm
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
 ,SUM(OrderAmt) OVER (ORDER BY CustomerName) as OrderByName
 from XXX order by CustomerName

Data Warehouses - 2021/22


Window
 select CustomerName, OrderDate
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName) as PartByName
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName,OrderDate) as PartByNameDate
 ,SUM(OrderAmt) OVER (ORDER BY CustomerName,OrderDate) as OrderByName
 from XXX

Data Warehouses - 2021/22


WINDOW
• select OrderDate, OrderAmount,
 SUM(OrderAmount) OVER (ORDER BY OrderDate) as VAL

• from XXX order by OrderDate;

Data Warehouses - 2021/22


OVER
with order by
with Rows/range

Data Warehouses - 2021/22


Window

• OVER with ROWS or RANGE


 specify a set of rows based on the position of the current row
 Because each partition is anchored to the current row, the rows or
range specifications are based on their proximity to the current
row.
 both functions require the use of the ORDER BY clause
 ROWS clause

Data Warehouses - 2021/22


 limits the rows within a partition by specifying a fixed number of
rows preceding or following the current row.
 RANGE clause
 logically limits the rows within a partition by specifying a range of
values with respect to the value in the current row.
Window
• Some describe ROWS as a physical operator while
RANGE is a logical operator.
 The ROWS clause limits the rows within a partition by
specifying a fixed number of rows preceding or following the
current row.
 The RANGE clause logically limits the rows within a
partition by specifying a range of values with respect to the
value in the current row.

• Range will include those rows in the window frame

Data Warehouses - 2021/22


which have the same ORDER BY values as the current
row.
 Thus, it’s possible that you can get duplicates with RANGE if
the ORDER BY is not unique.

• Preceding and following rows are defined based on the


ordering in the ORDER BY clause.
Window

Data Warehouses - 2021/22


Window

• A number of key words can be used with this clause to


define the window frame used by the window functions.
 CURRENT ROW:
 identifies the current row
 supported by ROWS and RANGE
 UNBOUNDED PRECEDING:
 the beginning of the partition

Data Warehouses - 2021/22


 supported by both ROWS and RANGE.
 UNBOUNDED FOLLOWING:
 the end of the partition
 supported by both ROWS and RANGE.
Window

• A number of key words can be used with this clause to


define the window frame used by the window functions.
 n PRECEDING:
 This is used to specify the number of rows before the current row
 supported by the ROWS clause only.
 n FOLLOWING:
 This is used to specify the number of rows after the current row

Data Warehouses - 2021/22


 supported by the ROWS clause only.

• Preceding and following rows


 are defined based on the ordering in the ORDER BY clause
Window
• A number of key words can be used with this clause to
define the window frame used by the window functions.
 BETWEEN X AND Y
 Used with either ROWS or RANGE to specify the lower (starting)
and upper (ending) boundary points of the window.
 X - <window frame bound> defines the boundary starting point
 Y - <window frame bound> defines the boundary end point
 The upper bound cannot be smaller than the lower bound.

Data Warehouses - 2021/22


 BETWEEN rows
 defined based on the ordering in the ORDER BY clause
Window
• For example,
 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
 defines a window that has three rows in size, starting with 2 rows
preceding, until and including the current row.
 ROWS BETWEEN 2 FOLLOWING AND 10 FOLLOWING
 defines a window that starts with the second row that follows the
current row and ends with the tenth row that follows the current
row
 RANGE BETWEEN CURRENT ROW AND UNBOUNDED

Data Warehouses - 2021/22


FOLLOWING
 defines a window that starts with the current row and ends with
the last row of the partition

• The window frame “RANGE … CURRENT ROW …”


 include all rows that have the same values in the ORDER BY
expression as the current row.
Window
 select CustomerName,OrderDate,OrderAmt
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate
RANGE BETWEEN 2 PRECEDING AND CURRENT ROW) as RangeRow
 from XXX
window
Current customer: Sam

Data Warehouses - 2021/22


Sum( )
Window
 select CustomerName,OrderDate,OrderAmt
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate
RANGE CURRENT ROW) as RangeRow
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate
ROWS CURRENT ROW) as RowRow
 from XXX

Data Warehouses - 2021/22


Window
 select CustomerName
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate RANGE
BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as RangeByDate
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate ROWS
BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as RangeByDate
 from XXX

Data Warehouses - 2021/22


Window
 select CustomerName
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate ROWS
BETWEEN 1 PRECEDING and CURRENT ROW) as RangeByDate
 ,SUM(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderDate ROWS
BETWEEN CURRENT ROW and 1 FOLLOWING) as RangeByDate
 from XXX

Data Warehouses - 2021/22


Window

• GROUP BY over a query with OVER:


 SELECT Sum(OrderAmt) AS SAMT, OrderDate
 ,SUM(SUM(OrderAmt)) OVER (ORDER BY OrderDate ROWS BETWEEN 1
PRECEDING AND CURRENT ROW) as RangeRow
 From XXX
 group by OrderDate

Data Warehouses - 2021/22


Window
 SELECT SalesOrderID, ProductID, OrderQty
 ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS Total
 ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Avg"
 ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Count"
 ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Min"
 ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Max"
 FROM Sales.SalesOrderDetail
 WHERE SalesOrderID = 43659;

Data Warehouses - 2021/22


Window
 SELECT SalesOrderID, ProductID, OrderQty
 ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS Total
 ,CAST(1.*OrderQty/SUM(OrderQty) OVER(PARTITION BY SalesOrderID) *100 AS
DECIMAL(5,2))AS "Percent by ProductID"
 FROM Sales.SalesOrderDetail
 WHERE SalesOrderID = 43659;

Data Warehouses - 2021/22


Window
 SELECT BusinessEntityID, TerritoryID, DATEPART(yy,ModifiedDate) AS SalesYear,
SalesYTD
 , AVG(SalesYTD) OVER (PARTITION BY TerritoryID
 ORDER BY DATEPART(yy,ModifiedDate)
 ) AS MovingAvg
 , SUM(SalesYTD) OVER (PARTITION BY TerritoryID
 ORDER BY DATEPART(yy,ModifiedDate)
 ) AS CumulativeTotal
 FROM Sales.SalesPerson
 WHERE TerritoryID IS NULL OR TerritoryID < 5

Data Warehouses - 2021/22


 ORDER BY TerritoryID,SalesYear;
Ranking

Data Warehouses - 2021/22


Ranking

• Ranking
 Ranking functions return a ranking value for each row in a
partition
 Depending on the function that is used
 some rows might receive the same value as other rows.

Data Warehouses - 2021/22


 Ranking functions are nondeterministic
 Examples:
 ROW_NUMBER, RANK, DENSE_RANK, NTILE
Ranking

• ROW_NUMBER
 return a row number for each row within the partition based
on the partition and order.

Data Warehouses - 2021/22


 requires the use of the ORDER BY clause.
 If PARTITION BY is used, then the row numbering starts
over within the partition.
Ranking
 select CustomerName,OrderDate,OrderAmt
 ,ROW_NUMBER() OVER(ORDER BY OrderAmt DESC) RowNum
 ,ROW_NUMBER() OVER(PARTITION BY OrderDate ORDER BY
CustomerName) RowNum
 FROM lab.CTEOrders ORDER BY OrderDate

Data Warehouses - 2021/22


Ranking

• RANK and DENSE_RANK


 order the rows based on the specified partition and apply a
rank or number to them.
 assign the same rank to “ties”.
 The difference is how it handles the next rank number in the

Data Warehouses - 2021/22


series.
 RANK does a “true” ordering and will apply the ranking based on
the number of rows and skip numbers that are ties.
 DENSE_RANK keeps the tie as well, however it does not skip any
numbers in the sequence.
Ranking
 select CustomerName,OrderDate,OrderAmt
 ,RANK() OVER(ORDER BY CustomerName) RankByCust
 ,DENSE_RANK() OVER(ORDER BY CustomerName) DenseByCust
 ,RANK() OVER(PARTITION BY CustomerName ORDER BY OrderDate)
RankByCustDt
 ,DENSE_RANK() OVER(PARTITION BY CustomerName ORDER BY OrderDate)
DenseByCustDt
 from lab.CTEOrders order by CustomerName

Data Warehouses - 2021/22


Ranking

• NTILE
 groups the data into ordered and ranked groups
 based on the ORDER BY clause
 based on the number of groups used
 specified as function’s argument
 if you specify 4 groups to produce a quartile ranking,
 4 ranked values (1-4) will be assigned to each group based on the

Data Warehouses - 2021/22


order
 In many cases, the total numbers of rows is not divisible by
the number of groups chosen
 The function will always frontload the results so the earliest
groups will have the “extra” rows.
Ranking

• NTILE
 If a PARTITION BY clause is used, the NTILE ranking will
be applied within each partition
 As with the other ranking functions, the ORDER BY clause is
required to use this function

• Example:

Data Warehouses - 2021/22


 NTILE(4) – quartile
 Each group represents ¼ of the dataset
 NTILE(100) – percentage
 Each group represents 1% of the dataset
Ranking
 select CustomerName,OrderDate,OrderAmt
 ,NTILE(4) OVER(ORDER BY OrderDate) NtileByDt
 ,NTILE(2) OVER(PARTITION BY OrderDate ORDER BY OrderAmt)
NtileByDtAmt
 from XXX order by OrderDate;

Data Warehouses - 2021/22


Ranking

Data Warehouses - 2021/22


Analytical

Data Warehouses - 2021/22


Analytical
• Analytic functions fall in to two primary categories:
 values at a position and percentiles

• Functions:
 Position Value Functions:
 LAG, LEAD, FIRST_VALUE and LAST_VALUE

Data Warehouses - 2021/22


 find a row in the partition and return the desired value from that
row.
 Percentile Functions:
 CUME_DIST and PERCENT_RANK
 break the partition into percentiles and return a rank value for
each row.
Position

• Position Value Functions:


 LAG, LEAD, FIRST_VALUE, LAST_VALUE
 Allow the usage of values from other rows in the current row
 The LAG and LEAD functions
 allow to specify the offset or how many rows to look forward
(LEAD) or backward (LAG)
 support a default value in cases where the value returned

Data Warehouses - 2021/22


would be null.
 do not support the use of ROWS or RANGE in the OVER clause.
 The FIRST_VALUE and LAST_VALUE
 allow you to further define the partition using ROWS or RANGE if
desired.
Data Warehouses - 2021/22
Position
 SELECT OrderID,OrderDate,OrderAmt,CustomerName
 ,LAG(OrderAmt) OVER (PARTITION BY CustomerName ORDER BY OrderID) as
PrevOrdAmt
 ,LEAD(OrderAmt, 2) OVER (PARTITION BY CustomerName ORDER BY OrderID) as
NextTwoOrdAmt
 ,LEAD(OrderDate, 2, '9999-12-31') OVER (PARTITION BY CustomerName ORDER BY
OrderID) as NextTwoOrdDtNoNull
 FROM lab.CTEOrders order by CustomerName

Data Warehouses - 2021/22


Position
 SELECT OrderID,OrderDate,OrderAmt,CustomerName
 ,FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerName ORDER BY
OrderID) as FirstOrdDt
 ,LAST_VALUE(CustomerName) OVER (PARTITION BY OrderDate ORDER BY
OrderID ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
FOLLOWING) as LastCustToOrdByDay
 FROM lab.CTEOrders order by OrderID

Data Warehouses - 2021/22


Percentile

• Percentile Functions:
 CUME_DIST, PERCENT_RANK
 return a ranking value
 as the ranking percentages move from lowest to highest, each
group’s percent value includes all of the earlier values in the
calculation as well
 CUME_DIST
 (0,1] that represents the percent of rows that have a value less

Data Warehouses - 2021/22


than or equal to the current in the group
 first group is the percentage
 PERCENT_RANK
 computes the rank of the value within a group as a percentage
 [0,1] with the first group is always 0
Percentile
• SELECT Department, LastName, Rate,
 CUME_DIST ()
 OVER (PARTITION BY Department ORDER BY Rate) AS CumeDist,
 PERCENT_RANK()
 OVER (PARTITION BY Department ORDER BY Rate ) AS PctRank
 FROM HumanResources.vEmployeeDepartmentHistory AS edh
INNER JOIN HumanResources.EmployeePayHistory AS e ON
e.BusinessEntityID = edh.BusinessEntityID
 WHERE Department IN (N'Information Services',N'Document
Control') ORDER BY Department, Rate DESC;

Data Warehouses - 2021/22


Percentile
 SELECT OrderDate,CustomerName
 ,CUME_DIST() OVER (Partition by CustomerName ORDER BY OrderDate) CumDist
 ,PERCENT_RANK() OVER (Partition by CustomerName ORDER BY OrderDate)
PctRank
 from lab.CTEOrders order by CustomerName, OrderDate

Data Warehouses - 2021/22


Querying OLTP – OLAP
Perspective

• Sum up
 Case
 Pivot/UnPivot
 Window
 OVER clause
 Ranking
 ROW_NUMBER, RANK / DENSE_RANK, NTILE

Data Warehouses - 2021/22


 Positions
 LAG, LEAD, FIRST_VALUE, LAST_VALUE
 Percentiles
 CUME_DIST, PERCENT_RANK
• Rick Sherman,
 Business Intelligence Guidebook: From
Data Integration to Analytics, Elsevier
Science, 2015

Bibliography • Martin Kleppmann


 Designing Data-Intensive Applications,
O'Reilly, 2017

• David Loshin,
 Business Intelligence: The Savvy Manager's
Guide, Getting Onboard with Emerging IT,
Morgan Kaufmann Publishers, 2003

• Swain Scheps,
 Business Intelligence for Dummies, Wiley
SOURCES Publishing, 2008

Data Warehouses - 2021/22


• Efraim Turban, Ramesh Sharda, Dursun
Delen, David King,
 Business Intelligence (2nd Edition),
Prentice Hall, 2010

• Reza Rad,
 Microsoft SQL Server 2014 Business
Intelligence Development Beginner's Guide,
Packt Publishing, 2014
Bibliography
• SQL The Complete Reference, Third
Edition
 Paul Weinberg and James Groff and
Andrew Oppel
 McGraw-Hill, 2010

SOURCES
• Beginning T-SQL 2012
 Scott Shaw and Kathi Kellenberger

Data Warehouses - 2021/22


 Apress, 2012
Thank you

Data Warehouses - 2021/22

You might also like