0% found this document useful (0 votes)
5 views

B9_PowerBI_CB

The document introduces Power BI as a business intelligence tool for data analysis and visualization, highlighting its features such as easy data connectivity, real-time updates, and high security. It outlines a course structure focused on foundational skills in Power BI Desktop, including data connection, modeling, and dashboard creation, while emphasizing hands-on learning through a project-based approach. Additionally, it covers the Power BI workflow, installation options, and the importance of data transformation and shaping using Power Query.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

B9_PowerBI_CB

The document introduces Power BI as a business intelligence tool for data analysis and visualization, highlighting its features such as easy data connectivity, real-time updates, and high security. It outlines a course structure focused on foundational skills in Power BI Desktop, including data connection, modeling, and dashboard creation, while emphasizing hands-on learning through a project-based approach. Additionally, it covers the Power BI workflow, installation options, and the importance of data transformation and shaping using Power Query.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 98

KHOA KINH TẾ

BỘ MÔN KẾ TOÁN

TIN HỌC ỨNG DỤNG

GIỚI THIỆU
POWER BI CƠ BẢN

TS. LÊ NGỌC HIẾU


KHOA KINH TẾ
BỘ MÔN KẾ TOÁN

TIN HỌC ỨNG DỤNG

CƠ BẢN VỀ POWER BI

TS. LÊ NGỌC HIẾU


Giới thiệu về Power BI 3

• Power BI là công cụ phân tích dữ liệu kinh doanh chia sẻ thông tin một cách
dễ dàng và chi tiết; Kết nối dữ liệu dễ dàng, hiển thị nhanh trên Dashboard –
bảng điều khiển, các báo biểu động và đa dạng cập nhật theo thời gian thực.
• Power BI là tập hợp nhiều ứng dụng và connectors. Biến các nguồn dữ liệu
không liên quan, thành thông tin chi tiết mạch lạc, trực quan và tương tác.
Nguồn dữ liệu đa dạng, có thể giản tệp Excel hoặc bảng của website, đến
Azure hoặc AWS.
• Đảm báo tính bảo mật cao là yếu tố quan trọng trong thời đại này.

TS. Lê Ngọc Hiếu


OUTLINE

1 Introducing Power BI Desktop Installing Power BI Desktop, exploring


the Power BI workflow, comparing
Power BI vs. Excel, etc.

2 Connecting & Shaping Data Connecting to data, shaping & transforming


tables, using profiling tools, editing, merging
& appending queries, etc.

3 Creating a Data Model Building relational models, creating table


relationships, understanding cardinality and
filter flow, etc.

6
THE PROJECT

THE You’ve just been hired as a Business Intelligence Analyst by


SITUATION AdventureWorks*, a global manufacturing company that produces
cycling equipment and accessories

The management team needs a way to track KPIs (sales, revenue, profit,
TH returns), compare regional performance, analyze product-level trends, and identify
BRIEF
E high-value customers.

All you’ve been given is a folder of raw csv files, which contain
information about transactions, returns, products, customers,
and sales territories.
Use Power BI Desktop to:
TH • Connect and transform the raw data
OBJECTIVE
E •

Build a relational data model
Create calculated columns and measures
with DAX
• Design an interactive dashboard to
visualize the data
*This data is provided by Microsoft for informational purposes only as an aid to illustrate a concept. These samples are provided “as is” without warranty of any kind. The example
companies, organizations, products, domain names, e-mail addresses, people, places, and events depicted herein are fictitious, and no association with any real company,
organization, product, domain name, e-mail address, person, place, or event is intended or should be inferred.
SETTING EXPECTATIONS

1 What you see on your screen may not always match mine
• Power BI Desktop features are updated frequently, with new versions released each
month
• NOTE: Power BI is currently only compatible with PC/Windows (not available for Mac)

2 This course is designed to help you build foundational


skills
• Our goal is to help you build a deep foundational understanding of the Power BI desktop
3 This is a hands-on and project-based learning
workflow; some topics may be simplified, and we won’t cover some advanced tools (M
code, advanced DAX, R/Python visuals, etc.)
experience
• You will get the most value out of this course if you follow along closely with the demos
and assignments; we’ll be working through the entire BI workflow to create a
professional-quality dashboard from scratch

4 We will not cover Power BI Service as part of


this
• course
This course focuses on Power BI Desktop specifically; online sharing and collaboration
features (app.powerbi.com) require a separate account and are covered in-depth in a
separate course
*Copyright Maven
Analytics, LLC
INTRODUCING POWER BI

*Copyright Maven
Analytics, LLC
MEET POWER BI

In this section we’ll introduce Power BI Desktop, review the download


and installation process, adjust default settings, and explore the
Power BI interface and workflow

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Download and install Power BI Desktop,


Introducing Power BI Power BI vs. Excel and adjust the settings for our course
project
• Understand the role that Power BI plays within
Installation Options Adjusting Settings the broader Microsoft ecosystem

• Explore core components of the Power BI


Desktop interface
Interface & Workflow Helpful Resources
• Review the business intelligence workflow
that we’ll follow as we build our course project
MEET POWER BI

Microsoft Power BI is a self-service business


intelligence platform, which includes both
desktop and web-based applications for
connecting, modeling, and visualizing data

Learn more at powerbi.microsoft.com

*Copyright Maven
Analytics, LLC
WHY POWER BI?

Connect, transform and load millions of rows of data


• Access data from virtually anywhere (database tables, flat files, web, cloud services,
folders, etc.), and create fully automated workflows to extract, transform and load
data for analysis

Build relational models to blend data from multiple


sources
• Create table relationships to analyze holistic performance across an entire relational
data model

Define complex calculations using Data Analysis


Expressions (DAX)
• Enhance datasets and enable advanced analytics with powerful and portable DAX
expressions

Bring data to life with interactive reports and


dashboards
• Build professional-quality reports and dashboards with best-in-class visualization tools
EXCEL VS. POWER BI

EXCEL POWER BI

Excel and Power BI


Spreadsheets Report View
are built on top of
the same analytics
engines
Power Query •
PivotTables Custom Visuals Power BI takes the
same data
Data Model transformation and
modeling capabilities
DAX and adds powerful
A1 Notation Interactive Dashboards
visualization and
publishing tools
• Transitioning is easy;
Cell Formulas Power BI Service you can import an
entire data model directly
from Excel!

*Copyright Maven
Analytics, LLC
INSTALLING POWER BI DESKTOP

1) Download from Microsoft 2) Download manually from 3) Install as part of Microsoft


store web 365
apps.microsoft.com powerbi.microsoft.com/downloads microsoft.com/en-us/microsoft-365

• Power BI Desktop is included as part


of select enterprise Office/Microsoft
Get
365 subscriptions
• If your company uses a compatible
version of Microsoft 365, talk to an
admin about getting access to
Power BI

• Windows handles automatic updates • No automatic updates (allows version


control) HEY THIS IS IMPORTANT!
• Updates only elements that have
been changed • Downloads an executable installation You do NOT need to register for a
• Doesn’t require administrator access file Power BI Pro account to access
• Administrator access may be Power BI Desktop
required
POWER BI SETTINGS

Global > Preview Current File > Data Current File > Regional
Features Load Settings

Select all available preview features by Make sure the following options are NOT Select “English (United States)” from
default (these change with each selected: the dropdown menu (this will align
monthly release) • Update or delete relationships when refreshing with the data in course project files)
data
• Autodetect new relationships after data is loaded
• Time Intelligence > Auto date/time
HEY THIS IS IMPORTANT!
Options under CURRENT FILE need to be adjusted every time you open a new Power BI workbook (these settings do not persist
across new .pbix files)
POWER BI WORKFLOW

Raw data is extracted and transformed in


the Power Query editor, then loaded to the
Power BI “front-end”

Power Query Editor Model View Data View Report View

Power BI “Back- Power BI “Front-


End” End”
*Copyright Maven
Analytics, LLC
POWER BI WORKFLOW

Power Query Editor Model View Data View Report View


Data is loaded
& transformed
in the Power
Query Editor
POWER BI WORKFLOW

Power Query Editor Model View Data View Report View


Data is loaded
& transformed
in the Power
Query Editor

Data models
are
configured in
the Model View
POWER BI WORKFLOW

Power Query Editor Model View Data View Report View


Data is loaded
& transformed
in the Power
Query Editor

Data models
are
configured in
the Model View

Table features &


calculations are
added in the Data
View
POWER BI WORKFLOW

Power Query Editor Model View Data View Report View


Data is loaded
& transformed
in the Power
Query Editor

Data models
are
configured in
the Model View

Table features &


calculations are
added in the Data
View

Visuals &
reports are
designed in the
Report View
HELPFUL RESOURCES

The Help tab includes


documentation, training videos,
sample files, templates, and links
to support blogs and communities

Update Screenshot

The Microsoft Power BI The Microsoft Power BI YouTube Power BI User Groups (Power BIUG)
blog Channel publishes demos, are communities of users,
(powerbi.microsoft.com/ feature summaries, and which include both local meet-
blog) publishes monthly advanced tutorials (check ups and helpful online forums
summaries to showcase out “Guy in a Cube” too!) (pbiusergroup.com)
new features
MONTHLY UPDATES

Power BI is updated monthly, so you may notice ongoing changes to settings,


options, tools, etc. Reference the links below to stay up-to-date on product
updates and new feature releases:

Power BI Desktop
https://docs.microsoft.com/en-us/power-bi/fundamentals/desktop-
latest-update

Power BI Service
https://docs.microsoft.com/en-us/power-bi/fundamentals/service-
whats-new

Power Platform
https://learn.microsoft.com/en-us/dynamics365/release-plans/
CONNECTING & SHAPING DATA

*Copyright Maven
Analytics, LLC
CONNECTING & SHAPING DATA

In this section we’ll connect to source files and cover some of the most
common techniques for extracting, cleaning, and shaping data to prepare it for
modeling and analysis

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Explore Power BI’s query editor and understand


Intro to Power Query Data Connectors the role that Power Query plays in the larger BI
workflow

The Query Editor Connection Modes • Introduce different types of connectors and
connectivity modes available for getting
data into Power BI
Data QA & Profiling Table Transformations • Review tools for checking data quality and key
profiling metrics like column distribution, empty
values, errors and outliers
Calendar Tools Combining Queries
• Transform tables using text, numerical and
date/time tools, pivot and group records, and
create new conditional columns

• Practice combining, modifying and refreshing


queries
FRONT-END VS. BACK-END

Power BI Desktop essentially has two distinct environments: a front-end and a back-end
• The front-end includes the Data, Model & Report views, where most of the modeling, analysis and
visualization takes place
• The back-end includes the Power Query Editor, where raw data is extracted, transformed, and loaded to
the front-end (ETL)
BACK-END FRONT-END

• Connect & extract data using pre-built • Build data models by creating table
connectors relationships between primary and
foreign keys
• Profile & QA the data to explore, clean and
prepare it for modeling and analysis • Add calculated measures & columns using
Data Analysis Expressions (DAX)
• Transform & shape tables to add new features,
modify values, group records, or sort and • Design reports to visualize the data and
filter columns create interactive, dynamic
dashboards
• Merge or append queries to join and
combine them prior to loading to the • Publish & share your Power BI
front-end workbooks using Power BI Service
(cloud application)
• Perform advanced transformations using
custom M code (out of scope for this
course) *Copyright Maven
Analytics, LLC
TYPES OF DATA CONNECTORS

Power BI can connect to virtually


any type of source data, including
(but not limited to):
• Flat files & Folders (csv, text, xlsx, etc.)
• Databases (SQL, Access, Oracle, IBM, etc.)
• Power Platform (Datasets, Datamarts,
Dataflows, Dataverse, etc.)
• Azure (Azure SQL, Analysis Services,
Databricks, etc.)
• Online Services (SharePoint, GitHub,
Dynamics 365, Google Analytics,
Salesforce, Power BI Service, etc.)
• Other (Web feeds, R scripts, Spark,
Hadoop, etc.)
POWER QUERY EDITOR

Query Editing Tools (Table transformations, calculated


columns, etc.)

Formula Bar (this is “M” code)

Table Name &


Properties

Queries Pane
(list of all Applied Steps
queries) (like a
macro)

Table Preview

*In older versions of Power BI, the Transform Data option may be
QUERY EDITING TOOLS
The HOME tab includes general settings and common table transformation
tools

The TRANSFORM tab includes tools to modify existing columns (splitting/grouping, transposing,
extracting text, etc.)

The ADD COLUMN tools create new columns (based on conditional rules, text operations,
calculations, dates, etc.)
BASIC TABLE TRANSFORMATIONS
Sort values (A-Z, Low-High, Change data type (date, $, %, text,
etc.) etc.)

Promote
headers

Duplicate, move
or
Choose or remove columns rename
Tip: use the “Remove Other columns
Columns” option if you always
want a specific set Tip: Right-click column
headers to access
Keep or remove rows common tools

Tip: use “Remove Duplicates”


to create new lookup tables
from scratch
Key Objectives
ASSIGNMENT: TABLE TRANSFORMATIONS
NEW MESSAGE
1. Create queries to connect to the two new .csv
files
From: Ethan T. Langer (Analytics Manager)
Subject: Welcome aboard! 2. Name your queries Product Category Lookup and
Product Subcategory Lookup
Hello, and welcome to the team! 3. Confirm that column headers have been
We’re excited that you’ll be helping us develop our promoted and that all data types are correct
new internal reports in Power BI. Looks like you’ve
already gotten started, but we have some new 4. Add a new column to extract all characters
data to add to the model. before the dash ("-") in the Product SKU column,
Could you please create two new queries to and name it ”SKU Type”
connect to the Product Category Lookup and Product
Subcategory Lookup files attached, and help with a 5. Update the SKU Type calculation above to
few modifications to the product table? return all characters before second dash,
instead of the first
Thanks!
-ETL 6. Replace zeros (0) in the Product Style column with
Product Category “NA”
Lookup Product
Subcategory Lookup
7. Close and load to your data model
SOLUTION: TABLE TRANSFORMATIONS
Solution Preview
NEW MESSAGE

From: Ethan T. Langer (Analytics Manager)


Subject: Welcome aboard!

Hello, and welcome to the team!


We’re excited that you’ll be helping us develop our
new internal reports in Power BI. Looks like you’ve
already gotten started, but we have some new
data to add to the model.
Could you please create two new queries to
connect to the Product Category Lookup and Product
Subcategory Lookup files attached, and help with a
few modifications to the product table?

Thanks!
-ETL
Product Category
Lookup Product
Subcategory Lookup
PRO TIP: STORAGE & CONNECTION MODES

Power BI Desktop supports several types of storage and connection modes:


• Import: Tables are stored in-memory within Power BI and queries are fulfilled by cached data (default)
• DirectQuery: Tables are connected directly to the source and queries are executed on-demand at the
data source
• Composite Model (Dual): Tables come from a mix of Import and DirectQuery modes, or integrate
multiple DirectQuery tables
• Live Connection: Connect to pre-published Power BI datasets in Power BI Service or Azure Analysis
Services

Import DirectQuery Composite Model Live Connection


 Dataset is less than 1GB  Dataset is too large to be  Boost performance by  Create one dataset that
(after compression) & fast stored in-memory setting appropriate storage serves as a central source
performance for each table of truth
 Source data changes
 Source data does not frequently and reports  Combine a DirectQuery  Analyst teams can create
change frequently must reflect changes model with additional different reports from the
  imported data same source
No restrictions on Power Company policy states
Query, data modeling, and that data can only be  Create a single model from  Multi-developer teams
DAX functions accessed from the two or more DirectQuery where one user builds the
original source models model and another works
on visualization
Learn more: https://learn.microsoft.com/en-us/power-bi/connect-data/service-dataset-
modes-understand
CONNECTING TO A DATABASE

Power Query can connect to data from various database sources including SQL Server,
MS Access, MySQL, PostgreSQL, Oracle, SAP, and more

Select tables &


Get Data & enter transform
credentials

Write custom or advanced queries with SQL statements


(optional)

*Copyright Maven
Analytics, LLC
EXTRACTING DATA FROM THE WEB
Power Query includes a native Web connector for importing web-hosted files (csv,
xlsx, etc.) or scraping URLs for anything that Power Query can identify as a
structured table

https://en.wikipedia.org/wiki/
List_of_asset_management_firms

*Copyright Maven Analytics, LLC


DATA PROFILING: COLUMN QUALITY
Profiling tools like column quality, column distribution, and column profile allow you to
explore the quality, composition, and distribution of your data before loading it
into the Power BI front-end

Hover over the column quality


box to see the number of records in
each category

Click the options menu to


Column quality shows the remove duplicates, errors
percentage of values within a or empty values
column that are valid, contain
errors, or are empty

PRO TIP: Profiling tools are a great way to quickly find and address common data quality issues in
one place, instead of having to manually apply multiple tools or filters
DATA PROFILING: COLUMN DISTRIBUTION

Column distribution provides a


sample distribution of the data
in a column

Hover over the column quality


box to see the number of distinct &
unique records

Click the options menu to


remove duplicates, errors
or empty values
Suggested action based
on column
distribution results

*Copyright Maven
Analytics, LLC
DATA PROFILING: COLUMN PROFILE

Column profile provides a more holistic view of the


data in a column, including a sample distribution
and profiling statistics

Column statistics provide


more detailed profiling
metrics, including:

Count = 293
(total number of
values in column)

Distinct Count =
119
Unique =
(total number of Hover over the value
82 whether
distinct values, distribution bar for suggested
they
(total number of values that appear
appear once or transformations and additional
exactly once)
Min &times)
multiple Max options
(lowest and
highest
observed
values)
Note:
Typically
only useful
for
TEXT TOOLS

Split a text column based


on a specific delimiter,
number of characters,
or other attributes
Extract characters from text
based on fixed lengths,
first/last characters,
ranges or delimiters

HEY THIS IS IMPORTANT! Format a text column to upper, lower or


You can access many tools from both the proper case, or add a prefix or suffix
Transform and Add Column menus - the difference Tip: Use “Trim” to eliminate leading & trailing
is whether you want to ADD a new column or spaces, or “Clean” to remove non-printable
OVERWRITE an existing one characters
Key Objectives
ASSIGNMENT:
NEW MESSAGE TEXT TOOLS
1. Duplicate the email address column
From: Ethan T. Langer (Analytics Manager)
and name it “Domain Name”
Subject: Customer domains 2. In the new column, remove all
text/characters except for the domain
Hi! name

We’re looking to better understand where our 3. Use transformation steps to clean up and
customers may be coming from, based on their capitalize the domain names (i.e. “Adventure
email domains. Works”)
Could you please create a new column in the 4. Save & Apply changes
customer table that will allow us do this?

Thanks!
-ETL
SOLUTION: TEXT TOOLS
Solution Preview
NEW MESSAGE

From: Ethan T. Langer (Analytics Manager)


Subject: Customer domains

Hi!

We’re looking to better understand where our


customers may be coming from, based on their
email domains.

Could you please create a new column in the


customer table that will allow us do this?

Thanks!
-ETL
NUMERICAL TOOLS

Information tools allow


you to define binary
flags (1/0 or
TRUE/FALSE) to mark
Standard Scientific Trigonometry rows as even, odd,
Statistics functions allow you to positive or negative
evaluate basic stats for a selected Standard, Scientific and Trigonometry tools allow you to
column (sum, min/max, average, apply standard operations (addition,
count, count distinct, etc.) multiplication, division, etc.) or more advanced
calculations (power, logarithm, sine, tangent,
Note: These tools return a SINGLE etc.) to each value in a column
value, and are commonly used to
explore a table rather than prepare it Note: Unlike the Statistics tools, these are applied to each
for loading row in the table
ASSIGNMENT: NUMERICAL TOOLS
Key Objectives
NEW MESSAGE
1. What is our average product cost?
From: Ethan T. Langer (Analytics Manager) 2. How many colors do we sell our
Subject: Need some stats for leadership products in?
3. How many distinct customers do we
Hi again, have?
Leadership is asking us to validate some high-level
4. What is the maximum annual
stats about our products and customers. Can you
please help me answer the following questions? customer income?

We don’t really need to store these values 5. Return the tables to their original state
anywhere, so make sure to restore the tables back
to their original state once you’re done pulling the
stats.

Thank you!
-ETL
SOLUTION: NUMERICAL TOOLS
Solution Preview
NEW MESSAGE
1. What is our average product cost? ($413.66)
From: Ethan T. Langer (Analytics Manager) 2. How many colors do we sell our products in?
Subject: Need some stats for leadership (10)
3. How many distinct customers do we have?
Hi again, (18,148)
Leadership is asking us to validate some high-level
4. What is the maximum annual customer
stats about our products and customers. Can you
please help me answer the following questions? income? ($170k)

We don’t really need to store these values 5. Return the tables to their original state
anywhere, so make sure to restore the tables back
to their original state once you’re done pulling the
stats.

Thank you!
-ETL
DATE & TIME TOOLS

Date & Time tools are relatively straight-forward, and include the following options:
• Age: Difference between the current date and the date in each row

• Date Only: Removes the time component from a date/time field

• Year/Month/Quarter/Week/Day: Extracts individual components from a date


field (time- specific options include Hour, Minute, Second, etc.)

• Earliest/Latest: Evaluates the earliest or latest date from a column as a


single value (can only be accessed from the “Transform” menu)
Note: You will almost always want to perform these operations from the “Add Column” menu
to build out new fields, rather than transforming an individual date/time column

PRO TIP: Load up a table containing a single date column and use Date tools to build out an
entire calendar table

*Copyright Maven
Analytics, LLC
CREATING A CALENDAR TABLE

Use the Date options in


the Add Column menu to
quickly build out an
entire calendar table
from a list of dates

*Copyright Maven
Analytics, LLC
CHANGE TYPE WITH LOCALE

1) Left click the data type 2) Select Date as the data type and 3) Confirm that the data type is correctly
icon in the column English (United States) as the locale for recognized. You should see a calendar icon
header and select the all datasets in this course next to the column name in the header
Using Locale option (regardless of your actual location) and no errors in the column

*Copyright Maven
Analytics, LLC
PRO TIP: ROLLING CALENDARS

1 Create a new blank 2 In the formula bar, type a “literal” to generate a


start date:
query & name it “Rolling
Calendar”

Format as: YYYY, MM,


Power Query: New Source > Blank DD

Click the 𝒇𝒙 icon to add a custom step, and enter the following
Query

3 formula to generate a list of dates between the start date


and the current day:

Front end: Get Data > Blank


Query

Note: If your first applied step is named something other than


“Source”, use that name in your formula (this is common for non-
US users)
*Copyright Maven
Analytics, LLC
PRO TIP: ROLLING CALENDARS

4 Convert the resulting list into


a Table 5 Rename the column to “Date” and add calculated
date columns (year, month, quarter, etc.) using the
and set the data type as a Add Column tools
Date

*Copyright Maven
Analytics, LLC
ASSIGNMENT: CALENDAR TABLES

Key Objectives
NEW MESSAGE Add the following columns to the
calendar table:
From: Ethan T. Langer (Analytics Manager)
Subject: New date fields 1. Month Name (e.g. “January”)
2. Month Number (e.g. “1”)
Hi,
3. Start of Year (e.g. “1/1/2020”)
We need to add a few fields to our calendar
table to help us analyze sales trending over 4. Year (e.g. “2020”)
time.

Could you please add the following columns


when you get a chance?

Thanks!
-ETL
SOLUTION: CALENDAR TABLES

Solution Preview
NEW MESSAGE

From: Ethan T. Langer (Analytics Manager)


Subject: New date fields

Hi,

We need to add a few fields to our calendar


table to help us analyze sales trending over
time.

Could you please add the following columns


when you get a chance?

Thanks!
-ETL

*Copyright Maven
Analytics, LLC
INDEX COLUMNS

Index Columns contain a list of


sequential values that can be used to
identify each unique row in a table
(typically starting from 0 or 1)

These are often used to create unique


IDs that can be used to form
relationships between tables (more
on that later!)
CONDITIONAL COLUMNS

Conditional Columns allow you to define new


fields based on logical rules and conditions
(IF/THEN statements)

Here we’re creating a conditional column


named Quantity Type, which is based on Order
Quantity:
• If Order Quantity =1, Quantity Type = “Single
Item”
• Else If Order Quantity >1, Quantity Type =
“Multiple Items”
• Else; Quantity Type = “Other”
CALCULATED COLUMN BEST PRACTICES

As a best practice, table transformations and column calculations should ideally


happen as close to the original data source as possible, to optimize performance and
speed

Data Source Power Query Power BI Front-End Published Reports

UPSTREAM DOWNSTREAM

HEY THIS IS IMPORTANT!


This is not a strict rule or requirement but can significantly impact performance for very large or complex
data models. Where you define calculations often depends on several factors (accessibility,
complexity, business requirements, etc.), so we will practice creating columns using both Power
Query and the Power BI front-end (DAX) throughout this course
GROUPING & AGGREGATING

Group By allows you to aggregate data at a different


level or “grain”
(i.e. group daily records into monthly, aggregate
transactions by store, etc.)

Here we’re transforming a daily, transaction-


level table into a summary of Total Quantity by
Product Key

NOTE: Any fields not specified in the Group By


settings are lost
GROUPING & AGGREGATING

This time we’re transforming the daily, transaction-level table into a


summary of Total Quantity grouped by both Product Key and Customer Key
(using the “Advanced” option)

NOTE: This is like creating a PivotTable in Excel and pulling in Sum of


Order Quantity with
Product Key and Customer Key as row labels
PIVOTING & UNPIVOTING
Pivoting describes the process of turning distinct row values into columns, and
unpivoting describes the process of turning distinct columns into rows

Imagine the table on a hinge;


pivoting rotates it from vertical to
horizontal, and unpivoting rotates it
from horizontal to vertical

NOTE: Transpose works very similarly,


but doesn’t recognize unique
values; instead, the entire table is
transformed so that each row
becomes a column and vice versa
MERGING QUERIES

Merging queries allows you to join tables


based on a common column (like a
lookup in Excel)

In this case we’re merging the Sales Data table with


the Product Lookup table, which share a common
Product Key column

NOTE: Merging adds columns to an existing


table/query
HEY THIS IS IMPORTANT!
Just because you can merge tables, doesn’t
mean you should!
In many cases, it’s better to keep
tables separate and define
relationships between them in the
data model (more on that soon!)
APPENDING QUERIES

Appending queries allows you to combine or stack


tables sharing the exact same column
structure and data types

Here we’re appending the AdventureWorks Sales 2020


table to the AdventureWorks Sales 2021 table, which is
valid since they share identical table structures

NOTE: Appending adds rows to an existing table/query

PRO TIP: Use the Folder option (Get Data > More > Folder) to append all files within a specified
folder (assuming they share the same structure); as you add new files, simply refresh the
query and they will automatically append!
PRO TIP: APPENDING FILES FROM A FOLDER
DATA SOURCE SETTINGS

Data Source Settings allow you to manage existing data connections, file paths
and permissions

HEY THIS IS IMPORTANT!


Connections to local files reference the
exact file path, so if the file name or
location changes you will need to update
your data source settings
PRO TIP: DATA SOURCE PARAMETERS
Use parameters to dynamically manage and update connection paths in the
Power Query editor

Parameter name
(Name of the
query/table)

Parameter type
(Any value, text,
date, etc.)
Parameter value
(Any value, list,
query) Update Server & Database
connection text values
with parameters

Parameter type
(Default &
current)
REFRESHING QUERIES

By default, all queries will refresh when you use


the
Refresh command from the Home tab

From the Query Editor, uncheck Include in report


refresh
to exclude individual queries from the refresh

PRO TIP: Exclude queries from refresh


that don’t change often (like lookups or
static data tables)
PRO TIP: IMPORTING EXCEL MODELS

Already have a fully-built model in Excel?


You can import models built in Excel directly into
Power BI Desktop using: Import > Power Query, Power
Pivot, Power View

Imported models retain the following:


• Data source connections and queries
• Query editing procedures and applied steps
• Table relationships, hierarchies, field settings, etc.
• All calculated columns and DAX measures

PRO TIP: If you are more comfortable working in Excel, build your models there first then
import to Power BI!

*In older versions of Power BI, this import option may be called “Excel *Copyright Maven
Workbook Contents” Analytics, LLC
POWER QUERY BEST PRACTICES

Get organized before connecting and loading data


• Define clear and intuitive table/query names from the start, and establish an
organized file/folder structure if you are working with local flat files to avoid changes
to file names or paths

Disable report refresh for any static data sources


• There’s no need to constantly refresh data sources that don’t change, like lookups or
static data tables

When working with large tables, only load the data


you need
• Don’t include hourly data when you only need daily, or transaction-level data when
only need a product-level summary (extra data will only slow your report down!)
CREATING A DATA MODEL

*Copyright Maven
Analytics, LLC
CREATING A DATA MODEL

In this section we’ll cover foundational data modeling topics like


normalization, fact and dimension tables, primary and foreign keys,
relationship cardinality and filter flow

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:

• Understand the basic principles of data


Data Modeling 101 Normalization modeling, including normalization, fact &
dimension tables and common schemas
• Create table relationships using primary and
Facts & Dimensions Primary & Foreign Keys foreign keys, and discuss different types of
relationship cardinality
• Configure report filters and trace filter
Cardinality Filter Flow context as it flows between related tables in
the model
• Explore data modeling options like hierarchies,
Common Schemas Hierarchies data categories and hidden fields
WHAT IS A DATA MODEL?

This IS NOT a data model


• This is a collection of independent
tables, which share no connections
or relationships
• If you tried to visualize Orders and
Returns
by Product, this is what you’d get
WHAT IS A DATA MODEL?

This IS a data model!


• The tables are connected via
relationships, based on a
common field (Product Key)
• Now Sales and Returns data can be
filtered using fields from the
Product Lookup table!
DATABASE NORMALIZATION

Normalization is the process of organizing the tables and columns in


a relational database to reduce redundancy and preserve data
integrity. It’s commonly used to:
• Eliminate redundant data to decrease table sizes and improve processing speed & efficiency
• Minimize errors and anomalies from data modifications (inserting, updating or deleting records)
• Simplify queries and structure the database for meaningful analysis

In a normalized database, each table should serve a distinct and


specific purpose (i.e. product information, transaction records,
customer attributes, store details, etc.)
Models that aren’t normalized contain
redundant, duplicate data. In this case, all of
the product-specific fields could be stored
in a separate table containing a unique
record for each product id

This may not seem critical now, but minor


inefficiencies can become major problems
at scale!
FACT & DIMENSION TABLES
Data models generally contain two types of tables: fact (“data”) tables, and
dimension (“lookup”) tables:
• Fact tables contain numerical values or metrics used for summarization (sales, orders, transactions,
pageviews, etc.)
• Dimension tables contain descriptive attributes used for filtering or grouping (products, customers,
dates, stores, etc.)

This Calendar Lookup table contains attributes about each date (month,
year, quarter, etc.)

This Product Lookup table contains attributes about each product_id (brand,
SKU, price, etc.)

This Fact table contains quantity


values, along with date and
product_id fields
Fact Table Calendar Lookup

PRIMARY & FOREIGN KEYS


Product Lookup

These are foreign keys (FK) These are primary keys (PK)
They contain multiple instances of They uniquely identify each row of
each value, and relate to primary keys the table, and relate to foreign keys
in dimension tables in fact tables
RELATIONSHIPS VS. MERGED TABLES
Can’t I just merge queries or use lookup functions to pull everything into
one single table?
- Anonymous confused man

Original Fact Table Attributes from Calendar Lookup Attributes from Product Lookup
fields table table

You can, but it’s extremely inefficient!


• Merging tables creates redundancy and often requires significantly more memory and processing power
to analyze compared to a relational model with multiple small tables
THE MODEL VIEW

Menu Ribbon
(Home,
Help)

Data / Field List

Model canvas

Model layout tabs Properties pane View Options


(Name, synonym, (Zoom, Reset Layout, Fit
format, etc.) to Page)
OPTION 1: Click and drag to connect OPTION 2: Add or detect
primary and foreign keys within the relationships using the Manage
Model view Relationships dialog box
CREATING TABLE RELATIONSHIPS

*Copyright Maven Analytics, LLC


MANAGING & EDITING RELATIONSHIPS

Launch the Manage Relationships dialog Editing tools allow you to activate or deactivate
box or double-click a relationship relationships and manage cardinality and filter
to modify it direction – more on that soon!
*Copyright Maven
Analytics, LLC
STAR & SNOWFLAKE SCHEMAS

A star schema is the simplest and most A snowflake schema is an extension of a


common type of data model, star, and includes relationships
characterized by a single fact table between dimension tables and
surrounded by related dimension related sub-dimension tables
tables
ASSIGNMENT: TABLE RELATIONSHIPS
Key Objectives
NEW MESSAGE 1. Delete all existing table
relationships
From: Dana Modelle (Analyst)
Subject: Need a favor… 2. Create a star schema by creating
relationships between the Sales,
Hey there, Calendar, Customer, Product and
Territories tables
Ethan shared the data model you’ve been
working on, and we might have an issue… 3. Connect all three product tables
Last night I left my laptop open, and my cat Dennis (Product, Subcategory, Category) in
somehow got his paws on our model. Now all the a snowflake schema
relationships are gone!
4. Use the matrix visual to confirm
Could you please rebuild the model, including all
three product tables? I owe you one! that you can filter Order Quantity
values using fields from each
-Dana
dimension table
SOLUTION: TABLE RELATIONSHIPS
Solution Preview
NEW MESSAGE

From: Dana Modelle (Analyst)


Subject: Need a favor…

Hey there,

Ethan shared the data model you’ve been


working on, and we might have an issue…

Last night I left my laptop open, and my cat Dennis


somehow got his paws on our model. Now all the
relationships are gone!

Could you please rebuild the model, including all


three product tables? I owe you one!

-Dana
PRO TIP: ACTIVE & INACTIVE RELATIONSHIPS

The Sales Data table contains two date fields (Order Date & Stock
Date), but there can only be one active relationship to the Date key
in the Calendar table

You can set relationships to active or inactive from either the Edit
Relationships
dialog box or the Properties (you must deactivate one before
PRO TIP: ACTIVE & INACTIVE RELATIONSHIPS

The Sales Data table contains two date fields (Order Date & Stock
Date), but there can only be one active relationship to the Date key
in the Calendar table

You can set relationships to active or inactive from either the Edit
Relationships
dialog box or the Properties (you must deactivate one before
RELATIONSHIP CARDINALITY

Cardinality refers to the uniqueness of values in a


column
• Ideally, all relationships in the data model
should follow a one-to-many cardinality: one
instance of each primary key, and many
instances of each foreign key

In this example there is only ONE instance of each Product Key in


the Product table (noted by a “1”), since each row contains
attributes of a single product (name, SKU, description, price, etc.)

There are MANY instances of each Product Key in the Sales table
(noted by an asterisk *), since there are multiple sales for each
product
EXAMPLE: ONE-TO-ONE CARDINALITY
Product Lookup Price Lookup

• Connecting the two tables above using product_id creates a one-to-one


relationship, since each product ID only appears once in each table
• This isn’t necessarily a “bad” relationship, but you can simplify the
model by merging the tables into a single, valid dimension table

NOTE: this still respects the


rules of normalization, since all
rows are unique and capture
product-specific attributes
EXAMPLE: MANY-TO-MANY CARDINALITY
Product Lookup Sales

• If we try to connect the tables above using product_id, we’ll get a many-to-
many relationship
warning since there are multiple instances of product_id in both tables
• Even if we force this relationship, how would we know which product was
actually sold on each date – Cream Soda or Diet Cream Soda?
CONNECTING MULTIPLE FACT TABLES

This model contains two fact tables: Sales


Data and
Returns Data
• Since there is no primary/foreign key
relationship, we can’t connect them
directly to each other
• But we can connect each fact table to
related lookups, which allows us to filter
both sales and returns data using fields from
any shared lookup tables
• We can view orders and returns by product
since both tables relate to Product Lookup,
but we can’t view returns by customer
HEY THIS IS IMPORTANT! since no relationship exists
Generally speaking, fact tables should connect through shared dimension tables, not directly to
each other
FILTER CONTEXT & FLOW

Here we have two data tables (Sales Data and


Returns Data), connected to Territory Lookup

The arrows show the filter direction, and point from


the one (1) side of the relationship to the many (*)
side
• When you filter a table, that filter context is
passed to any related “downstream” tables,
following the arrow’s direction
• Filter context CANNOT flow “upstream”

PRO TIP: Arrange lookup tables above fact tables in your model as a visual reminder that filters
always flow downstream
EXAMPLE: FILTER FLOW
In this model, the only way to filter both Sales and
Returns data by Territory is to use the Territory Key from
the lookup table, which is upstream and related to
both fact tables

• Filtering using Territory Key from the Sales table yields


incorrect Returns values, since the filter context can’t flow to
any other table

• Filtering using Territory Key from the Returns table yields


incorrect Sales values, and is limited to territories that exist
in the returns table

Filtering by Returns
Data[Territory Key]
Filtering by Territory Filtering by Sales
Lookup[Territory Key] Data[Territory Key]
BI-DIRECTIONAL FILTERS

Updating the cross-filter direction from Single to Both


allows filter context to flow in either direction
• In this example, filters applied to the Sales table
can pass up to the Territory Lookup table, then
down to Returns

*The “Apply security filter in both directions” option relates to row-level security (RLS) settings, which are *Copyright Maven
out of scope for this course Analytics, LLC
EXAMPLE: BI-DIRECTIONAL FILTERS

With two-way cross-filtering enabled between Sales


and Territory, we now see correct values using Territory
Key from either table
• Filter context can now pass up to the Territory Lookup
table, then downstream to Returns

• However, we still see incorrect values when filtering using


Territory Key from the Returns table, since the filter context is
isolated to that single table

Filtering by Returns
Data[Territory Key]
Filtering by Territory Filtering by Sales
Lookup[Territory Key] Data[Territory Key]
EXAMPLE: BI-DIRECTIONAL FILTERS

In this case, we’ve enabled two-way cross-filtering


between the
Returns and Territory tables
• As expected, we now see incorrect values when filtering
using Territory Key from the Sales table, since the filter
context is isolated to that single table

• While the values appear to be correct when filtering using


Territory Key from the Returns table, we’re missing sales data
from any territories that didn’t appear in the returns table
(specifically Territories 2 & 3)

Territories 2 & 3 don’t


exist in the Returns
table, so they aren’t
included in the filter
context that passes to
Territory Lookup and Sales

Filtering by Returns
Data[Territory Key]
Filtering by Territory Filtering by Sales
Lookup[Territory Key] Data[Territory Key]
AMBIGUITY

Use two-way filters carefully, and only when


necessary
• Using multiple two-way filters can cause ambiguity by
introducing multiple filter paths between tables

In this example, filter context from the Product table can


pass down to Returns and up to Territory Lookup, which would
be filtered based on the Territory Keys passed from the
Returns table
With an active relationship between Product and Sales as well,
filter context could pass through either the Sales or Returns table
to reach the Territory Lookup table, which could yield conflicting
PRO TIP: Design your models with filter context
one-way filters and 1:many cardinality
unless more complex relationships
are absolutely necessary
HIDING FIELDS

Hide in Report View makes fields inaccessible


from the Report tab, but still available in
Data and Model views
• This can be controlled by right-clicking a field in the
Data or Model view, or by selecting “Is hidden” in the
Properties pane
• This is commonly used to prevent users from
filtering using invalid fields, reduce clutter, or to hide
irrelevant metrics from view
PRO TIP: Hide the foreign keys in fact tables
to force users to filter using primary keys in
dimension tables
ASSIGNMENT: FILTER FLOW
Key Objectives
NEW MESSAGE 1. Replicate Larry’s matrix below to
diagnose what he must have done to
From: Dana Modelle (Analyst) the model*
Subject: Larry’s gone rogue!

Hey there, we’ve got another problem.

Larry from Sales just sent me this screenshot. I


think he must have downloaded our Power BI
model and messed with some relationships,
because I KNOW we had sales for product 338.
Can you help diagnose what’s going on, and • Which product is #338?
prevent him from doing this again? • Why didn’t Larry’s matrix show
-Dana any orders?

P.S. Kevin says hi 2. Hide any remaining foreign keys


to prevent other users from
making the same mistake
*Hint: you may need to temporarily change a relationship
to bi-directional
SOLUTION: FILTER FLOW

Solution Preview
NEW MESSAGE 1. Larry must have changed the
relationship between Returns Data and
From: Dana Modelle (Analyst) Product Lookup to bi-directional, and
Subject: Larry’s gone rogue! filtered his matrix using product_id
from the Returns table
Hey there, we’ve got another problem.
• Road bike (Road-650 Black, 44)
Larry from Sales just sent me this screenshot. I
think he must have downloaded our Power BI • Product 338 doesn’t exist in the
model and messed with some relationships,
Returns table, so it was
because I KNOW we had sales for product 338.
excluded when that filter
Can you help diagnose what’s going on, and 2 context passed to the Sales
prevent him from doing this again? . table
-Dana

P.S. Kevin says hi


PRO TIP: MODEL LAYOUTS

Model layouts allow you to create custom views to show specific portions of
large, complex models
• Here we’ve created a Sales View displaying only tables related to sales, and a Returns View
displaying only tables related to returns (Note: this doesn’t actually create duplicate tables)
DATA FORMATS & CATEGORIES

Customize data formats from the Column tools menu in


the
Data view or the Properties pane in the Model view

Assign data categories for geospatial


fields, URLs or barcodes
• This is commonly used to help
Power BI map location-based fields
like addresses, countries, cities,
coordinates, zip codes, etc.
HIERARCHIES

Hierarchies are groups of columns that reflect multiple levels of granularity


• For example, a Geography hierarchy might include Country, State and City fields
• Hierarchies are treated as a single item in tables and reports, allowing users to “drill up” and “drill
down” through each level

In the Data pane, right-click This hierarchy contains Right-click another field (like
a field and select Create “Continent”, and is named “Country”) and select Add to
hierarchy “Territory Hierarchy” Hierarchy (or drag it in!)
ASSIGNMENT: HIERARCHIES

Key Objectives
NEW MESSAGE 1. Create a new hierarchy based on
the Start of Year field, and name it
From: Dana Modelle (Analyst) “Date Hierarchy”
Subject: Adding a date hierarchy
2. Right-click or drag to add fields until
Good morning! your hierarchy contains the following
Hoping you can help with a quick request. (in this order):
Since we’ll be doing a lot of time-series analysis, • Start of Year
Ethan asked us to add a date hierarchy to the
model so that users can quickly view trends at • Start of Month
any level of granularity (year, month, day, etc.)
• Start of Week
Please get that added before our afternoon call.
Thanks! • Date
-Dana
3. Add your new hierarchy to the matrix
visual (on rows) and practice drilling
up and down between each level of
granularity
SOLUTION: HIERARCHIES
Solution Preview
NEW MESSAGE

From: Dana Modelle (Analyst)


Subject: Adding a date hierarchy

Good morning!

Hoping you can help with a quick request.

Since we’ll be doing a lot of time-series analysis,


Ethan asked us to add a date hierarchy to the
model so that users can quickly view trends at
any level of granularity (year, month, day, etc.)

Please get that added before our afternoon call.


Thanks!

-Dana
DATA MODEL BEST PRACTICES

Focus on building a normalized model from the


start
• Leverage relationships and make sure that each table serves a clear,
distinct purpose

Organize dimension tables above data tables


in your model
• This serves as a visual reminder that filters always flow
“downstream”

Avoid complex relationships unless absolutely


necessary
• Aim to use 1-to-many table relationships and one-way filters
whenever possible
98

TS. LÊ NGỌC HIẾU

You might also like