0% found this document useful (0 votes)
7 views

Topic 3_ 2023_Data Visualization_std_1

The document provides an introduction to business analytics, focusing on descriptive statistics and data visualization techniques. It covers concepts such as percentiles, quartiles, z-scores, and the empirical rule, as well as methods for identifying outliers and creating effective data visualizations. Additionally, it discusses the principles of effective data dashboards and various chart types for presenting data.

Uploaded by

Hai Yen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Topic 3_ 2023_Data Visualization_std_1

The document provides an introduction to business analytics, focusing on descriptive statistics and data visualization techniques. It covers concepts such as percentiles, quartiles, z-scores, and the empirical rule, as well as methods for identifying outliers and creating effective data visualizations. Additionally, it discusses the principles of effective data dashboards and various chart types for presenting data.

Uploaded by

Hai Yen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Introduction to Business Analytics

Descriptive Statistics (cont.) 6. Analyzing Distributions


provide summarizing information of the characteristics and distribution of values Percentiles:
A percentile is the value of a variable at which a specified (approximate)
percentage of observations are below that value.
♥ allow analysts to have a quick glance of
the central tendency and the degree of The pth percentile tells us the point in the data where:
dispersion of values ◦ Approximately p percent of the observations have values less than the pth
♥ helps describe data in a meaningful way percentile.
such that, patterns might emerge from the ◦ Approximately (100 − p ) percent of the observations have values greater
data than the pth percentile.

6. Analyzing Distributions 6. Analyzing Distributions


Percentiles: Percentiles:
Illustration: Illustration:
To determine the 85th percentile for the home sales data : To determine the 85th percentile for the home sales data :
1. Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250

Assoc.Prof. Nguyen Vinh 1


Introduction to Business Analytics

6. Analyzing Distributions 6. Analyzing Distributions


Percentiles: Percentiles:
Illustration: Illustration:
To determine the 85th percentile for the home sales data : To determine the 85th percentile for the home sales data :
1. Arrange the data in ascending order: 1. Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500 108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250 208,000 254,000 254,000 257,500 298,000 456,250

( n + 1) =   (12 + 1) = 11.05.
p 85
2. Compute 2. Compute L85 =
100  100 
3. The interpretation

6. Analyzing Distributions 6. Analyzing Distributions


Quartiles: z-Scores:
Quartiles: When the data is divided into four equal parts: The z-score measures the relative location of a value in the data set.
◦ Each part contains approximately 25% of the observations. Helps to determine how far a particular value is from the mean relative to
◦ Division points are referred to as quartiles. the data set’s standard deviation.
Often called the standardized value.
Q1 = first quartile, or 25th percentile.
Q2 = second quartile, or 50th percentile (also the median).
Q3 = third quartile or 75th percentile.

The difference between the third and first quartiles is often referred to as
the interquartile range, or IQR.

Assoc.Prof. Nguyen Vinh 2


Introduction to Business Analytics

6. Analyzing Distributions Calculating z-Scores for the Home Sales Data in Excel
z-Scores:
The z-score measures the relative location of a value in the data set.
Helps to determine how far a particular value is from the mean relative to
the data set’s standard deviation.
Often called the standardized value.
` x1 , x2 , , xn is a sample of n observations:

6. Analyzing Distributions 6. Analyzing Distributions


Empirical Rule: Empirical Rule:
◦ When the distribution of data exhibits a symmetric bell-shaped distribution, the
empirical rule can be used to determine the percentage of data values that are A Symmetric Bell-Shaped Distribution
within a specified number of standard deviations of the mean.
◦ For data having a bell-shaped distribution:
◦ Approximately 68% of the data values will be within 1 standard deviation.
◦ Approximately 95% of the data values will be within 2 standard deviations.
◦ Almost all the data values will be within 3 standard deviations.

Assoc.Prof. Nguyen Vinh 3


Introduction to Business Analytics

6. Analyzing Distributions 6. Analyzing Distributions


Identifying Outliers: Box Plots:

◦ Outliers: Extreme values in a data set. A box plot is a graphical summary of the distribution of data.
Developed from the quartiles for a data set.
◦ They can be identified using standardized values (z-scores).
◦ Any data value with a z-score less than –3 or greater than +3 is an outlier. Box Plot for the Home Sales Data
◦ Such data values can then be reviewed to determine their accuracy and
whether they belong in the data set.

Box Plot Created in Excel for Home Sales Data Box Plots for Multiple Variables Created in Excel

Assoc.Prof. Nguyen Vinh 4


Introduction to Business Analytics

Data Visualization Contents


the graphical representation of information and data
( tables, charts, graphs) Effective Design Techniques

Uses of data visualization:


◦ Helpful for identifying data Tables
errors. Data Descriptive Data Regression Optimization
Statistics Visualization Model Model
◦ Reduces the size of your data set Visualization
by highlighting important Charts
relationships and trends in the
data.
Data Dashboards

1. Effective Design Techniques 1. Effective Design Techniques


◦ Data-ink ratio: measures the proportion of what Tufte terms ◦ Data-ink ratio:
“data-ink” to the total amount of ink used in a table or chart.
Scarf Sales by Day Scarf Sales by Day
◦ Edward R. Tufte first described the data-ink ratio. Day Sales Day Sales Day Sales Day Sales
1 150 11 170 1 150 11 170
◦ Helpful for creating effective tables and charts for data
2 170 12 160 2 170 12 160
visualization:
3 140 13 290 3 140 13 290
◦ Data-ink: Ink used in a table or chart that is necessary to 4 150 14 200 4 150 14 200
convey the meaning of the data to the audience. 5 180 15 210 5 180 15 210
6 180 16 110 6 180 16 110
◦ Non-data-ink: Ink used in a table or chart that serves no 7 210 17 90 7 210 17 90
useful purpose in conveying the data to the audience. 8 230 18 140 8 230 18 140
9 140 19 150 9 140 19 150
10 200 20 230 10 200 20 230

Assoc.Prof. Nguyen Vinh 5


Introduction to Business Analytics

1. Effective Design Techniques 1. Effective Design Techniques


◦ Data-ink ratio: measures the proportion of what Tufte terms ◦ Data-ink ratio: measures the proportion of what Tufte terms
“data-ink” to the total amount of ink used in a table or chart. “data-ink” to the total amount of ink used in a table or chart.

Combined Line Chart and Table for Monthly Costs and Revenues at
2. Tables Gossamer Industries

◦ Tables should be used when:

1. The reader needs to refer to specific numerical values.


2. The reader needs to make precise comparisons between
different values and not just relative comparisons.
3. The values being displayed have different units or very
different magnitudes.

Assoc.Prof. Nguyen Vinh 6


Introduction to Business Analytics

2. Tables 2. Tables
◦ Table Design Principles: ◦ Table Design Principles:
◦ Avoid using vertical lines in a table unless they are necessary
for clarity.
◦ Horizontal lines are generally necessary only for separating
column titles from data values or when indicating that a
calculation has taken place.

Tables
Crosstabulation of Quality Rating and Meal Price for
2. Tables 300 Los Angeles Restaurants
◦ Crosstabulation: A useful type of table for describing data of Meal Price
two variables. Quality Rating $10–19 $20–29 $30–39 $40–49 Total
Good 42 40 2 0 84
◦ PivotTable: A crosstabulation in Microsoft Excel. Very Good 34 64 46 6 150
Excellent 2 14 28 22 66
Total 78 118 76 28 300

The greatest number of restaurants in the sample (64) have a very good
rating and a meal price in the $20–29 range.
Only two restaurants have an excellent rating and a meal price in the $10–19
range.
The right and bottom margins of the crosstabulation give the frequencies of
quality rating and meal price separately.

Assoc.Prof. Nguyen Vinh 7


Introduction to Business Analytics

Tables (Practice PivotTable)


PivotTable Report for the Restaurant Data with Average Wait Times Added
3. Charts
◦ Charts (or graphs): Visual methods of displaying data
Scatter chart: Graphical presentation of the relationship
between two quantitative variables.

3. Charts 3. Charts
◦ Scatter chart: ◦ Scatter chart:
Sample Data for the San Francisco Electronics Store
No. of • Copy the data in the file Electronics to a new excel worksheet in columns A through
Commercials Sales ($100s) C and rows 1 through 11.
Week x y 1: Select cells B2:C11
1 2 50 2: Click the Insert tab in the Ribbon
2 5 57 3: Click the Insert Scatter (X,Y) or Bubble Chart button in the Charts group
3 1 41
4: When the list of scatter chart subtypes appears, click the Scatter button
4 3 54
5: Click the Design tab under the Chart Tools Ribbon
5 4 54
6 1 38 6: Click Add Chart Element in the Chart Layouts group
7 5 63 Select Chart Title, and click Above Chart
8 3 48 Click on the text box above the chart, and replace the text with Scatter Chart
9 4 59 for the San Francisco Electronics Store
10 2 46

Assoc.Prof. Nguyen Vinh 8


Introduction to Business Analytics

3. Charts 3. Charts
◦ Scatter chart: ◦ Scatter chart:
Scatter Chart for the San Francisco Electronics Store
7: Click Add Chart Element in the Chart Layouts group
Select Axis Title, and click Primary Vertical
Click on the text box under the horizontal axis, and replace “Axis Title” with
Number of Commercials Trendline
8: Click Add Chart Element in the Chart Layouts group provides an
Select Axis Title, and click Primary Horizontal approximation of
the relationship
Click on the text box next to the vertical axis, and replace “Axis Title” with Sales
between the
($100s)
variables
9: Right-click on the one of the horizontal grid lines in the body of the chart, and click
Delete
10: Right-click on the one of the vertical grid lines in the body of the chart, and click
Delete

3. Charts 3. Charts
◦ Line chart:
Scatter Chart and Line Chart for Monthly Sales Data ◦ Sparkline: Special type of line chart:
- Minimalist type of line chart that can be placed directly into a cell in
Excel.
- Contains no axes; they display only the line for the data.
- Takes up very little space and can be effectively used to provide
information on overall trends for time series data.

Assoc.Prof. Nguyen Vinh 9


Introduction to Business Analytics

3. Charts 3. Charts
Bar Charts: Use horizontal bars to display the magnitude of the
quantitative variable.
Column Charts: Use vertical bars to display the magnitude of the
quantitative variable.
Bar and column charts are very helpful in making comparisons
between categorical variables.

3. Charts 3. Charts

Pie chart: Common form of chart used to compare categorical


data.
Bubble chart: Graphical means of visualizing three variables in a
two-dimensional graph that sometimes is a preferred alternative
to a 3-D graph.
Heat map: A two-dimensional graphical representation of data
that uses different shades of color to indicate magnitude.

Assoc.Prof. Nguyen Vinh 10


Introduction to Business Analytics

3. Charts 3. Charts

3. Charts 3. Charts
PivotCharts in Excel:
PivotChart: To summarize and analyze data with both a
crosstabulation and charting, Excel pairs PivotCharts with
PivotTables.

Assoc.Prof. Nguyen Vinh 11


Introduction to Business Analytics

4. Data Dashboards 4. Data Dashboards


◦ Data dashboard: Data-visualization tool that illustrates ◦ Data dashboard: Data-visualization tool that illustrates
multiple metrics and automatically updates these metrics multiple metrics and automatically updates these metrics
as new data become available. as new data become available.
◦ Provide Key performance indicators (KPIs) : ◦ Provide Key performance indicators (KPIs) :
◦ Automobile dashboard: Current speed, Fuel level, and oil
pressure.
◦ Business dashboard: Financial position, inventory on hand,
customer service metrics.

4. Data Dashboards 4. Data Dashboards


Principles of Effective Data Dashboards
Figure 2.23: Data Dashboard
◦ Should provide timely summary information on KPIs that are
for the Grogan Oil
important to the user.
Information Technology Call
◦ Should present all KPIs as a single screen that a user can quickly scan Center
to understand the business’s current state of operations.
◦ The KPIs displayed in the data dashboard should convey meaning to
its user and be related to the decisions the user makes.
◦ A data dashboard should call attention to unusual measures that • The data dashboard
may require attention. developed to monitor the
◦ Color should be used to call attention to specific values to performance of the call
differentiate categorical variables, but the use of color should be center, combines several
restrained. displays to track the call
center’s KPIs.

Assoc.Prof. Nguyen Vinh 12

You might also like