0% found this document useful (0 votes)
5 views

Topic 3 2023 Data Visualization Std

The document discusses descriptive statistics, focusing on summarizing data characteristics and distributions through percentiles, quartiles, z-scores, and the empirical rule. It also covers data visualization techniques, including effective design principles for tables and charts, and the use of data dashboards to monitor key performance indicators. The importance of identifying outliers and utilizing various chart types for data representation is emphasized.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Topic 3 2023 Data Visualization Std

The document discusses descriptive statistics, focusing on summarizing data characteristics and distributions through percentiles, quartiles, z-scores, and the empirical rule. It also covers data visualization techniques, including effective design principles for tables and charts, and the use of data dashboards to monitor key performance indicators. The importance of identifying outliers and utilizing various chart types for data representation is emphasized.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Descriptive Statistics (cont.

)
provide summarizing information of the characteristics and
distribution of values

♥ allow analysts to have a quick glance of


the central tendency and the degree of
dispersion of values
♥ helps describe data in a meaningful way
such that, patterns might emerge from the
data
6. Analyzing Distributions
Percentiles:

A percentile is the value of a variable at which a specified (approximate)


percentage of observations are below that value.

The pth percentile tells us the point in the data where:


◦ Approximately p percent of the observations have values less than the pth
percentile.
◦ Approximately (100 − p) percent of the observations have values greater
than the pth percentile.
6. Analyzing Distributions
Percentiles:
Illustration:
To determine the 85th percentile for the home sales data :
6. Analyzing Distributions
Percentiles:
Illustration:
To determine the 85th percentile for the home sales data :
1. Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
6. Analyzing Distributions
Percentiles:
Illustration:
To determine the 85th percentile for the home sales data :
1. Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250

2. Compute
6. Analyzing Distributions
Percentiles:
Illustration:
To determine the 85th percentile for the home sales data :
1. Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250

2. Compute
85 100
L=p n+1
( ) = ⎛ 85 ⎞(12 + 1) =

11.05. 100 ⎟

3. The interpretation

6. Analyzing Distributions
Quartiles:
Quartiles: When the data is divided into four equal parts:
◦ Each part contains approximately 25% of the observations.
◦ Division points are referred to as quartiles.

Q1 = first quartile, or 25th percentile.


Q2 = second quartile, or 50th percentile (also the median).
Q3 = third quartile or 75th percentile.

The difference between the third and first quartiles is often referred to as
the interquartile range, or IQR.
6. Analyzing Distributions
z-Scores:
The z-score measures the relative location of a value in the data set.
Often called the standardized value.
được sử dụng trong thống kê để tính độ lệch
chuẩn của quan sát với giá trị trung bình
Z Score cho biết điểm dữ liệu có phải là
điển hình cho một tập dữ liệu được chỉ
định hay không.
6. Analyzing Distributions
z-Scores:
The z-score measures the relative location of a value in the data set. Helps
to determine how far a particular value is from the mean relative to
the data set’s standard deviation.
Often called the standardized value.
` x1 , x2 , , xn is a sample of n observations:
Calculating z-Scores for the Home Sales Data in Excel
6. Analyzing Distributions
Empirical Rule:
◦ When the distribution of data exhibits a symmetric bell-shaped distribution, the
empirical rule can be used to determine the percentage of data values that are
within a specified number of standard deviations of the mean.
◦ For data having a bell-shaped distribution:
◦ Approximately 68% of the data values will be within 1 standard deviation.
◦ Approximately 95% of the data values will be within 2 standard deviations.
◦ Almost all the data values will be within 3 standard deviations.
6. Analyzing Distributions
Empirical Rule:

A Symmetric Bell-Shaped Distribution


6. Analyzing Distributions
Identifying Outliers:

◦ Outliers: Extreme values in a data set.


◦ They can be identified using standardized values (z-scores).
◦ Any data value with a z-score less than –3 or greater than +3 is an outlier.
◦ Such data values can then be reviewed to determine their accuracy and
whether they belong in the data set.
6. Analyzing Distributions
Box Plots:

A box plot is a graphical summary of the distribution of data.


Developed from the quartiles for a data set.

Box Plot for the Home Sales Data


Box Plot Created in Excel for Home Sales Data
Box Plots for Multiple Variables Created in Excel
Data Visualization
the graphical representation of information and data
( tables, charts, graphs)

Uses of data visualization:


◦ Helpful for identifying data
errors.
◦ Reduces the size of your data set
by highlighting important
relationships and trends in the
data.
Contents

Effective Design Techniques

Tables
Data
Visualization
Charts

Data Dashboards
1. Effective Design Techniques
◦ Data-ink ratio: measures the proportion of what Tufte terms
“data-ink” to the total amount of ink used in a table or chart.

◦ Edward R. Tufte first described the data-ink ratio.


◦ Helpful for creating effective tables and charts for data
visualization:
◦ Data-ink: Ink used in a table or chart that is necessary to
convey the meaning of the data to the audience.
◦ Non-data-ink: Ink used in a table or chart that serves no
useful purpose in conveying the data to the audience.
1. Effective Design Techniques
◦ Data-ink ratio:

Scarf Sales by Day Scarf Sales by Day


Day Sales Day Sales Day Sales Day Sales
1 150 11 170 1 150 11 170
2 170 12 160 2 170 12 160
3 140 13 290 3 140 13 290
4 150 14 200 4 150 14 200
5 180 15 210 5 180 15 210
6 180 16 110 6 180 16 110
7 210 17 90 7 210 17 90
8 230 18 140 8 230 18 140
9 140 19 150 9 140 19 150
10 200 20 230 10 200 20 230
1. Effective Design Techniques
◦ Data-ink ratio: measures the proportion of what Tufte terms
“data-ink” to the total amount of ink used in a table or chart.
1. Effective Design Techniques
◦ Data-ink ratio: measures the proportion of what Tufte terms
“data-ink” to the total amount of ink used in a table or chart.
2. Tables
◦ Tables should be used when:

1. The reader needs to refer to specific numerical values.


2. The reader needs to make precise comparisons between
different values and not just relative comparisons.
3. The values being displayed have different units or very
different magnitudes.
Combined Line Chart and Table for Monthly Costs and Revenues at
Gossamer Industries
2. Tables
◦ Table Design Principles:
◦ Avoid using vertical lines in a table unless they are necessary
for clarity.
◦ Horizontal lines are generally necessary only for separating
column titles from data values or when indicating that a
calculation has taken place.
2. Tables
◦ Table Design Principles:
2. Tables
◦ Crosstabulation: A useful type of table for describing data of
two variables.

◦ PivotTable: A crosstabulation in Microsoft Excel.


Tabl
es Rating and Meal Price for
Crosstabulation of Quality
300 Los Angeles Restaurants
Meal Price
Quality Rating $10–19 $20–29 $30–39 $40–49 Total
Good 42 40 2 0 84
Very Good 34 64 46 6 150
Excellent 2 14 28 22 66
Total 78 118 76 28 300

The greatest number of restaurants in the sample (64) have a very good
rating and a meal price in the $20–29 range.
Only two restaurants have an excellent rating and a meal price in the $10–19
range.
The right and bottom margins of the crosstabulation give the frequencies of
quality rating and meal price separately.
Tables (Practice
PivotTable)
PivotTable Report for the Restaurant Data with Average Wait Times Added
3. Charts
◦ Charts (or graphs): Visual methods of displaying data
Scatter chart: Graphical presentation of the relationship
between two quantitative variables.
3. Charts
◦ Scatter chart:
Sample Data for the San Francisco Electronics Store

No. of
Commercials Sales ($100s)
Week x y
1 2 50
2 5 57
3 1 41
4 3 54
5 4 54
6 1 38
7 5 63
8 3 48
9 4 59
10 2 46
3. Charts
◦ Scatter chart:
• Copy the data in the file Electronics to a new excel worksheet in columns A through
C and rows 1 through 11.
1: Select cells B2:C11
2: Click the Insert tab in the Ribbon
3: Click the Insert Scatter (X,Y) or Bubble Chart button in the Charts group
4: When the list of scatter chart subtypes appears, click the Scatter button
5: Click the Design tab under the Chart Tools Ribbon
6: Click Add Chart Element in the Chart Layouts group
Select Chart Title, and click Above Chart
Click on the text box above the chart, and replace the text with Scatter Chart
for the San Francisco Electronics Store
3. Charts
◦ Scatter chart:
7: Click Add Chart Element in the Chart Layouts group
Select Axis Title, and click Primary Vertical
Click on the text box under the horizontal axis, and replace “Axis Title” with
Number of Commercials
8: Click Add Chart Element in the Chart Layouts group
Select Axis Title, and click Primary Horizontal
Click on the text box next to the vertical axis, and replace “Axis Title” with Sales
($100s)
9: Right-click on the one of the horizontal grid lines in the body of the chart, and click
Delete
10: Right-click on the one of the vertical grid lines in the body of the chart, and click
Delete
3. Charts
◦ Scatter chart:
Scatter Chart for the San Francisco Electronics Store

Trendline
provides an
approximation of
the relationship
between the
variables
3. Charts
◦ Line chart:
Scatter Chart and Line Chart for Monthly Sales Data
3. Charts
◦ Sparkline: Special type of line chart:
- Minimalist type of line chart that can be placed directly into a cell in
Excel.
- Contains no axes; they display only the line for the data.
- Takes up very little space and can be effectively used to provide
information on overall trends for time series data.
3. Charts
3. Charts
Bar Charts: Use horizontal bars to display the magnitude of the
quantitative variable.
Column Charts: Use vertical bars to display the magnitude of the
quantitative variable.
Bar and column charts are very helpful in making comparisons
between categorical variables.
3. Charts
3. Charts

Pie chart: Common form of chart used to compare categorical


data.
Bubble chart: Graphical means of visualizing three variables in a
two-dimensional graph that sometimes is a preferred alternative
to a 3-D graph.
Heat map: A two-dimensional graphical representation of data
that uses different shades of color to indicate magnitude.
3. Charts
3. Charts
3. Charts
3. Charts
PivotCharts in Excel:
PivotChart: To summarize and analyze data with both a
crosstabulation and charting, Excel pairs PivotCharts with
PivotTables.
4. Data Dashboards
◦ Data dashboard: Data-visualization tool that illustrates
multiple metrics and automatically updates these metrics
as new data become available.
◦ Provide Key performance indicators (KPIs) :
4. Data Dashboards
◦ Data dashboard: Data-visualization tool that illustrates
multiple metrics and automatically updates these metrics
as new data become available.
◦ Provide Key performance indicators (KPIs) :
◦ Automobile dashboard: Current speed, Fuel level, and oil
pressure.
◦ Business dashboard: Financial position, inventory on hand,
customer service metrics.
4. Data Dashboards
Principles of Effective Data Dashboards
◦ Should provide timely summary information on KPIs that are
important to the user.
◦ Should present all KPIs as a single screen that a user can quickly scan
to understand the business’s current state of operations.
◦ The KPIs displayed in the data dashboard should convey meaning to its
user and be related to the decisions the user makes.
◦ A data dashboard should call attention to unusual measures that
may require attention.
◦ Color should be used to call attention to specific values to
differentiate categorical variables, but the use of color should be
restrained.
4. Data Dashbo ards

Figure 2.23: Data Dashboard


for the Grogan Oil
Information Technology Call
Center

• The data dashboard


developed to monitor the
performance of the call
center, combines several
displays to track the call
center’s KPIs.
The primary purpose of a
dashboard is not to
inform, and it is not to
educate. The primary
purpose is to drive action!

You might also like