Topic 3 2023 Data Visualization Std
Topic 3 2023 Data Visualization Std
)
provide summarizing information of the characteristics and
distribution of values
2. Compute
6. Analyzing Distributions
Percentiles:
Illustration:
To determine the 85th percentile for the home sales data :
1. Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
2. Compute
85 100
L=p n+1
( ) = ⎛ 85 ⎞(12 + 1) =
⎜
11.05. 100 ⎟
⎝
3. The interpretation
⎠
6. Analyzing Distributions
Quartiles:
Quartiles: When the data is divided into four equal parts:
◦ Each part contains approximately 25% of the observations.
◦ Division points are referred to as quartiles.
The difference between the third and first quartiles is often referred to as
the interquartile range, or IQR.
6. Analyzing Distributions
z-Scores:
The z-score measures the relative location of a value in the data set.
Often called the standardized value.
được sử dụng trong thống kê để tính độ lệch
chuẩn của quan sát với giá trị trung bình
Z Score cho biết điểm dữ liệu có phải là
điển hình cho một tập dữ liệu được chỉ
định hay không.
6. Analyzing Distributions
z-Scores:
The z-score measures the relative location of a value in the data set. Helps
to determine how far a particular value is from the mean relative to
the data set’s standard deviation.
Often called the standardized value.
` x1 , x2 , , xn is a sample of n observations:
Calculating z-Scores for the Home Sales Data in Excel
6. Analyzing Distributions
Empirical Rule:
◦ When the distribution of data exhibits a symmetric bell-shaped distribution, the
empirical rule can be used to determine the percentage of data values that are
within a specified number of standard deviations of the mean.
◦ For data having a bell-shaped distribution:
◦ Approximately 68% of the data values will be within 1 standard deviation.
◦ Approximately 95% of the data values will be within 2 standard deviations.
◦ Almost all the data values will be within 3 standard deviations.
6. Analyzing Distributions
Empirical Rule:
Tables
Data
Visualization
Charts
Data Dashboards
1. Effective Design Techniques
◦ Data-ink ratio: measures the proportion of what Tufte terms
“data-ink” to the total amount of ink used in a table or chart.
The greatest number of restaurants in the sample (64) have a very good
rating and a meal price in the $20–29 range.
Only two restaurants have an excellent rating and a meal price in the $10–19
range.
The right and bottom margins of the crosstabulation give the frequencies of
quality rating and meal price separately.
Tables (Practice
PivotTable)
PivotTable Report for the Restaurant Data with Average Wait Times Added
3. Charts
◦ Charts (or graphs): Visual methods of displaying data
Scatter chart: Graphical presentation of the relationship
between two quantitative variables.
3. Charts
◦ Scatter chart:
Sample Data for the San Francisco Electronics Store
No. of
Commercials Sales ($100s)
Week x y
1 2 50
2 5 57
3 1 41
4 3 54
5 4 54
6 1 38
7 5 63
8 3 48
9 4 59
10 2 46
3. Charts
◦ Scatter chart:
• Copy the data in the file Electronics to a new excel worksheet in columns A through
C and rows 1 through 11.
1: Select cells B2:C11
2: Click the Insert tab in the Ribbon
3: Click the Insert Scatter (X,Y) or Bubble Chart button in the Charts group
4: When the list of scatter chart subtypes appears, click the Scatter button
5: Click the Design tab under the Chart Tools Ribbon
6: Click Add Chart Element in the Chart Layouts group
Select Chart Title, and click Above Chart
Click on the text box above the chart, and replace the text with Scatter Chart
for the San Francisco Electronics Store
3. Charts
◦ Scatter chart:
7: Click Add Chart Element in the Chart Layouts group
Select Axis Title, and click Primary Vertical
Click on the text box under the horizontal axis, and replace “Axis Title” with
Number of Commercials
8: Click Add Chart Element in the Chart Layouts group
Select Axis Title, and click Primary Horizontal
Click on the text box next to the vertical axis, and replace “Axis Title” with Sales
($100s)
9: Right-click on the one of the horizontal grid lines in the body of the chart, and click
Delete
10: Right-click on the one of the vertical grid lines in the body of the chart, and click
Delete
3. Charts
◦ Scatter chart:
Scatter Chart for the San Francisco Electronics Store
Trendline
provides an
approximation of
the relationship
between the
variables
3. Charts
◦ Line chart:
Scatter Chart and Line Chart for Monthly Sales Data
3. Charts
◦ Sparkline: Special type of line chart:
- Minimalist type of line chart that can be placed directly into a cell in
Excel.
- Contains no axes; they display only the line for the data.
- Takes up very little space and can be effectively used to provide
information on overall trends for time series data.
3. Charts
3. Charts
Bar Charts: Use horizontal bars to display the magnitude of the
quantitative variable.
Column Charts: Use vertical bars to display the magnitude of the
quantitative variable.
Bar and column charts are very helpful in making comparisons
between categorical variables.
3. Charts
3. Charts