Unit 3 Descriptive Statistics
Unit 3 Descriptive Statistics
Prescriptive
Predictive Analysis
Descriptive Analysis Analysis
What’s What
Summary What happened? going to should
happen? happen?
It takes the
conclusion
s gleaned
It looks at
from
historical
descriptive
data and
and
It uses data mining and data analyzes
predictive
Function aggregation to discover past data
analysis
historical data. trends to
and
predict
recommen
what could
ds the best
happen.
future
course of
action.
It offers
critical
It’s a
It’s easy to employ in insights into
valuable
Pros daily operations. Little making the
forecasting
experience is needed. best, most
tool.
informed
decisions.
• Standard deviation
• Interquartile range
Chapter 3 - Key Terms
• Measures of • Coefficient of correlation, r
Association – Direction of the relationship:
direct (r > 0) or inverse (r < 0)
– Strength of the relationship:
When r is close to 1 or –1, the linear
relationship between x and y is strong.
When r is close to 0, the linear
relationship between x and y is weak.
When r = 0, there is no linear
relationship between x and y.
• Coefficient of determination, r2
– The percent of total variation in y that is
explained by variation in x.
The Center: Mean
• Mean
– Arithmetic average = (sum all values)/# of values
» Population: µ = (Σxi)/N
» Sample: = (Σxi)/n
x
Be sure you know how to get the value easily from
your calculator and computer softwares.
Problem: Calculate the average number of truck shipments from the
United States to five Canadian cities for the following data given in
thousands of bags:
Montreal, 64.0; Ottawa, 15.0; Toronto, 285.0;
Vancouver, 228.0; Winnipeg, 45.0 (Ans: 127.4)
The Center: Weighted Mean
• When what you have is grouped data, compute
the mean using µ = (Σwixi)/Σwi
Problem: Calculate the average profit from truck shipments, United
States to Canada, for the following data given in thousands of bags
and profits per thousand bags:
Montreal 64.0 Ottawa 15.0 Toronto 285.0
$15.00 $13.50 $15.50
Vancouver 228.0 Winnipeg 45.0
$12.00 $14.00
68.26%
Non-Normal Distribution
Mode
Negative Skew
Median
Mean
Non-Normal Distribution
Mode
Positive Skew
Median
Mean
The Spread: Range
• The range is the distance between the smallest
and the largest data value in the set.
• Range = largest value – smallest value
• Sometimes range is reported as an interval,
anchored between the smallest and largest data
value, rather than the actual width of that
interval.
The Spread: Variance
• Variance is one of the most frequently used
measures of spread,
– for population,
– for sample,
– for a sample,
Be sure you know how to get the values easily from
your calculator and computer softwares.
Relative Position - Quartiles
• One of the most frequently used quantiles is the quartile.
• Quartiles divide the values of a data set into four subsets
of equal size, each comprising 25% of the observations.
• To find the first, second, and third quartiles:
– 1. Arrange the N data values into an array.
– 2. First quartile, Q1 = data value at position (N + 1)/4
– 3. Second quartile, Q2 = data value at position 2(N + 1)/4
– 4. Third quartile, Q3 = data value at position 3(N + 1)/4