Notes Chapter 1
Notes Chapter 1
Data
1.2 Histograms in Descriptive Statistics, 1.3 Measures of
Location, 1.4 Measures of Variability
Thought Question
1. If you were to read the results of a study showing
that daily use of a certain exercise machine resulted
in an average 10-pound weight loss, what more
would you want to know about the numbers in
addition to the average?
30
25
Frequency
20
15
10
0
60.00
62.00
64.00
66.00
68.00
70.00
72.00
74.00
76.00
78.00
80.00
82.00
We use a theoretical distribution, instead of the sample distribution. This allows us to answer
certain questions using only the mean and standard deviation of the data
Wider bins,
width=20cm
A histogram plots
the bin counts as the
heights of bars
(like a bar chart).
It displays the
distribution at a
glance.
Here is a histogram of
earthquake
magnitudes:
Here is a relative
histogram of
earthquake magnitudes:
frequency
Use of histogram
Symmetry
Slide 4 - 12
An athlete might want to know the typical time for a particular knee
injury to heal.
The data below are the annual salaries of 10 business executives (in thousands of
dollars):
890
1,110
1,460
1,420
2,000
1,430
1,520
1,110
2,400
1,680
The arithmetic mean, usually called the mean or the average, is the sum of all data
values divided by the number of such values.
In this case, the total for all the salaries is $15 million; divided by 10 you get a
mean executive salary of $1.5 million.
Mean - Continued
Cons:
Then find the middle value (or as in this case, the average of the
middle two values) to get a median executive salary of $1,445,000
($1,430,000 + $1,460,000 divided by 2).
Salary Example
Note that in the original data set, the median of
$1,445,000 is only a little less than the arithmetic
mean $1.5 million.
The Median
Pros
Easily defined
Easy to calculate
Stable, not affected by
outliers
24
Cons
Stability has its downside
The median is not based on
all observations
Trimmed Mean
Quartiles
Quartiles
Percentiles (Quantiles)
First Quartile and Third Quartile are two particular examples of Percentiles,
specifically, 25% percentile and 75% percentile.
The lower and upper quartiles are the 25th and 75th
percentiles of the data, so
The IQR contains the middle 50% of the values of the
distribution, as shown in figure:
Example
Find Lower Quartile(Q1) and Upper Quartile(Q3):
Data: 850, 900, 1400, 1200, 1050, 1000, 750, 1250, 1050, 565
Order dataset: 565, 750, 850, 900, 1000, 1050, 1050, 1200, 1250, 1400
Q1: use left part of the data 565, 750, 850, 900, 1000
median of this part = Q1 = 850
Q3: use right part of the data 1050, 1050, 1200, 1250, 1400
median of this part = Q3 = 1200
IQR = 1200 850 = 350
33
Box Plot
Max
200
Q3
138
Median
132
Q1
121
Min
108
Q1
Q3
Q3+3*IQR or
the max of the
data
Q3+1.5*IQR or
the max of the
data
Q1-1.5*IQR
or the min
of the data
Median
100
120
140
160
SBP
180
200
220
1666
1362
1614
1460
1867
1439
7
11,200
7
1600
Deviations
Squared deviations
1792
17921600 = 192
(192)2 = 36,864
1666
1666 1600 =
1362
1614
1614 1600 =
1460
(-140)2 = 19,600
1867
(267)2 = 71,289
1439
(-161)2 = 25,921
sum =
66
14
(66)2 =
4,356
(-238)2 = 56,644
(14)2 =
196
sum = 214,870
214,870
s
35,811.67
7 1
2
47