Lecture 3 Summarizing Data Measures of Central Location and Sampling
Lecture 3 Summarizing Data Measures of Central Location and Sampling
HSC 205
Dr Rania Al Dweik
• Define the center location and
spread of data
• Determine the mean, mode,
median and midrange
Learning • Determine the standard deviation,
objectives standard error, interquartile range,
and confidence interval
• Choosing the Right Measure of
Central Location and Spread
Measures of Center Location and Spread
21,25,32,48,53,62,62,64
21 25 32 48 53 62 62 64
x
8
367
45.875 45.9
8
The median is the
middle value of a set of
data that has been put
Median into rank order
(increasing or
decreasing orders)
• First sort the values (arrange them
in order)
• If the number of data values is
odd, the median is the number
Finding the located in the exact middle of the
Median list.
• If the number of data values is
even, the median is found by
computing the mean of the two
middle numbers.
Example of Median for Odd data
• 7 data values:
5.40 1.10 0.42 0.73 0.48 1.10 0.66
• Sorted data:
0.42 0.48 0.66 0.73 1.10 1.10 5.40
median 0.73
Example of Median for even data
• 6 data values:
5.40 1.10 0.42 0.73 0.48 1.10
• Sorted data:
0.42 0.48 0.73 1.10 1.10 5.40
0.73 1.1
median 0.915
2
• It is the value that occurs with the
greatest frequency
• Data set can have one, more than one,
or no mode
Mode • Bimodal :two data values occur with
the same greatest frequency
• Multimodal: more than two data
values occur with the same greatest
frequency
• No Mode: no data value is repeated
Mode is most useful as a
measure of central tendency
when examining categorical
data, such as models of cars or
Uses of Mode flavors of soda, for which a
mathematical average value
based on ordering can not be
calculated.
Mode - Examples
5.41 0.42
midrange 2.915
2
Midrange
Sensitive to extremes (outliers)
because it uses only the maximum and minimum
values, so rarely used
Advantages
very easy to compute
Best Measure of Center
Example
• These data values represent weight gain or loss in
kg for a simple random sample (SRS) of18 college
freshman (negative data values indicate weight loss)
11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2
1 2
median 1.5 kg
2
Example
• Mode is -2
• Midrange
5 11
midrange 3.0 kg
2
Example
• All of the measures of center are below 6.8 kg
This shows that the range is very sensitive to extreme values; therefore,
not as useful as other measures of variation.
• The interquartile range is a measure of
spread most commonly used with the
median.
• It represents the central portion of the
Interquartile distribution, from the 25th percentile to
the 75th percentile.
Range • The interquartile range thus includes
approximately one half of the
observations in the set, leaving one
quarter of the observations on each
side.
Method For Determining The Interquartile Range
• Step 1. Arrange the observations in increasing order.
• Step 2. Find the position of the 1st and 3rd quartiles with the following formulas. Divide the sum by the
number of observations.
Position of 1st quartile (Q1) = 25th percentile = (n + 1) / 4
Position of 3rd quartile (Q3) = 75th percentile = 3(n + 1) / 4 = 3 × Q1
• Step 3. Identify the value of the 1st and 3rd quartiles.
• a. If a quartile lies on an observation (i.e., if its position is a whole number), the value of the quartile is the
value of that observation. For example, if the position of a quartile is 20, its value is the value of the 20th
observation.
• b. If a quartile lies between observations, the value of the quartile is the value of the lower observation plus
the specified fraction of the difference between the observations. For example, if the position of a quartile is
20¼, it lies between the 20th and 21st observations, and its value is the value of the 20th observation, plus ¼
the difference between the value of the 20th and 21st observations.
• Step 4 Statistically, calculate the interquartile range as Q3 minus Q1.
Dispersion: Variance
where
σ2 = variance of the distribution of data in the population (sigma)
Yi = a data value
μ = mean of the distribution of data in the population
N = number of data values in the population
Example
• Example 1: Given the data set {4.5, 9.8, 2.3, 5.3, 8.9}, find the variance.
• Solution: n = 5
Mean = (4.5 + 9.8 + 2.3 + 5.3 + 8.9) / 5 = 6.16
SE = SD/ √n
• Often, biostatisticians conduct studies not only to measure
characteristics in the subjects studied, but also to make
generalizations about the larger population from which
these subjects came. This process is called inference. For
example, political pollsters use samples of perhaps 1,000 or
Confidence so people from across the country to make inferences
about which presidential candidate is likely to win on
interval
Election Day. Usually, the inference includes some
consideration about the precision of the measurement.
(The results of a political poll may be reported to have a
• Step 3.
Lower limit of the 95% confidence interval = mean minus 1.96 x standard error.
Upper limit of the 95% confidence interval = mean plus 1.96 x standard error.
Choosing the Right Measure of Central Location and Spread
Exercise : Standard deviation
• Calculate the standard deviation for the same set of vaccination data.
• 2, 0, 3, 1, 0, 1, 2, 2, 4, 8, 1, 3, 3, 12, 1, 6, 2, 5, 1
Exercise : Interquartile range
• Determine the first and third quartiles and interquartile range for the same vaccination
data as in the previous exercises.
2, 0, 3, 1, 0, 1, 2, 2, 4, 8, 1, 3, 3, 12, 1, 6, 2, 5, 1
Exercise: Standard error
• When the serum cholesterol levels of 4,462 men were measured, the mean cholesterol
level was 213, with a standard deviation of 42. Calculate the standard error of the mean
for the serum cholesterol level of the men studied.
Exercise: Confidence interval
• When the serum cholesterol levels of 4,462 men were measured, the mean cholesterol
level was 213, with a standard deviation of 42. Calculate the standard error of the mean
for the serum cholesterol level of the men studied. Find the 95% Confidence interval