0% found this document useful (0 votes)
24 views

Lecture 3 Summarizing Data Measures of Central Location and Sampling

Uploaded by

RAnia
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lecture 3 Summarizing Data Measures of Central Location and Sampling

Uploaded by

RAnia
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

BIOSTATISTIC

HSC 205

Measures of Center Location and Spread

Dr Rania Al Dweik
• Define the center location and
spread of data
• Determine the mean, mode,
median and midrange
Learning • Determine the standard deviation,
objectives standard error, interquartile range,
and confidence interval
• Choosing the Right Measure of
Central Location and Spread
Measures of Center Location and Spread

A measure of central location provides a single value


that summarizes an entire distribution of data.

Measures of central location include the mean,


median, mode, and midrange.
• The measure of center
Mean obtained by adding the data
(Average) values and then dividing the
total by the number of values
Uses of The mean is the best
descriptive measure for
Mean continues data
Example of Sample Mean

21,25,32,48,53,62,62,64

21  25  32  48  53  62  62  64
x
8
367
  45.875  45.9
8
The median is the
middle value of a set of
data that has been put
Median into rank order
(increasing or
decreasing orders)
• First sort the values (arrange them
in order)
• If the number of data values is
odd, the median is the number
Finding the located in the exact middle of the
Median list.
• If the number of data values is
even, the median is found by
computing the mean of the two
middle numbers.
Example of Median for Odd data
• 7 data values:
5.40 1.10 0.42 0.73 0.48 1.10 0.66

• Sorted data:
0.42 0.48 0.66 0.73 1.10 1.10 5.40

median  0.73
Example of Median for even data
• 6 data values:
5.40 1.10 0.42 0.73 0.48 1.10

• Sorted data:
0.42 0.48 0.73 1.10 1.10 5.40

(even number of values – no exact middle)

0.73  1.1
median   0.915
2
• It is the value that occurs with the
greatest frequency
• Data set can have one, more than one,
or no mode
Mode • Bimodal :two data values occur with
the same greatest frequency
• Multimodal: more than two data
values occur with the same greatest
frequency
• No Mode: no data value is repeated
Mode is most useful as a
measure of central tendency
when examining categorical
data, such as models of cars or
Uses of Mode flavors of soda, for which a
mathematical average value
based on ordering can not be
calculated.
Mode - Examples

a) 5.40 1.10 0.42 0.73 0.48 1.10


Mode is 1.10
b) 27 27 27 55 55 55 88 88 99
Bimodal - 27 & 55
c) 1 2 3 6 7 8 9 10
No Mode
Midrange is the value midway between
the maximum and minimum data values
Midrange in the original data set (average of
highest and lowest data values)
• Step 1. Identify the smallest
(minimum) observation and
Method for the largest (maximum)
observation.
identifying
the • Step 2. Add the minimum plus
midrange the maximum, then divide by
two.
Example of Midrange

5.41 1.13 0.42 0.49 0.65 1.86 1.69

5.41  0.42
midrange   2.915
2
Midrange
 Sensitive to extremes (outliers)
because it uses only the maximum and minimum
values, so rarely used
 Advantages
very easy to compute
Best Measure of Center
Example
• These data values represent weight gain or loss in
kg for a simple random sample (SRS) of18 college
freshman (negative data values indicate weight loss)

11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2

• Do these values support the legend that college


students gain 6.8 kg during their freshman year?
Explain
Example
• Sample Mean
 x  34  1.9 kg
n 18
• Median
-5 -2 -2 -2 -2 0 0 1 1 2 2 3 3 4 5 7 8 11

1 2
median   1.5 kg
2
Example
• Mode is -2

• Midrange
 5  11
midrange   3.0 kg
2
Example
• All of the measures of center are below 6.8 kg

• Based on measures of center location, these


data values do not support the idea that college
students gain 6.8 kg during their freshman year
• Measures of central location are single values
that summarize the observed values of a
distribution. The mode provides the most

Selecting common value, the median provides the


central value, the mean provides the average
value, and the midrange provides the midpoint
the value.

appropriate • The mode and median are useful as


descriptive measures. However, they are not
often used for further statistical
measure manipulations.
• In contrast, the mean is a good descriptive
measure and has good statistical properties.
The mean is used most often in additional
statistical manipulations.
• While the arithmetic mean is the measure of
choice when data are normally distributed,
the median is the measure of choice for data
Selecting that are not normally distributed. Because
epidemiologic data tend not to be normally
the distributed (incubation periods, doses, ages of
patients), the median is often preferred. 

appropriate • The selection of the most appropriate


measure requires judgment based on

measure 1. the characteristics of the data (e.g.,


normally distributed or skewed, with or
without outliers)
2. the reason for calculating the measure (e.g.,
for descriptive or analytic purposes).
Measure of Spread/ Variation
• A measure of the center by itself
can be misleading
Why is it • Example: Two nations with the
important same median family income are
very different if one has
to extremes of wealth and poverty
understand and the other has little variation
among families (see the
variation? following table).
Example of variation
Data Set A Data Set B
50,000 10,000
60,000 20,000
70,000 70,000
80,000 120,000
90,000 130,000
MEAN 70,000 70,000
MEDIAN 70,000 70,000

Data set B has more variation


about the mean
Histograms: example of variation

Data set B has more variation


about the mean (Target).
• Spread, or dispersion, is the
second important feature of
Measures frequency distributions. Just as
measures of central location
of Spread/ describe where the peak is
located, measures of spread
Variation describe the dispersion (or
variation) of values from that peak
in the distribution.
How do we quantify Spread/
Variation?
Measures • Measures of spread include the
of Spread/ range, interquartile range,
variance, and standard deviation.
Variation
• The range of a set of data is
the difference between the
maximum value and the
Range minimum value.

Range = (maximum value) – (minimum value)


Example of range
27 28 25 6 27 30 26
Range = 30 - 6 = 24
If we ignore the outlier (6) in the previous data set gives data, the range
will be
Range = 30 - 25 = 5

This shows that the range is very sensitive to extreme values; therefore,
not as useful as other measures of variation.
• The interquartile range is a measure of
spread most commonly used with the
median.
• It represents the central portion of the
Interquartile distribution, from the 25th percentile to
the 75th percentile.
Range • The interquartile range thus includes
approximately one half of the
observations in the set, leaving one
quarter of the observations on each
side.
Method For Determining The Interquartile Range
• Step 1. Arrange the observations in increasing order.
• Step 2. Find the position of the 1st and 3rd quartiles with the following formulas. Divide the sum by the
number of observations.
Position of 1st quartile (Q1) = 25th percentile = (n + 1) / 4
Position of 3rd quartile (Q3) = 75th percentile = 3(n + 1) / 4 = 3 × Q1
• Step 3. Identify the value of the 1st and 3rd quartiles.
• a. If a quartile lies on an observation (i.e., if its position is a whole number), the value of the quartile is the
value of that observation. For example, if the position of a quartile is 20, its value is the value of the 20th
observation.
• b. If a quartile lies between observations, the value of the quartile is the value of the lower observation plus
the specified fraction of the difference between the observations. For example, if the position of a quartile is
20¼, it lies between the 20th and 21st observations, and its value is the value of the 20th observation, plus ¼
the difference between the value of the 20th and 21st observations.
• Step 4 Statistically, calculate the interquartile range as Q3 minus Q1.
Dispersion: Variance

• Dispersion for the Gaussian distribution = variance of the distribution


• Average of how far the values are from the mean
• For normally distributed data, the result will always be zero

where
σ2 = variance of the distribution of data in the population (sigma)
Yi = a data value
μ = mean of the distribution of data in the population
N = number of data values in the population
Example
• Example 1: Given the data set {4.5, 9.8, 2.3, 5.3, 8.9}, find the variance.
• Solution: n = 5
Mean = (4.5 + 9.8 + 2.3 + 5.3 + 8.9) / 5 = 6.16

Sample Variance = ∑ (xi−μ)2/ N


= (4.5−6.16)2+(9.8−6.16) 2 + (2.3−6.16) 2 +(5.3−6.16) 2 +(8.9−6.16) 2 /5
= 2.75 + 13.25 + 14.89 + 0.74 + 7.53 / 5
Answer: Sample variance = 9.34
• The standard deviation is the measure of
spread used most commonly with the mean
• The standard deviation is a measure of
variation of all values from the mean.

Standard • The value of the standard deviation s is usually


positive.

Deviation • The value of the standard deviation s can


increase dramatically with the inclusion of one
or more outliers (data values far away from all
others).
• The units of the standard deviation s are the
same as the units of the original data values
Method For Calculating The Standard Deviation

• Step 1. Calculate the mean.


• Step 2. Subtract the mean from each observation. Square the
difference.
• Step 3. Sum the squared differences.
• Step 4. Divide the sum of the squared differences by n – 1.
• Step 5. Take the square root of the value obtained in Step 4. The
result is the standard deviation for the sample.
• The standard deviation is sometimes confused with
another measure with a similar name — the
Standard standard error of the mean. However, the two are
not the same.
error of the • The standard deviation describes variability in a set
of data.

mean • The standard error of the mean refers to variability


we might expect in the means of repeated samples
taken from the same population.
Method For Calculating The Standard Error Of The Mean

• Step 1. Calculate the standard deviation.

• Step 2. Divide the standard deviation by the square


root of the number of observations (n).

SE = SD/ √n
• Often, biostatisticians conduct studies not only to measure
characteristics in the subjects studied, but also to make
generalizations about the larger population from which
these subjects came. This process is called inference. For
example, political pollsters use samples of perhaps 1,000 or
Confidence so people from across the country to make inferences
about which presidential candidate is likely to win on

interval
Election Day. Usually, the inference includes some
consideration about the precision of the measurement.
(The results of a political poll may be reported to have a

(Confidence margin of error of, say, plus or minus three points.) In


biostatistics, a common way to indicate a measurement’s
precision is by providing a confidence interval. A narrow
limit) confidence interval indicates high precision; a wide
confidence interval indicates low precision.
• In biostatistics, investigators generally want to have a
greater level of confidence than that, and usually set the
confidence level at 95% or based on the type of study.
More commonly, biostatisticians interpret a 95%
confidence interval as the range of values consistent with
the data from their study.
Method For Calculating A 95% Confidence
Interval For A Mean
• Step 1. Calculate the mean and its standard error.

• Step 2. Multiply the standard error by 1.96.

• Step 3.
Lower limit of the 95% confidence interval = mean minus 1.96 x standard error.
Upper limit of the 95% confidence interval = mean plus 1.96 x standard error.
Choosing the Right Measure of Central Location and Spread
Exercise : Standard deviation
• Calculate the standard deviation for the same set of vaccination data.

• 2, 0, 3, 1, 0, 1, 2, 2, 4, 8, 1, 3, 3, 12, 1, 6, 2, 5, 1
Exercise : Interquartile range
• Determine the first and third quartiles and interquartile range for the same vaccination
data as in the previous exercises.
2, 0, 3, 1, 0, 1, 2, 2, 4, 8, 1, 3, 3, 12, 1, 6, 2, 5, 1
Exercise: Standard error
• When the serum cholesterol levels of 4,462 men were measured, the mean cholesterol
level was 213, with a standard deviation of 42. Calculate the standard error of the mean
for the serum cholesterol level of the men studied.
Exercise: Confidence interval
• When the serum cholesterol levels of 4,462 men were measured, the mean cholesterol
level was 213, with a standard deviation of 42. Calculate the standard error of the mean
for the serum cholesterol level of the men studied. Find the 95% Confidence interval

You might also like