0% found this document useful (0 votes)
10 views

Lecture_2 Measures of Central Tendency and Variation

The document provides an overview of statistics and probability, focusing on measures of central tendency (mean, median, mode) and measures of dispersion (range, quartile deviation, standard deviation). It explains how to calculate these measures for both ungrouped and grouped data, as well as introduces concepts of skewness and kurtosis. Additionally, it includes examples and formulas for practical application of these statistical concepts.

Uploaded by

Hussein Kingazi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture_2 Measures of Central Tendency and Variation

The document provides an overview of statistics and probability, focusing on measures of central tendency (mean, median, mode) and measures of dispersion (range, quartile deviation, standard deviation). It explains how to calculate these measures for both ungrouped and grouped data, as well as introduces concepts of skewness and kurtosis. Additionally, it includes examples and formulas for practical application of these statistical concepts.

Uploaded by

Hussein Kingazi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

IS 141: Statistics and Probability

Measures of Central Tendency and Variation

Dr Emmanuel

March 27, 2024


Table of contents

Measures of Central Tendency

Measures of Dispersion

Skewness and Kurtosis


Measures of Central Tendency

The measures of central tendency are Mean, median and Mode.


These are sometimes called measures of average.
Measures of Central Tendency

The measures of central tendency are Mean, median and Mode.


These are sometimes called measures of average.

These measures can be determined for ungrouped and grouped


data.
Ungrouped Data
(a) Mean: There are various mean measures in statistics. These
include arithmetic mean, geometric mean and harmonic mean.

The arithmetic mean: This is the most commonly used


measure of average. It is denoted by X̄ .
Measures of Central Tendency

The measures of central tendency are Mean, median and Mode.


These are sometimes called measures of average.

These measures can be determined for ungrouped and grouped


data.
Ungrouped Data
(a) Mean: There are various mean measures in statistics. These
include arithmetic mean, geometric mean and harmonic mean.

The arithmetic mean: This is the most commonly used


measure of average. It is denoted by X̄ .

The arithmetic mean is regarded as a suitable measure of


central tendency when that data are symmetric, or when that
data have no extreme values.
Given a set of n observations X1 , X2 , . . . , Xn , then
n
1X
X̄ = Xi
n
i=1

(b) Median: This another measure of central tendency. This is


defined as the middle value of the data set when the data are
arranged in order.
Given a set of n observations X1 , X2 , . . . , Xn , then
n
1X
X̄ = Xi
n
i=1

(b) Median: This another measure of central tendency. This is


defined as the middle value of the data set when the data are
arranged in order.

The median is suitable for data with extreme values and it can
used to give the general overview for a huge mass of data
whereby the computation of the mean might be tedious.
Given a set of n observations X1 , X2 , . . . , Xn , then
n
1X
X̄ = Xi
n
i=1

(b) Median: This another measure of central tendency. This is


defined as the middle value of the data set when the data are
arranged in order.

The median is suitable for data with extreme values and it can
used to give the general overview for a huge mass of data
whereby the computation of the mean might be tedious.

The median takes the position n+1



2 th for odd number of
data. Foreven number of data, take the average of n2 th
and n+12 th observations.
(c) Mode: This is the value(s) with the highest frequency from
the data set.
(c) Mode: This is the value(s) with the highest frequency from
the data set.

It can be used to determine the most favourable output of a


certain experiment and help decide on what measures may be
taken from that output.
(c) Mode: This is the value(s) with the highest frequency from
the data set.

It can be used to determine the most favourable output of a


certain experiment and help decide on what measures may be
taken from that output.

Example: Compute mean, median and mode for the


following data. 3, 4, 5, 8, , 4, 9, 11.
Grouped Data
(a) Arithmetic Mean:
Given a grouped frequency distribution we need to create a
column for class midpoints or class marks (Xi ).
(c) Mode: This is the value(s) with the highest frequency from
the data set.

It can be used to determine the most favourable output of a


certain experiment and help decide on what measures may be
taken from that output.

Example: Compute mean, median and mode for the


following data. 3, 4, 5, 8, , 4, 9, 11.
Grouped Data
(a) Arithmetic Mean:
Given a grouped frequency distribution we need to create a
column for class midpoints or class marks (Xi ).

Then the arithmetic mean will be obtained from the general


formula given by
k
P
fi Xi
i=1
X̄ = k
P
fi
i=1
where fi are frequencies and k are number of classes or groups.
k
P
fi Xi
i=1
X̄ = k
P
fi
i=1
where fi are frequencies and k are number of classes or groups.

(b) Median: To compute the median, first you need to identify


the class which contains the median, we call this median class.
k
P
fi Xi
i=1
X̄ = k
P
fi
i=1
where fi are frequencies and k are number of classes or groups.

(b) Median: To compute the median, first you need to identify


the class which contains the median, we call this median class.

Then the median can be computed using the formula


N

2 − Cb h
Median = L +
fm
where
L is the lower boundary of the median class,
N is the total number of observations,
Cb is the cumulative frequency before the median class,
h is the class size and
fm is the frequency of the median class.
(c) Mode: The computation of mode for a grouped data is done
using the formula
 
fm − fb
Mode = L + h
2fm − fb − fa

where
L is the Lower class of the model class,
fm is the frequency of the model class,
fb is the frequency of the class before the model class,
fa is the frequency of the class after the model class and
h is the class size.
(c) Mode: The computation of mode for a grouped data is done
using the formula
 
fm − fb
Mode = L + h
2fm − fb − fa

where
L is the Lower class of the model class,
fm is the frequency of the model class,
fb is the frequency of the class before the model class,
fa is the frequency of the class after the model class and
h is the class size.
Example:
The height (in inches) of 100 students at ARU were recorded as
follows:
Height 60 - 62 63 - 65 66 - 68 69 - 71 72 - 74
Frequency 5 18 42 27 8

Compute (a) Arithmetic mean (b) Median height (c) Mode


(c) The class with the highest frequency is 66 - 68 and thus it is
the modal class. From this class we find that

L = 65.5, h = 3, fm = 42, fa = 27, fb = 18.


 
fm − fb
Therefore Mode = L + h = 67.35
2fm − fb − fa
(c) The class with the highest frequency is 66 - 68 and thus it is
the modal class. From this class we find that

L = 65.5, h = 3, fm = 42, fa = 27, fb = 18.


 
fm − fb
Therefore Mode = L + h = 67.35
2fm − fb − fa
Measures of Dispersion
Shows how data deviate from the given measure of average
(arithmetic mean or median).
(c) The class with the highest frequency is 66 - 68 and thus it is
the modal class. From this class we find that

L = 65.5, h = 3, fm = 42, fa = 27, fb = 18.


 
fm − fb
Therefore Mode = L + h = 67.35
2fm − fb − fa
Measures of Dispersion
Shows how data deviate from the given measure of average
(arithmetic mean or median).

These measures include the range, quartile deviation, mean


absolute deviation and the sample standard deviation.
(c) The class with the highest frequency is 66 - 68 and thus it is
the modal class. From this class we find that

L = 65.5, h = 3, fm = 42, fa = 27, fb = 18.


 
fm − fb
Therefore Mode = L + h = 67.35
2fm − fb − fa
Measures of Dispersion
Shows how data deviate from the given measure of average
(arithmetic mean or median).

These measures include the range, quartile deviation, mean


absolute deviation and the sample standard deviation.

The most commonly used measure of variation is the sample


standard deviation. However the measure is not suitable for data
with extreme values.
If the data consists of extreme values, the appropriate measure
would be the quartile deviation.
If the data consists of extreme values, the appropriate measure
would be the quartile deviation.

The mean absolute deviation is rarely used to compare the


variation given two data sets.

The range is very crude measures of dispersion. It is just to give


the general overview on dispersion of data.
If the data consists of extreme values, the appropriate measure
would be the quartile deviation.

The mean absolute deviation is rarely used to compare the


variation given two data sets.

The range is very crude measures of dispersion. It is just to give


the general overview on dispersion of data.

Ungrouped Data
Example
Grouped Data
Example
Z-scores
The mean and standard deviation of a data set can be used to
calculate a z-score, which measures the distance between a
particular observation x and the mean x̄, measured in units of
standard deviation (s).

x − x̄
Z-score =
s
Skewness and Kurtosis

Characterization of the data includes skewness and kurtosis.

Skewness is a measure of symmetry, or more precisely, the lack of


symmetry.

A distribution, or data set, is symmetric if it looks the same to the


left and right of the center point.

For univariate data X1 , X2 , . . . , Xn , the sample skewness is


defined as
Pn 3
Xi − X̄
Skewness = i=1
(n − 1)s 3
The above formula for skewness is referred to as the
Fisher-Pearson coefficient of skewness.
Kurtosis
is a measure of whether the data are heavy-tailed or light-tailed
relative to a normal distribution.

That is, data sets with high kurtosis tend to have heavy tails, or
outliers. Data sets with low kurtosis tend to have light tails, or
lack of outliers.

For univariate data X1 , X2 , . . . , Xn , the sample kurtosis is defined


as
n
P 4
Xi − X̄
Kurtosis = i=1
(n − 1)s 4
Example
Calculate the sample skewness and sample kurtosis from the
following grouped data
Class 2-4 4-6 6-8 8 - 10
Frequency 3 4 2 1
Tutorial Questions No1
Example (try it at your own time)

Salaries paid to supervisors had a mean of $25, 000 with a standard


deviation of $2000. If all salaries are increased by $2500, show that
the new mean and standard deviation are $(25, 000 + 2500) and
$2000 respectively.

You might also like