What Are The Measures of Central Tendency?: L04: Basic Statistical Descriptions of Data
What Are The Measures of Central Tendency?: L04: Basic Statistical Descriptions of Data
There are three main measures of central tendency: the mode, the median and the mean. Each of these
measures describes a different indication of the typical or central value in the distribution.
Mean: The average of all data points
Median: The data point where half of the data lies above and half below it
Mode: The most common value in the data
mean (arithmetic average), mode (most frequent number), median (middle number when
numbers are listed smallest to largest).
# Mean
An arithmetic mean is calculated using the following equation:
The most common and effective numeric measure of the “center” of a set of data is the
(arithmetic) mean. Let x1, x2, : : :, xN be a set of N values or observations, such as for some
numeric attribute X, like salary. The mean of this set of values is
X̅ = =
Example: The mean of 4,1, and 7 is (4+1+7)/3=12/3 =4 left parenthesis, 4, plus, 1, plus, 7, right
parenthesis, slash, 3, equals, 12, slash, 3, equals, 4.
Example Mean. Suppose we have the following values forsalary(in thousands of
dollars), shownin increasing order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110.
Using Eq. (2.1), we have
X̅ =
The mean is the sum of the value of each observation in a dataset divided by the number of
observations. This is also known as the arithmetic average.
Looking at the retirement age distribution again:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
# median
The median is the middle value in distribution when the values are arranged in ascending or
descending order.
The median divides the distribution in half (there are 50% of observations on either side of
the median value). In a distribution with an odd number of observations, the median value
is the middle value.
Example 1
Find the median of this data:
1, 4, 2, 5, 0
Put the data in order first:
0, 1, 2, 4, 5
There is an odd number of data points, so the median is the middle data point.
The median is 2.
Find the median of this data:
10, 40, 20, 50
Put the data in order first:
10, 40, 20, 50
There is an even number of data points, so the median is the average of the middle two data
points.
Median=20+40=60/2=30
fraction, equals, start fraction, 60, divided by, 2, end fraction, equals, 30
The median is 30
Looking at the retirement age distribution (which has 11 observations), the median is the middle
value, which is 57 years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has an even number of observations, the median value is the mean of the
two middle values. In the following distribution, the two middle values are 56 and 57, therefore
the median equals 56.5 years:
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The median cannot be identified for categorical nominal data, as it cannot be logically ordered.
*If two elements are in the middle of a set of data...to find the median, add the two
numbers together and divide by two.*
#Mode
Example:
Example:
Set of data: 1, 1, 1, 1, 4, 4, 4, 4, 6, 8, 10, 12, 15, 21, 21.
The highest value in the set is 21 and the lowest value is 1. So we add 21 and 1 and get 22 and
then divide 22 by 2 = 11. 11 is the midrange.
The mode is the most commonly occurring value in a distribution.
Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age data.
Age Frequency
54 3
55 1
56 1
57 2
58 2
60 2
The most commonly occurring value is 54, therefore the mode of this distribution is 54 years.
Advantage of the mode:
The mode has an advantage over the median and the mean as it can be found for both numerical
and categorical (non-numerical) data.
The are some limitations to using the mode. In some distributions, the mode may not reflect the
centre of the distribution very well. When the distribution of retirement age is ordered from
lowest to highest value, it is easy to see that the centre of the distribution is 57 years, but the
mode is lower, at 54 years.
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
It is also possible for there to be more than one mode for the same distribution of data, (bi-
modal, or multi-modal). The presence of more than one mode can limit the ability of the mode in
describing the centre or typical value of the distribution because a single value to describe the
centre cannot be identified.
In some cases, particularly where the data are continuous, the distribution may have no mode at
all (i.e. if all values are different).
In cases such as these, it may be better to consider using the median or mean, or group the data
in to appropriate intervals, and find the modal class.
The mode is another measure of central tendency. The mode for a set of data is the
value that occurs most frequently in the set. Therefore, it can be determined for qualita-
tive and quantitative attributes. It is possible for the greatest frequency to correspond to
several different values, which results in more than one mode. Data sets with one, two,
or three modes are respectively called unimodal, bimodal, and trimodal. In general, a
data set with two or more modes is multimodal. At the other extreme, if each data
value occurs only once, then there is no mode.
Variance:
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Mean, the Variance, and the Standard Deviation.
Your first step is to find the Mean:
Answer:
Mean = 600 + 470 + 170 + 430 + 3005
= 19705
= 394
so the mean (average) height is 394 mm. Let's plot this on the chart:
Now we calculate each dog's difference from the Mean:
To calculate the Variance, take each difference, square it, and then average the result:
Variance
= 1085205 / 5
= 21704
# Standard Deviation:
The Standard Deviation is a measure of how spread out numbers are.
Standard deviation are statistics which measure spread - how the data is distributed and
dispersion. amount of variation or dispersion of a set of values
Its symbol is σ (the greek letter sigma)
* A low standard deviation indicates that the values tend to be close to the mean (also called
the expected value) of the set, while a high standard deviation indicates that the values are
spread out over a wider range.
The formula is easy: Standard Deviation is the square root of the Variance. So now you ask,
"What is the Variance?"
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation
σ = √21704
= 147.32...
= 147 (to the nearest mm)
why the standard deviation can tell you how spread out the examples in a set are from the mean.
Why is this useful? Here's an example: If you are comparing test scores for different schools, the
standard deviation will tell you how diverse the test scores are for each school.
where,
= population standard deviation
= sum of...
= population mean—miu/mi-u
n = number of scores in sample.
Find the standard deviation of 4, 9, 11, 12, 17, 5, 8, 12, 14
First work out the mean: 10.222
Now, subtract the mean individually from each of the numbers given and square the result. This
is equivalent to the (x - )² step. x refers to the values given in the question.
x 4 9 11 12 17 5 8 12 14