0% found this document useful (0 votes)
11 views80 pages

Chapter 2 Measure of Central Tendency Dhiraj [Becon 2025]

Chapter 2 discusses measures of central tendency, which include the mean, median, and mode, as well as their definitions, merits, and demerits. The mean is the average of a set of numbers, while the median is the middle value in an ordered dataset, and the mode is the most frequently occurring value. The chapter also covers the weighted mean and provides methods for calculating these measures in both ungrouped and grouped data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views80 pages

Chapter 2 Measure of Central Tendency Dhiraj [Becon 2025]

Chapter 2 discusses measures of central tendency, which include the mean, median, and mode, as well as their definitions, merits, and demerits. The mean is the average of a set of numbers, while the median is the middle value in an ordered dataset, and the mode is the most frequently occurring value. The chapter also covers the weighted mean and provides methods for calculating these measures in both ungrouped and grouped data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 80

CHAPTER 2: MEASURE OF CENTRAL TENDENCY

DHIRAJ GIRI
KATHMANDU UNIVERSITY
2025
Summary Definitions
• The central tendency is the extent to which the values of a
numerical variable group around a typical or central value.

• The variation is the amount of dispersion or scattering


away from a central value that the values of a numerical
variable show.

• The shape is the pattern of the distribution of values from


the lowest value to the highest value.
Measures of Central Tendency
For a given set of numbers, it may be desirable to have a
single number to serve as a kind of representative value
around which all the numbers in the set tend to cluster, a kind
of “middle” number or a measure of central tendency. Three
such measures are discussed in this section.
1. Mean
2. Median
3. Mode
4. GM
5. HM
Measures of Central Tendency: The Mean
• Arithmetic mean of set of n observation i.e, x1, x2, ………,xn
defined as the sum of all the observations or sum of all
items divided by the total number of observations.
• The arithmetic mean (often just called the “mean”) is the
most common measure of central tendency
• For a sample of size n:

• If x1, x2, ………,xn be a set of n observations of a variable,


then the arithmetic mean denoted by
Pronounced x-bar The ith value
n

X i
X1  X 2   X n
X  i 1

n n

Sample size Observed values


• In a frequency distribution,
If x1 occurs f1 times, x2 occur f2 times …… xn occurs fn
times, then the arithmetic mean is given by
f1 x1  f 2 x 2  ......  f n x n  fx  fx
x  
f1  f 2  .......  f n f N
• In grouped frequency distribution, the data are grouped by
classes. To find the arithmetic mean in the grouped data, we
first calculate the midpoint of each class and consider as
value of the variable
Merits
• It is easy to understand and easy to calculate.
• It is based upon all observations.
• It is rigidly defined.
• It is capable of further algebraic treatment.
Demerits
 It cannot be determined by inspection nor can it be located
graphically.
 It cannot be used if you are dealing with qualitative
characteristics.
 It cannot be obtained if single observation is missing.
 It is affected very much by extreme values (outliers).
 It cannot be calculated if the class is open.

11 12  13  14  15 65 11 12  13  14  20 70
 13  14
5 5 5 5
AM is affected due to a change of origin and/or scale
The arithmetic is reduced to a great extent, by taking the deviations of
the given values from any arbitrary point 'A', as explained below.

Let di = xi - A , then fidi = fi(xi - A) = fiXi -Afi

Summing both sides over i from 1 to n, we get

where x is the arithmetic mean of the distribution.


• Any number can serve the purpose of arbitrary point 'A' but, usually,
the value of x corresponding to the middle part of the distribution
will be much more convenient.

• In case of grouped or continuous frequency distribution, the


arithmetic is reduced to a still greater extent by taking
where A is an arbitrary point and h is the common magnitude of class
interval.

• In this case, we have hdi = xi - A, and proceeding exactly similarly as


earlier, we get
Calculate the mean for the following frequency distribution.
Properties or Arithmetic Mean
Algebraic sum of the deviations of a set of values from their arithmetic
mean is zero. If xi|fi = 1, 2, ... , n is the frequency distribution, then
The sum of the squares of the deviations of a set of values is minimum
when taken about mean.
Proof. For the frequency distribution xi I fi , i = 1,2, ... , n,

Let

be the sum of the squares of the deviations of given values from any
arbitrary point 'A'. We have to prove that Z is minimum when .
Applying the principle. of maxima and minima from differential calculus,
Z will be minimum for variations in A if

Hence Z is minimum at the point .


Combined Arithmetic Mean
If the arithmetic means of two series and their respective
number of observations are known then the combined
arithmetic mean of two component series is given by
n1 x1  n2 x2
x12 
n1  n2
Characteristics of the Mean
• The balancing point or fulcrum for the data.

• Regardless of the shape of the distribution, absolute


distances from the mean to the data points always sum to
zero. n
 ( xi  x ) 0
i 1
• Consider the following asymmetric distribution of quiz
scores whose mean = 65.

n
 ( xi  x ) =(42 – 65)+(60 – 65)+(70 – 65)+(75 – 65)+(78 – 65)
i 1
= (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 0 4A-14
Weighted Mean
• In calculating arithmetic mean we suppose that all the items in the
distribution have equal importance. But in practice this may not be
so. If some items in a distribution are more important than others,
then this point must be borne in mind, in order that average
computed is representative of the distribution.

• In such cases, proper weightage is to be given to various items - the


weights attached to each item being proportional to the importance
of the item in the distribution.

• For example, if we want to have an idea of the change in cost of


living of a certain group of people, then the simple mean of the
prices of the commodities consumed by them will not do, since all
the commodities are not equally important, e.g., wheat, rice and
pulses are more important than cigarettes, tea, confectionery, etc.
Weighted Mean
• Let wi; be the weight attached to the items xi, i = 1, 2, ... , n.

• Then, the weighted mean of a set of data is


n

w x i i
w 1x1  w 2 x 2    w n x n
x i1

w  wi
• Where wi is the weight of the ith observation
Example
Sam wants to buy a new camera, and decides on the following rating
system: Image Quality 50%, Battery Life 30% and Zoom Range 20%

The Sonu camera gets 8 (out of 10) for Image Quality, 6 for Battery Life
and 7 for Zoom Range.

The Conan camera gets 9 for Image Quality, 4 for Battery Life and 6 for
Zoom Range.

Which camera is best?

Solution:
Sonu: 0.5 × 8 + 0.3 × 6 + 0.2 × 7 = 4 + 1.8 + 1.4 = 7.2
Conan: 0.5 × 9 + 0.3 × 4 + 0.2 × 6 = 4.5 + 1.2 + 1.2 = 6.9
Sam decides to buy the Sonu.
Example
In each course a student shall be evaluated on a four point scale by
giving letter grades representing grade values as follows:

The combined total marks obtained by the students in both the in-
semester assessment and the end-semester examination shall be
converted into letter grades as follows:
Grade Point Average (GPA) is a quotient determined by dividing the number
of grade points earned by the number of credit hours attempted
Measures of Central Tendency: The Median
• It is the middle value of ordered data set which divided the
distribution into two equal parts. (The number of observations
below the median and number of observations above the median are equal)
• In an ordered array, the median is the “middle” number
(50% above, 50% below)
• It is also called positional average.
• To compute the median, data must be arranged either in
ascending order or in descending order
• Less sensitive than the mean to extreme values
Determination of Median- in Ungrouped Data:
• The location of the median when the values are in
numerical order (smallest to largest):
n 1
Median position  position in the ordered data
2
• If the number of values is odd, the median is the middle
number

• If the number of values is even, the median is the


average of the two middle numbers
n 1
Note that is not the value of the median, only the
2 position of the median in the ranked data
Determination of Median- in Grouped Data:
• The location of the median when the values are in numerical
order (smallest to largest) in a class:
item
• Locate the median class with the help of cumulative frequency.
• Cumulative frequency just greater or equal to median position
value is Median Class
• The class, which lies in the middle of the distribution, is known
as median class.
• Median value is determined by using an interpolation formula.
 N 
  cf 
Median  L   2  h
 f 
 
L = Lower limit of the class, h = size ofthe class,
 f = frequency of
the median class, cf = cumulative frequency of the class preceding
the median class.
Merits
 It is rigidly defined.
 Easy to understand and easy to calculate.
 It is not at all affected by extreme values.
 It can be calculated for distribution with open-end classes.
 It can be located graphically.
 Even for qualitative data median can be calculated.
Demerits
 In case of even number of observation median cannot be
determined exactly.
 It is not based on the all observation.
 It is not amenable to further algebraic treatment.
 Data must be arranged before calculation.
Measures of Central Tendency: The Mode
• Value that occurs most often in data set
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes
• Complicated measure of Central Tendency
The mode of a distribution is the value at the point around
which the items to be most heavily concentrated.
There are four different methods for estimating mode of a
series:
1. Locating the most frequently repeated value in the array:
In case of individual series, maximum number of times
repeated value is mode.
2. Estimating mode by the interpolation:
In case of continuous series, the class corresponding to the
maximum frequency is called the modal class and the
mode is defined by following formula.
f m  f1
Mode  L  h
2 f m  f1  f 2
L = lower limit of the modal class, fm = frequency of the modal class,
f1 = frequency preceding the modal class, f2 = the frequency
succeeding the modal class, h = width of the modal class.
Locating mode by graphically
In continuous series the mode can be easily located by graph. Following
are the steps in locating mode graphically
Step 1. Draw a histogram of the given data.
Step 2. Draw two lines diagonally inside the model class rectangle,
starting from each upper corner of the rectangle to the upper
corner of the adjacent rectangle.
Step 3. Draw a perpendicular line from the intersection of two diagonal
lines to X-axis.
The abscissa of the point at which the perpendicular line meets is the
value of mode.
Estimating the mode from the mean and the median
• If the distribution is symmetric, the mean median and
mode will have identical value.
• If the distribution is skewed the mean median and mode
will pull apart.
• In a skewed distribution the median lies between the
mode and arithmetic mean.
• Median lies approximately 2/3rd distance from the mode
and 1/3 rd from the mean.
• The distance between the mean and median is about
1/3rd the distance between the mean and the mode.
• For moderately skewed distribution thus the following
relation holds.
Mode = Mean – 3 (Mean-Median)
= 3 Median – 2 Mean
• Empirical studies have proved that in a moderately skewed
frequency distribution, a very important relationship exists
between the mean, median and mode.
• The distance between the mean and the median is about
one-third the distance between the mean and the mode.

• Median – Mode = 2 (Mean - Median)


• Mean – Mode = 3(Mean - Median)
Measures of Central Tendency: Review Example

 $3, 000, 000 


HousePrices : • Mean:  
 5 
$2, 000, 000
= $600,000
$ 500, 000
$ 300, 000 • Median: middle value of ranked data
$ 100, 000
= $300,000
$ 100, 000
• Mode: most frequent value
Sum $3, 000, 000
= $100,000
For the grouped data represented on the histogram, there
are not individual values, to check for modal value. Thus,
we take up the modal class of size h, and then find out
the mode based on that.
Consider the graph given below.
Let the frequency of the modal class be fm or f1. Here, BC
= h. The frequency of the preceding modal class be f0 and
the frequency of the class succeeding the modal class
be f2, the lower limit of the modal class be I0.
Thus, the mode is given by I0 + x. Let's have a look
Let the frequency of the modal class be f1.The frequency of the class first
after the modal class is f2.

From the figure, we see that, triangle AEB is similar to triangle DEC.
⇒ΔAEB ∼ ΔDEC ⇒ The relative side ratio is also equal.
⇒ AB/CD = BE/DE = f1− f0 / f1−f2

Again we have ΔBEF∼ΔBDC from the figure.


⇒FE/BC = BE/BD

Clearly,
BE = f1−f0 and BD = BE+ED
⇒ BD = (f1−f0) + (f1−f2)
⇒ BD = f1−f0+f1−f2
⇒ BD = 2f1−f0−f2
Therefore, we have,
FE/BC = BE/BD = [(f1−f0) / 2f1−f0−f2]
⇒ FE/BC = [(f1−f0) / 2f1−f0−f2]
⇒ FE = [(f1−f0)/2f1−f0−f2]×BC

We know that BC = h, so we can write


⇒ FE = [(f1−f0) /2f1−f0−f2]×h

Let, FE be x.
⇒ x = [(f1−f0) /2f1−f0−f2]×h
Therefore, the mode can be obtained by adding this value of x to I0.

⇒ Mode = I0+ x
Substituting the value of x as obtained from above, we get,
⇒ Mode = I0+ [(f1−f0)/2f1−f0−f2]×h

Hence, the mode formula is determined.


⇒ Mode = I0+ [(f1−f0)/2f1−f0−f2]×h
Measures of Central Tendency: Which Measure to Choose?
• The mean is generally used, unless extreme values (outliers)
exist.
• The median is often used, since the median is not sensitive to
extreme values. For example, median home prices may be
reported for a region; it is less sensitive to outliers.
• In many situations it makes sense to report both the mean
and the median.
Geometric Mean
• It is suitable to compute average rate of change over a
period of time.
• Geometric mean of Set of n observations x1, x2, …………xn
is defined as the nth root of their product.
GM n x1 x 2 .......... . x n

GM n x1 x 2 .......... . x n
1
GM x1 x 2 .......... 
. x n n
1
log GM  logx1 x 2 .......... . x n 
n
1 1
log GM  log x1  log x 2  ..........  x n  
n n  log x
1 
 G.M  Anti log
n
log x 

In case of frequency distribution
• If x1, x2, x3,…,xn be non-zero and non-negative
variate values with the corresponding
frequencies f1, f2, f3,…, fn
1 then 
G.M .  Anti log
N
 f log x
Merits
 It is rigidly defined.
 It is based upon all the observations.
 It is amenable to further algebraic treatment.
 It is useful to find out population growth, construction of Index
number and rate of interest.
Demerits
 It is not easy to understand and to calculate for non-mathematics
student.
 If one of the observations is zero, geometric mean becomes zero
 It cannot be computed when there are positive and negative
values in the series.
Harmonic Mean
• It is the reciprocal of the arithmetic mean of the
reciprocals of the set of non-zero observations.
• If x1, x2,…,xn be the set of n non-zero observations, then the
harmonic mean HM is
n
HM 
1
 X
In case frequency distribution
• If x1, x2,…,xn be the set of n non-zero observations with
corresponding frequencies f1, f2,…,fn then the harmonic
mean is HM 
N
f
 x
Merits
• It is based on all observations of the series.
• It is especially useful for solving problems relating to time,
work, rate, speed etc.
• It is not affected by fluctuations of sampling.
Demerits
• It is difficult to understand and difficult to compute, as
compared with AM.
• If any of the observations is zero, HM becomes zero.
Relationship between AM, GM and HM
Arithmetic mean, Geometric mean and Harmonic mean
follows the following relationship.
AM GM  HM
GM  AM HM
The square of the geometric mean is equal to
the product of the arithmetic mean and the
harmonic mean.
Measures of Central Tendency: Summary

Central Tendency

Arithmetic Median Mode Geometric Mean


Mean
n
XG ( X1 X 2  Xn )1/ n
X i
X i1
n Middle value Most Rate of
in the ordered frequently change of
array observed a variable
value over time
Quartile Measures
• Quartiles split the ranked data into 4 segments with an
equal number of values per segment

• The first quartile, Q1 , is the value for which 25% of the


observations are smaller and 75% are larger
• Q2 is the same as the median (50% of the observations are
smaller and 50% are larger)
• Only 25% of the observations are greater than the third
quartile.
Quartile Measures: Locating Quartiles
Find a quartile by determining the value in the appropriate
position in the ranked data, where
n  1 ranked value
First quartile position: Q1 
4
n  1 ranked value
Second quartile position: Q2  2
3 n  1
Third quartile position: Q3  ranked value
4

Where, n is the number of observed values


Quartile Measures: Calculation Rules
• When calculating the ranked position use the following rules
• If the result is a whole number then it is the ranked
position to use
• If the result is a fractional half (e.g. 2.5, 7.5, 8.5, etc.)
then average the two corresponding data values.
• If the result is not a whole number or a fractional half
then round the result to the nearest integer to find the
ranked position.
Quartile Measures: Locating Quartiles

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22(n = 9)

9  1 2.5 position
Q1 is in the 4
of the ranked data

so use the value half way between the 2nd and 3rd values,
Q1 12.5

Q1 and Q3 are measures of non-central location


Q2 = median, is a measure of central tendency
Calculating The Quartiles: Example

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 n 9 


Q1 is in the 9  1 2.5 position of the ranked data,
4 12  13 
so Q1  12.5
2
Q2 is in the 9  1 5th position of the ranked data,
2
so Q2 median 16
Q3 is in the 3 9  1 7.5 position of the ranked data,
4
so Q3 
18  21
19.5
2

Q1 and Q3 are measures of non-central location


Q2 = median, is a measure of central tendency
Relationships among the Mean, Median, and Mode
• For a symmetric histogram and
frequency distribution curve
mean = median = mode

• For right-skewed histogram and


frequency distribution curve
mode < median < mean

• For left-skewed histogram and


frequency distribution curve
mean < median < mode

57
Conceptualizing the mean
As the center of As the representative
the distribution score in the distribution
Conceptualizing the mean
As center of As representative score
distribution in distribution
Conceptualizing the mean
As center of As representative score
distribution in distribution
Conceptualizing the mean
As center of As representative score
distribution in distribution
Conceptualizing the mean
As center of As representative score
distribution in distribution

Balancing
point
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5 6 7 8 9 10

Balancing
1+10 = 11 point
Mean = 11/2 = 5.5
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5 6 7 8 9 10

Balancing
points
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5 6 7 8 9 10
What happens if we add an
observation to our
distribution?
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5 6 7 8 9 10
What happens if we add an
observation to our
distribution?
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5 6 7 8 9 1
0
What happens if we add an
observation to our
distribution?
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5 6 7
8 9 10
What happens if we add an
observation to our
distribution?
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5
6 7 8 9 1
0
What happens if we add an
Balancing observation to our
1+10+7 = 18 point distribution?
Mean = 18/3 = 5.5
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5
6 7 8 9 1
0
What happens if we add an
observation to our
1+10+7 = 18 distribution?
Mean = 18/3 = 6.0
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5 6 7
8 9 10
What happens if we add an
observation to our
1+10+7 = 18 distribution?
Mean = 18/3 = 6.0
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5 6 7 8 9 10
What happens if we add an
observation to our
1+10+7 = 18 distribution?
Mean = 18/3 = 6.0
Conceptualizing the mean
As center of As representative score
distribution in distribution

1 2 3 4 5 6 7 8 9 10
What happens if we add an
New observation to our
1+10+7 = 18 Balancing distribution?
point
Mean = 18/3 = 6.0
Conceptualizing the mean
As center of As the representative
To be fair, let’s give
distribution score in the distribution
everybody the
Girl Scout bake sale for camping trip
same amount.
1 2 3 4 5 6 7 8 9 10

$12 $30 $6 $18 $13


$25 $15

12+25+30+6+18+15+13=119 119/7 = 17
Conceptualizing the mean
As center of As representative score
distribution in distribution
Girl Scout bake sale for camping trip

1 2 3 4 5 6 7 8 9 10

$17 $17 $17 $17 $17


$17 $17

12+25+30+6+18+15+13=119 119/7 = 17
So everybody is represented by same
score, the mean is the “standard”
17+17+17+17+17+17+17=119 119/7 = 17
Conceptualizing the Mode
Grouping Method Technique
1. Prepare a table consisting of 6 columns in addition to a column for various
values of X.
2. In the first column, write the frequencies against various values of X as
given in the question.
3. In second column, the sum of frequencies, starting from the top and
grouped in twos, are written.
4. In third column, the sum of frequencies, starting from the second and
grouped in twos, is written.
5. In fourth column, the sum of frequencies, starting from the top and
grouped in threes is written.
6. In fifth column, the sum of frequencies, starting from the second and
grouped in threes is written.
7. In the sixth column, the sum of frequencies, starting from the third and
grouped in threes is written.
8. The highest frequency total in each of the six columns is identified and
analyzed to determine mode. We apply this method for determining mode
of the above example.
Example

From the analysis


table value 58 is
repeating 6 times so,
mode value is 58
Conceptualizing the Mode
Situation Most Least Can’t Use
Representative Representative

Nominal Mode Median/Mean

Ordinal Median Mode Mean

Skewed interval or Median Mode


ratio
Open ended interval Median Mode Mean
or ratio
Interval or Ratio Mean Mode

You might also like