0% found this document useful (0 votes)
9 views

2_Final_Introduction to Data_Measure_Central_Tendency_DPPM_II_PG

Uploaded by

kunturandal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

2_Final_Introduction to Data_Measure_Central_Tendency_DPPM_II_PG

Uploaded by

kunturandal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 123

2024 QUANTITATIVE METHODS

IN DECISION MAKING

POST-GRADUATE DIPLOMA IN PROJECT PLANING & MANAGEMENT,


DPPM II- KLA WKD
MODULE: Quantitative Methods in Decision Making
Quantitative Methods in Decision Making

Key concepts, application


&
Types of Data

Dr. PG, Mphil_Epidemiology, PhD


Dr Philip Govule 2
Some useful definitions:
• Variable: a quantity or a quality which varies between one individual
and another.
• Frequency: the number of individuals with a specific value of a
variable.
• Probability: the proportion of times an event would occur in
repetition of given circumstances.
• Population: a collection of individuals of interest.
• Sample: a group of individuals taken from a larger population used to
find out about the population.
DESCRIPTIVE STATISTICS
INTRODUCTION TO DATA MANAGEMENT

The bigger picture Study Design


(observational and
experimental)
Data collection

B C Statistical
D Analysis
Data Management
Ensuring data quality
A Measures of [Location,
Dispersion plus Inferential
(data entering, editing stat (univariate, Bivariate &
and reconciliation) Multivariate; Correlation, Chi,
t‐test, ANOVA , MANCOVA)
DATA
 Data are numbers obtained by measuring or counting prope
rties of objects
 Data are obtained from
Analysis of existing records
Cross‐sectional Surveys‐Primary data (eg. UDHS, Student research)
Census (last Population and Housing census in Uganda: 2024)
Experiments (Clinical trials, field trials)
Reports
etc.
MEASUREMENT
Measurement is the assignment of numbers to objects or e
vents in a systematic fashion.
 A variable is any measured characteristic or attribute th
at varies from subject to subject.
Weight
Age
Height
 etc.

 A random variable is one that cannot be predicted in


advance because it arises by chance.
Qualities of Variables
• Exhaustive - Should include all possible answerable
responses.
• Mutually exclusive -No respondent should be able
to have two attributes simultaneously: For example,
– Male Vs Female,
– Employed vs. unemployed
– It is possible to be both if looking for a second job
while employed).
Definitions and examples
Definitions Example
Variable Gender

Attribute Attribute Female Male


VARIABLES

Observations or measurements are used to obtain the


value of a random variable.

There are two types of variable:


Quantitative (numerical) variables

Qualitative/Categorical (non‐numerical) variables, attribute


Levels of Measurement
VARIABLE & MEASUREMENT SCALE

Data

ATTRIBUTE OR NUMERIC OR
QUALITATIVE QUANTITATIVE

DISCRETE
NOMINAL ORDINAL CONTINUOUS
(COUNT)

RATIO INTERVAL
The Levels of Measurement

 Nominal

 Ordinal

 Interval

 Ratio
Why Is Level of Measurement Important?

Helps you decide what statistical analysis is appro


priate on the values that were assigned

Helps you decide how to interpret the data from th


at variable
NOMINAL SCALE (sounds like “names” or labels.

 Nominal variables allow for only classification or categorization based o


n some distinctively different characteristic, but we cannot rank order th
ose categories.
 A categorical variable, also called a nominal variable, is for mutual exclu
sive (no overlap) and none of them have any numerical significance……
…but not ordered, categories.
 “Nominal” scales could simply be called “labels.”
• They are mere codes assigned to objects as labels, they are not me
asurements.
• Not a measure of quantity.
Nominal Measurement

sex, religion
blood group,
symptoms of disease, cause of death
Measurement?
The relationship of the values that are assigned to the attributes for a variable

Variable Party Affiliation

Attributes Movement Independent FDC

Values 1 2 3

Relationship
ORDINAL SCALE
Ordinal Measurement
When attributes can be rank‐ordered…

With ordinal scales, the order of the values is what’s importa


nt and significant, but the differences between each one is
not really known.
Ordinal Measurement
o Attributes can be rank‐ordered…
o Distances between attributes do not have any meaning.
All we know: #4 > #3 or #2,
o But we don’t know–and cannot quantify–how much better it is.
o E.g. Is the difference between “OK” and “Unhappy” the same as th
e difference between “Very Happy” and “Happy?” We can’t say.
o Typically measures of non‐numeric concepts like satisfaction, happi
ness, discomfort, etc.
o “Ordinal” = sounds like “order” (order matters, but that’s all you
really get from these).
Ordinal Measurement
Educational Attainment
0= No education
O 1= Primary
R
2= Secondary
D
3= Tertiary
E
4= University (Graduate)
R
5= Post graduate eg. PG‐DJCM, DHSMA
Ordinal Measurement
Ordinal variables
• Also known as ordered categorical variables
• Consist of ordered categories, where the differences between
categories cannot be considered to be equal
• Example student evaluation rating made up of:
– Excellent,
– Satisfactory,
– Unsatisfactory
Continuous variable
• A continuous variable is a number on a continuous scale
and so it can take on an unlimited number of values. Eg:
weight, height, income
– 150.4, 150.8
Discrete variable
• A discrete variable is a numerical variable that can take on
only a limited number of values.
• These values are usually whole numbers

• An example of this is age in years at last birthday

• Another example is number of episodes of diarrhea


experienced by a child
Interval Measurement
Interval scale:
values have identity, magnitude, and equal intervals. Eg. Temperature
:Every degree Fehrenheit/Celsius is the same interval.
Hence distance between attributes has meaning, for example,
temperature (in Fahrenheit)
‐‐ distance from 30‐40 is same as distance from 70‐80
• Absolute Zero has no meaning (arbitrary) eg. 0 degrees does not
mean there is no temperature
• IQ scores or performance scores
Ratio Measurement
• Has an absolute zero that is meaningful
• Can construct a meaningful ratio (fraction), for
example, number of clients in past six months

• It is meaningful to say that “...we had twice as many


clients in this period as we did in the previous six
months. Examples are weight and height
Important
 Quantitative variables are often converted to categorical ones using
“Cut‐points”.
 Instead of presenting the mean fasting glucose level of male and
female subjects, one may prefer to present the proportion of
diabetics in male and female population using a fasting glucose level
of 110 mg/dL as the cut‐point to categorize the subjects as diabetic/
non‐diabetic.
 Glucose level:85 and 110 mg/dL‐non‐diabetic.
 Glucose level 111 and 150 mg/dL ‐ Diabetic.
 However, categorizing a continuous variable lead to loss of informati
on
The Hierarchy of Levels
The Hierarchy of Levels
Types of Data
Types of Data
DESCRIPTIVE STATISTICS

ORGANIZATIONOFTHEDATA
ORGANIZATION OF THE DATA
 Data is usually presented in a matrix form
 The column of the matrix represents variable
 The row of the variable represents individual or units.
The “Normal” distribution of biological continuous variables

• Most biological continuous variables (such as Blood


pressure, Height, Weight, etc) present different
values among individuals (some have higher values
and others lower values).

• The measurements of biological continuous variables


present a characteristic frequency distribution that is
called Normal Distribution (or Gaussian Distribution).
The Normal frequency distribution of
biological variables
Frequency distribution of the body height of a hypothetical population
80

70

60
Frequency (No. of observation)

50

40

30

20

10

0
0

0
15

15

15

15

15

16

16

16

16

16

17

17

17

17

17

18

18

18

18

18

19

19

19

19

19

20
Hight (cm)
Parameters of Frequency Distribution
• Frequency distribution of continuous data are defined by two
types of measures or parameters:
• Measures of Central Tendency
– They allow to summarise in a single value the whole set of
observations.
– We calculate a measure of central location when we need
a single value to summarize a set of epidemiological data.
• Measures of Dispersion
– They suggest how widely the observations are spread out.
Measures of Central Tendency
• There are three fundamental measures of central tendency:
• The Mode
• The Median
• The Mean (Arithmetic mean)
• Others: Midrange, geometric mean

• Which measure is best for our use in a particular instance depends


on the characteristics of the distribution, such as its shape, and on
how we intend to use the measure.
Measures of Central Tendency
DESCRIPTIVE METHODS FOR CONTINUOUS DATA

NUMERICALMETHODS

Measures of Central Tendency


Measures of Location
MEASURES OF LOCATION
• The most common measures of location are the:
– Mode
– Median
– Mean
• The most common measures of dispersion are the:
– Range
– Variance
– Standard Deviation
– Interquartile Range
MEASURES OF LOCATION - MODE

 The mode is the most frequently occurring value in a set of


measurements (set of data).
2, 3, 4, 4, 5, 5, 5, 6, 6, 7
 For grouped data modal interval is defined as the class
interval with the highest frequency.
The modal value is the midpoint of the modal interval

 The mode is not used much in statistical analysis because


of the ambiguity in its definition.
The Mode

• The following parity data has a mode of 1, because it


occurs 4 times, which is more than any other value:
– 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 4, 6
• We find the mode by creating a frequency distribution in
which we tally how often each value occurs. If every value
occurs only once, the distribution has no mode.
• If two or more values are tied as the most common, the
distribution has more than one mode. Eg. bimodal
The Mode
Value Tally No
Find the mode of the given data set:
0 2
0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 4, 6
1 4
2 3
What is the Mode? 3 1
ANS: Mode= 1 is the mode since it is ap 4 1
6 1
pearing more number of times (=4 time
12
s) in the set compared to other number:
The Mode
Value Tally No
Find the mode of the given data set: 3, 3
3 2
, 6, 9, 15, 15, 15, 27, 27, 37, 48.
6 1
9 1
What is the Mode? 15 3
ANS: Mode= 15 is the mode since it is a 27 2
ppearing more number of times in the s 37 1
48 1
et compared to other numbers. :
11
The Mode

The following set of numbers:


Set of numbers 1 2 3 4 6 9

What is the Mode?


Every value occurs only once, the distribution has no mode.
Note
If two or more values are tied as the most common, the distributi
on has more than one mode. Eg. bimodal
MEASURES OF LOCATION ‐ MODE
 Some distributions may be bimodal bimodal, trimodal, etc.
Use and limits of the Mode
• The mode may have some communication interest but
generally is rarely of statistical value.
• The mode is probably most useful when describing qualitative
data.
• It is not uncommon to have a frequency distribution with more
than a mode, simply as the consequence of chance.
• Sometimes, however, a bimodal frequency distribution is
extremely meaningful because the population contains two
sub-groups, each of which has a different distribution that
peaks at a different point.
MEASURES OF LOCATION - MEDIAN

 The median is the middle observation


 It divides the data set into equal halves.
 If n (# of obs.) is odd, the median will be unique, and
defined as: (n + 1)/2 or ½(n+1)th observation
MEASURES OF LOCATION - MEDIAN

 If n is even, the median is obtained by averaging the


two middle observations
Method I: Even numbers
Using formulae: (n + 1)/2th observation
then get a number with a decimal point denoting distance from the whole number eg 4.
5 shows the median is between position 4 and the next value
Method 2: Even numbers
The median can be obtained by the simple average of the
n/2 ‐th and (n/2 + 1) ‐th terms.
= ((n/2)th +(n+1)th)/2 observation
The Median: How do we get the Median

• Median: The median is the middle of a set of data that has


been arranged into rank order.
• Divides a set of data into two halves [One half of the
observations being larger than the median value], and [one half
smaller than the median].
• Example: suppose we had the following set of height measures
(in cm): 110, 120, 120, 130, 140, 165, 180
• 3 observations are larger than 130 and 3 observations are
smaller; thus the median is 130 cm.
Median 1 for Even Numbers
• Find the median of the following set of data with n = 10:
15, 7, 13, 9, 10, 11, 16, 12, 5, 11.

1. Arrange the observations in increasing or decreasing order.


5, 7, 9, 10, 11, 11, 12, 13, 15, 16

2. Find the middle rank.


– Middle rank = (n + 1)/2
– (10+1)/2 = 5.5
– Therefore, the median lies halfway between the values of the 5th and 6th observations.

3. Identify the value of the median. Since the median is equal to


the average of the values of the 5th and 6th observations, the
median is 11. Median = (11+11)/2 = 11
Median 2
• Find the median of the following set of data with n = 10:
15, 7, 13, 9, 10, 11, 16, 12, 5, 11.

1. Arrange the observations in increasing or decreasing order.


5, 7, 9, 10, 11, 11, 12, 13, 15, 16

2. Find the middle rank.


 term ]

  term ]   term ] term


– 𝑀𝑒𝑑𝑖𝑎𝑛 = =
– Therefore, the median lies halfway between the values of the 5th and 6th observations.

3. Median = (11+11)/2 = 11
Method 2 for simplifying Even number Median
• Find the median of the following set of data with n = 10:
15, 7, 13, 9, 10, 11, 16, 12, 5, 11.

1. Arrange the observations in increasing or decreasing order.


5, 7, 9, 10, 11, 11, 12, 13, 15, 16

2. Find the middle rank.


– Get the average observation = [(n/2)th + {(n/2) + 1}th] /2
– ((10/2)th observation + (10+1/2)th)/2
– [(5)th observation + {(5th) + 1}th] /2
– [(5)th observation + {(6th] /2
– [(5)th+ 6th] /2 = (11+11)/2 = 11
Since the median is equal to the average of the values of the 5th and 6th
observations, the median is 11. Median = (11+11)/2 = 11
Use and limitations of the Median

• The main advantage of using a median is that it is “robust” to outliers.


• Example:
– Data set: 24 + 25 + 29 + 29 + 30 + 31 Median = 29
– Data set: 24 + 25 + 29 + 29 + 30 + 131 Median = 29

• Therefore in a series of data that have some outliers that may shift the
mean too much, the use of the median may be more meaningful.
• The median is also used in defining the LD50 in experimental animals
(lethal dose that kills 50% of the animals).
• It does not allow complex inferences from medical data as it can not be
used for advanced statistics.
MEASURES OF LOCATION - MEDIAN
 The median is less affected by extreme values.
 However, it has some notable disadvantages compared to the mean:
It ignores the precise magnitude of most of the observations.
This makes it less efficient than the mean
In large data sets, the median requires more work to calculate than th
e mean
No easy way to combine the median of two groups of measurements.

It is not of much use in the elaborate statistical analysis.


The Median
• Is the point above which 50% of the
distribution lies and below which lies 50%
of the distribution

• The median should be used when the


distribution is skewed.
Egessa Simon
Determining the median
• The median should be used when the
distribution is skewed.
• This is the middle figure of the distribution.
Given 3,4,6,7,10 the median is 6

Egessa Simon
Determining the median of grouped data

Median = Lm + c(N/2 – CFb) /fm


Where:
Lm is the lower boundary of the median class
CFb = is the cumulative frequency before the median
class
fm = frequency of the median class
c is the width of the median class
N is the total number of observations
MEDIAN FOR GROUPED DATA – Using class limits
Class Tally (F) Cum Relative Frequency Relative Frequency
Frequency Frequency F (𝐅/𝐧) (𝐅/𝐧)
10‐14 IIII III 8 8 0.08 0.08
15‐19 IIII III 8 16 0.08 0.16
20‐24 IIII IIII IIII IIII IIII IIII 29 45 0.29 0.45
25‐29 IIII IIII II 12 57 0.12 0.57
30‐34 IIII IIII 10 67 0.1 0.67
35‐39 IIII IIII I 11 78 0.11 0.78
40‐44 IIII III 8 86 0.08 0.86
45‐49 IIII III 8 94 0.08 0.94
50‐54 IIII I 6 100 0.06 1
∑f=100 1.0
MCT - MEDIAN FOR GROUPED DATA

  To simplify, measurements are assumed to be sprea


d evenly over the interval.
 The first interval for which the cumulative relative frequen
cy exceeds 0.50 contains the median.
 The computation of the median value for grouped data is
carried out as follows:
MCT ‐ MEDIAN FOR GROUPED DATA

𝑁 𝑛
Cf Cf
2 Median 𝑳 2 𝑤
Median 𝐿 𝑐
𝐹 𝐹
MCT ‐ MEDIAN FOR GROUPED DATA
Class width: is the difference between the upper or lower limits of two consecutive classes:
•Formula: Class width = Upper class limit ‐ Lower class limit
•Example: For the class interval 163–175, the class width is 12 because 175 – 163 =

C = Class Interval width = Class interval width is the difference between the lower endpoint o
f an interval and the lower endpoint of the next interval. For 2 groups: 20‐24; 25‐29.
Class interval width is 25‐20 =5
Median = 24.5 +
𝑁
Cf
Median 𝐿 2 𝑐 = 24.5 + (
𝐹
= 24.5 + 0.42 x 5
24.5 + 2.1 = 26.6
MCT ‐ MEDIAN FOR GROUPED DATA
Median group = 25‐29
Lm= Lower class boundary of the Median class = 25‐0.5 = 24.5
Cfb = Cumulative frequency of the class which is before the Median class= 45
Fm = frequency of the median class = 12
C = Class Interval width = Class interval width is the difference between the lower endpoint of
an interval and the lower endpoint of the next interval. For 2 groups: 20‐24; 25‐29.
Class interval width is 25‐20 =5
Median = 24.5 +
𝑁
Cf = 24.5 + (
Median 𝐿 2 𝑐
𝐹 = 24.5 + 0.42 x 5
24.5 + 2.1
26.6
GROUPED MEAN FOR GROUPS CLASSIFIED BY CLASS
BOUNDARIES

Dr Philip Govule 65
What are class boundaries?
• A class boundary refers to the dividing line between two adjacent
classes or categories in a dataset.
• It helps in determining the range of values that fall within each
class and allows for better analysis and interpretation of data.
• Understanding class boundaries is crucial for effective data
segmentation and can provide valuable insights for business
planning and strategy development.
• Note: Some data are often classified by class boundaries hence
may still be useful for estimation of Mean, Median etc
Dr Philip Govule 66
MCT - MEDIAN FOR GROUPED DATA

 Cumbersome to compute because the actual measure


ment values are unknown
 To simplify, measurements are assumed to be spread
evenly over the interval.
 The first interval for which the cumulative relative frequen
cy exceeds .50 contains the median.
 The computation of the median value for grouped data is
carried out as follows:
The Mean
The arithmetic Mean

• The arithmetic mean is the most familiar


measure of central tendency.
• It is the arithmetic average and is commonly
called simply “mean” or “average”.
• In formulas, the arithmetic mean is usually
represented as x, read as “x-bar”.
Mean
• The population mean
µ = sum of all data value in the population
population size
• The sample mean
X = sum of all data values in the sample
sample size
Egessa Simon
• The formula for calculating
Mean
the mean from individual
data is:
Mean = x = ∑ x(i) / n
• This formula is read as “x-
bar equals the sum of the
x’s divided by n”.

• Where n is the number of


observations.
MEASURES OF LOCATION - MEAN

 Given a data set of n xi values, i=1, 2, 3,….,n,


the mean of the x’s (denoted by x̅ ) is given by:

 F o r example, if our data are:


Determining the mean of grouped data
• Mean = (∑fx)/∑f
• Where:
• ∑f is the cumulative frequency of the
distribution.
• ∑fx is the summation of the product of
frequency and class mark of each class
interval.
MEAN FOR GROUPED DATA
 The formula for the mean for grouped data is given by:
Mean: calculation example – grouped data
• The frequency of weight of patients to a health facility :
A B C D
Weight (Kg) Frequency (f) Midpoint (xm) f. xm
54-57 5 55.5 277.5
58-61 7 59.5 416.5
62-65 10 63.5 635
66-69 12 57.5 810
70-73 6 71.5 429
74-77 5 75.5 377.5
59-81 4 79.5 318
82-85 1 83.5 83.5
∑ f =n= 50 ∑ f x = 3347
Mean: calculation example – grouped data
• Solution:
 To calculate the mean:
Step1: Find the midpoints of each class and enter them in column C
Xm = 54+57/2 = 55.4; 58+61/2 =59.5 etc
Step 2: For each class, multiply the frequency by the midpoint, as shown below,
and enter the product in column D (f. xm )
5*55.5 = 277.5; etc.
Step 3: Find the sum of column D, as shown in the table= ∑ f. xm

Step 4: Divide the sum by n, to get the mean


x = f. xm / n = 3347/50= 66.9 kilograms
CALCULATE THE MEAN DEVIATION, VARIANCE AND
STANDARD DEVIATION
• Solution:
CALCULATE THE MEAN DEVIATION, VARIANCE AND
STANDARD DEVIATION
• Solution
• :
CALCULATE THE MEAN DEVIATION, VARIANCE AND
STANDARD DEVIATION
• Solution

• The variance of a population for grouped data


is: σ2 = ∑ f (x − x̅)2 / n.
• :
A
Weight (Kg)
Mean:
B calculation
Freq (f)
C
Midpoint (x )
example
D
f. x
– grouped data
Mean deviation
m m

• The frequency of weight of patients to a health facility


(|X-X| ): f.(|X-X| ) f.(|X-X| )2

54-57 5 55.5 277.5 11.44 57.20 654,368


58-61 7 59.5 416.5 7.44 52.08 387.4752
62-65 10 63.5 635 3.44 34.40 118.336
66-69 12 57.5 810 0.56 6.72 3.7632
70-73 6 71.5 429 4.56 27.36 124.7616
74-77 5 75.5 377.5 8.56 42.80 366.368
59-81 4 79.5 318 12.56 50.24 631.0144
82-85 1 83.5 83.5 16.56 16.56 274.2336
∑ f =n= 50 ∑ f. xm= ∑ f.(x- x)= 287.36 2560.32
3347 65.12
Mean: calculation example – grouped data

• Solution:

 To calculate the mean:

Step1: Find the midpoints of each class and enter them in column C

Step 2: For each class, multiply the frequency by the midpoint, as shown below,
and enter the product in column D (f. xm )

Step 3: Find the sum of column D, as shown in the table= ∑ f. xm

Step 4: Divide the sum by n, to get the mean


CALCULATE THE MEAN DEVIATION, VARIANCE AND
STANDARD DEVIATION
• Solution

• The variance of a population for grouped data


is: σ2 = ∑ f (x − x̅)2 / n.
• 2560.32/50
• Variance (S2)=51.2064
• Standard Deviation: √ Variance
• √:51.2064 = 7.559
Mean: calculation example – grouped data
• The frequency of distances in kilometres travelled by patients to a health facility :
A B C D

Class Frequency (f) Midpoint (xm) f. xm


5-9 1 7.0 7
10-14 2 12 24
15-19 3 17 51
20-24 5 22 110
25-29 4 27 108
30-34 3 32 96
35-39 2 37 74
∑ f =n= 20 ∑ f. xm= 470
MEASURES OF LOCATION - MEAN
A d v a n t a g e s of the group formula:
The process saves some computational labor
The difference between the x̅ ’s from the two approaches
is very small if:
the data set is large and
the interval width is small
Applications and characteristics
• The arithmetic mean is useful when performing analytic manipulation.
With the exception of a situation where extreme scores occur in the
distribution, the mean is generally the best measure of central tendency.
 The values of mean tend to fluctuate least from sample to sample.
 It is amenable to algebraic treatment and it possesses known
mathematical relationships with other statistics.
 Hence, it is used in further statistical calculations. Thus, in most
situations the mean is more likely to be used than either the mode or the
median.
Advantages of the Mean
Simple to calculate.
Has mathematical properties that enable the
development of advanced statistics standard
distribution
Most descriptive analyses of continuous variables and
advanced statistical analyses use the mean as the
measure of central tendency.
Mean advantages
• Advantages
– It summarizes the entire distribution
– It is unbiased/meaning it always gives us the
population mean μ

Egessa Simon
Determining the mean of grouped data
Mean = =(∑fx)/∑f
Where:
∑f is the cumulative frequency of the
distribution.
∑fx is the summation of the product of
frequency and class mark of each class
interval.
Mean Disadvantages
• It is affected by extreme values

• Sometimes the figure obtained is not


anywhere in the distribution;
Limitations of the Mean - 1
• The mean is quite sensitive
to extreme values that skew
a distribution.
• Because the mean is so
sensitive to extreme values
(OUTLIERS), it is a poor
summary measure for data
that are severely skewed in
either direction.
Limitations of the Mean - 2
• In summary mean is so sensitive to extreme values
.
• Example:
– Data set: 29 + 31 + 24 + 29 + 30 + 25. =[Mean is 28.0]

– Data set: 29 + 31 + 124 + 29 + 30 + 25. =[Mean is 44.7]


Jokes in statistics
• The Italian poet Trilussa said:
if you eat two chicken a day and I eat none, on the average
you and I eat a chicken a day.
• The statistician who was drowned in a lake that had an
average depth (the mean of the depth) of 40 cm
Conclusion
The mean may be the same but the observation may have a
very different dispersion
Limitations of the Mean - 3
• Moreover the same mean may be found in two series of
observations that are very different:
• For instance let us consider the average height of 2 groups of
people (in cm):
• Group 1: 170, 168, 167, 165, 165, 165, 164, 163, 163, 163,
162, 161, 160, 160.
Mean : 164 cm
• Group 2: 205, 195, 189, 186, 170, 173, 160, 157, 150, 145,
143, 142, 141, 140.
Mean : 164 cm
Mean is the centre of gravity of the distribution

Mean is the centre of gravity


of the distribution
Measures of central tendency & Normal Distribution
• In reality very often the distribution does not look perfectly
symmetrical but may be particularly extended in one
direction (Skewness).
• When the curve is stretched on the left side it is said to be
skewed on the left.
• When a distribution is skewed on one side the Mean will
move in the same direction. The median tends to move
also but in a lesser extent.
Measures of central tendency & Normal Distribution

Median
This frequency distribution is skew
ed to the right side (+ve skewness)
Mode Mean
MEASURES OF LOCATION - SKEWNESS

 Skewness is a measure of the lack of symmetry.


 The skewness value can be positive or negative, or even undefined.

 In a symmetrical distribution, the mode, median, and mean will all be


the same.
MEASURES OF LOCATION - SKEWNESS
 In a skewed distribution:
the mean is pulled in the direction of the tail
the median falls between the mode and the mean
MEASURES OF LOCATION - GEOMETRIC MEAN
MEASURES OF LOCATION - GEOMETRIC MEAN
 Geometric means offer useful summaries for highly skewed data.
Survival data
Income distribution
Ratios
 The effect of this process is to minimize the influences of extre
me observations (very large numbers in the data set).
 Don't use a geometric mean, though, if you have any negative o
r zero values in your data.
MEASURES OF LOCATION - GEOMETRIC MEAN
MEASURES OF LOCATION - GEOMETRIC MEAN

 For example, consider the data set


The observation 28 is an unusually large measurement, causing a right skewing a
nd influencing the mean

Compare the means for the ra


w and transformed data:
MEASURES OF LOCATION - GEOMETRIC MEAN

 A clinical trial assessed the effect of the drug 6‐mercaptopurine (Puri


nethol) (6‐MP) in maintaining remission.
 Patients were randomized to receive 6‐MP or placebo. Below are the
times to relapse (weeks) for the 21 placebo group were:
GROUPED MODE

Dr Philip Govule 104


Determining the mode of grouped data

Mode = Lm +(D1/D1+D2)c
Where:
◦ D1 is the frequency in the modal class minus the frequency
in class before it
◦ D2 is the frequency in the modal class minus the frequency
after it
◦ Lm is the lower class limit of the modal class
◦ C is the width of the modal class
Egessa Simon
Mode
Advantages:
• It is simple
• Unique
• Useful for qualitative data say the most
handsome man;

Egessa Simon
Mode
• Disadvantages:
– Cannot be called unbiased
– Cannot be used to reconstruct the distribution
– Can not be further processed
– Some distributions are bimodal

Egessa Simon
APPLICATION OF MEASURES
Measure Formula/Example Used for
Arithmetic Mean [Average] 𝑆𝑢𝑚 𝑎 𝑏 𝑐 Most situations
𝑆𝑖𝑧𝑒 3 (“Average Item”)
Median Middle of sorted list Widely varying samples
[Middle value] (2 middles? (houses, incomes)
Mode Most popular value No compromises (Winner
[Most popular] takes all)
Geometric Mean [average 𝑎𝑏𝑐 Investments, growth, area,
factor] volume
Harmonic Mean [Average rate] 3 Speed, production, cost
1 1 1
𝑎 𝑏 𝑐
What is the best measure of central tendency?

 The "best" measure of central tendency depen


ds on the data you are analyzing,
Whether to use the median, mean or mode will
depend on the type of data you have (categorical
or continuous data);
Whether your data has outliers and/or is skewed
;
What you are trying to show from your data.
In a strongly skewed distribution, what is the best
indicator of central tendency?

It is usually inappropriate to use the mean in such situ


ations where your data is skewed.

You would normally choose the median or mode, with


the median usually preferred.
When is the mean the best measure of central
tendency?

The mean is usually the best measure of central tendency


to use when:‐

Data distribution is continuous and symmetrical (data is n


ormally distributed).

Depends on what you are trying to show from your data.


When is the mode the best measure of central
tendency?

 The mode is the least used of the measures of central tendency


 Used when dealing with nominal data.
 For this reason, the mode will be the best measure of central
tendency (as it is the only one appropriate to use) when deal
ing with nominal data.
 The mean and/or median are usually preferred when dealing wi
th all other types of data, but this does not mean it is never use
d with these data types.
When is the median the best measure of
central tendency?
 The median is usually preferred to other measures
of central tendency when your data set is skewed
(i.e., forms a skewed distribution)
 When dealing with ordinal data.
 However, the mode can also be appropriate in the
se situations, but is not as commonly used as the
median.
What is the most appropriate measure of
central tendency when the data has outliers?
 The median is usually preferred in these situations
because the value of the mean can be distorted by
the outliers.
 However, it will depend on how influential the out
liers are.
 If the outliers do not significantly distort the mean
, using the mean as the measure of central tenden
cy will usually be preferred.
In a normally distributed data set, which is
greatest: mode, median or mean?
 If the data set is perfectly normal, the mean, medi
an and mode are equal to each other (i.e., the sa
me value).
In a normally distributed data set, which is
greatest: mode, median or mean?
 For any data set, which measures of central tende
ncy have only one value?

 The median and mean can only have one value for
a given data set.

 The mode can have more than one value


Class Exercise 1
• The performance of UMI Participants in Quantitative Methods for
the last five years has not been good despite the fact that it is a
compulsory module.

• In 2019/2020 UMI rolled out a QM Performance improvement


program that included a new QM Curriculum, change in teaching
methods and availing lasts reading materials on the subject.

• The data below shows the scores out of 50 in Quantitative


Methods by 50 UMI DHSMA Participants selected at random in
2022
Class Exercise cont.
29 22 8 34 24 33 28 37 30 35
17 30 11 23 34 15 18 30 38 38
33 26 31 35 46 21 27 19 29 36
43 36 13 25 32 26 33 32 45 40
44 42 34 38 25 13 20 8 39 31
Egessa Simon
Class exercise. Cont,d
Required
a) Starting with the class 5–10, 11-16 etc. and maintain equal
class width, formulate a frequency distribution table to show the
above data.
b) Using the information in the frequency table above, calculate
the mean, median, mode and standard deviation of the
Participants income.
c) Assuming that the mean performance/scores In 2020 of the of
participants in QM was 27 out of 50 is normally distributed, test
the hypothesis at 5% level of significance that the QM
performance improvement program has worked and scores in
QM has increased significantly
Class Exercise 2
• The following data in the table below presents the
monthly income (Shs‘000) of 50 randomly selected
villagers after a poverty reduction exercise had been
carried out in Apac District. Before the exercise those
in charge of the poverty reduction program had
carried out a baseline survey and established that, on
average their monthly income was Shs 49,000 per
month.
Class Exercise contd
70 41 34 55 45 66 73 77 80 30
50 45 72 50 27 70 55 70 85 70
30 50 60 53 40 45 35 55 20 81
25 51 35 62 60 30 45 35 50 89
53 23 28 65 68 50 65 34 35 76
Class exercise. Cont…
Required
a) Starting with the class 20–29 and using class intervals of
equal width, construct a frequency distribution table for the
data above.
b) Using the information in the frequency table above,
calculate the mean, median, mode and standard deviation
of the villagers income.
c) Assuming that the monthly income of the villagers is
normally distributed, test the hypothesis at 5% level of
significance that the poverty reduction initiative has worked
and income has increased significantly

You might also like