2_Final_Introduction to Data_Measure_Central_Tendency_DPPM_II_PG
2_Final_Introduction to Data_Measure_Central_Tendency_DPPM_II_PG
IN DECISION MAKING
B C Statistical
D Analysis
Data Management
Ensuring data quality
A Measures of [Location,
Dispersion plus Inferential
(data entering, editing stat (univariate, Bivariate &
and reconciliation) Multivariate; Correlation, Chi,
t‐test, ANOVA , MANCOVA)
DATA
Data are numbers obtained by measuring or counting prope
rties of objects
Data are obtained from
Analysis of existing records
Cross‐sectional Surveys‐Primary data (eg. UDHS, Student research)
Census (last Population and Housing census in Uganda: 2024)
Experiments (Clinical trials, field trials)
Reports
etc.
MEASUREMENT
Measurement is the assignment of numbers to objects or e
vents in a systematic fashion.
A variable is any measured characteristic or attribute th
at varies from subject to subject.
Weight
Age
Height
etc.
Data
ATTRIBUTE OR NUMERIC OR
QUALITATIVE QUANTITATIVE
DISCRETE
NOMINAL ORDINAL CONTINUOUS
(COUNT)
RATIO INTERVAL
The Levels of Measurement
Nominal
Ordinal
Interval
Ratio
Why Is Level of Measurement Important?
sex, religion
blood group,
symptoms of disease, cause of death
Measurement?
The relationship of the values that are assigned to the attributes for a variable
Values 1 2 3
Relationship
ORDINAL SCALE
Ordinal Measurement
When attributes can be rank‐ordered…
ORGANIZATIONOFTHEDATA
ORGANIZATION OF THE DATA
Data is usually presented in a matrix form
The column of the matrix represents variable
The row of the variable represents individual or units.
The “Normal” distribution of biological continuous variables
70
60
Frequency (No. of observation)
50
40
30
20
10
0
0
0
15
15
15
15
15
16
16
16
16
16
17
17
17
17
17
18
18
18
18
18
19
19
19
19
19
20
Hight (cm)
Parameters of Frequency Distribution
• Frequency distribution of continuous data are defined by two
types of measures or parameters:
• Measures of Central Tendency
– They allow to summarise in a single value the whole set of
observations.
– We calculate a measure of central location when we need
a single value to summarize a set of epidemiological data.
• Measures of Dispersion
– They suggest how widely the observations are spread out.
Measures of Central Tendency
• There are three fundamental measures of central tendency:
• The Mode
• The Median
• The Mean (Arithmetic mean)
• Others: Midrange, geometric mean
NUMERICALMETHODS
3. Median = (11+11)/2 = 11
Method 2 for simplifying Even number Median
• Find the median of the following set of data with n = 10:
15, 7, 13, 9, 10, 11, 16, 12, 5, 11.
• Therefore in a series of data that have some outliers that may shift the
mean too much, the use of the median may be more meaningful.
• The median is also used in defining the LD50 in experimental animals
(lethal dose that kills 50% of the animals).
• It does not allow complex inferences from medical data as it can not be
used for advanced statistics.
MEASURES OF LOCATION - MEDIAN
The median is less affected by extreme values.
However, it has some notable disadvantages compared to the mean:
It ignores the precise magnitude of most of the observations.
This makes it less efficient than the mean
In large data sets, the median requires more work to calculate than th
e mean
No easy way to combine the median of two groups of measurements.
Egessa Simon
Determining the median of grouped data
𝑁 𝑛
Cf Cf
2 Median 𝑳 2 𝑤
Median 𝐿 𝑐
𝐹 𝐹
MCT ‐ MEDIAN FOR GROUPED DATA
Class width: is the difference between the upper or lower limits of two consecutive classes:
•Formula: Class width = Upper class limit ‐ Lower class limit
•Example: For the class interval 163–175, the class width is 12 because 175 – 163 =
C = Class Interval width = Class interval width is the difference between the lower endpoint o
f an interval and the lower endpoint of the next interval. For 2 groups: 20‐24; 25‐29.
Class interval width is 25‐20 =5
Median = 24.5 +
𝑁
Cf
Median 𝐿 2 𝑐 = 24.5 + (
𝐹
= 24.5 + 0.42 x 5
24.5 + 2.1 = 26.6
MCT ‐ MEDIAN FOR GROUPED DATA
Median group = 25‐29
Lm= Lower class boundary of the Median class = 25‐0.5 = 24.5
Cfb = Cumulative frequency of the class which is before the Median class= 45
Fm = frequency of the median class = 12
C = Class Interval width = Class interval width is the difference between the lower endpoint of
an interval and the lower endpoint of the next interval. For 2 groups: 20‐24; 25‐29.
Class interval width is 25‐20 =5
Median = 24.5 +
𝑁
Cf = 24.5 + (
Median 𝐿 2 𝑐
𝐹 = 24.5 + 0.42 x 5
24.5 + 2.1
26.6
GROUPED MEAN FOR GROUPS CLASSIFIED BY CLASS
BOUNDARIES
Dr Philip Govule 65
What are class boundaries?
• A class boundary refers to the dividing line between two adjacent
classes or categories in a dataset.
• It helps in determining the range of values that fall within each
class and allows for better analysis and interpretation of data.
• Understanding class boundaries is crucial for effective data
segmentation and can provide valuable insights for business
planning and strategy development.
• Note: Some data are often classified by class boundaries hence
may still be useful for estimation of Mean, Median etc
Dr Philip Govule 66
MCT - MEDIAN FOR GROUPED DATA
• Solution:
Step1: Find the midpoints of each class and enter them in column C
Step 2: For each class, multiply the frequency by the midpoint, as shown below,
and enter the product in column D (f. xm )
Egessa Simon
Determining the mean of grouped data
Mean = =(∑fx)/∑f
Where:
∑f is the cumulative frequency of the
distribution.
∑fx is the summation of the product of
frequency and class mark of each class
interval.
Mean Disadvantages
• It is affected by extreme values
Median
This frequency distribution is skew
ed to the right side (+ve skewness)
Mode Mean
MEASURES OF LOCATION - SKEWNESS
Mode = Lm +(D1/D1+D2)c
Where:
◦ D1 is the frequency in the modal class minus the frequency
in class before it
◦ D2 is the frequency in the modal class minus the frequency
after it
◦ Lm is the lower class limit of the modal class
◦ C is the width of the modal class
Egessa Simon
Mode
Advantages:
• It is simple
• Unique
• Useful for qualitative data say the most
handsome man;
Egessa Simon
Mode
• Disadvantages:
– Cannot be called unbiased
– Cannot be used to reconstruct the distribution
– Can not be further processed
– Some distributions are bimodal
Egessa Simon
APPLICATION OF MEASURES
Measure Formula/Example Used for
Arithmetic Mean [Average] 𝑆𝑢𝑚 𝑎 𝑏 𝑐 Most situations
𝑆𝑖𝑧𝑒 3 (“Average Item”)
Median Middle of sorted list Widely varying samples
[Middle value] (2 middles? (houses, incomes)
Mode Most popular value No compromises (Winner
[Most popular] takes all)
Geometric Mean [average 𝑎𝑏𝑐 Investments, growth, area,
factor] volume
Harmonic Mean [Average rate] 3 Speed, production, cost
1 1 1
𝑎 𝑏 𝑐
What is the best measure of central tendency?
The median and mean can only have one value for
a given data set.