0% found this document useful (0 votes)
17 views

Lecture-3&4- Measure of Centeral

Probability

Uploaded by

Abebe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Lecture-3&4- Measure of Centeral

Probability

Uploaded by

Abebe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Chapter-Three

Measure of Central Tendency


Definition
Measures of central tendency
o The tendency of statistical data to get the actual value at which
the data tends to concentrate.
o They are a single numbers which quantify the characteristics of a
distribution of data set. This measure used to provide insight into
typical or representative score of the data.
o The methods of determining the actual value at which the data
tend to concentrate.
Why measure of central tendency:
To describe (locate) the center of the distribution
To facilitate comparison
To make further statistical analysis

2
Cont’d …
o When two or more groups are measured, the central tendency
provides the basis of comparison between them.
A typical average is better if;
 It should be based on all observations

 It should not be affected by the extreme values


 It should be as close to the maximum number of values as
possible
 It should be defined rigidly(have definite value)

3
Cont’d …
The most common measures of central tendency include:
1. Mean (Arithmetic, Weighted, Geometric, and Harmonic)

2. Median

3. Mode

4. Quantiles (Quartiles, deciles and percentiles)

4

X 1  X 2  ...  X n
x  x2  ...  xn 
X  1 N
n
N

x
n

x i i 1
i

 i 1
N
n
5

 fi X i
X  i 1
k

 i 1
fi

f i Xi
X  i 1
k
,
f i 1
i

6
Example

7
Grouped Data

8
Special properties of A.M


Cont’d

X 1 n1  X 2 n 2 X n i i
40(350)  60(380)
Xc   i 1
  368Birr
n1  n 2 2
40  60
n i 1
i

10
Merits and demerits of arithmetic mean

o The mean can be used as a summary measure for quantitative


data, but it is not appropriate for either nominal or ordinal data.
o For a given set of data, there is only one arithmetic mean
(uniqueness).
o Easy to calculate and understand (simple).
o Greatly affected by the extreme values.

o In the case of grouped data, if any class interval is open-ended,


the arithmetic mean can not be calculated.
2. Weighted mean
o While calculating simple arithmetic mean, all items were
assumed to be of equal importance (each value in the data set
has equal weight).
o When the observations have different weight, we use weighted
average. Weights are assigned to each item in proportion to its
relative importance.
Con'd….

Con'd….
 Solution: We use a weighted mean, the weight associated with
each course being taken as the number of credits received for
the corresponding course.

4 3 1 2 Total
4 4 3 2 13
16 12 3 4 35
Cont’d

15
Con'd….
Merits and Demerits of Arithmetic Mean
 Merits:
 It is based on all observations
 It is suitable for further statistical analysis
 It is easy to calculate and simple to understand
 Demerits:
 It is affected by extreme observations
 It cannot be used in the case of open-ended classes
 It cannot be used when dealing with qualitative
characteristics, such as intelligence, honesty, and beauty

By habtamu.A. 11/26/2024
3. Geometric mean


Con'd….

By habtamu.A. 11/26/2024
Con'd….

Values 3 4 5 6
Freq. 2 3 1 2
Con'd….

4. Harmonic Mean

Con'd….

Activity; discuses the advantage and disadvantage of A.M, G.M
and H.M
Exercise: The number of diarrhea episodes for 25child are
summarized in the following table.

diarrhea No child
episodes
1 3
2 3
3 f3
5 2
6 10
8 f6

If the arithmetic mean is 4.8,then what are the values of f3 and f6?
23
 Median is as its name indicates the middle most value in the
arrangement which divides the data in to two equal parts

If “n” is odd If “n” is Even

~
X  X1 ~ 1 
( n 1)
X   X n  X n 
2 2 1
2 2 

i.e
When n = 11, then the median is the 6th observation.
When n = 12, then the median is the 6.5th observation, which is an
observation halfway between the 6th and 7th ordered observation.

24
Example: For the same random sample, the ordered observations will be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th observation, i.e. = (32+34)/2 = 33.
Median of Group Data

~ w n 
X  Lme    lcfbm 
fm  2 

Lme = Lower class boundary of the median class


w = Width of the median class fm = Frequency of the median class
n = total observation lcfbm = less than cumulative frequency of
the class before the median class
To determine the median class, we have to take the class that contains or

th
n n
  or  lcf
2 2

25
Example: Find Median

Age in years Number of people Cumulative number of


people
14.5-19.5 677 677
19.5-24.5 1908 2585
24.5-29.5 1737 4332
29.5-34.5 1040 5362
34.5-39.5 294 5656
39.5-44.5 91 5747
44.5-49.5 16 5763
Total 5763 -

26
Solution: To determine the median class, we have to take the class
that contains
th th
 n    5763 
   
th
 2881.5 item
2  2 

The first Lcf in which 2881.5 is less than or equal to is 4332.


Hence, the median class is 24.5-29.5 Then,
Lme=15 ~ w n 
X  L me    lcf bm 
w =5 f m 2 
n =5763
 5763 5 
fm = 1737  24.5    2585 
1737  2 
Fpm = 2585
 24.5  0.85  24.9

27
THE MODE ( X̂ )
The mode or modal value is the value with the highest frequency in
the data set. The mode of a set of data or distribution can be:
No mode: In this case all values appear equal number of times
Unimodal: If the distribution has only one mode
Bimodal: If the distribution has two modes
Multi-modal: If the distribution has more than two modes

Example: The age distribution of male at the time of marriage:


23, 28, 28, 31, 32, 33, 34, 37, 41, 43, and 45 is 28,
since it occurred twice while the other values occurred only once.

28
Mode of Group Data
 1
x  Lmo  w
1   2
Lmo = Lower boundary of modal class
Δ1 = difference of frequency between modal class and class before it
Δ2 = difference of frequency between modal class and class after it
w = class width

 1
 f mo
 f 1

 2
 f mo
 f 2

f1 = frequency of the class preceding the modal class


f2= frequency of the class succeeding the modal class
Modal Class: class which has highest frequency
29
Example: The following are the sizes (in millimeters) incidental
intracranial aneurysms (IIAs) of 30 patients.

Since, the maximum frequency is 12, the modal


IIAs Frequency (f) class is 5-9.Then,
Lmo=4.5 w =5 fmo=12 f1= 6 f2 = 7
0-4 6
5-9 12  1
 f mo
 f 1
 12  6  6

10-14 7  2
 f mo
 f 2
 12  7  5

15-19 5
  1 
20-24 0 x  L mo  W  
 
Total n = 30  1   2 
   6
x  4.5  5 
65

x  4.5  0.55  5.05

30
Measure of position(quantiles)
 Quantiles are measures of position that divide a dataset into
equal intervals, each containing a specific proportion of the data.
 They help to describe the distribution of a dataset by identifying
values at specific points that divide the data into portions.
 The most commonly used quantiles are: quartile, decile and
percentile

31
Con'd….

Con'd….

Example: The following data shows the age of 30 sampled
patients in JUSH 6, 9, 11, 14, 16, 17, 18, 21, 22, 22, 22, 22, 23,
25, 25, 26, 27, 28, 28, 32, 33, 34, 34, 36, 39, 39, 41, 45, 46, 49

Find the lower, middle and upper quartiles for the above data.
Solution: n = 30 Q1  1 (n  1)th
4 = 1
4 (30  1) th

= 7.75th value =7th value +0.75(8th value -7th value)


=18+0.75(21-18) = 18+2.25 =20.25
This implies one fourth of the patients(25%)age are below 20.5
years.

34
Quartile for grouped data


Deciles

Con'd….

Percentile

Con'd….

Con'd….

Income No. of person


100 - 200 15
100 - 300 33
100 - 400 63
100 - 500 83
100 - 600 100
Chapter Three

Measures of Variation
Introduction
o Measures of central tendency locate the center of the distribution.
However, they do not tell how individual observations are
scattered on either side of the center. The spread of observations
around the center is known as dispersion or variability.
o In other words, the degree to which numerical data tends to
spread about an average value is called dispersion or variation of
the data.

o Measures of dispersions are statistical measures that provide ways


of measuring the extent to which data are dispersed or spread out.
Significance of measure of dispersion

o To determine the reliability of an average: If the variation is


small, the average will closely represent the individual values and
is highly representative on the other hand, if the dispersion or
variation is large, the average will be quite unreliable.
o To compare the variability of two or more groups: It is also
useful to determine the uniformity or consistency of two or more
groups. A high degree of variation would mean less consistency or
less uniformity as compared to the data having less variation.
o For facilitating the use of other statistical measures: Measures
of dispersion serve as the basis of many other statistical measures
such as correlation, regression, and testing of hypothesis.
Type of measure of dispersion
1. Absolute measures of variation. The absolute measure is
expressed in the same statistical unit in which the original data
are given such as kilograms, tones, etc. These measures are
suitable for comparing the variability in two distributions having
variables expressed in the same units and the same averaging
size.
2. Relative measure of variation: In case the two sets of data are
expressed in different units of measurement, then the absolute
measures of variation are not comparable. In such cases,
measures of relative variation should be used
Absolut Relative
o Range Relative range
o Inter-quartile range Coefficient of quartile deviation
o Variance Coefficient of variation
o Standard deviation Standard score
Range
o It is the difference between the largest and smallest
observations from the data.
o Example: Consider the data on the weight (in Kg) of 10
newborn children at Jimma Hospital within a month: 2.51,
3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43
Solution: The range for the dataset can be computed by first
arranging all observations into ascending order as: 1.98, 2.02,
2.33, 2.33, 2.43, 2.51, 2.88, 2.98, 3.01, 3.25

Range = Maximum – Minimum = 3.25-1.98 = 1.27


Quartile deviation and Coefficient of quartile deviation


Con'd….

The variance and standard deviation

Con'd….

Con'd….

Con'd….

By habtamu.A. 11/26/2024
Con'd….
Some important properties of variance and standard
deviation

11/26/2024
Coefficient of variation (CV)
o When two data sets have different units of measurement, or
their means differ sufficiently in size, the CV should be used as
a measure of dispersion. It is used to assess the relative
variability of data.
o The coefficient of variation is defined as the ratio of standard
deviation to the mean, usually expressed as a percent.
o Data with lower CV indicates less variability or consistency,
meaning the data is more tightly clustered around the mean.
o Data with higher CV indicates more variability relative to the
mean, meaning the data is more spread out.
o measure variation relative to the mean and present in percentage
(%)
Con'd….
Example: Last semester, the students of the nursing and anesthesia
departments took Stat273 course. At the end of the semester, the
following information was recorded.

Department Nursing Anesthesia


Mean score 79 64
Standard deviation 23 11

Compare the relative dispersion of the two departments' scores?

Solution: The means of the two sets of data are very different, we
use coefficient of variation to compare variability
Con'd….
 The coefficients of variation are calculated as

Nursing Anesthesia

 Interpretation: It can be seen that CV for Nursing students is


greater than that of Anesthesia students, we can say that there is
more variation relative to the mean in the distribution of Nursing
students' scores compared with that of Anesthesia students.
Standard score (Z-scores)
o Statistical measure that quantifies the number of standard
deviations a data point is away from the mean of a dataset.
o It tells us how many standard deviations a specific value is
above or below the mean value of the data set.
o It is obtained by subtracting the mean of the data set from the
value and dividing the result by the standard deviation of the
data set.
 A Z-score of 0 means the data point is exactly at the mean.
 A positive Z-score indicates the data point is above the mean
(greater than average).
 A negative Z-score indicates the data point is below the mean
(less than average).
Con'd….

Course Average Std.dev


score
Int. to 51 12
Statistics
Int.to 72 16
Economics
Solution:

Even though Student scored 66 in Int. to Statistics and scored 80 ,in


Int. Economics the Z-scores tell us that student has higher
performance in Int. to Statistics than in Int.to Economics relative to
the class.
Measure of shapes
Statistical tools used to describe the shape or distribution of data,
particularly the symmetry, peakedness, and the presence of skewness
or kurtosis. These measures help provide insights into how the data
is spread or clustered around its central values.
There are different type of measure of shape;
I. Skewness
II. Kurtosis
o Skewness: skewness of distribution defined as lack of symmetry.
It tells us the direction (shape) of variability from the center, not
the size
o In symmetry distribution, the three measures of central tendency
are approximately equal and the shape of frequency distribution
divided into two equal parts at the mean.
o If extremely low or extremely high observations are present in a
distribution, then the mean tends to shift towards those scores.
Con'd….
Based on the type of skewness, distributions can be:
a)Negatively skewed distribution: occurs when the majority of
scores are at the right end of the curve and a few small scores are
scattered at the left end.
b)Positively skewed distribution: Occurs when the majority of
scores are at the left end of the curve and a few extremely large
scores are scattered at the right end.
c)Symmetrical distribution: It is neither positively nor negatively
skewed.

By habtamu.A. 11/26/2024
Graphical representation

By habtamu.A. 11/26/2024
Generally,
o If the distribution of data is skewed to the left, the
mean is less
than the median, which is often less than the mode.
 Median closer to the third quartile

o If the distribution of data is symmetrically skewed the mode,


the median, and the mean are equal.
 The median is at the center

o If the distribution of data is skewed to the right, the mode is


often less than the median, which is less than the mean
 Median closer to the first quartile

11/26/2024
Measure of skewness


kurtosis
o Kurtosis is the degree of peakedness or flatness of a distribution
(tells us how tall and sharp the central peak is, relative to a
standard bell curve). It tells us the degree of data concentration
around the mean.
o When the curve of a distribution is relatively flatter than normal it
is known as platykurtic. The distribution is more peaked than
normal, it is called leptokuric. The normal distribution which is
not very high peaked or flat topped is called mesokurtic.
Measure of kurtosis
 Interpretation of the value of 𝛼4
 If 𝛼4 > 3 then the curve is
leptokurtic
 If 𝛼4 = 3 then the curve is
mesokurtic
 If 𝛼4 < 3 then the curve is
platykurtic

By habtamu.A.

You might also like