MSS 112 Probability & Statistics
MSS 112 Probability & Statistics
10/28/2024 1
Topic-1 & 2
10/28/2024 2
1.0 INTRODUCTION
Numbers play an essential role in statistics because they provide
raw material of statistics.
Numbers must be processed so as to be useful. It is like crude
oil which must be refined into petrol before being consumed by
an automobile engine.
Numbers can represent the following;
i. Qualities and values of commodities produced and sold
ii. Prices of products
iii. Income and expenses
iv. Records of birth and death rates
10/28/2024 3
1.0 INTRODUCTION
Numbers can represent the following;
10/28/2024 4
1.0 INTRODUCTION
10/28/2024 5
1.0 INTRODUCTION
Statistics is usually not studied for its own sake; rather it is
widely employed as a tool and highly valuable one in the
analysis of problem in many disciplines.
1.1 Examples
10/28/2024
Agriculture Census 2021 7
1.0 INTRODUCTION
10/28/2024 8
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
1.2.1 Statistics;
Refers to the discipline that is responsible for collection,
organization, analysis, interpretation and presentation of data
so as to draw some useful conclusions
Refers to the collection, presentation, analysis and utilization
of numerical data to make inferences and reach decisions in
the face of uncertainty in economics, business and other social
and physical sciences.
10/28/2024 9
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
1.2.2 Descriptive statistics
Descriptive statistics aim at providing picture or outlining
properties of data collected and summarize them into
manageable forms. The data collected can be summarized into
Tables, Graphs, Measures of central tendency, Dispersion and
Shape.
Descriptive statistics can summarize a body of data with one or
two pieces of information that characterize the whole data.
10/28/2024 10
1.0 INTRODUCTION
10/28/2024 11
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
1.2.4 Population
Refers to the set of existing or hypothetical objects or items of
the same nature from which data is gathered. Example
population of farmers in country, bacteria in a colony, firms in a
given sector etc.
1.2.5 Sample
Refers to the part of the population drawn with aim of studying
the characteristics of the entire population.
10/28/2024 12
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
10/28/2024 13
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
1.2.6 Parameter
Is the measure of the population example population mean,
population variance
Is any value describing characteristic of a population
1.2.7 Statistic
Is the measure of the sample example sample mean, sample
variance.
Is any value describing characteristic of a sample.
10/28/2024 14
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
1.2.8 Variable
10/28/2024 15
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
1.2.8 Types of Variables
Depending on a value(s) of a variable, it may be either
quantitative or qualitative.
Quantitative variables are the one whose values have countable
or measurable characteristics example number of eggs, length
of a road, number of goats in a district etc.
Qualitative variables are the ones whose value have non
measurable characteristic example blood types, education
levels, names of regions in Tanzania etc.
10/28/2024 16
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
1.2.8.1 Quantitative variables
Quantitative variables are divided into Discrete and
Continuous quantitative variables.
Discrete quantitative variables are the ones whose values are
expressed in a limited integer number (countable form)
example number of children, number of classrooms etc
Continuous quantitative variables are the ones which take all
possible values in a given interval example price of forest
products, height of a person etc.
10/28/2024 17
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
1.2.8.2 Qualitative variables
Qualitative variables are divided into Ordinal and Nominal
qualitative variables.
Ordinal qualitative variables are the ones whose values can be
classified in a specific order example reactions to vaccine (Nil,
slight, causing ulceration, causing death), levels of satisfaction
etc
Nominal qualitative variables are the ones whose values can
not be classified in a predetermined order example blood type,
sex, profession categories etc
10/28/2024 18
1.0 INTRODUCTION
10/28/2024 19
1.0 INTRODUCTION
Exercise-2
Classify each of the variables in an image of the excel
worksheet as quantitative or qualitative categories.
In their categories classify them further as discrete,
continuous, ordinal or nominal
10/28/2024 20
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
Exercise-2
10/28/2024 21
1.0 INTRODUCTION
1.2 Basic Statistics Terminologies
Exercise-2
Description of the variables
10/28/2024 23
1.3 Data
Data (plural) and datum (singular) is a collection of facts, such as
values or measurements.
It can be numbers, words, measurements, observations or even
just descriptions of things.
10/28/2024 24
1.3 Data
Technically, Data are observations of variables
10/28/2024 25
1.3 Data
Classification of Data
All the data collected in a particular study are referred to
as the data set for the study.
On the basis of the value that the data can carry; data can be
classified as either qualitative or quantitative.
10/28/2024 26
1.3 Data
Data classification: Quantitative Data
10/28/2024 27
1.3 Data
Data classification: Quantitative Data
10/28/2024 28
1.3 Data
Data classification: Quantitative Data
10/28/2024 29
1.3 Data
Data classification: Qualitative Data
Data are nonnumeric.
Qualitative data are measurements that cannot be measured on
a natural numerical scale, rather they can only be classified into
one of a group of categories.
10/28/2024 30
1.3 Data
Data classification: Qualitative Data
10/28/2024 31
1.3 Data
Data classification: Qualitative Data
10/28/2024 32
1.3 Data
Data classification: Qualitative Data
10/28/2024 33
1.3 Data
Data classification: Primary Data
Source:
www.mwananchi.co.tz
10/28/2024 34
1.3 Data
Data classification: Secondary Data
10/28/2024 35
1.3 Data
Data classification: Secondary Data
10/28/2024 36
1.4 Scales of Measurement
Scales of measurement are rules that describe the properties
of numbers
The rules imply that a number is not just a number. It can carry
different properties depending on how it was used or
measured
10/28/2024 37
Types of measurement scales
In the early 1940s, Harvard psychologist S.S Stevens divided
scale of measurements into four types namely;
i. Nominal
ii. Ordinal
iii. Interval
iv. Ratio
10/28/2024 38
Nominal scale
When the variable data consist of labels or names to identify an
attribute of the variable, nominal scale is considered as a scale of
measurement
Nominal scale can bear numeric code as well as nonnumeric
labels.
To simplify data collection and entry in the analysis software we
might use numeric code by assigning labels of the numbers
10/28/2024 39
Examples of nominal scale
Political party affiliation
Ownership of house
Sex
10/28/2024 40
Ordinal Scale
The scale of measurement for a variable is called an ordinal
scale if the data exhibit the properties of nominal data and
the order or rank of the data is meaningful.
10/28/2024 41
Examples of ordinal scale
Rank of academic members of staff;
Tutorial Assistant, Assistant Lecturer, Lecturer, Senior Lecturer,
Associate Professor, Professor.
10/28/2024 42
Interval Scale
Refers to the scale which represents quantity such that each
point (unit of a quantity) is placed at equal distance (interval)
from one another.
Moreover zero (0) does not represent the absolute lowest
value. Rather it is a point on the scale with numbers above
and below it.
10/28/2024 43
Examples of Interval scale
Temperature scales such as Celsius and Fahrenheit scales.
The unit markings on the thermometer are equidistant and
they can be below and above zero i.e -20°C, -10°C, 0°C, 10°C,
20°C.
Measurements of altitude referring the sea level is another
example of interval scale.
10/28/2024 44
Ratio scale
The ratio scale is similar to interval scale in the sense that
scores(quantities) are distributed with equal distance from
one another.
Yet unlike interval scale, a distribution scores under ratio
scale has a true/absolute zero. That is there is no numbers
below zero.
Common examples of ratio scales include measures of
length, height, time and weight.
10/28/2024 45
Fundamental difference of the scales
INDICATION OF DIRECTION OF AMOUNT OF ABSOLUTE
DIFFERENCE DIFFERENCE DIFFERENCE ZERO
NOMINAL
X
ORDINAL
X X
INTERVAL
X X X
RATIO
X X X X
10/28/2024 46
1.5 Data collection
Data collection procedure can be divided into three
major stages namely:
Determination of method of data collection.
10/28/2024 47
Types of data collected
Methods of collecting data depends on nature of the data
source and objective of the study.
There are Primary and Secondary sources. Basing on the
mentioned sources, two types of data can be collected;
i. Primary data
ii. Secondary data
10/28/2024 48
Collection of primary data
Primary data can be collected through either a census survey
or sample survey.
Whether a sample survey or census surveys we can obtain
primary data mainly through methods such as:
i. Direct personal observation and measurement
ii. Personal Interview (e.g face to face interview)
iii. Questionnaire (e.g Mail survey)
iv. Experimentation
10/28/2024 49
Direct personal observation and measurement
In this method, information is sought by way of
investigator’s own direct observation without asking from
the respondents.
Observation as a method includes use of sense organs. For
example through both ‘seeing’ and ‘hearing’. It can also be
accompanied by perception.
10/28/2024 50
Advantages of observation method
i. Subjective bias is eliminated, if observation is done
accurately
ii. It is independent of respondent’s willingness to
respond and as such is less demanding of active
corporation from the responds as for the case of
interviews
iii. The information obtained relates to what currently
happened.
10/28/2024 51
Disadvantages of observation method
i. The information provided by this method is very
limited. This is because some of the information are
case specific and hence it is difficult to generalize the
results.
ii. It tells what happened but not why. It does not go into
the motives, attitude or opinion.
iii. It is expensive in terms of resources (time and money)
10/28/2024 52
Application of observation method
Observation is particularly suitable in studies where
respondents are not capable/willing to give verbal
responses due to reasons. For example
10/28/2024 53
Personal Interview
It is defined as a two-way systematic conversation between
an investigator and an informant initiated for obtaining
information relevant to a specific study.
It involves not only conversation but also learning from
respondent’s gestures, facial expressions and pauses and his
environment.
10/28/2024 54
Advantages of Personal Interview
More and in depth information can be obtained.
Interviewer by his/her own skill can overcome the
resistance, of the respondent.
There is greater flexibility under this method as the
opportunity of restructured questions is always there.
The personal information can be obtained easily under
this method.
10/28/2024 55
Disadvantages of Personal Interview
10/28/2024 56
Questionnaire
A questionnaire consists of number of questions printed or
typed in a definite order on a form or set of forms.
In this method a questionnaire is sent (post/online etc) to
the persons concerned with a request to answer the
questions and return the questionnaire.
Questionnaire allows quantification of items and therefore
they can be used to describe or explain various phenomena
being examined
Questionnaire are prepared with open ended (non
directive) and closed ended (directive) questions.
10/28/2024 57
Advantages of Questionnaire
It is relative cheaper in terms of operational cost than
other methods as it may not involve travelling like
interviews.
It is suitable for widely scattered respondents.
If careful structured, questionnaires administered may
give more accurate and adequate results than the
other methods.
The method can in incorporate large samples of data
thus the results are made to be more reliable than
other methods.
10/28/2024 58
Disadvantages of Questionnaire
It may result into irrelevant, inaccurate and bias
information if filled by a wrong person.
The method is useful only when the questionnaires are
fairly simple and therefore, it is not a suitable method
for complex survey
Respondent failure is high (missing data). Sometimes it is
important to make follow-up on the no-respondents by
telephone calls, letters etc.
Timely respondents may also be affected by efficiency in
a postal system.
10/28/2024 59
Experiment
An experiment is a test or series of runs in which purposeful
changes are made to the input variables of a process or
system so that we may observe and identify the reasons for
changes that may be observed in the output response
Simply an experiment is device/means of getting an answer
to the problem under investigation
Basing on the purpose of doing an experiment, experiment
can be classified as
i. Absolute experiment
ii. Comparative experiment
10/28/2024 60
Classification of experiments
Absolute experiments are concerned with establishing
the absolute value of some characteristics for example
Establishing the average maize yield per acre of a
particular maize variety in Mpimbwe District Council.
10/28/2024 61
Advantages of Experimentation Method
Is reliable as researcher get the first hand information.
Establish causal-effect relationship
10/28/2024 63
Question.
Explain the important factors to consider when selecting a method
for data collection
10/28/2024 64
1.6.1 Organization of Data
Data organization is an intermediary stage of work between data
collection and data analysis.
The completed instruments of data collection such as interview
schedules, questionnaires and observation schedules contain vast
mass of data. They cannot straightaway provide answers to
research questions, they need to be classified and summarized in
order to make them amenable to analysis.
Data organization consists of a number of closely related
operations such as editing, classification, coding and
tabulation.
10/28/2024 65
Editing
Editing is the process of checking to detect and correct errors,
omissions, inconsistencies, irrelevant answers and wrong
computation in the return from the survey may be corrected
or adjusted.
10/28/2024 66
Why editing?
As a result of stress when interviewing, the interviewer
cannot always record responses completely and legibly.
Therefore after each interview is over he/she should review
the schedule to complete abbreviated responses, rewrite
illegible responses and correct omissions.
The returns (schedules or questionnaires) received from the
respondents have to be scrutinized patiently and carefully
and detect errors caused by careless recording by the field
workers or inconsistency or factually wrong information
given by the respondents.
10/28/2024 67
Classification
The edited data are arranged according to some characteristics
possessed by the items consisting data and coded.
10/28/2024 68
Classification
10/28/2024 69
Coding
Coding means assigning numerals or other symbols to the
categories or responses. For each question a coding scheme
is designed on the basis of the concerned categories.
Tabulation
Tabulation is the process of summarizing raw data and
displaying them on compact statistical tables for further
analysis. It involves counting of the number of cases falling
into each of several categories.
10/28/2024 70
1.6.2 Presentation of Data
Collected data has to be presented in such a way that reader
can easily grasp the information presented.
10/28/2024 71
For Categorical Data
Tables
Frequency and Relative Frequency Tables
Cross tabulation
Three Laws of Effective Visual Communication
Graphs
Bar graph 1. Have a Clear purpose
Simple 2. Show the data Clearly
Grouped 3. Make the message Obvious
Compound
Pie chart Source: STRATOS visualisation panel
https://graphicsprinciples.github.io/
10/28/2024 73
Cross tabulation
10/28/2024 74
Bar Graph
Used to visualize categorical variable on rectangular bars,
10/28/2024 75
Grouped & Compound Bar Graphs
Used to visualize data of two or more categorical variables ,
10/28/2024 76
Pie chart
The graphical display/visualize information from frequency
summary table of categorical data e.g general health Status
10/28/2024 77
Quantitative Data
Tables
Frequency Tables
Grouped & Ungrouped
Graphs
Histogram
Line graph
Simple
Grouped & compound
Frequency polygon
Cumulative frequency
Box plot
Scatter plot
10/28/2024 78
Histogram
Summarizing data that are measured on an interval/ratio scale (either discrete
or continuous)
10/28/2024 79
Line graph
It used to show the trend of a variable over time.
Time is displayed on the horizontal axis (x-axis) and the
variable is displayed on the vertical axis (y- axis).
10/28/2024 80
Box plot
Visualize numerical data with basic five number summary
i. Minimum
ii. First quartile
iii. Second quartile (Median)
iv. Third quartile
v. Maximum
10/28/2024 81
Box plot
10/28/2024 82
Scatter Plot
Visualize the trend/ association between two continuous
variables
10/28/2024 83
Task
With examples explain types of variables including their sub
categories and suggest an appropriate chart or graph that can be
used to present each category. Provide the reasons behind your
suggestion.
10/28/2024 84
1.7 Summary statistics
Refers to the information that provides quick and simple
description of the data. Summary statistics includes;-
10/28/2024 85
Measures of central tendency
10/28/2024 87
Arithmetic mean
The case of ungrouped data
10/28/2024 88
Arithmetic mean
Question
10/28/2024 89
Arithmetic mean
Solution
Where by;
X – class mark
fi – frequency of each value of xi
10/28/2024 91
Arithmetic mean
Example
Consider the following frequency table for 100 test scores.
Compute the mean value
Scores Frequencies
5– 6 10
7–8 6
9 – 10 11
11– 12 10
13 – 14 25
15 – 16 16
17 – 18 8
19 - 20 14
10/28/2024 92
Arithmetic Mean
Solution
Scores Class limit Class Mark Frequencies fX
(boundary) (X) (F)
5– 6
7– 8
9 – 10
11– 12
13 – 14
15 – 16
17 – 18
19 - 20
10/28/2024 93
Arithmetic Mean
Solution
Scores Class limit Class Mark Frequencies FX
(boundary) (X) (F)
5– 6 4.5 – 6.5 5.5 10 55
7– 8 6.5 – 8.5 7.5 6 45
9 – 10 8.5 – 10.5 9.5 11 104.5
11– 12 10.5 – 12.5 11.5 10 115
13 – 14 12.5 – 14.5 13.5 25 337.5
15 – 16 14.5 – 16.5 15.5 16 248
17 – 18 16.5 – 18.5 17.5 8 140
19 - 20 18.5 – 20.5 19.5 14 273
ΣF=100 ΣFX=1318
10/28/2024 94
Arithmetic Mean
Solution
10/28/2024 95
Arithmetic Mean
Task
Explain the advantages and disadvantages of mean as one of
the measures of the central tendency.
10/28/2024 96
Median
The middle value when the measurements are arranged in
ascending or descending order.
10/28/2024 97
Median
Median for ungrouped data
10/28/2024 98
Median
Median for ungrouped data
Example-1
Consider the following tuition fees (in million) charged by
six different universities in Tanzania to complete a three
year degree programme.
10.3, 4.9, 8.9, 11.7, 6.3, 7.7.
Compute the median
10/28/2024 99
Median
Median for ungrouped data
Solution-1
10/28/2024 100
Median
Median for ungrouped data
Example-2
Consider the following data set
24.1, 22.6, 27.0, 19.8, 21.5, 23.7, 22.6.
Compute the median and interpret
10/28/2024 101
Median
Median for ungrouped data
Solution-2
10/28/2024 102
Median for grouped data
10/28/2024 103
Median for grouped data
Note: Median class is the first class interval that its cumulative
frequency is greater or equal to the half of the total
observations.
10/28/2024 104
Median for grouped data
Example
Consider the following frequency table for 100 test scores.
Compute the median
Scores Frequencies
5– 6 10
7–8 6
9 – 10 11
11– 12 10
13 – 14 25
15 – 16 16
17 – 18 8
19 - 20 14
10/28/2024 105
Median for grouped data
Solution
Scores Class boundary Class Mark Frequencies Cumulative
(limit) (X) Frequencies
5– 6
7– 8
9 – 10
11– 12
13 – 14
15 – 16
17 – 18
19 - 20
10/28/2024 106
Median for grouped data
Solution
Scores Class boundary Class Mark Frequencies Cumulative
(limit) (X) Frequencies
10/28/2024 108
Median for grouped data
Solution
10/28/2024 109
Median
Task
Explain the advantages and disadvantages of median as one of
the measures of the central tendency.
10/28/2024 110
Mode
Mode is of the set of numbers is the value that occurs
with the highest frequency that is the most common
value or the most frequently occurring value
10/28/2024 111
Mode
Mode for ungrouped data
Example-1
Consider the following tuition fees (in million) charged by six
different universities in Tanzania to complete a three year
degree programme.
10.3, 4.9, 8.9, 11.7, 6.3, 7.7
Compute the mode
10/28/2024 112
Mode
Mode for ungrouped data
Solution-1
10/28/2024 113
Mode
Mode for ungrouped data
Example-2
Consider the following data set
24.1, 22.6, 27.0 19.8, 21.5, 23.7, 22.6.
Compute the mode and interpret
10/28/2024 114
Mode
Mode for ungrouped data
Example-2
10/28/2024 115
Mode
Mode for ungrouped data
Question
Find mode(s) for each of the following set of observations
i. 2,2,5,7,9,9,9,10,10,11,12,18,9
ii. 3,5,8,10,12,15,16
iii. 2,3,4,4,4,5,7,7,7,9
10/28/2024 116
Mode for grouped data
Mode is given by;
5– 6 4.5 – 6.5
7– 8 6.5 – 8.5
9 – 10 8.5 – 10.5
11– 12 10.5 – 12.5
13 – 14 12.5 – 14.5
15 – 16 14.5 – 16.5
17 – 18 16.5 – 18.5
19 - 20 18.5 – 20.5
10/28/2024 119
Mode for grouped data
Solution
Class I Boundaries Class Mark Frequencies Cumulative
(X) Frequencies
10/28/2024 121
Mode for grouped data
Solution
10/28/2024 122
Mode
Task
Explain the advantages and disadvantages of mode as one of
the measures of the central tendency.
10/28/2024 123
Measures of dispersion or variability
Measure of central tendency do not provide a complete mental
picture of the frequency distribution for a data set values.
In addition to determine the center of distribution, We must
have some measures of variability of the data, to explain the
spread of data values about the center of distribution.
Data sets may have same central value (say mean) but different
variability.
Examine the histogram below, How are class A and class B
scores dispersed about the center?
10/28/2024 124
Measures of dispersion or variability
10/28/2024 125
Measures of dispersion or variability
Measures of dispersion are numbers that measure the
degree of spread about the centre value such as mean or
median.
Common measures are:-
i. Range
ii. Inter-quartile range
iii. Mean deviation Reading assignment
iv. Variance (σ2)
v. Standard deviation(σ)
vi. Coefficient of variation
10/28/2024 126
Range
10/28/2024 127
Range
Example
Consider the following tuition fees (in million) charged by
six different universities in Tanzania to complete a three
year degree programme.
10.3, 4.9, 8.9, 11.7, 6.3, 7.7.
10/28/2024 128
Range
Solution
Range = 11.7 – 4.9
= 6.8 Millions
10/28/2024 129
Range
Range for grouped data
10/28/2024 130
Inter-quartile range
IQR = Q3 – Q1
What is an advantage of IQR
Q3 – Upper Quartile over range?
Q1 – Lower Quartile
10/28/2024 131
Inter-quartile range
Quartile
There are three quartiles (Q1 ,Q2 andQ3) that divide a series of
data into four series of the same size.
10/28/2024 132
Inter-quartile range
Q2 is the second quartile. By being the second quartile it
means 50% of the observations are less than Q2. In other
words 50% of the observations are less than Median value.
10/28/2024 133
Inter-quartile range for ungrouped data
10/28/2024 134
Inter-quartile range for ungrouped data
Example
Compute Inter-quartile range for the following numbers
10.3, 7.7, 11.7, 4.9, 8.9 and 6.3
10/28/2024 135
Inter-quartile range for ungrouped data
Solution
Inter-quartile range (IQR) is given by;
10/28/2024 136
Inter-quartile range for ungrouped data
Solution
10/28/2024 137
Inter-quartile range for ungrouped data
Solution
10/28/2024 138
Inter-quartile range for ungrouped data
Solution
10/28/2024 139
Inter-quartile range for grouped data
Similar to the case of ungrouped data the ICR is given by the
difference between the third and first quartiles, that is;
10/28/2024 140
Inter-quartile range for grouped data
Where by;
Lq1 – lower class boundary of the 1st quartile class
h – size of the 1st quartile class interval
n – total number of observation
fq1 – frequency of the 1st quartile class
c.f – sum of frequencies of all below 1st quartile class
10/28/2024 141
Inter-quartile range for grouped data
The third quartile is calculated as;
10/28/2024 142
Inter-quartile range for grouped data
Where by;
Note;3rd Quartile class – is the first class interval that its cumulative
frequency is greater than or equal to ¾ of total observations
10/28/2024 143
Inter-quartile range for grouped data
Example
Consider the following frequency table for 100 test scores and compute:
first quartile, third quartile and IQR.
Class interval Frequencies
5– 6 10
7–8 6
9 – 10 11
11– 12 10
13 – 14 25
15 – 16 16
17 – 18 8
19 - 20 14
10/28/2024 144
Inter-quartile range for grouped data
Solution
Class I Boundaries Class Mark Frequencies Cumulative
(X) Frequencies
10/28/2024 146
Inter-quartile range for grouped data
Solution
10/28/2024 147
Inter-quartile range for grouped data
Solution
10/28/2024 148
Box plot, Range, IQR and Quartiles
10/28/2024 149
Inter-quartile range and Outliers
10/28/2024 150
Task
With examples show how to compute each of the
following measures for the case of ungrouped and
grouped data.
i. Inter-decile range
ii. Inter-percentile range
10/28/2024 151
Mean deviation (MD)
For ungrouped data
10/28/2024 153
Mean deviation for ungrouped data
Solution
10/28/2024 154
Mean deviation for ungrouped data
Solution
10/28/2024 155
Mean deviation for grouped data
𝑛
𝑖=1 𝑓𝑖 × |𝑋𝑖 − 𝑋|
MD = 𝑛
𝑖=1 𝑓𝑖
10/28/2024 156
Mean Deviation for grouped data
Example
Consider the following frequency table for 100 test scores and
compute mean deviation
Class interval Frequencies
5– 6 10
7–8 6
9 – 10 11
11– 12 10
13 – 14 25
15 – 16 16
17 – 18 8
19 - 20 14
10/28/2024 157
Mean Deviation for grouped data
Class I Boundaries Class Mark Deviation Frequencies f×d
(X) d = |X − 𝟏𝟑. 𝟏𝟖| (f)
5– 6 4.5 – 6.5 5.5 7.68 10 76.80
7– 8 6.5 – 8.5 7.5 5.68 6 34.08
9 – 10 8.5 – 10.5 9.5 3.68 11 40.48
11– 12 10.5 – 12.5 11.5 1.68 10 16.80
13 – 14 12.5 – 14.5 13.5 0.32 25 8.00
15 – 16 14.5 – 16.5 15.5 2.32 16 37.12
17 – 18 16.5 – 18.5 17.5 4.32 8 34.56
19 - 20 18.5 – 20.5 19.5 6.32 14 88.48
Total Σf=100 Σfd=336.32
10/28/2024 158
Mean deviation for grouped data
𝑛
𝑖=1 𝑓𝑖 × |𝑋𝑖 − 𝑋|
MD = 𝑛
𝑖=1 𝑓𝑖
10/28/2024 159
Sample variance
For the case of ungrouped data
The sample variance (denoted by s2) for a set of n
measurements is equal to the sum of the square distances
from the mean divided by n-1. In symbols it is as follows;
10/28/2024 160
Sample variance
For the case of ungrouped data
Alternatively the sample variance can be calculated as;
10/28/2024 161
Sample variance
For the case of ungrouped data
Example
Consider the tuition fee data set 10.3, 4.9, 8.9, 11.7, 6.3 and
7.7
Calculate sample variance.
10/28/2024 162
Sample variance
For the case of ungrouped data
Solution
The sample variance is calculated as;
10/28/2024 163
Sample variance
For the case of ungrouped data
Solution
Consider the following table
10/28/2024 164
Sample variance
For the case of ungrouped data
Solution
From the table
10/28/2024 165
Sample variance
For the case of ungrouped data
Solution
Therefore the sample variance of the given data set is
6.368 millions
10/28/2024 166
Sample variance
For the case of grouped data
If X1, X2,…,Xn represents mid points (class marks) of the
distribution table with ‘m’ classes and corresponding
frequencies f1, f2,…,fn , the sample variance ( s2 ) of the set of
n grouped measurements having x̅ as the mean is defined
as;
10/28/2024 167
Sample variance
For the case of grouped data
Alternatively sample variance can also be calculated as;
10/28/2024 168
Sample variance
For the case of grouped data
Example
Consider the frequency table of 100 test scores from the
previous example and compute sample variance.
10/28/2024 169
Sample variance
For the case of grouped data
Solution
10/28/2024 170
Sample variance
For the case of grouped data
Solution
10/28/2024 171
Sample variance
For the case of grouped data
Solution
10/28/2024 172
Sample variance
For the case of grouped data
Solution
From the Table
10/28/2024 173
Sample variance
For the case of grouped data
Solution
Hence sample variance of the test scores is 17.755
10/28/2024 174
Population variance
10/28/2024 175
The sample standard deviation
For the case of ungrouped data
10/28/2024 176
The sample standard deviation
For the case of grouped data
10/28/2024 177
Population standard deviation
Standard deviation of the population can be obtained by
taking square root of the population variance as follows;
10/28/2024 178
Coefficient of variation (C.V)
C.V:- Measures the variability in the values in a distribution
relative to the magnitude of the distribution mean.
It’s the percentage of the ratio of standard deviation to the
magnitude of arithmetic mean.
𝑆
CV= × 100%
𝑋
C.V is unit less, hence its mostly used in comparison for given
two or more distributions.
10/28/2024 179
Coefficient of variation (C.V)
Upon comparison of two data sets
Data set having greater C.V is said to be more variable
(heterogeneous) or less consistent. In other words the
observations are more dispersed from the mean.
10/28/2024 180
Coefficient of variation (C.V)
Example
Two workers on the same job were assessed to determine the
time spent to accomplish the tasks, the following table shows the
results over a long period of time.
Worker A Worker B
Mean time (Minutes) 36 25
Standard Deviation (Minutes) 6 5
10/28/2024 182
Population coefficient of variation
Population Coefficient of variation (CV) is given by;
10/28/2024 183
Measures of shape
10/28/2024 184
Skewness
Skewness refers to lack of symmetry (i.e., asymmetry). Specifically
it includes the amount and direction of the departure from
horizontal symmetry.
Skewness is also the tendency for values to be more frequent
around the high or low ends of the x-axis.
Distribution are said to be symmetry, Normal or bell shaped if
Mean = Median = Mode
10/28/2024 185
Skewness
By observation it may be possible to assess symmetry or normality
in the data set by drawing a normal distribution curve or a
histogram
10/28/2024 186
Types of skewness
Skewness can be categorized into two, that is
10/28/2024 187
Positive skewness
Occurs when a normal distribution curve or histogram has
longer tail to the right, predominantly median and mode values
are less than the mean value. Example Income distribution
Mean > Median > Mode
10/28/2024 188
Positive skewness
Example:
Household Income Data
Source Figure
Income distribution (standardised income) | CBS [Accessed:
2024/03/13]
10/28/2024 189
Negative skewness
Occurs when a normal distribution curve or histogram has
longer tail to the left, predominantly median and mode values
are greater than the mean value. Example birth weight
distribution
Mean < Median < Mode
10/28/2024 190
Examples
10/28/2024 191
Measures of skewness
Skewness can be measured through the following approaches
10/28/2024 192
Karl Pearson’s coefficient of skewness
It is based on the measures of central tendency and standard
deviation. It is given by;
Mean −Mode
Skp =
Standard Deviation
10/28/2024 193
Karl Pearson’s coefficient of skewness
Properties
−1 ≤ Skp ≤ 1
When:
Skp = 0 :- Symmetrical distribution
10/28/2024 194
Karl Pearson’s coefficient of skewness
Example
Calculate Karl Pearson’s Coefficient of Skewness from the
following data set
10/28/2024 195
Karl Pearson’s coefficient of skewness
10/28/2024 196
Bowley’s coefficient of skewness
It is based on quartiles.
For symmetrical distribution it seems that Q1 and Q3 are equidistant
from Q2 (Median)
Thus (Q3-Q2) – (Q2-Q1) can be taken as an absolute measure of
skewness. That is
10/28/2024 197
Bowley’s coefficient of skewness
Properties
−1 ≤ Sb ≤ 1
When:
Sb = 0 :- Symmetrical distribution
10/28/2024 198
Bowley’s coefficient of skewness
Example
10/28/2024 199
Bowley’s coefficient of skewness
Solution
10/28/2024 200
Bowley’s coefficient of skewness
Solution (Continued)
10/28/2024 201
Kurtosis
Kurtosis measures the degree of the height of the peak of the
curve describing data distribution.
Kurtosis tells us whether the distribution, if plotted on the graph
would give us a normal curve, a curve that is more flat than a
normal curve or a curve is more peaked than a normal curve
There are three broad patterns of peakdeness of a distribution
namely;
i. Leptokurtic
ii. Mesokurtic
iii. Platykurtic
10/28/2024 202
Kurtosis
A peaked curve is termed leptokurtic and posses kurtosis in excess
or have positive kurtosis.
10/28/2024 203
Kurtosis
10/28/2024 204
Kurtosis
Measuring Kurtosis
Kurtosis can be estimated as follows;
4
1 𝑋𝑖 − 𝑋
Kurtosis =
𝑛−1 𝑠
10/28/2024 205
Kurtosis
Properties
When:
Kurtosis > 3 :- Leptokurtic (Positive kurtosis)
10/28/2024 206
Kurtosis
Example
Given the numbers 2, 3, 2, 8, and 10. Find the kurtosis and
state whether it is leptokurtic, mesokurtic or platykurtic.
10/28/2024 207
Kurtosis
Solution
10/28/2024 208
Kurtosis
Solution (Continued)
10/28/2024 209
Kurtosis
Solution (Continued)
10/28/2024 210
The End
10/28/2024 211