Application of Statistical Methods Topic 1 4m 5
Application of Statistical Methods Topic 1 4m 5
APPLICATION OF STATISTICAL
METHODS
Statistics refers to a scientific and
systematic/organized method of collecting,
recording, summarizing, analyzing and
representation of numerical data in precise
manner.
Or
• The study of methods of collecting, recording,
summarizing, analyzing and presentation of
data in precise manner by using numbers
Or
• A science of observing, collecting, recording,
summarizing, analyzing and presentation of
data in precise manner by using numbers.
NATURE OF DATA
20 – 29 17,987 13,098
30 – 39 16,876 17,654
According to scale of measurements
• Nominal data
The type of data according to scale
of measurement of which the values are given
according to the name of items in a given
sample. e.g. 10 apples, 5 oranges, 7 mangoes,
5 banana and 2 cherish
• Ordinal data
DATA
The information which is presented
in numerical form is known as
statistical data.
• Primary data
• Secondary data
Primary data
These are the numerical facts collected from the
field or handled for the first time. i.e. They are
the first hand or original information. The data
are not available in the existing sources like
books. Primary statistical data are handled by
the techniques of interview, the use of
questionnaires, observation, counting,
measurements and other methods.
Secondary data
These are the numerical facts derived from the
stored sources. The data were compiled by
other people who carried out research. The
sources of this type of data include; text
books, reference books, magazines, maps,
video tapes, audio tapes, and other sources
which deliver the same.
SOURCES OF STATISTICAL DATA
Secondary sources
Involves the seeking of data that has already gathered
from the field.
• The collection involves reference to past
publications and official records.
Interview method
Calculated
Categories of statistical measures
Measures of central tendency
Measures of variability
x.+x..+x…+xn
n
• Where by:
N=9
• Where by;
• X = Class mark
• f = Frequency Real values
• Example:
Find the arithmetic mean for the following
scores of marks
Class Interval F X fx
91-95 0 93 0
86-90 1 88 88
81-85 6 83 498
76-80 10 78 780
71-75 15 73 1095
66-70 34 68 2312
61-65 22 63 1386
56-60 10 58 580
51-55 2 53 106
Solution:-
2 3
Unimodal
4 2
5 1
6 1
bimodal
(2) 4, 9, 8, 5, 6, 7
The given data set has no mode.
40 – 44 7
45 – 49 8
50 – 54 11
55 – 59 10
60 – 64 4
• Solution
• The mode for the given data set above is
calculated as follow:-
• According to the given data set;
• L = 49.5
• t1 = 3
• t2 = 1
• i=5
• Then;
• 49.5 + (0.75 x 5)
• 49.5 + 3.75 = 53.25
• Thus; the mode = 53.25
• Advantages of a mode
• It helps to make determination of
predominance of a certain geographical
feature in a place.
• It helps to know number of occurrence of the
values in data set.
• Disadvantages of a mode
• It needs high mathematical knowledge to
calculate mode for the grouped data set
• It is unreliable measures of central tendency
as a data set may have more than one modes
• MEDIAN
• Median refers to a point value that
divides the other values in a set of
distribution into two equal parts after
to have been arranged in ascending
or descending order.
• Computation of the median
• The computation of the median
chiefly depends on the nature of data
set given if ungrouped or grouped.
• For the ungrouped data set,
the calculation of median should
further take into account the nature
of data set given whether odd or
even.
• If the ungrouped data set is odd; the
median is just the middle value and
it is obtained after the value
numbers to have been arranged in
ascending or descending order.
• E.g.
• 1, 2, 1, 4, 6, 5, 3
• Solution
• The ascending order of the values
is as follow:-
• 1, 1, 2, 3, 4, 5, 6
• Thus; the median = 3.
• If the data set is even; median is the
average of the two middle values and
obtained after the value numbers to have
been arranged in ascending descending
order.
• E.g.
• 1,4,5,2,7,8,3,2
• The ascending order for the values is as
follows:-
• 1,2,2,3,4,5,7,8
• Thus; the median = 3.5
Median determination for the
grouped data
For the grouped data; median is
determined by applying the following
formula:-
Where by:-
L = The lower limit of the median
class
N = Total number of observation
nb = the number of elements in the
classes below the median class
nw = number of elements in the
median class
i = class interval
Example:-
The tabled data below: shows the score of
marks in geography subject for form IV
students.
Class interval Frequency
40 – 44 7
45 – 49 8
50 – 54 11
55 – 59 10
60 – 64 4
Example:-
The tabled data below; shows the score of marks in
geography subject for form V students.
According to the given data
L = 49.5
N = 40
nb = 15
nw = 11
i=5
nb = the number of elements in the classes below the median
class
nw = number of elements in the median class
i = class interval = 5
49.5 + (0.45 x 5)
49.5 + 2.25 = 51.75
Thus the median = 51.75
Advantages of median
• It helps to understand the middle value among
of the numerous values in a certain data set.
• It is easy to make determination particularly
for the simple data set.
Disadvantages of the median
• If the values are numerous, it becomes
cumbersome to arrange in ascending or
descending order to get the median
• It needs high skill to determine median for the
grouped data set.
MEASURES OF VARIABILITY
These are the ones which asses the
variation / difference of values in data
set. The common measures of variability
include the following:-
o Range
o Standard deviation
o Variance
o Mean deviation
RANGE
Range is the difference between
highest and lowest values in a
given set of distribution.
It is used to assess the existing
variation/ difference between the
highest score and lowest score.
Calculation of the range
Calculation of a range also considers
the nature of a data set given
whether ungrouped or grouped.
For the ungrouped data set, range is
calculated by subtracting the lowest
value from the highest value in a
data set given.
Example:-
Determine the range for the
following data set 4, 2, 3, 5, 6, 4, 8
Solution
The range for the data set given
is computed as following:-
Range = Highest value – lowest
value
According to the given data set:-
· Highest value = 8
· Lowest value = 2
· 8–2=6
· Thus; The range = 6
With the result of range; if it is high
implies greater variation. If the range
is small, it implies there is small
variation.
For the grouped data;
10 – 14 12
15 – 19 17
20 – 24 22
25 – 29 27
30 – 34 32
35 – 39 37
According to the computed class marks
· Highest class mark = 37
· Lowest class mark = 12
37 – 12 = 25,
Thus, the range = 25
• Advantages of a range
• Range gives a quick rough estimate of
variability
• It is simple to calculate and the majorities are
much aware with it.
• Disadvantages of a range
• It considers only two values of highest and
lowest and thus not sensitive to the total
distribution
• It is affected by the extreme values
Standard deviation
• DEVIATION
Deviation is the difference between the value
and the mean. It is computed by subtracting a
mean from the value.
• Whereby:-
• X = value given in a set of distribution
• X = average of all values
Standard deviation
Refers to the common difference of all
values from the mean. It is the root mean
square deviation from the mean. It is the
measure which determines how far or
scattered are the values from the mean.
Standard deviation is represented by
sigma symbol of
• Computation of a standard deviation
• Calculation of a standard deviation also
depends on the nature of dataset
given whether ungrouped or grouped.
• For the ungrouped data; standard
deviation is calculated by the following
application.
Where by:-
X = value in a set of distribution
Class F X Fx
interval
40 – 44 7 42 294
45 – 49 8 47 376
50 – 54 11 52 572
55 – 59 10 57 570
60 – 64 4 62 248
Hence; 51.5
Then:-
X 42 47 52 57 62
= 40
2 + 1 + 2 +4 + 3 + 0 = 12
Example:-
Class interval Frequency
40 – 44 7
45 – 49 8
50 – 54 11
55 – 59 10
60 – 64 4
Determination of the mean
Class F X Fx
interval
40 – 44 7 42 294
45 – 49 8 47 376
50 – 54 11 52 572
55 – 59 10 57 570
60 – 64 4 62 248
Hence; the mean = 51.5
Determination of the
deviations.
• Where by:
• X = Class mark
X X– D F Fd
47 47 – 51.5 4.5 8 36
52 52 – 51.5 0.5 8 36
57 57 – 51.5 5.5 10 55
62 62 – 51.5 10.5 4 42
· The sum of (fd) determination
Group Line Graph showing Maize production by three villages between 2000 and 2002
Advantages of Group Line Graph
1. The quantity of each component is shown clearly by different line
shadings.
2. Time and space are saved since all the line graphs are drawn at
ago as a group.
Construction: