0% found this document useful (0 votes)
2K views47 pages

Sta 111 (Introduction of Statistics)

Statistics

Uploaded by

fabianjoseph063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views47 pages

Sta 111 (Introduction of Statistics)

Statistics

Uploaded by

fabianjoseph063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

STA 111 (INTRODUCTION OF STATISTICS)

By

Akeyede, I. And Nja, M.E.

Department of mathematics

Federal university Lafia

1.0 Introduction

1.1 What is Statistics?

Statistics is the branch of science that making effective use of numerical data relating to
groups of individuals or experiments. It deals with all aspects of these including not only
the collection, observation, recording, analysis and interpretation of such data, but also
the planning of the collection of data, in terms of designs of surveys and experiments.

Statistics is a word derived from 'status' meaning 'state', 'condition'. The study of statistics
was popularized during the Roman Empire. In the days of the Roman Empire, statistics
was regarded as the science that studied the state population, economic resources, social
problems, politics etc. The theory of probability in statistics began in the 17th century and
provided the mathematical foundation for statistics.

The purpose of statistics is to make decisions concerning an entire population of


measurements on the basis of information available from relatively small sample. Since
sample and population are basic to all statistical work.

1.1.1 Population: The word population is used to mean the entire set of individuals or
items of interest.

1.1.2 Sample: A subset of the population is called sample. It may mean a selected part
of the population.

In practice, the population of interest is too large or too scattered to allow measurements
to be made on all the individuals. Since a complete census is impossible, the next option
is to select part of the population called a sample. The investigation is thoroughly carried
out on the sample of individuals, recording all relevant measurements and information.
Finally the information gained from the sample after all necessary calculations are used
to make inferences concerning the population. Due to the fact that the sample information
is not perfect, there is the risk of making incorrect inferences about the population. Part
of the method of statistical inference is to minimize such risks.
1.1.3 Example: Suppose that a medical researcher has developed a new vaccine for
HIV/AIDS. The population of interest may include not only those individuals who are
infected but those who are at risk in contracting the disease in future. Thus it is clearly
impossible for the researcher to investigate the effect of treatment on every individual in
the population. The statistical approach to this problem is to try the treatment on a
randomly selected sample of individuals and use the results to make inferences regarding
the overall effectiveness of the treatment.

Numerical measures are used to describe both populations and samples. The measures
used to describe populations are called parameters, while those that describe samples are
called statistics. Parameter values are usually unknown however values of statistics can
be calculated from the sample measurements. Using the calculated statistical values,
parameter values can be estimated. The use of calculated statistical values to estimate
unknown parameter values is an important part of statistical inference.

Note that: the calculated value of the statistic serves as estimates of the parameter, and
may not be equal to the parameter.

1.1.4 Example: A sample mean ( x ) a statistic estimates the population mean ( µ ), and
the sample proportion ( p̂ ) is a statistic that estimates the population proportion (P), a
parameter.

1.1.5 Example: The campaign manager for a candidate say 'D' in a student union election
is interested in unknown parameter P, the proportion of students familiar with candidate
'D'. The school is large and the campaign manager does not have available the necessary
time or resources for a complete polling of the students' population. What might he do?
Solution:
The campaign manager might conduct a sample randomly selected students and
determine how many are familiar with candidate D. If for instance, a sample of 200
students includes exactly 80 who are familiar with the candidate. Then the sample
80
proportion, denoted by p̂ = is a statistic that can serve as an estimate of the
200
unknown population parameter P.

1.2 Descriptive and inferential statistics:


When presented with a set of measurements, there is need to organize and summarize the
information. The branch of statistics that presents techniques for describing sets of
measurements is called descriptive statistics. It may be presented in many forms as bar
charts, pie charts or line charts.
1.2.1 Definition: Descriptive statistics consists of procedures used to summarize and
describe the important characteristics of a set of measurements. Sometimes it is
preferable to draw conclusion based on the descriptive statistics for it might be too
expensive or time consuming to enumerate the entire population. Thus looking at the
sample, conclusion may be drawn on the entire population.

1.2.2 Definition: Inferential statistics consists of procedures used to make inference


about population characteristics from information contained in the sample drawn from
the population.

The objective of inferential statistics is to make inferences (i.e. draw conclusions, make
predictions, make decisions about the characteristics of a population from information
contained in a sample.

In this case, inferences about data are taking after the data is summarized and analysed.
These inferences may take the form of answering Yes/ No questions about the data
(Hypothesis Testing), numerical characteristics of the data (Estimation), describing
relationship within the data (Correlation), modelling relationship within the data
(Regression) and extrapolation, interpolation or other modelling techniques (like
ANOVA and Time Series ).
1.2.3 Data: It is a fact that comes up as a result of statistical inquiries or survey.
Information like age, weight, price of commodities, number of people and houses in a
commodities etc are infact statistical data. Data is important information needed in
statistics.
1.3 Sources of Statistical Data:
The data we collect for any statistical study may be classified into qualitative or
quantitative data. Qualitative data are those used for describing characteristics which
cannot be defined in numerical terms. For example colour of hair, colour of eyes,
defective or non-defective, performance graded as excellent, good, average, poor. These
characteristics are called attributes. Qualitative data may be divided into nominal or
ordinal data. Ordinal data have an inherent ordering while nominal data do not have any
ordering.
Quantitative data are data that are capable of numerical description. Examples include
data on height of students in meters, wages of workers in naira, scores of students in
percentages etc. Quantitative data is divided into discrete and continuous data. Discrete
data take on only a finite or countable (equivalent to set of integers) number of values.
For example children in a household, number of visits to a doctor in a year, etc. Discrete
data are integers or fractions. Continuous data are data that can take on any real value in
an interval or over the whole real number line. For example age, height, heart rate, blood
pressure, etc.
Data collected by the investigator himself for the purpose of investigation is called
primary data, while data collected from already published sources is called secondary
data

1.4 Method of Primary data collection:


(i) Interview method
This involves collection of data by trained personnel called enumerators. These agents
visit their informants in their houses or offices, in the markets or on the street and ask the
necessary questions. This method is used for statistical inquiries where personal contact
with the respondent is necessary.

Disadvantages of this method include the expensive cost of conducting the survey, the
time consuming nature of the survey, the informants may refuse to respond to questions
and the absence of respondents in their homes.

(ii) Mailing of questionnaire


In the mailing of questionnaire method, questions are mailed to the informant who fills
same unaided and returns the completed forms to the office of origin. This method is
cheap, gives enough time to the respondent and there is no interviewer bias.

Disadvantages include non-response by informants, enumerators not present to explain


intricate questions to the informants. Sometimes the wrong person may complete the
form which leads to unreliable information.

(iii) Method of registration


In the registration method data are collected by keeping records of events immediately
they occur. Examples of this method include registration of births, deaths, marriages,
divorces, immigration and emigration, motor accidents, etc. This method is more efficient
in developed countries than in developing countries.

1.5 Methods of collection of secondary data


Any data collected from published sources is called secondary data. This data may
include economic indicators like population, age distribution etc of a country published
by the country in their statistical books. The major advantage of this method of data
collection is that it is less expensive to obtain.

1.5.1 Problems associated with secondary data


(i) Secondary data may be out of date or may not relate to the required period.
(ii) Sample size of secondary data may be too small to make any reliable inference.
(iii) Secondary data may not include when and where it was collected.
1.6 Presentation of Data
After the data have been collected, they are consolidated and summarized to show the
variables that have been measured and how often each variable occurs. A statistical table
is constructed to display the data graphically as a data distribution.

1.6.1 Example: The data below shows the distribution of students in 11 Departments at
Federal University Lafia in the 2011/2012 academic session.

Table 1.1 : Maiden Matriculation of Students at Federal University Lafia in 11


Departments in 2011/2012 Academic Session
Department No of Students
Biology 38
Chemistry 15
Computer Science 24
Economics 19
English 22
History 10
Mathematics 11
Physics 14
Political Science 41
Sociology 38
Visual and Creative Arts 15
Total 247

1.6.2 Definition: A pie chart consists of a circle divided into sectors each angle is
proportional to the size of the data.
1.6.3 Example: To draw the pie chart of Example 1.6.1 the following computations are
conducted
38 15
Biology = × 360 o =55₀, Chemistry = × 360 o = 22o
247 247

The remaining computations are


carried out and the results
Vis. &
shown in the pie chart.
Creat Arts Biology
22₀ 55₀
Sociology Department No of Students
Chemistry
55₀ Biology 55₀
22₀
Comp. Chemistry 22₀
Pol. Science Computer Science 35₀
Science 35₀ Economics 28₀
60 ₀
Economics
Phys English 32₀
28₀
20₀ English History 15₀
Maths 32₀ Mathematics 16₀
16₀ History
15₀ Physics 20₀
Political Science 60₀
Fig. 1.1 Pie Chart of the enrolment of students at various Departments in Federal
University Lafia in 2011/2012 academic session (in angles).

1.6.3 Definition: A bar chart consists of rectangular bars which may be vertical or
horizontal and are proportional to the magnitude under consideration.
1.6.4 Example: To draw the bar chart of Example 1.6.1 each bar is proportional to the
magnitude considered.

45
40
35
30
25
20
15
10
5
0

Fig. 1.2 Bar Chart of the enrolment of student in Departments at Federal University Lafia
in 2011/2012 academic session.

Sometimes a pie chart can be drawn using angles as shown in the example below.

1.6.5 Example: A family shared out available money of N400 for the month as follows:
Food N180
School fees N100
Health care N50
Rent N60
Incidentals N10
(i) Represent the above information in angles on a pie chart.
(ii) What percentage of the income is given to school fees?
Rent(54 o)
o
Food(162 )
Health (45o)
Incidentals(9o)

School fees(90 o)

Solution: Total income = N400


180
(i) Food = × 360 =162º
400
100
School fees = × 360 = 90º
400
50
Health care = × 360 = 45º
400
60
Rent = × 360 = 54º
400
10
Incidentals = × 360 = 9º
400
90
(ii) Percentage given to school fees = ×100 = 25%
360

1.6.6 Raw Data : Data is collected that is not organized numerically is called raw data.

1.6.7 Example : The scores of 20 students in statistics examination is shown below: 52,
14, 23, 53, 21, 13, 27, 17, 44, 74, 91, 92, 19, 48, 63, 80, 70, 64, 50, 57.

1.6.8 Array : An array is an arrangement of raw data numerically in ascending or


descending order of magnitude.

1.6.9 Example : Consider the scores of 20 students in statistics examination in the


example above and arrange the data in an array.

Solution : 13, 14, 17, 19, 21, 23, 27, 44, 48, 50, 52, 53, 57, 63, 64, 70, 74, 80,
91, 92.

1.6.6 Frequency Distribution


In summarizing large masses of data, it is often convenient to distribute the data into
classes or categories and find the number of data falling into each category. This is
called class frequency. Such a data is called frequency distribution.
1.6.7 Example: A frequency distribution of the scores of 20 students in statistics in the
example above is given below.
Scores(%) Tally Frequency

1 - 20 1111 4
21 - 40 111 3
41 - 60 1111 1 6
61 - 80 1111 5
81 - 100 11 2
Data organized and summarized as in the above table is called grouped data.

1.6.8 Class interval, class limit and class boundary


In the above example 1 - 20 denotes a class interval. 1 is the lower class limit and 20
is the upper class limit. 1 - 20 actually includes from 0.5 to 20.5. The numbers 0.5
and 20.5 are called class boundaries. 0.5 is called lower class boundary and 20.5 the
upper class boundary.

1.6.9 Class Mark: The arithmetic mean of the upper and lower class limits or the upper
and lower class boundaries is called class mark.

1.6.10 Example: The class mark of the class 21 – 40 is 30.5.

1.6.11 Histogram
A histogram is made up of rectangles or bars of equal width with no space or gaps
between bars. The heights of rectangles or bars correspond to the class frequencies.
The class boundaries are marked on the horizontal axis while the frequencies are
marked on the vertical axis.

1.6.12 Example: Construct a histogram using the example on the scores of 20 students
in an examination.

frequency
2
0 0.5 20.5 40.5 60.5 80.5 100.5
Class Boundary

1.6.13 Frequency Polygon: A line graph of frequency against class mark. It is obtained
by joining the midpoints of the tops of the rectangles of a histogram.

1.6.14 Cumulative Frequency Curve


The total frequency of all values less than the upper class boundary of the given class
interval is called the cumulative frequency.

1.6.15 Example: The following example gives scores of 20 students in an examination.

Scores Frequency Cumulative frequency Class Boundary


1 – 20 4 4 0.5 – 20.5
21 – 40 3 7 20.5 – 40.5
41 – 60 6 13 40.5 – 60.5
61 - 80 5 18 60.5 – 80.5
81 - 100 2 20 80.5 – 100.5

20
15
Freq. 10
5
0
20.5 40.5 60.5 80.5 100.5
Upper class boundaries
The graph of cumulative frequency against upper class boundaries is called Ogive.

EXERCISES
1. Consider the budget allocations of a Local Government Council

Ministry or Educatio Health Utilities Roads Agriculture Capital Housing


Division n Development
Revenue 900,000 720,000 280,000 420,000 480,000 1,400,000 600,000
Allocated
Represent the above information on a pie chart.

2. In an examination, The marks scored by 40 candidates were as follows:


5 27 33 15 45 67 72 56
70 48 42 30 38 24 63 40
41 50 46 34 26 48 36 65
16 8 48 32 35 43 12 55
54 57 25 35 18 60 58 41
(i) Prepare a frequency distribution table, using intervals 1-10, 11-20, etc;
(ii) Represent the data on a histogram.

3. The age distribution of 20 victims of an accident are given in the table


below.
Age 1-8 9-16 17-24 25-32 33-40 41-48 49-56
No of 3 4 3 5 3 1 1
people
(i) Draw a histogram using the information.
(ii) Make a cumulative frequency table for the distribution.
(iii) Draw a cumulative frequency curve (ogive).

4. The table below shows the distribution of the weight in kg of 50 residents


of an estate.
Weight 21-30 31-40 41-50 51-60 61-70 71-80
Frequency 6 8 12 15 5 4
(i) Construct a cumulative frequency table for the distribution.
(ii) Draw a cumulative frequency curve of the distribution.

2.0 MEASURES OF CENTRAL TENDENCY

2.1 ARITHMETIC MEAN

The Arithmetic Mean is the most popular measure of central tendency. It is


popularly known as average or mean. It is represented by µ or ̅ for data from
population or sample respectively. Arithmetic Mean is calculated by taking the sum of
all values of items divided by number of items.

In what follows, notations of Arithmetic Mean for unordered (raw) and frequency
distribution data.
∑ 
 
̅ =
⋯… 
= for samples from raw data and
 

∑     
̅ = ∑
⋯…  
= ∑
for samples from discrete (ungrouped) and continuous
   
(grouped) distributions. Where  is the ith observation and  is the corresponding ith
frequency and n and ∑  are total number of observation for raw data and frequency
distribution data respectively.

Example 2.1:

Find the arithmetic mean of the following age (in month) of children for an
immunization;
6, 12, 4, 16 and 2

Solution:
        
̅ = = 40/5 =8.


The Arithmetic Mean of ages of the five children is 8 month.

Sometimes when a set of data is large there will be need to form a frequency
table. The format of frequency tables for grouped and ungrouped frequency has
discussed in chapter four. The method of obtaining their arithmetic mean is shown
insection 6.1.1 and 6.1.2 of this chapter.

2.1.1 ARITHMETIC MEAN OF DISCRETE FREQUENCY DISTRIBUTION

Obtain the arithmetic mean for data in table 6.1 below

Table 2.1: The Frequency Distribution of Ages of Children Died on a Motor Accident In
a Luxurious Bus.

Age (in year) 1 2 3 4 5 6 7

Freq 2 7 5 4 9 7 6

Solution

Age (x) Freq. (f) fx

1 2 2

2 7 14

3 5 15

4 4 16

5 9 45

6 7 42

7 6 42

Total 40 175
∑ 
 = ∑
= 75/40 = 4.375.

The arithmetic mean of their age is therefore 4 years

2.1.2 ARITHMETIC MEAN FROM CONTINUOUS FREQUENCY DISTRIBUTIO (GROUPED


DATA)

Calculate the Arithmetic Mean of data in table 6.2

Table 2.2: The Weekly Profit in Naira from a Super Market

Weekly profit 1-10 11-20 21-30 31-40 41-50 51-60

Frequency 6 6 12 11 10 5

Solution

Class interval Freq (f) Class mark(x) Fx

1-10 6 5.5 33

11-20 6 15.5 93

21-30 12 25.5 306

31-40 11 35.5 390.5

41-50 10 45.5 455

51-60 5 55.5 277.5

Total 50 1555


=



The arithmetic mean of weekly profit is therefore ₦31.1

2.1.3 ARITHMETIC MEAN BY ASSUMED MEAN METHOD

The assumed mean method is adopted when data under consideration consists
of large items. The number chosen arbitrarily from the list of information being
considered for the purpose of calculating the arithmetic mean is the Assumed Mean. It’s
generally suggested that the assumed mean should take a number very close to the
middle score. If A denotes our Assumed Mean then the Arithmetic Mean using an
Assumed Mean is calculated by
∑( !)
̅ = A + 

If the observation is raw data, and ̅ = A + ∑ (


 !)
if the observation is from
frequency distribution.

Example 2.3:

Using an assumed mean of 6 for data in example 6.1, find the mean of the distribution.

Solution

Age (x) X-A

6 0

12 6

4 -2

16 10

2 -4

Total 10


π=6+ =8


The mean age of the children is 8 months

Example 2.3(b)

Using the data in table 2.2, obtain the arithmetic mean by taking the assumed mean to
be 5 years

Solution

Age (x) Freq(f) X-A F(X-A)


1 2 -4 -8

2 7 -3 -12

3 5 -2 -10

4 4 -1 -4

5 9 0 0

6 7 1 7

7 6 2 12

Total 40 -24

( )
̅ = 5+ = 4.4


The arithmetic mean of their ages is 4 years.

2.1.4: WEIGHTED ARITHMETIC MEAN

The weighted arithmetic mean is used for observation from continuous


distribution or grouped data. The weighted arithmetic mean is otherwise referred to as
coded factor. It is obtained by taking the deviation of assumed mean from observations,
and then divides the result by class width of the class mark. The coded factor is denoted
by u and the class size or class weight is denoted by c then the relationship between u
and c is
# !
U=
$

Example 2.4:

Calculate the arithmetic mean of data in table 6.2 using an assume mean of 44.5

Solution

Class mark(X) F X-A U FU

5.5 6 -40 -4 -24

15.5 6 -30 -3 -18

25.5 12 -20 -2 -24


35.5 11 -10 -1 -11

45.5 10 0 0 0

55.5 5 10 1 5

TOTAL 50 -72

C = 15.5 – 5.5 = 10
( %)
π = 45.5 + 10


= 45.5 – 14.4

= 31.1

The arithmetic mean of the distribution is 31.1

2.1.5 ADVANTAGE OF ARITHMETIC MEAN

• It is easily understood and easy to compute


• It the most amenable measure of central tendency of further calculation.
• The Arithmetic Mean is most popular measure in fields such as business,
engineering, science etc.
• It is unique as it provides only one answer
• It is useful when comparing two or more sets of data, particularly in parametric
technique.
• It makes use of all available information in a data set.

2.1. 6 DISADVANTAGES OF ARITHMETIC MEAN

• The arithmetic mean is affected by extreme value (outlier) in a distribution.


This is due to the fact that the mean is affected by every value on the set of
data.
• Unlike other measures of central tendency arithmetic mean cannot be
obtained graphically.
• It may be difficult to obtain without calculation.
• The arithmetic mean can only be obtained from quantitative data. Whereas
other measures of central tendency like median and mode can be extended
to qualitative data.

2.2 MEDIAN

The median is another type of measure of central tendency. It is obtained from


the mid position of a set of data presented in order of magnitude. The steps involved in
calculating the median of unordered are stated as follows.
 Arrange the data in ascending or descending order of magnitude
 Count the data from the left and right then pick out the middle value. However if
the set of the middle data are even we pick two items that falls into mid position
and take their average to represent our median.

Example 2.5:

Consider the following height of trees to the nearest metre; 2.0, 3.5, 4.2, 3.7, 2.6, and
5.1. Obtain the median of the distribution

Solution

Arranging the data in an ascending order, we have 2.0, 2.6, 3.4, 3.5, 4.2 and 5.1

There are five elements all together the median is therefore 3.5 (the value in 4th
position)

2.2.1: MEDIAN FROM DISCRETE FREQUENCY DISTRIBUTION

When the items are large, it may be necessary to use the method other than counting
from left and right of a set of data in an array.
( &) th
The median may be obtain by taking the observation that falls into 
if N is odd or
& &
average of two numbers that fall into (  )'( and ( + 1)'( Position of observation where
N is the terminal cumulative frequency.

Example 2.6:

Table 2.3: Height and Number of Trees in a Garden

Height (m) 2.0 2.5 3.0 3.5 4.0 4.5

Number of trees 4 3 5 1 3 2

Cumulative frequency 4 7 12 13 16 18

Median = (18/2)th = 9th and (18/2+1)th = 10th positions. Hence from the cumulative
frequency column the Median is 3.0m

2.2.2: MEDIAN FROM CONTINUOUS DISTRIBUTION (GROUPED DATA)

The median from grouped distribution is obtained by method of interpolation


using the formula
&/ ∑,
Median = Lm + ( )c
-

Where;

Lm denotes lower class boundary of the medians


N denotes total number of observation (total frequencies)

∑fb denotes cumulative frequency before the median class

fm denotes frequency of the median class

c denotes class size

Example 2.7:

Using the data in table 2.4 below to obtain the median of the distribution

Table 2.4: Matches per Box and Corresponding Frequency of Randomly Selected
Number of Matches

Matches per 39 – 41 42 – 44 45 – 47 48 – 50 51 – 53 54 - 56
Box

Frequency 3 13 26 38 15 5

Solution

Class Boundary Frequency Cumulative Frequency

38.5 – 41.5 3 3

41.5 – 44.5 13 16

44.5 – 47.5 26 42

47.5 – 50.5 38 80

50.5 – 53.5 15 95

53.5 – 56.5 5 100

From the distribution;

N/2 = 100/2 = 50th position

That 50th member is found within the cumulative frequency of up to 80 of class


boundary, 47.5 – 50.5. Therefore, Lm = 47.5, N = 50, ∑fb = 42, fm = 38, c = 3 (50.5 – 47.5
etc)
 %
Median = 47.5 + ( ./
)3 = 47.5 +0.63 = 48.13

The median value of the matches stick per packet is 48 matches.

2.2.3 MEDIAN FROM CUMULATIVE FREQUENCY CURVE (O-GIVE)


Also, the median of grouped data may be obtained graphically from cumulative
frequency curve (O-give). The curve is obtained by plotting cumulative frequency against
class boundary.
& &
To obtain the median from O-give, we calculate (  )th if N is odd and (  )th if N is
even and trace the value from the cumulative frequency to the class boundary as
illustrated in figure 2.1.

120

100
Cumulative Frequency

80

60

40

20

0
41.5 44.5 47.5 50.5 53.5 56.5
Class Boundary
Figure 2.1: Demonstration of Median from Cumulative Curve

2.2.3.1 ADVANTAGES OF MEDIAN

• Unlike Arithmetic Mean, extreme values (outliers) do not affect the median.
• It is useful when comparing two or more sets of data especially in the non
parametric test.
• Computation in median is very easy and easy to understand as it does not
involve serious calculation.
• It can be obtained from graphs as discussed in section 6.2.3
• It is unique as it gives only one figure.

2.2.3.2 DISADVANTAGES

• The re-arrangement of observation involves a difficult task, especially when


large values are involved.
• It may not be needed for further statistical calculation
• The median is not as popular as mean
• It tends to ignore the extreme values.

2.3 MODE
The mode is the most easily computed and simplest to interpret among others. The
mode of a given data is the item which occurs most often in the distribution. In case of
data that are frequently distributed, the mode is a member of numbers that has highest
frequency. The information with one mode is referred to as unimodal, two modes is
bimodal and more than two modes are known to be multimodal. However, if all items
are different there is no mode.

Example 2.8:

From the following data 50, 45, 50, 25 45, 45 and 30, the mode are 45 and 50 (Bimodal)

2.3.1 MODE FROM CONTINUOUS DATA (GROUPED DATA)

The mode from grouped data may be simply obtained by taking the average of the
class interval or class boundary or picking the class mark of the modal class. On the
other hand, an exact value of mode is obtained by interpolation or graphical method.

2.3.1 MODE BY INTERPOLATION METHOD

The mode by interpolation method provides a single value data that represents the
whole data set. This method is carried out by using the formula;
∆
Mode = Lmo+ (∆∆)c where;

Lmo is the lower class boundary of the modal class

∆1 is the frequency of the modal class minus frequency before it.

∆2 is the frequency of the modal class minus frequency after it.

Example 2.9:

Obtain the mode of data in table 6.4?

Lmo = 47.5 (because the highest the frequency is 38)

∆1 = 38 – 26 = 12

∆2 = 38 -15 = 23

c=3

Mode = 47.5 + (.)3 = 48.5

The mode of the matches is therefore 49.

2.3.2 MODE BY GRAPHICAL METHOD


The mode or modal class can also be obtained from bar charts or histogram. The
modal class will be the peak of the diagrams. The class width or size should be equal
otherwise mode may not be possible to obtain from the graph or chart.

The mode can be read off from the histogram by the following steps

• Identify the modal class from the tallest rectangular bar.


• Draw a straight line from the right and left top corners of the tallest rectangular
bar to the right and left top corners of its immediate left and right bars
respectively
• Identify the point of intersection of the two lines drawn, and draw a straight line
from the point of intersection perpendicularly to the horizontal axis.
• Read off the mode of the distribution from the point of horizontal. The
illustration is shown in figure 6.2.

2.3.3 ADVANTAGES OF MODE

• Extreme values (outliers) do not affect the mode.


• It easy to determine, compute and understand.
• It can be obtained from graphs as it describes the shape of a frequency
distribution.

2.3.4 DISADVANTAGES OF MODE

• It not necessarily unique, there may be more than one modes


• When there is more than one mode, it is difficult to interpret and/or compute
• When no value repeats in the data set, the mode is every value and is useless.
• It is not as popular as the mean and median.
• It is not useful in further statistical calculation.

2.4 GEOMETRIC MEAN

The geometric mean of a set of non negative N observations is the Nth root of
their product. Suppose x1, x2 ,..............., xN be a set of positive numbers then,

Geometric Mean (GM) = 5√x x x……….4 xN = 6∏&


8  9
/&
. Where,

∏ is the symbol for multiplication.

The geometric means is obtained by considering the logarithm of the observed


values rather than the observed values themselves and obtain the arithmetic mean.
The result is the logarithm of the geometric mean of the observed values. The proof
is shown below

GM = 5√x x x……….4 xN = (x1. x2.....xN)1/N

Log GM = 1/N (Logx1. Logx2.....LogxN)


GM = Antilog [1/N (Logx1. Logx2.....LogxN)]

Example 2.10:

A sample of five batteries is tested for the following hour, 2, 4, 3, 8 and 6 hours.
Find the geometric mean for the distribution?

Solution

:; = √=>?@A = √CCB= = >. C


B B

Therefore, the geometric mean of the distribution is 4 batteries.

2.4.1 GEOMETRIC MEAN FROM FREQUENCY DISTRIBUTION

The frequency power of each value of observation in the frequency distribution


table is taken with its corresponding frequency and the nth root of the overall
multiplications is taken for the geometric mean. The geometric mean for the
ungrouped and grouped frequency is given as follows;

N N P N NQ C/P
:DEFDGHIJ ;DKL (:;) = M(CC O== O … … OPP ) = ( ∏P
R Q )

Obtain the geometric mean for the data in table 5 below,

Table 5: Number of Goals by a Foot ball Team in a Series of Game

Score 1 2 3 4

Frequency 10 2 3 2

Solution

√1 S2 S3. S4 = √68. =


CW Z

2.4.2 PROPERTIES OF GEOMETRIC MEAN

• The geometric mean is useful when data contains only positive integers
• It is useful in calculating relative values such as index number
• It is suitable for skewed distributions since taking the logarithm of the
observation makes it more symmetrical and the mean then becomes a good
measure of centre.

2.5 HARMONIC MEAN

The harmonic mean (HM) of a set of N positive observations is the reciprocal of the
arithmetic mean of the reciprocals of observations

Suppose x1, x2,........ ,xN is a set of positive of N observation, then,


 &
Harmonic Mean (HM) = ∑
= ∑ /#
for ungrouped data
/& /#

Example 2.12:

Find the harmonic mean of the following scores of students in a test, 5, 3, 10, 12 and 2

Solution
B
HM = =
/  /.  /  /  /

2.5.1 HARMONIC MEAN FROM FREQUENCY DISTRIBUTION

The formula for Harmonic Mean (HM) of frequency distribution is given as follows

1
∑ f i xi ∑fi
i
Harmonic Mean (HM) = i =
∑f i
i ∑fx i i

Example 2.13:

Find the harmonic mean of data in table 6.5?

MEASURES OF PARTITION

3.0 INTRODUCTION

When presenting or analysing measurement of a variable, it sometimes helpful to


group subjects into several groups. Indeed, measures of partition are measures which
divide a distribution into equal segments or parts. For example, to create four equal
groups, we need values that split the data such that 25% of the observation is in each
group. The cut off points are called quartiles.

Other values likely to be considering in this chapter are deciles and percentile which
split the observation into ten and hundred parts respectively.

3.1 QUARTILE

In descriptive statistics, a quartile is any of the three values which the divide the sorted
data into four equal parts, so that each represents one-fourth of the sample data. The
middle one is also called the median.

First quartile: being designated ‘[’; cuts off lowest 25% of data, it is the middle value
of lower half. Second quartile designated ‘[ ’; a median of distribution cuts off data sets
into equal half and it’s 50'( percentile.
Third quartile : being designated ‘[. ’; cut off 25% of the data set or lowest 75% which is
equivalent to75'( percentile. It is the middle value of upper half. Note that the first and
third quartile are also referred to as lower and upper quartile respectively. The
difference between the two is known to be inter-quartile range. Therefore our major
concern in this section shall be on [ and [. . The formula for calculating the quartile for
ungrouped data is similar to that of the median given in section 2.12

3.1.1 QUARTILE FROM UNGROUPED DATA

The first and third quartile from ungrouped frequency distribution is given as
follows;
(&)'(
First quartile ([) = the value of position data and


& th
Third quartile([. ) = the value of 3( 
) position from ungrouped data.

If there is any even number of data items, then we need to get the average of data
(&)'( 6(&)9'( .(&)'( &
fall into 
and 
for [ and 
and 3[( 
) +1]th for [. .

Example 3.1 : The illustration given in example 7.1 below find the lower and upper
quartile in the following set of data. 12, 5, 22, 30, 7, 36, 14, 42, 15, 53, 25, (65)

Solution

Arrangement of data: 5, 7, 12, 14, 15, 22, 25, 30, 36, 42, 53

[ [ [.

Lower quartile [ =12

Median [ =22

Upper quartile [. = 36

If there are even numbers of data as included in the parenthesis of data in example 7.1

Arrangement: 5, 7, 12, 14, 15, 22, 25, 30, 36, 42, 53, 65
  .
[ = = 13, [ = = 23.5, [. = =39.
  

3.1.2 QUARTILE FROM GROUTED DATA

The lower and upper quartile from grouped frequency distribution are given as follows
&/ ∑ _
Lower Quartile [ = L[ + ( `
)c

.&/ ∑ _
Upper Quartile [. = = L[. + ( )c
`a

Where b[ and b[. are the respective lower class boundaries of the lower and upper
quartile classes. [ and [. are the respective frequencies of lower and upper quartile
classes.

∑ c is the cumulative frequency before the quartile class, and c is class size ( lower
boundaries – upper boundaries)

Inter-quartile range = [. - [ .

Example 3.2:

Find the first and second quartiles of the data in table 3.1, hence, obtain the Inter-
quartile range for the distribution

Table 3.1 Age Distribution of Members of a Society

AGE 10 – 19 20 - 29 30 - 39 40 – 49 50 – 59

FREQUENCY 8 12 13 32 35

Solution

Class Frequency Cumulative

Boundary Frequency

9.5 – 19.5 8 8

19.5 – 29.5 12 20

29.5 – 39.5 13 33

39.5 – 49.5 32 65

49.5 – 59.5 35 100

100
&  th
For first quartile  = ( 
) = 25th , LQ = 29.5, ∑ ` = 20, ` = 13, c =39.5 – 29.5 = 10
 
[= 29.5 +( ) 10
.

= 29.5 + 3.85
= 33.35

For the third quartile,


.& .()
= = (25)th, b[. = 49.5, ∑ `. = 65,
 

(% )
[. = 49.5 + 10
.

= 49.5 + 2.86

= 52.36.

Inter-quartile range = 52.36 – 33.35 =19.01

3.5 DECILES

The measure which divides the data in a distribution into ten equal parts is called
Decile. Deciles are the percentiles that are multiple of 10. First decile is the point with
10% of the data below it and 90% of the data above it while the nineth decile is the
point with 90% below it and 10% above it.

First decile (d ) is the (1/10)th, d is (1/5)th d. is (3/10)th, ----------, de is the (9/10)th
of the distribution .

These are the main points which divide a distribution into ten equal parts. The formula
for the deciles is generally given by
QP
( ∑ NhQ )
fI = ghQ + Ci
J, (I = C, =, − − −, l). The symbol is as explained as
NhQ
for Quartiles.

3.3 PERCENTILES

Percentiles are measures of partitions that divide the whole distribution into100
equal parts. Using the formula to determine percentiles the procedure is similar to that
of quartiles and Deciles.
QP
∑ NqQ
mDHJDLGInDo (mI ) = gpQ = ( Cii
)J
NqQ

Example 3.3:

Using the data in table 7.1.1, obtain the following:

a. First decile and nineth decile


b. 10th percentile and 75th percentile.

Solution
5
∑ s
Di = br + ( 
)c
s

&  th
=( ) = 10th , br = 19.5, ∑ r = 8, r = 12
 

( )
First decile D1 = 19.5 + 
X10

= 19.5 – 1.67 = 17.83


t5
∑ u
(ii) 9th Decile de = bre + (  u
)c

e& e ()
= = 90Th , bre = 49.5, ∑ re = 65, re = 35
 

(e )
de = 49.5 + .
10 = 49.5 + 7.4 = 56.54
5
∑ s
(b) 10th percent v = bw + (   )c
s

& 

=  = 1, bw =9.5, ∑ x = 0, x = 8
( )
v = 9.5 + X10
/

= 9.5 + 1.25 = 10.75


Zy5
∑ zxZy
75th percentile = bw% + (  )c
wZy

%& %
= = 75th bw% = 49.5 ∑ v% = 65, v% = 35
 

(% )
v% 49.5+ X 10
.

=49.5 + 2.86 = 53.36

MEASURES OF DISPRRESION

4.0 INTRODUCTION

Another feature of a set of date is its spread about an average. While measure of
central tendency are used to estimate normal value of a data set, measures of
dispersion are important for describing the spread of the data, or its variation around a
central value. For instance two distinct samples may have the same mean or median but
completely different levels of variability or vice versa.
A proper description of a set of data should include both of these characteristics.
There are various methods that can be used to measure dispersion of a data set, each
with its own set of advantages and disadvantages.

4.1 RANGE

Range is the difference between the largest and smallest sample values. It is seen as
the distance from the highest to the lowest value in the set of numbers. Range is the
difference between the upper boundary of highest class and the lower boundary of the
lowest class of a grouped frequency distribution.

Range = max(xi) – min(xi)

4.1.1 PROPERTIES OF RANGE

• One of the simplest measures of variability to calculate.


• It is not reliable measure of dispersion
• Depends only on extreme values and provide no information about how the
remaining data is distributed.

Examples 4.1:

Find the range of price of bag of rice in a selected market given below

N 7500, N7300, N6800, N8000 and N8100

Range = 8100 - 6800 = N 1300

4.2 INTER – QUARTILE RANGE

This measure gives the length of the interval containing the middle (50%) of the
data. It is the difference between the upper and lower quartiles as discussed in chapter
seven

Inter quartile range = [. - [.

From example 7.2, I.R. = 52.36 – 33.35 = 19.01

4.3 SEMI INTER – QUARTILE RANGE (QUARTILE DEVIATION)

Semi inter – quartile range is a measure of spread or dispersion and otherwise


known as Quartile deviation,. It is computed as one half the difference between 75th
percentile (often called [. ) and 25th percentile ([) i.e. upper and lower quartile
respectively. The formula for semi inter-quartile range is therefore given by;
`a{ |
Semi inter-quartile range =


4.3.1 PROPERTIES OF SEMI INTER-QUARTILE RANGE


 In a symmetric distribution an interval stretching semi inter quartile range from
one below the median to one.
 Since half the score in a distribution that is, between [ and[. , the semi inter
quartile range is half the distance needed to cover half the scores.

 Semi inters quartile range above the median will contain of the scores. This

may not be true for a skew distribution.
 It is little affected by extreme scores, so it is a good measure of skewed
distributions

Example 4.2:

Obtain semi inter quartile range from example 7.2


.. ... e.
S.I.R = = = 9.51
 

4.4 MEAN DEVIATION

The mean deviation of a set of observation is the arithmetic mean of all absolute
deviations from the mean. It is a measure of dispersion that spread about the mean. The
process is by finding the sum of all values of each deviation from the mean (changing all
negative values to positive) and then dividing it by number of the values.

4.4.1 MEAN DEVIATION FROM AN UNGROUPED DATA

Given the arithmetic mean of a set of data , , ----, to be ̅ The mean deviation
for ungrouped data is
∑| ̅ |
M.D = { , Where | | is an absolute value (assuming all signs to be positive) of each
&
deviation from the mean.

Example 4.3:

Find the mean deviation of the following heights of plants in an agricultural science
laboratory. 2, 5, 6, 7, 7, and 9cm

Solution
%%e .
̅ = = =6
 e

(  ) (  ) ( %  ) ( %  ) ( e  ) 
Mean deviation = 
= = 1.7


4.4.2 MEAN DEVIATION FROM UNGROUPED AND GROUPED FREQUENCY


DISTRIBUTION

Suppose we are given observations , , ----, with their respective frequency as
, , ----, , The mean deviation for such data will be given as:
∑ | ̅ |
Mean Deviation = ∑

Example 4.4.3:

Find the mean deviation of the data in table 8.4 below

Table 4.4: Number of Spoiled Matches Stick in Some Selected Packets.

Packet of matches 1–5 6 – 10 11 – 15 16 – 20 21 – 25 26 – 30

No of spoiled sticks 6 3 4 8 2 7

Solution

Class mark (x) F fx | − ̅ | f| − ̅ |

3 6 18 13 78

8 3 24 8 24

13 4 52 3 12

18 8 144 2 16

23 2 46 7 14

28 7 196 12 84

30 480 228
∑ 
S~ = ∑

/
= .
= 16

From the table, ∑ | − ̅ | = 228


/
M .D. = = 38


4.4.3 PROPERTIES OF MEAN DEVIATION

• It is relatively easy to calculate especially when the mean is a whole number as in


example 8.42
• All observations are used in the calculation
• It is not useful for further calculation or research theories.
4.5: VARIANCE AND STANDARD DEVIATION

The variance may be viewed as an average of the distance of all observed values
from the mean (but not quite, since we divide by n-1 rather than n).

If the variance is small, the most of sample values lies quite close to the sample
mean. However, if the variance is large then the sample value lies rather far from
sample mean.

Standard deviation measures the degree to which a set of values has been
spread about their mean. If the value is large, then the values in the distribution are well
spread out about their mean, clustered or otherwise.

4.5.1 THE VARIANCE E AND STANDARD DEVIATION FROM UNGROUPED DATA

The variance is the arithmetic mean of the squares of the deviation of the
observation from the true mean, the standard deviation is indeed square root of the
variance.

Given a set of observation , , ----, with the mean ̅ , the variance and standard
deviation of ungrouped (raw) data is given as follows:
( ̅ )  ∑
Variance = ∑ = (∑   – ) this formular is expand to
&  &  &

√(∑( )
S .D. = √€‚€ƒ„ =
& 

Example 4.4:

Obtain the variance and standard deviation of data in example 8.4

Solution

̅ = 6 (as calculated in example 8.4)


(   )  (   )  ( %  ) ( %  )  ( e  )
Variance =


    . /
= = = 5.6
 

Alternatively using the second formula,( it may be use without obtaining the mean).
 .
Variance =  (2 + 5 + 7 + 7 + 9 − )
 

=1/5(28) = 5.6

Standard deviation = √5.6 = 2.37

4.5 VARIANCE AND STANDARD DEVIATION FROM DISCRETE AND CONTINUOUS


(GROUPED AND UNGROUPED) FREQUENCY DISTRIBUTION.

When dealing with a grouped frequency distribution the value xi of an observation


contained in the group is the class mark of the given ith class interval for the distribution.
However in an ungrouped frequency distribution the value of xi directly represents a
given ith Observation.

Meanwhile, the procedure used in computing the variance and standard deviation in
an ungrouped and grouped frequency distribution data is the same.

The formula and process are demonstrated a follows


( ̅ )
Variance = ∑ ∑ 

 ∑ 2
By expansion variance = ∑  (∑   – ( ∑
))

and standard deviation being square root of the variance is then given as

S. D = √‡€‚€ƒ„

∑ ( )
=M ∑ 
or

 
S.D. = M6∑  (∑   - (∑ ∑ )2)


Example 4.5:

Obtain the variance and standard deviation in table 8.4?

Solution

Class mark (x) F fx ( − ̅ )2 f( − ̅ )2

3 6 18 169 1014

8 3 24 64 192

13 4 52 9 36

18 8 144 4 32

23 2 46 49 98

28 7 196 1494 1008

30 480 2380
∑ z
S~ = ∑z

/
= .
= 16

From column 5 total in the table ∑( − ̅ )2 = 2380. Hence,


./ ./
Variance = = = 82.07
.  e

4.5.1 PROPERTIES OF VARIANCE

• Since variance measures the square of the units of the observations it is difficult
to use it to compare the variation (spread) of two sets of data.
• It is relatively difficult to interpret the variance
• The variance of constant observations (same observation value) is zero
• If the variance is small, the simplest value lies quite close to the sample mean.
Therefore if the variance is large then most of the sample values lie rather far
from the sample means.

4.5.2 PROPEERTIES OF STANDARD DEVIATION

• The standard deviation being square root of the variance provides solution to
the problem of the squaring unit of data. Hence standard deviation may be used
to compare the spread of two set of data.
• It is used in further statistical text or analysis such as testing of difference of
location using t or z text.
• It makes use of all the observations
• As for the variance and mean deviation, a small standard deviation means the
sample value lie close around the mean and variance. The standard deviation is
however affected by the magnitude or change in the unit of the observation.

4.6 COEFFICIENT OF VARIATION

Coefficient OF Variation (C. V.) describes the magnitude of sample values and the
variation within them. It corrects difference of spread in magnitude of observations. For
example, consider the following sets of data on price of two commodities in four
markets

Commodity1 (₦): 5.1, 2.8, 4.0, 3.8, ̅ = 3.9, S.D = 0.9

Commodity2 (₦): 51.0, 28.0, 40.0, 38.0, ̅ = 39, S.D = 9

It can be observed from the two data sets that both mean and standard deviation are
different (3.9, 0.9) for commodity 1 and (39, and 9) for commodity 2. This means that
commodity 2 has a greater spread than commodity 1. However, coefficient of variation
is use in correcting the magnitude in the variability of the two commodities.

Coefficient of Variation (CV) is the ratio of standard deviation (SD) to the mean
(̅ ) i.e.
ˆr
C.V =


Example 4.6:

Using the two data sets in section 8.6 for commodity 1:


.e
CV = ..e
= 0.23

For commodity 2:
e
CV = = 0.23
.e

This indicates that the two data sets have equal variability.

4.6.1 PROPERTIES OF COEFFICIENT OF VARIATION

• The coefficient of variation is sometimes expressed in percentage. The lower the


C.V the smaller the spread
• As for the mean deviation and standard deviation it is use in comparing two or
more sets of data.

MOMENT, SKEWNESS AND KURTOSIS

5.0 INTRODUCTION

A fundamental task in much statistical analysis is to characterise the location and


variability of a data set. A further characterisation of the data includes moment,
skewness and kurtosis. Both skewness and kurtosis are obtained directly from the
moment. The description of each is given in section 9.1, 9.2 and 9.3.

5.1 MOMENT

Moment of a set value is the summation over the power of the set of values. In
other words, they are the expectation of the powers of the set of values. They can be
classify into two namely Raw and Central moments.

5.1.1RAW MOMENT (MOMENT ABOUT THE ORIGIN)

Given a set of observations , , ----, the rth moment about the origin (Zero) is

given by ‰Š̀ = ∑ &
for ungrouped/per data
 Œ
‰Š̀ = ∑ ∑ for ungrouped/grouped frequency


The first moment about the origin is mean i.e.


∑ ∑ 
‰ = &
= ∑

Example 5.1:

Find the first, second and third moment about the origin for the followingvalues ; 2, 4,
5, 3, and 1

Solution

First raw moment when r = 1 is


∑ .
‰ = = = 3(Mean)
& 

Second raw moment (where r=2) is


∑    .  
‰ = &
= 
= 
= 11

Third raw moment (when r=3) is


a a a .a a 
‰. = 
= = 45


Example 5.2

Find the first second and third moment for data in table 9.2 below;

TABLE 9.2: Time Spent By a Number of Users of an ATM Machine.

Time (min) 4 6 8 10 12

ATM 42 5 1 3 4

Solution

Time (x) f fx x2 fx2 x3 Fx3

4 2 8 16 32 64 128

6 5 30 36 180 216 1080

8 1 8 64 64 512 512
10 3 30 100 300 1000 3000

12 4 48 144 576 1788 6912

15 124 0 1,152 11,632

∑ 
First raw moment ‰ = (from equation)
∑

= 124 / 15 = 8.27
∑ 
Second raw moment ‰ = (from equation)
∑

= 1152 / 15 = 76.8
∑  a
Third raw moment ‰. = (from equation)
∑

= 11632 / 15 = 775.47

)
5.2 CENDTRAL MOMENT (MOMENT ABOUT THE MEAN 

Given a set of observation x1, x2, - - - - - - - - -, xn, the rth moment about the mean or
∑( ~~~
)Œ
rth central moment is given by Mr = for ungrouped data


∑ ( ~~~
)Œ
‰  = ∑
for ungrouped and grouped frequency distribute on data.

Where ̅ is the sample mean an unbiased estimate of ‰ (ISt raw moment)

5.2.1 PROPERTIES OF CENTRAL MOMENTS

• The first central moment is zero


• The second central moment gives variance of the distribution.
• The third and fourth of the central moment are used to obtain skewness and
kurtosis respectively.

Example 5.3 : obtain Ist and 2nd and 3rd central moment of data in example 9.2
Solution
 = 3 from the first raw moment, from the table blow

X (x - ̅ ) (x - ̅ )2 (x - ̅ )3

2 -1 1 -1
4 1 1 1

5 2 4 8

3 0 0 0

1 -1 4 -8

0 10 0

From the Ist , 2nd and 3rd columns totals of the table, we have,
∑( − ̅ ) = 0

First central moment M1 = 0/5 = 0

∑(x − ̅ )2 =10
∑(4 –  ) 
Second central moment M2 = = =2
 

∑(x − ̅ )3 = 8
∑(4 ̅ ). 
Third central moment M3 = 
==0

Example 5.4, obtain the first, second and third moments of data in table 9.2

Solution

X F Fx x − ̅ f(x − ̅ ) (x − ̅ )2 f(x − ̅ )2 (x − ̅ )3 f(x − ̅ )3 f(x − ̅ )4

4 2 8 -4.27 -8.54 18.23 36.47 -77.85 -155.70 664.67

6 5 30 -2.27 -11.35 5.15 25.76 -11.70 -58.50 132.61

8 1 8 -0.27 -0.27 0.073 0.07 -0.02 -0.02 0.005

10 3 30 1.73 5.19 2.99 8.98 5.17 15.51 26.82

12 4 48 3.73 14.92 13.91 54.65 51.90 207.58 773.95

15 124 -0.05 126.93 8.87 1,597.0

S~ =
∑
= 124 = 8.27
∑

Using the formula, from the column 5, 7 and 9 of the table;

∑f(x − ̅ ) = -0.05
.
M1 = = -0.003 = ≈ 0


f(x − ̅ )2 = 126.93
e.e.
M2 = = 8.462


/./%
M3 = = 0.519


∑(x − ̅ )4 = 1,597.0
e%
∴ M4 = 
= 106.53

5.3 SKEWNESS

Skewness is a measure of symmetry or more precisely the lack of symmetry. A data or


distribution is symmetric if it looks the same to the left and to the right of the centre
point. The mean, median and mode are equal in the symmetric distribution. See the
figure 9.2 below for illustration of a symmetric curve. For a set of observation x1, x2 - - - -
-xn, the formula for skewness is given as follows:
‘a
A2 = ˆa

Where M3 and S3 are third moment and standard deviation respectively.

5.3.1 PROPETIES OF SKEWNESS

• The skewness for a normal distribution is zero and any symmetric data
should have a skewness near zero.
• Negative values of the skewness indicate data that are skewed left and
positive values of skewness indicate data that are skewed right.
• By skewed left, we mean that the left tail is longer than the right. Similarly
skew right means vice versa. However some measurements have lower
bound and are skewed right.

5.4 kurtosis:

Kurtosis is a measure of whether the data are peaked or flat relative to a normal
distribution. That is, the data sets with high kurtosis trend to have a distinct peak
near the mean decline rather rapidly and have heavy tails. The data sets with low
kurtosis trends to have a flat top near the mean value rather than a sharp pick. A
uniform distribution would be extreme case.

The histogram is an effective graphical technique for showing the skewness


and kurtosis of a data set.
Given a set of observation x1, x2 - - - - -xn the formula for kurtosis is stated as
follows
‘’
Kurtosis = where m4 is the fourth central moment.
“’

5.4.1 PROPERTIES OF KURTOSIS

• Positive kurtosis indicates a peaked distribution while negative value of kurtosis


indicates a flat distribution.
• The standard normal distribution has a kurtosis of zero.

Example 5.3

Find the Skewness and Kurtosis for data in example 9.4 hence and interpret your
answers.

Solution

∑(4 ̅ )
S=M
∑ 

From table: ∑(x − ̅ )2 = 126.93, M3 = 0.59, M4 = 106.53, ∑ =15 and

.e.
S=M = 3.01


Therefore,
‘ .e
Skewness = ˆaa = ... = 0.02, since the value of the skewness is positive and close to
zero, the data skewed to right and it can also be taken as a symmetric data.
‘’ ..
Kurtosis = ˆ’
= ...
= 3.906 (the distribution is a peaked)

6.0 Correlation and Regression


6.1 Correlation: This is a degree of relationship between two or more
variables. Many times the occurrence of an outcome depends upon or affects
some other things. For example, the litres of fuel consumption of a car may
affect the number of kilometres travelled by the car, blood pressure affects
the age of persons, per capital income affects standard living and so on.
When it has been established that one thing is affecting another, we say a
functional relationship exists between them.
Functional relationship can be shown on a graph sheet known as scatter
diagram
Types of Scatter diagram
1. Perfect positive correlation (r = 1)
2. Perfect negative correlation (r = -1)
3. Partially positive correlation (0<r < 1)
4. Partially negative correlation(-1<r<0)
5. Zero correlation (r = 0)
6.1.1 Pearson Product Moment Correlation
This is given as follows;
ƒ ∑ ” − ∑  ∑ ”
= , −1 ≤  ≤ 1
•6∑   − (∑ ) 96∑ ”  − (∑ ”) 9
Example 6.1
The following data are obtained by ten students in two subjects
English: 7 4 5 5 7 4 6 4 6 3
Mathematics: 8 5 5 6 7 4 7 5 6 4
a. Draw the Scattered diagram
b. Compute the person product moment correlation coefficients and
interpret the results
Solution
N = 10, ∑  = 51, ∑ ” = 57, ∑ ” = 306, ∑   = 277, ∑ ”  = 341
10306 − 5157
= = 0.93
√(10277 − 51 )(10341 − 57 )
6.1.2 Spearman’s Rank Correlation
In rank correlation, the data are ranked in ascending or descending order of
magnitudes. The rank correlation are usually used when we have a
qualitative data to measure the strength of relationship between two or more
variables that are qualitative
This is given as follows;
6 ∑ —
 =1− ,
ƒ(ƒ − 1)
Where d is the difference between ranks i.e
d = Rx - Ry
Example 6.2
In the table below are the grades of nine students in two examinations
I 85 87 97 50 96 85 91 93
II 79 46 86 93 83 27 84 95
Calculate the rank correlation coefficient to determine the strength of the
relationship between the grades
Let x be the grade in the first examination and y in the second examination.
The relevant computation is given as follows;
Rx Ry d d2
2.5 4 -1.5 2.25
4 2 2 4
9 7 2 4
1 3 -2 4
8 8 0 0
7 5 2 4
2.5 1 1.5 2.25
5 6 -1 1
6 9 -3 9
30.5
n = 9, ∑ —  = 30.5, therefore,
..
 =1− = 0.746
e(/ )

6.2 Regression
Regression is a measure of relationship between two or more variables, one
being dependent variable or response and others are independent variables or
predictors. The regression equation generally is given as follows
y = α + βx + e
Where,
y is the dependent variable or response
x is the independent variable or predictor
α is a constant value
β is a coefficient of independent variable
e is the error term which is normally and identically distributed with mean 0
and variance σ2. This type of model stated earlier is referred to as simple
linear regression model. For the purpose of this course we shall restrict
ourselves to the model of this form.
Note that the parameter of the model, α and β can be estimated using least
square method as follows;
From the model,
y = α + βx + e
e = y - α – βx
To minimise the error term we can take the square and sum over the error
terms
e2 = (y - α – βx)2
∑ 
= ∑(y − α – βx)
› = ∑(y − α – βx)
œˆ
= −2 ∑žy − α – βxŸ = 0
œ
∑ ” = ∑  + ¡ ∑   … … … … … … . .1
œˆ
= −2 ∑žy − α – βxŸ = 0
Ϣ

∑” = ƒ + ¡∑………………………2
Multiply equation 1 by n and 2 by∑ , this gives
ƒ ∑ ” = ƒ ∑  + ƒ¡ ∑   … … … … … … . .3
∑  ∑ ” = ƒ ∑  + ¡(∑ ) … … … … … … .4
Subtract 3 from 4, we have,
ƒ ∑ ” − ∑  ∑ ” = ƒ¡ ∑   − ¡(∑ ) , therefore
 ∑ £ ∑  ∑ £
¡= and
∑ (∑ )
∑£ ∑
= = ”~ − ¡̅


Example 6.3
Eight undergraduate students were surveyed in a study involving time spent
on the Internet and their grade point average (GPA). The results are shown
in Table below. x is the amount of time spent on the Internet weekly and y is
the GPA of the student.
Hours GPA
11 2.84
5 3.20
22 2.18
23 2.12
10 2.90
19 2.36
15 2.60
18 2.42
a. Fit a straight line regression to the data and give the values of α and β
b. What will it be the GPA of a student who spent 40 hours in the
internet weekly?
Solution

¤  = 123, ¤   = 2169, ¤ ” = 20.62, ¤ ” = 300.36

ƒ ∑ ” − ∑  ∑ ”
¡=
ƒ ∑   − (∑ )
8300.36 − 12320.62
¡= = −0.06
682169 − (123) 9
∑” −¡∑
=
ƒ
20.62 − (−0.06)123
= = 3.5
8
The model is therefore being given as;
”¥ = 3.5 − 0.06

7.0 INDEX NUMBER

This is a ratio or an average of ratios express as a percentage to show changes in prices,


quantities and values with respect to time and geographic location. The objective of
measuring changes is for comparison.

7.1 Uses of Index Numbers

(1) It is used to show changes in the price of a single commodity over time
(2) It is used to know how the price level of a group of related commodities change
with time
(3) Index series serve as of comparison
(4) It aids decision making

7.2 Types of Index Number

1. Simple Index Number


2. Weighted Index Number
3. Chain Index Number

1. Simple Index Numbers


This involves only one item in comparisons. This may be price or quantity relative to
index. The relative period of measurement is called the base or reference period. The
simple price relative index is given by
x
Simple price relative index (In) =
x
100
¦
Simple price relative index (In)= 100, where n represents the current year
¦

Example 7.1

Consider the prices of a fish for three years: 1990, 2000, 2010

Year 1990 2000 2010

Price 20 30 80

Using 1990 as the base period, calculate relative price index for 1990, 2000 and 2010

Price Relative Index = pn/p0 x100

I90 = 20/20 x 100 = 100%

I00 = 30/20 x 100 = 150%

I10 = 80/20 x 100 = 400%

This is to say that the prices of fish for 1990, 2000 and 2010 were 100, 150 and 400%
respectively of what it was in 1990. There was an increase of 50% and 300%respectively

Example 7.2

Compute arithmetic mean and simple aggregate price of relatives’ index of the following
items using 2011 as a base period

Item year

2011 2012

Maize 85 110

Beans 60 125

G. Con 100 150

Solution
∑ ¦ /¦
The mean of relatives’ index =


e./...
I11 = .
= 162.58%
This shows that quantities have risen by 62.58%
x
b. simple aggregate price = ∑ x
¨


= 100 = 157.14%
/

7.3 Weighted Index Number

Under the weighted aggregate method, the aggregate is found for each period by
multiplying the prices by their respective weights and a new figure for the total is
calculated as a percentage of the base total. Hence, in calculating the price index for a
group of commodity using the weighted aggregate method, the quantities for the
respected items serve as the weights. weighted aggregate price index can be
constructed in two ways namely; Laspeyres and Paasche’s

Laspeyres’ Method: in this case we use base-period quantities as weights. The


Laspeyres formula for weighted aggregate index is given as follows;
∑ « ¬
©ª =
∑ « ¬

Where,

Il is price index

pn is given (current) year unit price

pn is base year unit price

q0 is base year quantity

7.4 Paasche’s Method: the difference between this and Laspeyres formula is that
Paasche’s formula uses current year quantities qn as the base period instead of base
year quantity q0. The formula Paasche’s for weighted aggregate index is given as
follows;
∑ « ¬
©ª =
∑ « ¬

The quantity index can be calculated as follows

a. Laspeyres’ Quantity index is


∑ ¬ ¬
©ª =
∑ « ¬
b. Paasche’s weighted aggregate p is
∑ « ¬
©ª =
∑ « «
Compute the Laspeyres and Paasche’s weighted aggregate price index for the table
below

Price Unit(N) Quantity Unit


Commodity Unit 1980(p0) 1982(p1) 1980(q0) 1982(q1)
Potatoes Kg 40k 50k 20 30
Bread Loaf 50k 60k 50 70
Peak Milk Tin 20 40k 80 90
Eggs Dozen N 1.20k N 2.00k 20 20

Solution

Commodity Unit 1980 1982 1980 1982 P 0q 0 P 1q 0 P 1q 1 P 0q 1


(p0) (p1) (q0) (q1) (N) (N) (N) (N)
Potatoes Kg 40k 50k 20 30 8.00 10.00 15.00 12.00
Bread Loaf 50k 60k 50 70 25.00 30.00 42.00 35.00
Peak Milk Tin 20 40k 80 90 16.00 32.00 36.00 18.00
Eggs Dozen N1.20k N 2.00k 20 20 24.00 40.00 40.00 24.00
Total 73.00 112.00 133.00 89.00
a. Laspeyres’ weighted aggregate price is
∑x ¦ 
©ª = ∑ x ¦ = = 1.532 = 153.4%
  %.
b. Paasche’s weighted aggregate price is
∑x ¦ ..
©ª = ∑ x ¦ = /e
= 1.494 = 149.4%
 

The computed indices can be interpreted as follows;

The Laspeyres’ index of 153.4% shows that there has been an increase of 53.4% in the
price of the group of items in 1982 relative to the 1980 price levels; while Paasche’s
index of 149.4%, this increase constitutes 49.4%

The quantity index can be calculated as follows

c. Laspeyres’ Quantity index is


∑¦ ¦ /e
©ª = ∑ x ¦ = %. = 1.219 = 121.9%
 
d. Paasche’s weighted aggregate price is
∑x ¦ ..
©ª = ∑ x x =  = 1.188 = 118.8%
 

The Laspeyres’ Quantity index indicates there is a 21.9% increase in the quantity of
items purchase in 1982 over the 1980 level while The Paasche’s Quantity index indicates
there is a 18.8% increase in the quantity of items purchase in 1982 over the 1980 level

You might also like