Statistics and Basic Distribution - Mabe
Statistics and Basic Distribution - Mabe
JULY, 2018
Copyright
All rights reserved. No part of this book may be reproduced, photocopied or stored in a retrieval
system by anybody or transmitted to any person without the permission of the Author, Dr.
Franklin N. Mabe and the publisher, Institute of Distance and Continuing Education (IDCE),
University for Development Studies.
Publisher
Institute of Distance and Continuing Education (IDCE), University for Development Studies,
Tamale.
July, 2018
Author:
Dr. Franklin N. Mabe
Department of Agricultural and Resource Economics
University for Development Studies
Tamale
[email protected]
+233-242760053
Editorial Team
Dr. Hamdiya Alhassan
Dr. Abdul-Basit Tampuli
Typesetting
Bright K. D. Tetteh
Dominc Konja Tasila
Teaching Assistants
Scholastica Atara
Makafui Adzo Dikro
i
Acknowledgements
The author, Franklin N. Mabe expresses his sincere appreciation to the Director of Institute of
Distance and Continuing Education, Dr. Ebenezer Owusu-Sakyere for providing this
opportunity for him to write this book. The works of editorial team, Dr. Hamdiya Alhassan,
Dr. Abdul-Basit Tampuli, typesetters, Bright K. D. Tetteh, Dominic Konja Tasila and the
teaching assistants, Scholastica Atara, Richard Sulemana Danaa and Makafui Adzo Dikro are
much appreciated.
ii
Preface
The book, Statistics and Basic Distribution is the official and recommended manual for the
distance and continuing education students in University for Development Studies. It is the
textbook and manual for introductory statistics course for distance education students but it can
be used by regular undergraduate students. Postgraduate students can also use it as a revision
material. The book provides easy to read and understand notes and worked example questions
for students who are not enrolled for full-time programme. It is also a perfect reading material
for students who are not studying quantitative programmes such as mathematics, economics,
statistics, physical sciences and engineering.
The motivation for writing this book is to provide a quality and easy-to-read and understand
textbook and manual for students. Many statistics books are difficult for students to
comprehend. For easy comprehension, this book focuses on the calculation and interpretation
of statistical results, especially in real world settings taking cognizant of the fact that the
students taking the course are not quantitatively inclined. The solved examples in the book are
based on our everyday reality problems. The examples are also localised and fit well in the
Ghanaian context. In writing the manual, the level of students’ maturity was highly taken into
considered.
iii
Table of Contents
Acknowledgements ....................................................................................................................ii
Preface...................................................................................................................................... iii
Table of Contents ...................................................................................................................... iv
LIST OF TABLES .................................................................................................................... ix
LIST OF FIGURES ................................................................................................................... x
CHAPTER ONE: INTRODUCTION TO STATISTICS .......................................................... 1
Unit 1: Origin and Definition of Statistics ................................................................................. 2
1-1.1 Origin of Statistics ...................................................................................................................... 2
1-1.2 Meaning of the Word “Statistics” ....................................................................................... 3
1-1.3 Definition of Statistics as a Subject ..................................................................................... 3
1-1.4 Definition of Statistics as a Body of Numbers or Information ............................................ 3
1-1.5 Statistics in Education, Medicine, Business and Agriculture .............................................. 3
1-1.5.1 Educational Statistics .......................................................................................................... 3
1-1.5.2 Medical Statistics ................................................................................................................ 4
1-1.5.3 Business Statistics ............................................................................................................... 4
1-1.5.4 Agricultural Statistics .......................................................................................................... 4
Self-Assessment Test 1.1 .................................................................................................................... 4
Unit 2: Importance of Study of Statistics ................................................................................... 5
1-2.1 Importance of the Study of Statistics.................................................................................. 5
1-2.2 Importance of the Study of Educational Statistics .............................................................. 5
1-2.3 Four Main Reasons for the Study of Educational Statistics ................................................ 6
1-2.3.1 Monitor Student's Performance and Progress ................................................................... 6
1-2.3.2 Help Teachers Evaluate their Own Performance................................................................ 6
1-2.3.3 Evaluate Performance of Subjects ...................................................................................... 6
1-2.3.4 Importance of Statistics in Educational Management........................................................ 6
1-2.4 Everyday Uses of Statistics .................................................................................................. 7
Self-Assessment Test 1.2 .................................................................................................................... 7
Unit 3: Descriptive Versus Inferential Statistics ........................................................................ 8
1-3.1 Overview of Descriptive Versus Inferential Statistics ......................................................... 8
1-3.2 Definition of Descriptive statistics ...................................................................................... 8
1-3.3 Definition of Inferential statistics .............................................................................................. 9
1-3.4 Key Differences between Descriptive and Inferential Statistics ................................................ 9
Self-Assessment Test 1.3 .................................................................................................................... 9
iv
Unit 4: Terms in Statistics and Scales of Statistical Measurements ........................................ 11
1-4.1 Definition of terms ............................................................................................................ 11
1-4.2 Measurement Scales ......................................................................................................... 12
1-4.2.1 Nominal Versus Ordinal Scales ......................................................................................... 12
1-4.2.2 Interval Versus Ratio Scales .............................................................................................. 13
1-4.3 Types of Variables ............................................................................................................. 13
1-4.3.1 Discrete/Categorical Versus Continuous Variables .......................................................... 13
1-4.3.2 Dependent Versus Independent/Explanatory Variables .................................................. 13
1-4.3.3 Ordered and Unordered Variables ................................................................................... 14
1-4.3.4 Quantitative Versus Qualitative Variables ........................................................................ 14
Self-Assessment Test 1.4 .................................................................................................................. 14
CHAPTER TWO: RELATIONSHIPS AND REPRESENTATION OF DATA .................... 15
Unit 1: Meaning and Sources of Data ...................................................................................... 15
2-1.1 Definition of Data .............................................................................................................. 15
2-1.2 Sources of Data ........................................................................................................................ 15
2-1.3 Kinds of Data ............................................................................................................................ 16
2-1.3.1 Numerical Versus Categorical Data .................................................................................. 16
2-1.3.2 Univariate, Bivariate and Multivariate Data ................................................................. 16
2-1.3.3 Qualitative Versus Quantitative Data ........................................................................... 16
2-1.3.4 Cross-Sectional, Time Series and Panel Data .................................................................... 17
Self-Assessment Test 2.1 .................................................................................................................. 18
Unit 2: Concepts of Relationships and Data Representation ................................................... 19
2-2.1 Concepts of Relationship .................................................................................................. 19
2-2.2 Meaning of Data Representation ...................................................................................... 19
2-2.3 Importance of Pictorial Data Representation ................................................................... 19
2-2.4 Effective Pictorial Representation of Data ........................................................................ 20
Self-Assessment Test 2.2 .................................................................................................................. 20
Unit 3: Symbol Chart and Graphical Representation of Data .................................................. 21
2.3.1 Symbol Charts or Diagrams ............................................................................................... 21
2-3.2 Bar Charts/Graphs............................................................................................................. 21
2-3.3 Single Column Bar Chart/Graph .................................................................................... 22
2-3.4 Single Horizontal Bar Chart/Graph................................................................................ 22
2-3.5 Multiple or Compound Column Bar Chart/Graph......................................................... 22
2-3.6 Multiple or Compound Horizontal Bar Chart/Graph .................................................... 23
2-3.7 Component or Stacked Column Bar Chart/Graph ........................................................ 23
2-3.8 Component or Stacked Horizontal Bar Chart/Graph .................................................... 24
v
2-3.9 Line Charts or Graphs ........................................................................................................ 24
2-3.10 Single or simple line graph ............................................................................................ 25
2-3.11 Multiple or compound line graph ................................................................................. 25
2-3.12 Component Line Chart or Graph ................................................................................... 26
2-3.13 Histogram .......................................................................................................................... 27
2-3.14 Pie Chart ............................................................................................................................ 27
Self-Assessment Test 2.3 .................................................................................................................. 28
CHAPTER THREE: SUMMARISING AND DESCRIBING DATA: STATISTICAL
MEASURES ............................................................................................................................ 30
Unit 1: Introduction to Statistical Measures ............................................................................ 31
3-1.1 Definition of Statistical Measures ..................................................................................... 31
3-1.2 Types of Statistical Measures............................................................................................ 31
3-1.3 Measure of central tendency ............................................................................................ 31
3-1.4 Measure of Variation/Dispersion...................................................................................... 32
3-1.5 Measure of Positions ........................................................................................................ 33
3-1.6 Measure of Shape ............................................................................................................. 33
Self-Assessment Test 3.1 .................................................................................................................. 34
Unit 2: Measure of Central Tendency ...................................................................................... 35
3-2.1 Introduction to Measure of Central Tendency ........................................................................ 35
3-2.2 Mean ........................................................................................................................................ 35
3-2.2.1 Arithmetic Means of Ungrouped Data ............................................................................. 35
3-2.2.2 Arithmetic Mean of Grouped Data ................................................................................... 36
3-2.2.3 Arithmetic Mean When Probability is Given .................................................................... 37
3-2.2.4 Weighted Arithmetic Mean .............................................................................................. 38
3-2.3 Median ..................................................................................................................................... 39
3-2.3.1 Median for Ungrouped Data............................................................................................. 39
3-2.3.2 Median for Grouped Data................................................................................................. 40
3-2.4 Mode........................................................................................................................................ 41
3-2.4.1 Mode of Ungrouped Data ................................................................................................. 42
3-2.3.2 Mode of Grouped Data ..................................................................................................... 42
3-2.4 Midrange.................................................................................................................................. 43
Self-Assessment Test 9...................................................................................................................... 44
Unit 3: Measures of Dispersion/Variation ............................................................................... 46
3-3.1 Introduction ............................................................................................................................. 46
3-3.2 Range ....................................................................................................................................... 46
3-3.3 Interquartile Range (IQR) ......................................................................................................... 47
vi
3-3.3 Semi-Interquartile Range ......................................................................................................... 48
3-3.4 Variance ............................................................................................................................ 48
3-3.4.1 Variance for Ungrouped Data ........................................................................................... 49
3-3.4.2 Variance for Frequency Distribution Table ....................................................................... 50
3-3.5 Standard Deviation .................................................................................................................. 52
3-3.6 Mean deviation MD ................................................................................................................. 53
Self-Assessment Test 3.3 .................................................................................................................. 54
Unit 4: Measures of Position.................................................................................................... 55
3-4.1 Introduction to the Measure of Position .......................................................................... 55
3-4.2 Quartiles ............................................................................................................................ 55
3-4.2.1 Quartiles of Ungrouped Data ........................................................................................ 55
3-4.2.2 Quartiles for Grouped Data of Frequency Distribution .................................................... 58
3-4.3 Deciles and Percentiles ..................................................................................................... 61
3-4.4 Steps in Finding a Value Corresponding to a Given Percentile ......................................... 62
Self-Assessment Test 11 ................................................................................................................... 64
UNIT 5: MEASURES OF SHAPE .......................................................................................... 66
3-5.1 Meaning of Measure of Shape .......................................................................................... 66
3-5.2 Skewness ........................................................................................................................... 66
3-5.2.1 Normal or Symmetric Distribution ................................................................................... 66
3-5.2.2 Positive Skewness or Right Skewed .................................................................................. 67
3-5.2.3 Negative Skewness or Left Skewed .................................................................................. 67
3-5.3 Test of skewness ............................................................................................................... 68
3-5.4 Measure of skewness ........................................................................................................ 68
3-5.5 Kurtosis ............................................................................................................................. 70
3-5.6 Moments ........................................................................................................................... 71
Self-Assessment Test 3.5 .................................................................................................................. 73
CHAPTER FOUR: SAMPLING ............................................................................................. 75
Unit 1: Probability Sampling ................................................................................................... 76
4-1.1 Sample, Sampling and Population ........................................................................................... 76
4-1.2 Purpose of Sampling ................................................................................................................ 76
4-1.3 Types of Sampling .................................................................................................................... 76
4-1.4 Probability Sampling ................................................................................................................ 76
4-1.4.1 Simple random sampling .................................................................................................. 77
4-1.4.2 Stratified random sampling .............................................................................................. 77
4-1.4.3 Cluster sampling: .............................................................................................................. 79
4-1.4.4 Systematic sampling: ........................................................................................................ 80
vii
Unit 2: Non-Probability Sampling ........................................................................................... 82
4-2.1 Non probability-sampling ........................................................................................................ 82
4-2.2 Purposive sampling: ................................................................................................................. 82
4-2.3 Quota sampling:....................................................................................................................... 82
4-2.4 Convenient sampling: .............................................................................................................. 83
4-2.5 Snowball sampling ................................................................................................................... 83
4-2.6 Judgmental sampling ............................................................................................................... 84
4-2.7 Consecutive sampling .............................................................................................................. 84
CHAPTER FIVE: PROBABILITY ......................................................................................... 85
Unit 1: Introduction to Probability........................................................................................... 86
5-1.1 Definition of Probability........................................................................................................... 86
5-1.2 Basic Probability Concept ........................................................................................................ 86
Unit 2: Simple and Mutually Exclusive Events ....................................................................... 87
5-2.1 Types of Events ........................................................................................................................ 87
5-2.2 Simple or Single Events ............................................................................................................ 87
5-2.3 Addition rule ............................................................................................................................ 87
5-2.4 General Multiplication Rule ..................................................................................................... 88
5-2.5 Joint probability ....................................................................................................................... 89
5-1.7 Probability Rule of Multiplication for Independent Event ...................................................... 90
Self-Assessment Test 15 ................................................................................................................... 90
References ................................................................................................................................ 91
viii
LIST OF TABLES
ix
LIST OF FIGURES
x
CHAPTER ONE: INTRODUCTION TO STATISTICS
Introductory Remarks
The study of statistics has become more popular in the recent years due to the upsurge in the
usage of computers and statistical software packages. Almost every profession requires some
knowledge in statistics. In light of this, students in almost all disciplines are required to take at
least a course in statistics. The use of graphs and statistical tests is common in the analysis of
data for research today.
Hello, you are welcome to chapter one of this course, Basic Statistics. I believe you enjoyed
Basic Mathematics that you were taught in the first trimester. This chapter introduces you to
the concept of statistics and its importance. Note that you are going to be writing your
undergraduate dissertation in final year and hence you would be working with data related to
your field of study. Therefore, it is important you study this course, Basic Statistics for
Undergraduates. By the end of this chapter you would understand the reasons why this course
is cardinal in all fields.
In this chapter, you would be introduced to the concepts and definition of terms in Basic
Statistics. Your duty as a student is to develop the strong interest in the course and this cannot
be done without taking the pains to understand this first chapter of the book. The chapter is
divided into four units namely:
Unit 1: Origin and Definition of Statistics
Unit 2: Importance of Study of Statistics
Unit 3: Descriptive Versus Inferential Statistics
Unit 4 Terms in Statistics and Scales of Statistical Measurements
1
Unit 1: Origin and Definition of Statistics
Welcome to unit one of this chapter. It is my fervent hope that you will develop a strong interest
in Basic Statistics in this unit. Many at times, students complain about how statistics is being
taught. Most of these complains are engineered by the phobia of Mathematics developed since
elementary education. You are probably asking yourself the question, "When and where will I
use statistics?". I can assure you would find answers after studying this unit.
Unit Objectives
By the end of this unit, you should be able to:
Define the term statistics
Name and explain the branches of statistics
Know the differences between ancient statistics and modern statistics
Develop the desire to study statistics
1-1.1 Origin of Statistics
Statistics was used unofficially before 18th century. It was used during biblical Moses era and
also at the time of the birth of Jesus Christ (Roman census). The use of statistics was not
common. Statistics was used when rulers and kings needed information about lands,
agriculture, commerce, population of their states etc. to assess their military potentials, their
wealth, taxation and other aspects of government. The early use of statistics did not involve
much analyses and interpretation.
The first official use of statistics was in the 18th century. This official use of statistics occurred
in 1801 in England during a population census which included the collection, analysis and
interpretation of data. During the 20th century, several statisticians have actively developed
new methods, theories and applications of statistics. The availability of statistical computer
sofwares such as SPSS, Stata, Eviews etc for modern data analysis is a clear indication of
modern development of statistics.
In Ghana, chiefs unofficially used to count their subjects and used that information to share
how much tolls each individual was to pay. The application of statistics in Ghana dates back to
the British colonial rule. In the then Gold Coast, the British Colonial Government conducted
the first population census in 1891 and this was done mainly along the coast. After the Northern
and the Ashanti Territories were put under British rule, another census was conducted in 1901.
After Ghana gained independent, more censuses have been conducted and the latest was the
2010 population and housing census.
There is a big difference between the old statistics and the modern statistics. It is important to
note that the old statistics is part of the present statistics implying the ancient statistics is a
subset of modern statistics. Ancient statistics did not have statistical software to do rigorous
analysis.
Now, mention two differences between the old and the modern statistics.
1. ………………………………………………………………………………………
2. ………………………………………………………………………………………
2
1-1.2 Meaning of the Word “Statistics”
Have you heard the word statistics before? If yes, have you look at its meaning? If you do not
know the meaning, do not worry. Here is the literary meaning of the word “statistics”.
The Word statistics is derived from the Latin word “Status” or the Italian word “Statista”.
Status as in Latin or Statista as in Italian means “Political State” or a “Government”. In other
words, it means a “Statesman”.
There are numerous definitions of statistics. For the purpose of this course, we shall look at the
definition of statistics as Numerical Data and Statistics as a Subject. A number of specialties
have evolved to apply statistical theory and methods to various disciplines.
1-1.3 Definition of Statistics as a Subject
There are different subjects. Since your elementary school, you have studied many subjects.
Can you remember some? Obviously, you can remember Mathematics, Science, English,
Geography etc. Each of the subjects has its own definition.
For the purpose of this course, Statistics as a subject is defined as the science of collecting,
organizing, presenting, analyzing and interpreting data for the purpose of assisting in
making a more effective decision (Mason, 1990). As a subject, the term statistics is often used
as a statistical procedure and hence it is a science which follows certain steps. Therefore, the
order of arrangement of key words in the definition must not be mixed. It should be in the
order; collection of data → organization of data → presenting of data → analyzing data →
interpreting analyzed data →decision making.
1-1.4 Definition of Statistics as a Body of Numbers or Information
As a body of numbers, Statistics can be said to be the raw numerical facts or data collected
for analysis. It can also be defined as the information generated from numerical facts or data
after the analysis. With this, one can talk of demographic statistics, agricultural statistics, health
statistics, educational statistics etc.
1-1.5 Statistics in Education, Medicine, Business and Agriculture
1-1.5.1 Educational Statistics
Educational statistics falls under both statistics as a subject and statistics as a body of numbers
or information.
As a subject of study, educational statistics is the study of the methods and procedures used in
collecting, organizing, analyzing and interpreting data or information relating to education
for effective decision making (Etsey, 2014). It also entails making meaning out of raw data
available in the field of education. At the end of this module, you would understand educational
statistics very well and apply it in an office or classroom.
As a body of numbers or information, educational statistics is defined as a body of numbers or
data or information on education. Information such as the school enrolment, number of
teaching staff, literacy rate, rate of school dropout, pupils’ attendance, male-female student
ratio etc deal with educational statistics.
3
1-1.5.2 Medical Statistics
Similarly, medical statistics can be defined as a subject and as a body of numbers or
information.
Medical statistics is the study of the methods and procedures used in collecting, organizing,
analyzing and interpreting data or information relating to medicine and health sciences for
effective decision making (Armitage, Berry and Matthews, 2002).
As a body of numbers or information, medical statistics is defined as a body of numbers or
data or information on medicine and health sciences. For instance, information on births,
deaths, temperature, weight, blood pressure are called vital statistics in medical field.
1-1.5.3 Business Statistics
Business statistics can also be defined as a subject and as a body of numbers or information.
Business statistics is the study of the methods and procedures used in collecting, organizing,
analyzing and interpreting data or information for effective decision making in the face of
uncertainty (Morien, 2007).
As a body of numbers or information, business statistics is defined as a body of numbers or
data or information on business. For instance, financial information, marketing information etc
1-1.5.4 Agricultural Statistics
As noted above agricultural statistics can be defined as a subject and as a body of numbers or
information.
Agricultural statistics is the study of the methods and procedures used in collecting, organizing,
analyzing and interpreting data or information on the agricultural production, agro-
processing, agricultural marketing and consumption for effective decision making in the face
of uncertainty.
As a body of numbers or information, agricultural statistics is defined as a body of numbers
or data or information on acreages, quantity of fertilizer, number of bags of maize etc.
4
Unit 2: Importance of Study of Statistics
Hello, you are welcome to unit 2 of this introductory chapter. From unit 1, we noted that
statistics originated from Latin or Italian word. We also learnt the ancient and modern
application of statistics. The various definitions of statistics have also been vividly explained.
In this unit, we shall focus on the importance of this study of statistics
Unit Objectives
By the end of this unit, you should be able to:
State the importance of the study of statistics
Mention and explain the application of statistics in everyday life
Explain why teachers need to assess the performance of the students
Give examples of statistics encountered in everyday life
1-2.1 Importance of the Study of Statistics
A number of statistical concepts have impact on a wide range of fields. Broadly the following
are the reasons why one need to study statistics irrespective of the persons’ field of study.
Help managers to make sound judgments as decisions are based on data but not
on assumptions
Helps businesses to plan better and make predictions about the future
5
Monitor students’ performance and progress.
Monitor the teacher's productivity performance and progress.
Check the effectiveness of a subject being taught.
Design of experiments for study.
Statistics can be helpful in any number of situations as they make it possible to
analyze sets of data so as to make informed conclusions about that data.
All these reasons for the study of statistics for teachers can be grouped into four; these are
explained in the section below.
1-2.3 Four Main Reasons for the Study of Educational Statistics
There are four main reasons why teachers are mandated to study educational statistics. With
the study of educational statistics, teachers are able to monitor students’ performances,
evaluate their own performance, evaluate the performance of subjects and manage
educational institutions in the management level. These four reasons are explained below.
1-2.3.1 Monitor Student's Performance and Progress
It is important for teachers to study statistics so as to be able to monitor students' progress
throughout the term or academic year. Teachers often give students tests or quizzes
intermittently and end of term examination. These tests or exams are aimed at keeping track
and monitoring the performance of students to see if they are doing well. The statistical
information from tests or examination scores are useful to the students as it helps the teacher
to offer extra help to students who are weak or advice the parents accordingly.
1-2.3.2 Help Teachers Evaluate their Own Performance
Statistics can also be used by educationists or teachers to evaluate their own performance in
terms of the methodology they used in teaching students. With information on the performance
of students, teachers can assess themselves to know whether or not they are teaching well. It
also affords teachers the opportunity to know whether or not the time allocated for tests or
examinations are adequate.
1-2.3.3 Evaluate Performance of Subjects
Statistics can also be used by educational institutions in general to assess the performance of
students in a particular subject. It can also show where there is possible room for improvement
and by analysing this data; these improvements can be implemented as soon as possible.
1-2.3.4 Importance of Statistics in Educational Management
It is important for the school manager or administrator to plan, implement plans and evaluate
their success. Statistical data is used to do these effectively. The day to day decision making of
educational managers is based on data. For instance, to make purchases for stationaries or
furniture, the school manager uses statistical data on class sizes and the age of students so as
to arrive at the quantities to be purchased. Data on enrolment, class size and number of teachers
will enable a school manager make requisition on the materials require for day to day running
of the school.
6
1-2.4 Everyday Uses of Statistics
As you are aware that we live in information age and the world is now a global village,
information abounds on the internet and we need to use them effectively to enhance our stay
on this planet earth. Statistics shape your life in the following areas:
Everyday weather forecast helps you to plan your daily movements
Political campaign and indicators of the economic performance helps one to make
voting decision
Advertisement on TV, Radio, internet or print media help you make choices as to which
commodity to buy
Taking stocks or inventory of foodstuffs in the house and deciding on the quantity to
purchase
Making decision on the quantity of food to eat to be satisfied
Self-Assessment Test 1.2
Q1. Explain the four-main importance of the study of statistics
Q2. Why do educational managers need to learn statistics?
Q3. Give five examples of how statistics is used in our everyday activities
Q4. Discuss five importance of the study of statistics in your field of study.
7
Unit 3: Descriptive Versus Inferential Statistics
Introductory Remarks
Hello, I hope you have had brief understanding of why you should study Educational Statistics
as you are preparing to work in your field of study after graduation. In this unit you are going
to build on from the knowledge you gained from units 1 and 2. We shall look at differences
between descriptive and inferential statics.
Unit Objectives
By the end of this unit, you should be able to:
Define descriptive statistics
Define inferential statistics
Explain measures of descriptive statistics
Explain measures of inferential statistics
State differences between descriptive and inferential statistics
1-3.1 Overview of Descriptive Versus Inferential Statistics
In recent times, the role statistics play in research cannot be under-estimated. Through the study
and applications of statistics, researchers are able to collect data, organise, analyse and make
inductions for predictions and decision making. It is important for researchers to make analysis
of the data by employing descriptive or inferential statistics. From different definitions of
statistics, can you try to define descriptive and inferential statistics? Try and define it yourself.
Descriptive statistics ……………………………………………………………………………
Inferential statistics…………………………………………………………………………….
Now, here you are with the correct definition of descriptive and inferential statistics
1-3.2 Definition of Descriptive statistics
Descriptive statistics: It is the discipline of statistics which describes important features,
characteristics and properties of data set. It describes the features of the sample or population
data. Components of descriptive statistics are:
Measure of central tendency: Measure of central tendency describes the point about which
the various observed values group or cluster. The examples of measure of central tendency are
mean, mode, and median.
Measure of dispersion or variation: Another statistical measure which depicts descriptive
statistics is the measure of dispersion. Measure of dispersion show the spread of data about the
point. Eg. range, interquartile range, variance, standard deviation, mean deviation and
coefficient of variation.
Measure of positions: There is also a measure of position which is also descriptive statistics.
It is a measure of the point or location at which a data set can be identified. Quartiles, quintiles,
deciles and percentiles are all measures of positions which describe particular location of a data
set.
8
Diagrammatic or Tabular Representations: Lastly, graphical tools such as charts, tables,
and graphs summarize and describe features of data sets.
These components of descriptive statistics will be dealt with in details in subsequent chapters.
Measure of shape: It is a measure of how a data set is distributed. Skewness, kurtosis and
moment are used to measure the distribution of data.
1-3.3 Definition of Inferential statistics
Inferential statistics: It is the type of statistics which estimates the characteristic of a sample
data set with the main aim of making generalisation about the population. It is the branch of
statistics that involves drawing conclusions about a population based on information contained
in a sample taken from that population (http://www.saylor.org/books)
With inferential statistics, a head teacher can take the age of a certain percentage of pupils in
JHS 2 of schools A and B, estimate a statistic parameter and generalised that there are
differences between the ages of pupils in the two schools. In a nutshell, it is a statistical analysis
that can be used to draw conclusions about the population when it is not possible to collect data
from each and every member of the population. It is important to note that inferential statistics
involve hypothesis testing. Do not worry if you do not understand hypothesis testing. The
methods of inferential statistics are analysis of variance, chi-square test, student’s t-test,
regression analysis, etc.
1-3.4 Key Differences between Descriptive and Inferential Statistics
From the above definitions of descriptive and inferential statistics, try and write four
differences between the two. Discuss it with your course mates.
The table below illustrates the key differences between descriptive and inferential statistics.
Table 1: Differences between descriptive and inferential statistics
Basis of Descriptive statistics Inferential statistics
Comparison
Meaning Describes the features of the Draw conclusions about the population
data set based on the sample data
What it does? Organise, analyse and present Compares data, tests hypothesis and make
data in a meaningful way predictions
Form of final Numerical values, charts, Probability
results graphs and tables
Usage To describe a situation To explain the chances of occurrence of
an event
Function It explains the data which is It makes conclusions about the population
already known with the aim of which extend beyond the available data
summarising the data
9
Q3. Mention and explain measures of descriptive statistics
Q4. Explain measures of inferential statistics
Q5. State four differences between descriptive and inferential statistics
Q6. Differentiate between measure of tendency and measure of dispersion
Q7 Descriptive statistics is less important for policy implications in research than inferential statistics.
Briefly explain
10
Unit 4: Terms in Statistics and Scales of Statistical Measurements
Welcome to unit 4 of the introductory chapter of this book. Well done for reaching this far. I
believe it has been so fantastic and you have enjoyed the previous units. In this unit, you are
going to build on from the knowledge you gained from units 1, 2 and 3 by learning
terminologies which are commonly used in statistics. You are also going to study scales of
statistical measurement. Are you set for us to start unit 4? Ready, go!!!
Unit Objectives
By the end of this unit, you should be able to:
Define certain terms used in statistics
Mention and explain different scales of statistical measurements
Differentiate between nominal and ordinal scales
Differentiate between interval scale and ratio scale
Mention and explain types of variables
Differentiate between discrete and continuous variables
Differentiate between categorical and continuous variables
Differentiate between discrete and categorial variables
Differentiate between ordered and unordered variables
Differentiate between dependent and independent variables
Differentiate between quantitative and qualitative variables
1-4.1 Definition of terms
Population: it is the totality of items or things under consideration. Population is
thought to be an entire collection of persons, things, or objects under study. Population
is the collection of all individuals or items under consideration in a statistical study
(Weiss, 1999). Due to inadequate resources, and the fact the population has similar
characteristics, a sample can be selected for a study.
Sample: The idea of sampling is to select a portion (or subset) of the larger population
and study that portion (the sample) to gain information about the population. Sample is
that part of the population from which information is collected (Weiss, 1999).
11
Sample Survey: The technique of collecting information from a portion of the
population.
Random Sample: A random sample is a sample which is selected in such a way that
each member of the population has equal chance of being selected.
12
1-4.2.2 Interval Versus Ratio Scales
Quantitative variables, whether discrete or continuous, are defined either on an interval scale
or on a ratio scale.
Interval scales are numerical scales in which intervals have the same interpretation throughout.
An interval scale is an ordered scale in which the difference between the measurements can be
compared meaningfully.
If, on the other hand, one can compare both the differences between measurements of the
variable and the ratio of the measurements meaningfully, then the quantitative variable is
defined on ratio scale. A ratio scale is an interval scale with a meaningful absolute zero point
and it is the highest level of measurement. It has all the properties of interval scale.
Additionally, it has absolute zero point which interval scale of measurement has not. For
example, temperature measured on the Centigrade system is an interval variable and the height
of person is a ratio variable.
Table 2: 1 Properties of Measurement Scales
Properties/Scales Nominal Ordinal Interval Ratio
scale scale scale scale
Order No Yes Yes Yes
Difference No No Yes Yes
Ratio No No No Yes
13
therefore called independent or explanatory variables because they all vary independently, and
they explain the variation in food expenditure among different households. In other words,
these variables explain why different households spend different amounts of money on food.
Food expenditure is called the dependent variable because it depends on taste and special
dietary needs (IVs). Simply a dependent variable is a condition or piece of data in an
experiment that is controlled or influenced by at least one outside factor, mostly the
independent variable. An independent variable is a condition or piece of data in an experiment
that can be controlled or changed.
1-4.3.3 Ordered and Unordered Variables
An ordered variable is a categorical variable for which the possible values are ordered. In
ordered variables, the ordering of the variables matters but not the differences between/among
values. The values simply express an order. Examples include; 1-first class, 2-second class
upper, 3-second class lower, 4-third class, 5-pass; 1-excellent, 2-very good, 3-good, 4-average,
5-pass.
Unordered variables are those variables that are categorical with two or more categories
without having any kind of natural order. If the variable is ordered, the ordering makes very
little sense or none at all. These variables have no numeric values and arithmetic operations
such as addition, subtraction among many others cannot be performed on them. Examples
include the following: political affiliation, sex, eye colour, genotypes, blood groups, etc.
1-4.3.4 Quantitative Versus Qualitative Variables
A quantitative variable is a variable that can be measured numerically. Quantitative data can
easily be analysed using mathematical tools than qualitative data. For example, it does not
make sense to find an average hair color or blood type. Amount of money, pulse rate, weight,
number of teachers in the school, and the number of students who take statistics are examples
of quantitative data. Quantitative data may be either discrete or continuous
Qualitative variable is a variable which is not numeric. It categorizes or describes the attributes
of a population. Qualitative data are generally described by words or letters. For instance, hair
color might be black, dark brown, light brown, blonde, gray, or red. Blood type might be AB+,
O-, or B+.
Self-Assessment Test 1.4
1. Define the following terms: sample, population, data, variable, census, statistic,
parameter, finite population
2. Mention four scales of measurement and briefly explain them.
3. Differentiate between
(i) discrete and continuous variables
(ii) qualitative and quantitative variables
(iii) ordered and unordered variables
(iv) dependent and independent variables
14
CHAPTER TWO: RELATIONSHIPS AND REPRESENTATION OF DATA
Introductory Remarks
Hello, you are welcome to chapter two of this course, Basic Statistics. I believe you enjoyed
all units under chapter one. This chapter is a continuation of the concepts in chapter one.
In this chapter, you will learn how to establish relationships in data. How to represent data on
graphs and charts for easy appreciation will be brought to bear in this chapter as well.
This chapter will be treated under the following units:
Unit 1: Meaning and Sources of Data
Unit 2: Concepts of Relationships and Data Representation
Unit 3: Symbol Chart and Graphical Representation of Data
You are now in unit one of chapter two. It is important to note that this unit introduces you to
data. In this unit, you will first be introduced to definition of statistical data and its sources.
You will also learn the types of statistical data and their meanings to help you appreciate the
data that you will need for a particular analysis.
Unit Objectives
By the end of this unit, you should be able to:
Define data
Mention and explain sources of data
Mention and explain types of data
15
.
2-1.3 Kinds of Data
There are different kinds of data just like variables. These kinds of data are explained below.
2-1.3.1 Numerical Versus Categorical Data
Numerical data is information that can easily be measured. This information is always collected
in numbers or in number form. An example of numerical data would be the number of students
who attended basic maths class last semester on a weekly basis. One way to identify a
numerical data is seeing if the data can be added together. Numerical data can also be put in
ascending or descending order.
Categorical data represents a type of data that may be divided into groups or categories.
Examples of categorical data are sex, age group and educational level. Age group and
educational level may also, be considered as numerical by using the exact values for age and
the highest class completed. However, it is often more informative to categorize such variables
into a relatively small number of groups for qualitative descriptions.
2-1.3.2 Univariate, Bivariate and Multivariate Data
A univariate data is a collection of information characterised by or depending on only one
random variable. This is a type of data where the focus is only on observing one aspect of the
item of interest at a time. Univariate data does not answer the question of relationship that
exists between variables, but rather it is used to describe one characteristic or attribute that
varies from observation to observation. Examples of univariate data include the heights of
students, the net salary of lecturers, ages of workers among many others.
A bivariate data deals with two variables that can change and compared to find relationships.
In the case that one variable influences another variable, then a bivariate data is said to exist or
observed. This is because one variable depends on the other. Thus a dependent variable
depending on an independent variable to study the relationship that exists between the two. For
example, studying the relationship that exists between academic performance (test scores) and
study time (hours spent studying). The study time is independent which could influence
academic performance, the dependent.
A multivariate data refers to data in which description and analysis is based on more than two
variables per observation. Usually multivariate data is used for explanatory purposes just like
in a bivariate data but here, there are more than two variables that are under consideration at
the same time. An example of multivariate data is data on a lecturer studying the relationship
that exists among academic performance, study time, monthly allowance of students, residency
status and so on.
2-1.3.3 Qualitative Versus Quantitative Data
Qualitative observations or data are typically categories, groups or characteristics. Examples
include levels of education (primary, JHS, SHS, VOC/TECH, tertiary, etc.), hair colour, eye
colour and favourite food. Quantitative observations or data on the other hand are numerical
values. Weight of students, shoe sizes, ages of students and workers are examples of
quantitative observations. These two types of data are generally treated differently. This is so
because, qualitative observations cannot be sorted into a numerical order. For example, suppose
16
you are analysing the hair colour of a group of ladies. You might take each lady and categorize
her into one of a few groups: black, blonde, brown, gray, etc. The colour black is not superior
or inferior to blonde; it is just different. Quantitative observations, on the other hand, are
meaningful numerical values and so they can be sorted. If you are recording the ages of
students, a student who is 30 years is older than someone who is 25 years and that students is
also older than one who is 20 years and so on. This mathematical ordering allows us to use the
full resources of arithmetic, algebra, and even calculus to summarize quantitative data.
2-1.3.4 Cross-Sectional, Time Series and Panel Data
Cross-sectional data: A cross-sectional data set consists of sets of samples of individuals,
households, firms, countries, regions or any other type of units at a specific point in time. This
kind of data is simply data on individuals, households, firms and so on at a given point in time.
Example includes the expenditure of first year students in the last semester.
Cross-Section Data are data collected on different elements at the same point in time or at the
same period of time.
Table 3: Cross-Sectional Data
Schools School fees for second
term in 2017 (Gh¢)
JHS A 67
JHS B 80
Millennium Academy JHS 450
A time series data: A time series data set consists of observations on one or several variables
over time. Time series data are data collected at multiple points in time on a single unit or
observations. The time period could be hourly, weekly, monthly, quarterly, yearly and so on.
Examples of time series data include fuel prices collected monthly over a 10-year period,
Ghana’s gross domestic product (GDP) over a period of time, etc.
Time-series data are data collected on the same element for the same variable at different points
in time or for different periods of time.
Table 4: Time-Series Data
Year Performance level
in BECE (%)
2012 68
2013 80.8
2014 79.5
2015 90
2016 96
17
Panel data: it is the type of data set that follows a given sample of individuals over time and
thus provides multiple observations of each individual in the sample (Hsiao, 2003, page 2). It
consists of observations on several variables obtained over multiple period of time for the same
firms or individuals. Panel data is also called longitudinal data. It gives large data set and hence
it is difficulty to analyse (Diggle et al., 2002).
Table 5a: Panel Data for One Individual (John Nyabeni)
Year Number Farm Size Annual Revenue Annual
of goats (acres) (Gh¢) Savings (Gh¢)
2013 10 5 5500 1500
2014 17 5 6000 1600
2015 23 6.5 7200 2200
2016 23 8 7000 1900
2017 26 7.2 7600 2250
Table 6b: Panel Data for Several Individuals (Baba and Belinda)
Persons Year Annual Educational Level
Salary (Gh¢)
Baba 2015 6000 SHS
Baba 2016 6500 SHS
Baba 2017 7800 Teacher Training College
Belinda 2015 8600 Teacher Training College
Belinda 2016 8950 Teacher Training College
Belinda 2017 14000 University
18
Unit 2: Concepts of Relationships and Data Representation
Welcome to unit 2 of chapter two. Well done for reaching this far. How has it been so far? I
hope you have understood and appreciated data, sources of data and types of data treated in
unit one of this chapter. This is the second unit of chapter two and you are going to be learning
the concept of data representation and its importance in statistics. Are you set for us to start
unit 2? Ready, go!!!
Unit Objectives
By the end of this unit, you should be able to:
1. Explain relationship in statistics
2. Understand graphical data representation
3. State and explain the importance of pictorial data representation
4. State the principles of effective pictorial data representation
Easy to keep pictures or charts in mind: Pictures or graphs or charts appeal to the
mind faster and help the target audience to easily understand.
Pictorial representation of data helps us identify relationships. One can easily picture
the information presented and be able to identify the relationship between variables.
19
With pictorial illustration of data, target audience are able to make comparisons of
variables easily.
Another reason why pictorial data representation is important is that, it helps one to
easily understand the information. It actually hastens the process of processing
information by the mind and hence increases the rate of understanding.
Pictorial representation of data helps one to display a large information in a small
space
20
Unit 3: Symbol Chart and Graphical Representation of Data
Splendid, you have done well by sustaining the interest and studying this book up to this unit.
Welcome to unit 3 of this chapter. In unit 2, we learnt the importance of representing data on
graphs, tables or chart. This unit will help you understand how to represent data by using
symbol charts, graphs, tables among others. It is my fervent hope that you will enjoy the unit.
Unit Objectives
By the end of this unit, you should be able to:
Know the differences between symbol charts and graphical charts.
Know the different types of bar charts and when they are used
Manually draw or construct all forms of bar charts, pie charts, histogram and line graphs
Know the difference between histogram and bar charts
Discuss the uses, strengths and limitations of bar charts, pie charts, histogram and line
graphs
Figure 2: Eggs
21
2-3.3 Single Column Bar Chart/Graph
It is a bar chart or graph with columns or vertical bars that represent a single category or
group. In figure 3, each subject represents a single category of group.
100
Average marks obtained (%)
50 78 88 85
45 55
Subjects
Figure 3: Single Column Bar Chart: Average Marks of Students in Exams
Economics 85
Geography 88
Subjects
Science 55
Maths 45
English 78
0 50 100
Average marks obtained (%)
22
95
100 Males Females 88 85 85
Average marks obtained (%)
78 80
80 65
55
60 45
40
40
20
0
Subjects
Figure 5: Multiple or Compound Column Bar Chart of Average Marks of Students in Exams
Economics 85
85
Geography 95
88
Science 65
55
Subjects
Maths 40
45
80
English 78
0 20 40 60 80 100
23
Males Females
Average marks obtained (%)
200
180
160
140 95
85
120 80
100
65
80
60 40
40 78 88 85
45 55
20
0
English Maths Science Geography Economics
Subjects
Figure 7: Component or Stacked Column Bar Chart of Average Marks of Students in Exams
Economics 85 85
Geography 88 95
Subjects
Science 55 65
Maths 45 40
English 78 80
300
250
200
Enrolment
150
100
50
0
2011 2012 2013 2014 2015 2016 2017
Years
Figure 9: Single or Simple Line Chart of School Enrolment from 2011 to 2017
25
Table 8: School Enrolment for SHS A and SHS B from 2011 to 2017
Year Enrolment in SHS A Enrolment in SHS B
2011 80 85
2012 178 150
2013 160 170
2014 200 140
2015 225 160
2016 210 250
2017 240 275
300
250
School Enrolment
200
150
Enrolment in SHS A
100
Enrolment in SHS B
50
0
2010 2011 2012 2013 2014 2015 2016 2017 2018
Years
Figure 10: Multiple or Compound Line Graph of School Enrolment for SHS A and SHS B
from 2011 to 2017
26
600000 Sales of Company A Sales of Company B
500000
400000
Sales (Ghc)
300000
200000
100000
0
1 2 3 4 5 6 7
Years
Figure 11: Component Line Chart of Annual Sales of Companies from 2011 to 2017
2-3.13 Histogram
It is a graphical representation of information which is made up of rectangles whose heights
indicate the frequency of the variable and the width indicating the class interval. Histogram
was first introduced by Karl Pearson. It is used for continuous data but not discrete data.
27
Table 10: Population of Tribes of Students in Level 100
Tribes Population
Fante 300
Dagomba 180
Dagati 250
Ewe 370
Akyem 320
Others 150
10% 19%
20%
11%
16%
24%
Figure 13: Pie Chart Showing Percentage of Tribes of Students in Level 100
28
Pillars of Planting for Food
Farmers AEAs Researchers
and Jobs
Seeds 84 71 67
Fertilizer 100 79 17
Agricultural extension
51 100 0
services
Establishment of markets 32 8 17
E-agriculture 24 33 33
4. Using the table below, draw the following graphs using excel software:
(a) component or stacked column bar chart
(b) multiple line chart
(c) single line chart for only SHS B
(d) histogram using for only SHS A
29
CHAPTER THREE: SUMMARISING AND DESCRIBING DATA: STATISTICAL
MEASURES
Introductory Remarks
It is important to note that presenting data is an important aspect of descriptive statistics. The
limitation is that it does not truly tell the whole story about the data. As a good data analysis,
you need to compute and summarise key features of statistical data. The main objective of this
chapter is to help you understand how to summarise, describe and interpret key features of
statistical data.
This chapter will be treated in five units, namely:
Unit 1: Introduction to Statistical Measures
Unit 2: Measure of Central Tendency
Unit 3: Measure of Dispersion/Variation
Unit 4: Measure of Position
Unit 5: Measure of Shape
30
Unit 1: Introduction to Statistical Measures
You are moving gradually. In the preceding chapters, you learnt the meaning of statistics and
the various ways of presenting statistical data. Congratulations for achieving this milestone in
the study of statistics.
Unit Objectives
By the end of this unit, you should be able to:
1. Mention the various statistical measures
2. Define the various types of statistical measures
3. Know when each statistical measure is used
31
Figure 15: Measures of central tendency
Source: Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.
32
about the central point. They include range, interquartile range, variance, standard deviation,
mean deviation, coefficient of variation etc.
Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part
are called the first, second and third quartiles; and they are denoted by Q1, Q2 and Q3
respectively. Note:
Standard scores (z-scores) indicate how many standard deviations an element is from the
mean. The standard score can be calculated from the following formula;
z X ;
where z is the z-score, X is the value of the element, is the mean of the population and is
the standard deviation. A z-score = 1 represents an element that is 1 standard deviation greater
than the mean and so on.
3-1.6 Measure of Shape
Measures of shape describe the distribution (or pattern) of the data within a data set. This is
used to summarize data from continuous measurement scales with statistics used to describe
33
how the distribution rises or drops. This can be described as a logical order for quantitative
data but cannot be described as such for qualitative data. The distributions can either be
symmetrical or asymmetrical. Normal distributions represent symmetrical and skewed
distributions represent asymmetrical.
Symmetric distribution: refers to distributions that have the same shape on both sides of the
centre which are called symmetric. A symmetric distribution with only one peak is referred to
as normal distribution.
Skewness: refers to the degree of asymmetry in a distribution. Asymmetry often reflects
extreme scores in a distribution. Positively skewed distributions are distributions with the
mean greater than the median. This means that the mean is sensitive to each score in the
distribution and is subject to large shifts when the sample is small and contains extreme scores.
Negatively skewed distributions have their mean smaller than median. This has an extended
tail pointing to the left and reflects clustering of the numbers in the upper part of the distribution
with fewer scores at the lower end of the measurement scale.
Kurtosis: refers to how scores are concentrated in the centre of a distribution, the upper and
lower tails (ends), and between the centre and tails (shoulders) of the distribution. Kurtosis is
in three forms as mesokurtic, platykurtic and leptokurtic. Mesokurtic is also referred to as
a normal distribution. Platykurtic refers to the distribution which appears as mesokurtic but
flattens out with scores moved from the both centre and tails into the shoulders. Leptokurtic
moves score from the shoulders of a mesokurtic distribution in the centre and tail of distribution
which results in a peaked distribution with thick tails.
34
Unit 2: Measure of Central Tendency
You are welcome to unit two of this chapter. In unit one, we were introduced to statistical
measure and their importance. In this unit, we shall look at one of the statistical measures thus
measure of central tendency or location. We are going to learn how to calculate a single value
that describes the point to which the various observed values group or cluster about. Try as
much as possible to study the unit and solve the questions.
Unit Objectives
By the end of this unit, you should be able to:
Define arithmetic mean, weighted mean, mode and median.
Calculate arithmetic mean, weighted mean, mode and median from group and ungrouped
data.
Describe the properties or features of the arithmetic mean, mode and median
State the advantages and disadvantages of arithmetic mean, mode and median
3-2.1 Introduction to Measure of Central Tendency
As noted in the preceding unit, data tends to group or cluster about certain central points. In
order to describe a data, one can find a single value which is representative of the whole data
set. This single value that describes the point to which the various observed values group or
cluster about a central point is called measure of central tendency or location.
3-2.2 Mean
It is the average of the observations or data. Note that we have different types of means
namely arithmetic mean, weighted mean, geometric mean, harmonic mean.
3-2.2.1 Arithmetic Means of Ungrouped Data
The arithmetic mean of ungrouped data is the sum of a set of observations (either positive,
negative or zero), divided by the number of observations. Arithmetic mean is the most
commonly use measure of central tendency
i n
X1 X 2 X n
Xi
i 1
X
n n
in
X
i 1
i
X
Note that can easily be written as since this course is fundamental and
n n
not meant for students studying statistics or economics.
In statistics,
X = sample arithmetic mean
n = sample size
X i = ith observation of the random variable x
35
Example 1
Given that the ages of six children in a hospital are 1, 3, 5, 7, 9 and 5; what is the
mean age?
Solution
1 3 5 7 9 5 30
x 5
6 6
Example 2
Find the mean of the following set of data:
(a) Ages in years: 15, 21, 17, 26, 18, 29
(b) Daily sales of ice water vendors: Gh¢42, Gh¢52, Gh¢57, Gh¢63, Gh¢51
(c) Weight of patients in hospital: 90, 60, 40, 70, 50, 20, 100, 100, 50, 50, 50 and 80
in Kg
Solution
The formula for calculating the mean of ungroup data is
X X 2 X n X
X 1
n n
15 21 17 26 18 29 126
(a) Mean of ages = X 21 years
6 6
42 52 57 63 51 265
(b) Mean of daily sales = X Ghc53
5 5
X
f X
i i
f i
36
Example 3
Find the average marks obtained by 43 students in Mathematics Examination
Class Frequency (f)
20-29 3
30-39 5
40-49 20
50-59 10
60-69 5
Solution
Frequency (f) X fX
3 24.5 73.5
5 34.5 172.5
20 44.5 890
10 54.5 545
5 64.5 322.5
∑f=43 ∑fX=2003.5
Average mark= X
f X i i
2003 .5
46.59
f i 43
The formula for calculating arithmetic mean when probability distributions are given is:
i n
X X 1 P( X 1 ) X 2 P( X 2 ) X 3 P( X 3 ) ... X n P( X n ) X i P( X i )
i 1
Example 4
Given the probability distribution of the number of bags of maize obtained by farmers per
acre in Nkoranza, find the mean number of bags of maize per acre.
X 4 5 6 9 12
P(X) 0.1 0.2 0.2 0.4 0.1
37
Solution
i n
X X 1 P( X 1 ) X 2 P( X 2 ) X 3 P( X 3 ) ... X n P( X n ) X i P( X i )
i 1
w X w2 X 2 ... wn X n w X i i
X 1 1 i 1
i n
w1 w2 ... wn
w
i 1
i
w i
i n
w X w2 X 2 ... wn X n w X
i 1
i i
X 1 1
w1 w2 ... wn in
w
i 1
i
Example 6
The table below shows the marks and grades obtained by a student in an examination in the
university. Use the information to calculate the grade point average (GPA) or the weighted
arithmetic mean.
Courses Marks Credit Grades Grade
Hours Point
Human Anatomy 82 3 A+ 5
Biochemistry 46 2 D+ 1.5
Physics I 70 3 A 4.5
Basic Mathematics 80 2 A+ 5
Botany I 62 1 B 3
Zoology I 53 3 C 2
38
Solution
GPA X
w X i i
w i
Note that credit hours is the weight and grade point is the observations
Courses Credit Grade wiXi
Hours Point
(wi)
Human Anatomy 3 5 3×5=15
Biochemistry 2 1.5 2×1.5=3
Physics I 3 4.5 3×4.5=13.5
Basic Mathematics 2 5 2×5=10
Botany I 1 3 1×3=3
Zoology I 3 2 3×2=6
∑w=13 ∑wiXi =50.5
50.5
GPA X 3.88
13
3-2.3 Median
It is the middle value of observations which are arranged in either ascending or descending
order. It is the midpoint of an arranged (ordered from smallest to largest or largest to the
smallest) data. It is the same as the 50th percentile or the second quartile.
3-2.3.1 Median for Ungrouped Data
For odd number of observations, the median is obtained by using the formula:
Me X 1
( n 1)
2
For the even number of observations, the median is obtained by using the formula:
1
M e X n X n
2 2 1
2
Example 6
Find the median of
(a) 12, 23, 14, 5, 7, 80, 39, 10, 7
(b) 42, 52, 57, 63, 51, 75, 45, 20, 60, 15
Solution
(a) Arrange the observations in ascending order: 5, 7, 7, 10, 12, 14, 23, 39, 80
Number of observations, n is 9 and since it is an odd number, the mean is calculated
as:
39
Me X 1 X5
( 91)
2
1 1 1
Me X n X n X 10 X 10 X 5 X 6
2 2 1
2
2 2 2
1
2
From the arranged data, X5 which is the 5th observation is 51 and X6 which is the 6th
observation is 52.
Me
1
X 5 X 6 1 51 52 51.5
2 2
3-2.3.2 Median for Grouped Data
For grouped data, the formula used in finding median observation is given as:
h n
M e Lo F
fo 2
Example 7
The table below shows the distribution of ages of 60 students in a diploma class. Find the
median age.
Age in years Number of students
15-19 6
20-24 5
25-29 12
30-34 22
35-39 7
40-44 8
40
Solution
Class Number of Cumulative
Boundaries students number of
students
14.5-19.5 6 6
19.5-24.5 5 11
24.5-29.5 12 23
29.5-34.5 22 45
34.5-39.5 7 52
39.5-44.5 8 60
th
n 60
Median location position 30 th position
2 2
Median class is 30-34 and median class boundary is 29.5-34.5
Lower limit of the median class = L0 = 29.5
Frequency of the median class = f0 = 22
Cumulative frequency of the pre-median class = F = 23
Width of the median class = h = 34.5 – 29.5 = 5
h n
M e Lo F
fo 2
5 60
M e 29.5 23 31.09 years
22 2
3-2.4 Mode
It is the observation with the highest frequency. It can also be defined as the most occurring
observation or value.
Value that occurs most often
Not affected by extreme values
There may not be a mode
There may be several modes
Used for either numerical or categorical data
41
3-2.4.1 Mode of Ungrouped Data
Example 8
The marks obtain in a class test by class three pupils are 3, 10, 4, 6, 1, 6, 2, 5, 8, 6, 6, 8.
Find the mode.
Solution
The mode is the most occurring mark = 6
Example 9
The ages of students in SHS 3 are shown in the table below. Find the mode.
Ages Frequency
15 2
16 3
17 1
18 12
19 3
20 5
Solution
From the table, the age with the highest frequency is 18years, and hence the mode is 18years
3-2.3.2 Mode of Grouped Data
For grouped data, the modal value cannot be obtained easily.
1
M 0 L1 h
1 2
42
The total marks for a class test for pupils in Basic Six are 25. If the marks obtained by the
pupils are shown in the frequency table below, find the modal mark.
Solution
Frequency
Marks Midpoint (x) (f) (fx)
0-4 2 6 12
5-9 7 12 84
10-14 12 7 84
15-19 17 5 85
20-24 22 0 0
1
M 0 L1 h
1 2
43
150 78
Midrange 114
2
Self-Assessment Test 9
1. The following is the probability distribution of the number of phone calls received by
an office between 8 am and 9 am on a day. Find the mean number of phone calls.
X 1 2 3 4 5
2. The table below shows the distribution of test scores. Find the median score.
Age in years Number of students
31-40 6
41-50 7
51-60 20
61-70 14
71-80 10
81-90 8
91-100 5
44
6. Find the median of 60, 62, 55, 75, 90, 60, 70
7. The table below shows the daily expenditure households. Use the information to
calculate
(a) Arithmetic mean
(b) Median
(c) Mode
(d) Midrange
11. The average of five numbers is 26. If four of the numbers are -12, 90, -26, 10, what is
the fifth number? (Stewart, 2009). Note that the answer is 68.
12. If the average of six consecutive multiples of 4 is 22, what is the greatest of these
integers? (Stewart, 2009). Note that the answer is 32.
13. The marks obtained by Eunice Damba in the various courses are
Subjects Grade Credit hours Grade points
(w) (x)
Basic Mathematics A 3 4.00
Introduction to Rural Sociology A– 3 3.75
Communication Skills C+ 3 2.00
Development Communication Theory B 1 3.00
International Communication D 2 1.00
Human Development and its Evolution B– 3 2.50
(i) Calculate the weighted arithmetic mean (grade point average) of the student
(ii) Using the table below, what is the class of the student?
GPA 0 – 1.49 1.50 – 1.99 2.00 – 2.49 2.50 –3.24 3.25 –3.59 3.60 –4.00
nd nd
Class Fail Pass Third class 2 class lower 2 class upper First class
45
Unit 3: Measures of Dispersion/Variation
Bravo for your tenacity. This unit is a continuation of unit 2 in the chapter. As we treated unit
2, this unit also looked at a statistical measure called measure of dispersion. If you are able to
understand unit 2 very well, then this unit will be easy for you.
Unit Objectives
By the end of this unit, you should be able to:
Explain range, variance, standard deviation, mean deviation, inter-quartile range and semi
interquartile range each.
Do calculations on range, variance, standard deviation, mean deviation, inter-quartile and
semi interquartile range
Describe the properties or features of range, variance, standard deviation, mean deviation,
inter-quartile range and semi interquartile range
3-3.1 Introduction
In addition to measures of central tendency or location, it is desirable to consider measures of
variability or dispersion. The measure of variation is used to describe the distribution of data.
For instance, the mean age of students in agribusiness level hundred might be 20, but this does
not say whether they are all around the same age or their ages range from 16 – 30. For this, the
level of dispersion/variation is often important. As noted earlier, dispersion or variation in
statistics is the measure of the level or degree of spread of data about an average value
(Spiegel and Stephens, 2008). The measures of dispersions are range, mean deviation,
interquartile range, semi-interquartile range, 10-90percentile range, standard deviation and
variance.
3-3.2 Range
It is the simplest measure of dispersion, which is the difference between the largest and smallest
values in a set of data.
Range (R)
Range = largest value – smallest value
46
Although the range is the easiest measure of variation to compute, it is seldomly used as the
only measure. This is because the range is based on only two of the observations, and thus is
highly influenced by extreme values or outliers. We would realize in the example above that
as much as a worker is receiving GHȼ3000 monthly, another worker is receiving as low as
GHȼ 500. In this case the range was GHȼ2500, which is not a good descriptive of the variation
in data. Thus, one or two extreme values can affect the range, making it abnormally wide. In
another instance in a shop, it is possible to ask (about watches): what price range do you have?
The answer might be: ‘they range from GHȼ2 to GHȼ200’. This is one simple statement that
gives the dispersion or spread for this commodity. A follow up question might be: ‘what is the
average price?’ For after all, the fact that the watches range from GHȼ2 to GHȼ200 does not
give any indication whether most of them are priced closed to GHȼ2 or that of GHȼ200.
However, an obvious way of avoiding this is to ignore extreme values that are far from the
centre. This can be done using quartiles.
3-3.3 Interquartile Range (IQR)
To understand the concept of interquartile range, we need to know some basic things about
quartiles. When data is sorted into ascending order, quartiles are defined as the values that
divide a set of values into four equal sizes. Interquartile range is therefore a measure of
dispersion that overcomes the dependency on extreme values. In other words, the IQR is the
difference between the third quartile (Q3), and the first quartile (Q1).
Example 2:
Suppose the monthly starting wages of university graduates are shown in table 8. Find the
interquartile range for the data set.
Table 11: Monthly starting wages
Graduate Monthly wage Graduate Monthly wage
1 40 11 89
2 42 12 93
3 43 13 97
4 48 14 100
5 50 15 110
6 60 16 140
7 62 17 200
8 65 18 210
9 71 19 212
10 80 20 220
47
Solution
The wages of employees 5, 10 and 15 on this list divide the distribution into four equal parts
(1-5, 6-10, 11-15, and 16-20). The position and value of the quartiles are:
th
n 20
Q1 Position 5 th Position
4 4
This is the position of the lower quartile and the wage of the fifth graduate is the lower quartile
wage. i.e GHȼ 50
th
3n 60
Q3 Position 15 th Position
4 4
This is the position of upper quartile and the wage of the fifteenth graduate is the upper quartile
wage. i.e GHS 110
Therefore:
IQR Q3 Q1
IQR 110 50
IQR GHS 60
3-3.4 Variance
The variance is a measure of variability that utilizes all the data. The variance is based on the
difference between the value of each observation (Xi) and the mean ( X ). The difference
between each observation (Xi) and the mean is called a deviation about the mean.
For a sample, a deviation about the mean is written (Xi - x ); for a population, it is written
(Xi - μ). In the computation of the variance, the deviations about the mean are squared and
divided by the number of observations. The variance is defined as the average of the square of
the deviations. It can also be defined as the square of the standard deviation.
If the data are for a population, the average of the squared deviations is called the population
variance. The population variance is denoted by the Greek symbol σ2. For a population of N
48
observations and with μ denoting the population mean, the formula for the
population variance of ungrouped data is as follows.
2
(X i )2
N
In most statistical applications, the data being analysed are for a sample. When we compute a
sample variance, we are often interested in using it to estimate the population variance σ2. It
can be shown that if the sum of the squared deviations about the sample mean is divided by
(n 1) when the sample size is greater than 30, the resulting sample variance provides an
unbiased estimate of the population variance. This proof is beyond the scope of this course.
3-3.4.1 Variance for Ungrouped Data
For ungrouped data, the sample variance, denoted by s2, is expressed as:
s 2
(X i X )2
, when n > 30: This is always used for sample variance
n 1
s 2
(X i X )2
, when n ≤ 30: This is always used for estimating population variance
n
Why do we subtract one from the sample size when calculating the sample variance? This is
because, a sample is always part of the population and the subtraction compensates for the lost
in data. Variance cannot be a negative number. Variance will only be zero if all the observations
are the same.
Example 3:
Find the sample variance and the population variance for the following set of data;
5, 6, 10, 14, 20
Solution
First find the arithmetic mean for the data, thus
5 6 10 14 20 55
11
5 5
Next find the various deviations from the arithmetic mean, that is X i X .
49
Sample variance s 2
(X i X )2
n 1
152 152
S2 = 38
5 1 4
Population variance 2
(x i )2
N
152
2 30.4
5
Example 4
Find the variance of the data set given. 2, 3, 7, 8, 10.
Solution
Again, calculation starts by finding the mean of the observations, which is
2 3 7 8 10 30
Mean ( X ) 6
5 5
The variance is the sum of mean squared deviation divided by the sample size, which is given
as:
2
Variance (S ) =
(X i X )2
(2 6) 2 (3 6) 2 (7 6) 2 (8 6) 2 (10 6) 2
n 5
Variance (S2) f (X X ) 2
f (X X ) 2
f n
Where
x = midpoint of a class,
f = number of values in the class
x = mean value
50
N = total number of observations
Short Methods for Calculating Variance
2
i n
i n
fi X fi X i
i
2
fX fX
2
2
S
2 i 1
i 1
n n n n
i n
X 2
X i2
2
X
i
i 1 2
S
2
X
n n n
Example 5
Consider the table showing monthly visits to the school clinic by 32 workers in a college. The
data is displayed in class sizes.
Class size Midpoint(x) Frequency(f) Product (fx) ( X X )2 f (X X )2
0 – 4.9 2.5 3 7.4 169 507
f 32 fX =494.9 f (X X ) 2
=2154.25
X =15.5
Variance =
f (X X ) 2
2154 .25
67.3
n 32
The same could be accounted for in standard deviation whereby we take the squared root of
the variance for grouped data. Thus, we have
S
2 f (X X ) 2
2154 .25
n 32
S 2 67.8 8.23
51
3-3.5 Standard Deviation
The use of variance as a measure of variability or dispersion has drawbacks. Small variance
implies a small variation. Variance is good for comparing two or more data sets. It is not a good
measure of dispersion for a single set of data due to the squaring operation which affects the
unit of measurement. In order to standardise the unit to its original form, standard deviation is
preferred. This is done by taking the square roots.
The standard deviation is a measure of dispersion which uses all the values in a distribution in
the sense that every value contributes to the final result in the same way that every value
contributes to the calculation of the arithmetic mean. It is the standard measure of dispersion
for two reasons; standardization of all values of n as well as its usefulness both practically and
mathematically. Since the standard deviation is used as a measurement, only the positive roots
of the variance are taken as a measure of variability.
The formula for calculating the standard deviation is the square root of the variance. Thus,
S
(X i X )2
, n ≤ 30
n
S
(X i X )2
, n > 30
n 1
Short Methods for Computing Standard Deviation
in
X 2
X
2 2
X
i
i 1 2
S X
i
n n n
2
in
in
fi X fi X i
i
2
fX 2
fX
2
S i 1
i 1
n n n n
2
i n
i n
fi X fi X i
i
2
fX fX
2
2
S
2 i 1
i 1
n n n n
52
3-3.6 Mean deviation MD
The deviation is the difference between a value and the mean or the distance that each value is
away from the mean. A basic measure gives the mean absolute deviation.
Deviation = Value – mean value
d Xi X
Each value has a deviation, so the mean of these deviations should give a measure of spread.
Unfortunately, the mean deviation has the major disadvantage of allowing positive and
negative deviations to cancel. If we have the three values 3, 4 and 8, the mean is 5 and the mean
deviation is given by the formula:
MD
(X i X)
,
n
Then
(3 5) (4 5) (8 5)
MD 0
3
Even-dispersed data has a mean deviation of zero, which is why this measure is never used. A
more useful alternative is the mean absolute deviation (MAD). MAD simply takes the absolute
values of deviations. In other words, it ignores negative signs and adds all deviations as if they
are positive. The result is a measure of the mean distance of observations from the mean so the
larger the mean absolute deviation, the more dispersed the data.
MAD
ABS ( X i X)
n
Mean Absolute Deviation (MAD)
MAD
| X i X |
n
X = Mean value
n = number of observations
ABS ( X X ) = the absolute value of X X (that is, ignoring the sign), which is also written
as | X X | .
Example 6
What is the mean absolute deviation of 4, 7, 6, 10 and 8?
Solution
53
The calculation of the MAD starts by finding the mean of the numbers, which is:
4 7 6 10 8
X 7
5
Then the mean absolute deviation is:
| 4 7 | | 7 7 | | 6 7 | | 10 7 | | 8 7 |
MAD
7
| 3 | | 0 | | 1 | | 3 | | 1 |
MAD
5
3 0 1 3 1
MAD 1 .6
5
This shows that on average the values are 1.6 units away from the mean.
54
Unit 4: Measures of Position
Unit Objectives
By the end of this unit, you should be able to:
Explain measure of position.
Mention some examples of measures of position
Do calculations on quartiles, deciles, quintile and percentiles and interpret the values
3-4.1 Introduction to the Measure of Position
In addition to measures of central tendency and measures of variation, there exist measures of
position or location. Statistician often talk about the position of one value relative to the other
values in data set. So also, teachers usually talk about the positions of students in examination
in the class relative others position. In this section, we shall discuss measures of position.
A measure of position is defined as the position of a single value in relation to other values in
a sample or a population data set. The common measures of positions are quartiles, and
percentiles. Meanwhile, quintiles, deciles and standard scores (z-scores) are measures of
position which is not often used.
3-4.2 Quartiles
Quartiles are the summary measures that divide a ranked data set into four equal parts. Three
measures will divide any data set into four equal parts. These three measures are:
Q1 – first quartile
Q2 – second quartile (also known as the median)
Q3 – third quartile
The second quartile is the same as the median of a data set. The first quartile is the value of the
middle term among the observations that are less than the median, and the third quartile is the
value of the middle term among the observations that are greater than the median.
3-4.2.1 Quartiles of Ungrouped Data
n 1
The first quartile Q1 is at position
4
n 1
The second quartile Q2 (the median) is at position
2
3(n 1)
The third quartile Q3 is at position
4
Note that “n” represents the number of observations in a data set.
55
Example 1
The following are the ages (in years) of nine teachers in a Junior High School.
47 28 39 51 33 37 59 24 33
Find the values of the three quartiles (first, second and third).
Solution
Using the Median Method
1. First, we rank the data from smallest value to the largest value
24 28 33 33 37 39 47 51 59
n 1
th
Q1 position
4
9 1
th
Q1 2.5 position
th
4
2.5 position occurs between 28 and 33 observations.
28 33
Therefore, = Q1 30.5 = 30.5
2
n 1
th
Q2 position
2
9 1
Q2 5 th position
2
56
Q2 = 37
3n 1
th
Q3 position
4
39 1 th
Q3 position 7.5 position
th
4
7.5 position occurs between 47 and 51 observations.
47 51
Therefore, = Q3 49
2
Example 2
Find the values of the three quartiles for this data set.
15 13 6 5 12 59 22 18
Solution
Using the quartile formula
Rearranging the data set from the smallest to the largest
5 6 12 13 15 18 22 59
n 1
th
Q1 position
4
8 1
th
Q1 2.25 position
th
4
2.25 position occurs between 6 and 12 observations
6 12
Therefore, Q1 9
2
n 1
th
Q2 position
2
8 1
Q2 4.5 th position
2
2.25 position occurs between 13 and 15 observations
13 15
Q2 14
2
57
3n 1
th
Q3 position
4
38 1 th
Q3 position 6.75 position
th
4
6.75 position occurs between 18 and 22 observations.
18 22
Therefore, = Q3 20
2
6 12
Median = =9
2
Q1 corresponds to 9
18 22
Median = = 20
2
Q3 corresponds to 20.
i = 1, 2, 3
Where;
58
C = cumulative frequency preceding quartile group
i (N )
Qi = th value
4
If i = 2, and N = 20
2(20)
Q2 = th value
4
Q 2 = 10th value
Example 3
The table below shows marks obtained from a class test. From the table below, find
(a) the first quartile
(b) second quartile
(c) third quartile
Marks Frequency
8 -12 2
13 – 17 3
18 – 22 5
23 – 27 2
28 – 32 6
33 – 37 2
Solution
Class Frequency Cumulative
boundary frequency
7.5 -12.5 2 2
12.5 – 17.5 3 5
17.5 – 22.5 5 10
22.5 – 27.5 2 12
27.5 – 32.5 6 18
33.5 – 37.5 2 20
59
hQi N
Qi LQi i C
f Qi 4
i (N )
Qi th position
4
N=20
1(20)
(a) For Q1, Q1 th position 5th position
4
5 20
Q1 12.5 1 * 2 17.5
3 4
2 (20)
(b) For Q2, Q2 th position 10 th position
4
5 20
Q2 17.5 2 * 3 24.5
5 4
3 (20)
(c) For Q3, Q3 th position 15th position
4
60
N = total number of observations i.e. sum of frequencies = 20
C = cumulative frequency preceding quartile group = 12
5 20
Q3 27.5 3* 12 30.0
6 4
Percentile Formula
The percentile corresponding to a given value X is computed by using this formula;
(number of values below X ) 0.5
Percentile = *100
total number of values
Example 4
A teacher gives a 20point test to 10 students. The scores are shown here. Find the percentile
rank of a score of 12.
18 15 12 6 8 2 3 5 20 10
Solution
1. Rearrange the data from lowest to highest.
2 3 5 6 8 10 12 15 18 20
61
2. Then substitute into the formula
(number of values below X ) 0.5
Percentile = *100
total number of values
X=12
Number of values below X=6
Total number of values = 10
6 0.5
Therefore percentile= *100 = 65th percentile
10
Thus, a student whose score was 12 did better than 65% of the class.
Example 5
Using the data below, find the percentile rank for a score of 6.
18 15 12 6 8 2 3 5 20 10
Solution
Rearranging the data
2 3 5 6 8 10 12 15 18 20
X= 3
3 0.5
Percentile * 100 = 35th percentile
10
Thus, a student whose score was 6 did better than 35% of the class
62
3. If C is not an integer or whole number, round up to the nearest whole number and
then count from the lowest value till you reach the rounded up value (‘C’ value).
Hence that value corresponds to the specific percentile.
4. If C is an integer or whole number, use the value halfway between the C and (C+1)
values when counting from the lowest value.
Example 6
18 15 12 6 8 2 3 5 20 10
With the above data, find the value corresponding to the 25th and 75th percentile
respectively.
Solution
Rearrange the data from lowest to highest.
2 3 5 6 8 10 12 15 18 20
Then substitute into the formula
n * p 10 * 25
C= = = 2.5
100 100
Because C is not a whole number, we round up to 3. We then count over to the third value
which is 5.
Therefore, the value 5 corresponds to the 25th percentile.
n* p 10 * 75
C= = = 7.5
100 100
Because C is not a whole number, we round up to 8. We then count over to the eighth value
which is 15.
Therefore, the value 15 corresponds to the 75th percentile.
Example 7
Find the value that corresponds to the 60th percentile
18 15 12 6 8 2 3 5 20 10
Solution
Rearranging the data
63
2 3 5 6 8 10 12 15 18 20
Self-Assessment Test 11
1. Find the percentile rank for each test score in the following data set.
20 12 15 5 26 30
What test score corresponds to the 33rd percentile?
2. Find the percentile rank for each test score in the following data set.
12 42 35 50 13 49 48 24 30
What test score corresponds to the 60th percentile?
3. Find the percentile ranks of each weight in the data set. The weights are in pounds
92 98 82 78 86 97
53 55 60 58 66 72 80 69 35 86 89
65
UNIT 5: MEASURES OF SHAPE
Unit Objectives
By the end of this unit, the students should be able to:
Explain measure of shape.
Mention and explain some examples of measures of shape
Determine the skewness of data, moment of data and kurtosis of data
Graphically sketch three different types of skewness.
3-5.1 Meaning of Measure of Shape
As defined earlier, measure of shape describes the distribution or pattern of a data within a
data set. It can also be defined as the manner in which the data is distributed. A distribution
of a data set may be symmetrical or asymmetrical. The pattern and distribution of data are
measured by skewness, kurtosis and moment.
3-5.2 Skewness
Skewness of a distribution is a measure of symmetry or the lack of symmetry. A distribution
of data is skewed if the scores of the highest frequency are found not at the middle but near one
end. It refers to a distribution in which one tail is stretched out longer than the other. In order
words, the distribution looks different to the right and the left of the center point. Such
distributions are asymmetric. A distribution of data is not skewed if it looks the same to the
right and the left of the center point and such distributions are symmetric. There are three
measures of skewness. They are normal distribution or symmetric distribution, negative
skewness and positive skewness.
On the contrary, a distribution is non-symmetrical if it does not look the same to the right and
to the left of the center.
3-5.2.1 Normal or Symmetric Distribution
A distribution of data is normal or symmetric if the scores of the highest frequency are found
at the middle and the distribution looks the same to the right and to the left of the center point
such that the left half and the right half are mirror images. For such distributions, there are no
extreme values in a particular direction and hence low and high values balance each other out.
The graph of such data has a normal bell-shape. For a normal or symmetric distribution, the
mean, median and mode are equal to each other.
66
Mean = Mode = Median
Figure 16: Normal or Symmetric Distribution
67
Figure 18: Negative or Left Skewed Distribution
mean mod
Sk = , this is used when mode is well defined
S .D
3(mean median)
Sk = , used when mode is not well defined
S .D
68
If S k > 0, the distribution is positively skewed and S k < 0, the distribution is negatively
skewed
Example 1
Compute the Karl Pearson's coefficient of skewness from the following data
Solution
Height Number of
(X) persons `(f) `X 2 fX f X2
58 10 3364 580 33640
59 18 3481 1062 62658
60 30 3600 1800 108000
61 42 3721 2562 156282
62 35 3844 2170 134540
63 28 3969 1764 111132
64 16 4096 1024 65536
65 8 4225 520 33800
Total F 187 X 2
30300 fX 11482 fX 2
705588
Mode = 61
Mean = (X ) =
fX = 11482 61.4
f 187
69
S.D =
fX 2
(X )2 =
705588
(61.4) 2 = 3773 .1978 3769 .96
f 187
= 3.2378 = 1.799 =
3-5.5 Kurtosis
It is a measure that is used to draw a distinction between two data sets with the same mean and
standard deviation. It is a measure of the peakness or flatness of data relative to a normal
distribution. There are three different forms of kurtosis. These are mesokurtic distribution,
leptokurtic distribution and platykurtic distribution.
Again, a leptokurtic distribution is one with a higher peak than the normal distribution but with
heavier tails. It is distribution with coefficient of kurtosis greater than three.
Also, a platykurtic distribution is a distribution that has a lower peak than a normal distribution
but with lighter tails. It is a distribution with coefficient of kurtosis less than three.
70
Q
K=
P90 P10
1
Where Q = semi interquartile range that is (Q 3 -Q 1 )
2
P 90 = 90th percentile
P 10 = 10th percentile
3-5.6 Moments
Moment is a mechanical term which is defined as the force with respect to its tendency to
provide rotation. It is the product of the force and its corresponding distance.
x1 x 2 ........ x N
r r r
X r=
N
X
r
j
j 1
Xr
N
Where X r r th moment
First moment, r = 1
Moment about the mean for ungrouped data is given as
N
(x
j 1
j x)r
r
N
(x x) 1
X X X X 0
1
N
=
N N
x x
2
2 N S2
N
71
N
(x
j 1
j A) r
d r
A
r
N N
f
r
xj
f1 x1 f 2 x 2 ....... f N x
r r r j
N j 1
Xr =
N N
f j 1
j (x j x)r
r
N
Example 2
Solution
2 4 6 9 10
x = = 6.2
5
b. Second moment
x
2
x 2
i
22 42 62 92 10 2
=47.4
n 5
c. Third moment
72
x3
x 3
=
2 3 4 3 6 3 9 3 10 3
=
8 64 216 729 1000
=
2017
= 403.4
N 5 5 4
2. Find the first, second and third moments about the mean for the data 2, 4, 6, 9, 10
Solution
(x x)
1
N
(x x) 2
2
N
(x x) 3
3
N
73
2. The ages of students are distributed as follows: mean age was 28 years, the median
age 25 years and modal age 23 years. The standard deviation was computed to be 4.2
years. Find the coefficient of skewness and interpret your value.
74
CHAPTER FOUR: SAMPLING
Introductory Remarks
Sampling is very vital in research and estimations. It is always difficult to conduct research by
using the entire population. Researchers often than not take a portion from the population. The
portion taken (sample) need to have certain features to represent the population. The sampling
techniques differ based on the characteristics of the population. This chapter is very important
since students will use knowledge gained here to conduct their research.
This chapter will be treated in two units, namely:
Unit 1: Probability Sampling
Unit 2: Non-Probability Sampling
75
Unit 1: Probability Sampling
Unit Objectives
By the end of this unit, you should be able to:
Define the term probability sampling
Mention and explain the types of probability sampling
State the advantages and disadvantages of probability sampling
Collect data using different random sampling techniques
Know appropriate probability sampling technique for different populations
4-1.1 Sample, Sampling and Population
Sample: The idea of sampling is to select a portion (or subset) of the larger population
and study that portion (the sample) to gain information about the population. Sample is
that part of the population from which information is collected (Weiss, 1999).
Sampling: It is the process of selecting a portion (or subset) of the larger population
with the objective of estimating the characteristics of the whole population. In research
sampling is the process of selecting units (e.g., people, organizations) from a
population of interest so that by studying the sample we may fairly generalize our
results back to the population from which they were chosen.
reduce cost
safe time
prevent homogeneity
improve the accuracy and quality of the data.
If accessing the population is impossible; sampling is the only option.
76
utilizes randomization. This is to ensure that the sample selected is free of a bias and be a true
representation of the entire population.
Examples of probability sampling
4-1.4.1 Simple random sampling
Simple random sampling is defined as a technique where there the sample or subset is
taking from a known elements of the population such that each selected element has an equal
chance of being selected. Simple random sampling is only possible if the sampling frame
(number and every item of the population) is known. It is used when the population is
homogeneous (have the same characteristics)
Number each frame unit from 1 to N, shuffle them in a container and pick randomly
Use a random number table or a random number generator to select n distinct
numbers between 1 and N, inclusively.
Easier to perform for small populations
Cumbersome for large populations
It is time consuming since it one need to gather the full list of a specific population
the capital necessary to retrieve and gather the list is high and hence it is expensive to
do simple random sampling
If homogeneity of population is not well checked, biases could occur
77
Stratified random sampling is a better method than simple random sampling. Stratified
random sampling divides a population into subgroups or strata, and random samples are
taken, in proportion to the population, from each of the strata created. The members in each
of the stratum formed have similar attributes and characteristics. This method of sampling is
widely used and very useful when the target population is heterogeneous.
In stratified random sampling or stratification, the strata are formed based on members'
shared attributes or characteristics. Stratified random sampling is also called proportional
random sampling or quota random sampling.
Example
78
Advantages of stratified sampling
it ensures each subgroup within the population receives proper representation within
the sample. As a result, stratified random sampling provides better coverage of the
population since the researchers have control over the subgroups to ensure all of them
are represented in the sampling.
Potential for reducing sampling error
In this type of probability sampling, the researcher divides the target population into
nonoverlapping already existing clusters or areas and use random sampling technique to
select the subsets or elements from each of the clusters. Each cluster is a miniature, or
microcosm, of the population. If the number of elements in the subset of clusters is larger
79
than the desired value of n, these clusters may be subdivided to form a new set of clusters and
subjected to a random selection process.
For example, a researcher wants to survey the academic performance of students of UDS. He
can divide the entire population of UDS into four campuses with each campus being a cluster.
Then the researcher selects a number of students from each campus (cluster) simple or
systematic random sampling.
Cluster sampling is similar to stratified sampling but the two are different.
It is less expensive and quicker as it reduces travel costs to contact sample elements
It allows us to obtain information from one or more areas
It makes also combine advantages of both simple and systematic random sampling
More convenient for geographically dispersed populations and hence it permits each
accumulation of large samples
Simplified administration of the survey
May be more precise than simple random sample.
A type of probability sampling method in which sample members from larger population are
selected according to a random starting point and fixed, periodic interval. This interval is
called the sampling interval and it is calculated by dividing the population size by the desired
sample size. (Could not find a suitable example)
80
• Population elements are an ordered sequence
• The first sample element is selected randomly from the first k population elements.
• Thereafter, sample elements are selected at a constant interval, k, from the ordered
sequence frame.
N = 100
Want n = 20
K=N/n = 100/20 =5
81
It is relatively easy to construct, execute, compare and understand
It also provides researchers and statisticians with a degree of control and sense of
process.
It is less risky
The method assumes the size of the population is available or can reasonably be
approximated.
82
individuals are chosen out of a specific subgroup. This method is based on the researcher’s
judgment.
Advantages of quota sampling
It is relatively cheaper
It can be performed quickly
It accounts for population properties
It is a useful method when probability sampling techniques are not possible
Disadvantages of quota sampling
Sample selection is not random
There is a potential bias, which is unrepresentative of the population
4-2.4 Convenient sampling:
It is a non-probability sampling technique where subjects are selected because of their
convenient accessibility and proximity to the researcher. Usually data is collected from
people who are closer to the researcher and are easily accessible. For example, using student
volunteers as subjects for a research.
Advantages of convenience sampling
Ease of availability
It saves time
It also saves money since time is money
It is also very useful in a pilot study
Disadvantages of convenience sampling
It is possible to introduce biasness
There is a high possibility of a sampling error
Results cannot be generalized
4-2.5 Snowball sampling
In this type of non-probability sampling, where the characteristics to be possessed by the
samples are rare and difficult to find and the selection of elements is based on referral from
the earlier selected element. With this sampling, the informant nominates the next element or
individual to be selected. It is mostly used in sociology and statistics research and often
referred to as referral sampling or chain–referral. Selecting prostitutes is difficult and not easy
to get, with snowballing, the researcher need to get the first prostitutes and she will nominate
the next prostitute for inclusion in the survey.
Advantages of snowballing
It allows for studies to take place where otherwise it might be impossible to conduct
because of lack of known participants
It can also help a researcher to discover characteristics about a population that he/she
were not aware existed.
Disadvantages of snowballing
83
It is usually impossible to determine the sampling error or make inferences about
populations based on the obtained sample.
4-2.6 Judgmental sampling
It is a non-probability sampling technique where the researcher selects units to be sampled
based on their knowledge and professional judgement. It is also called expert sampling. For
example, a researcher may decide to choose a population he thinks are more qualified and are
willing to give a more detailed information about something.
Advantages of judgmental sampling
The approach is well understood and has been refined by experience over several
years.
No special knowledge of statistics is required
It is not expensive
It is saves time
Disadvantages of judgmental sampling
It is not scientific
It is wasteful and usually too large samples are selected
The conclusions reached are usually vague
There is no logic to the selection of a sample or its size
Personal bias is unavoidable in selecting the sample
4-2.7 Consecutive sampling
Also known as enumerative sampling, is a sampling technique in which in every subject
meeting the criteria of inclusion is selected until the required sample size is achieved.
Advantages of consecutive sampling
It is relatively easy to employ
There is less opportunity for any manipulation
Disadvantages of consecutive sampling
It is more difficult to do it correctly
It requires more attention
It may take longer to fill the sample size
84
CHAPTER FIVE: PROBABILITY
Introductory Remarks
From the previous chapter, we learnt how to establish relationship between or among variables.
You now know the common descriptive statistics used in behavioral research. It is important
to note that in real life, decisions are made under both certainty and uncertainty situations.
Under uncertainty situation, the decision maker cannot tell the actual outcome of his or her
decision. Quantitative tools that can help us understand and make decisions under uncertainty
situation is probability. Probabilistic (also called stochastic) models can be used to make
predictions, inventory control, quality control etc. This chapter sets the foundation for you to
learn basic probability. We shall limit our study to simple probability. So, do not worry about
equations so much.
85
Unit 1: Introduction to Probability
The theory of probability can be traced to the indoor games of chance such as ruffle, tossing of
coins, casting of dice, playing of cards etc..
An event that has no chance of occurring is called null event and has a probability of zero. An
event which is sure to occur is called certain event and it has a probability of one.
X
Probability of occurrence (P) =
T
86
Unit 2: Simple and Mutually Exclusive Events
Mutually exclusive events: Two events are mutually exclusive if both events cannot
occur at the same time. Example, sex is mutually exclusive since one person cannot be
both male and female. Also, whether it rains or not does not depend on washing of cars.
Collectively exhaustive events: Two events are collectively exhaustive if one of the
events must occur. Addition of probability of all collectively exhaustive event must be
one. E.g. Male and female are mutually exclusive and collectively exhaustive. No one
is both (mutually exclusive) but everyone is one or the other (collectively exhaustive).
A
P ( A)
A B
A
P( A) 1
A B
P( A and B )
P( A / B)
P( B)
87
P (B) = marginal probability of B
P(A / B) is read as ‘conditional probability of A given B.
Question 1.
A student deck of 52 playing cards containing four suits (spades, hearts, clubs and diamonds),
each of which has 13 different cards (ace, king, queen, jack, 10, 9, 8, 7, 6, 5, 4, 3, 2). There
are 26 red and black colors each. The black color has 2 ace and 24 non-ace. The red color
also has 2 ace and 24 non-ace. Find the probability that the cards are ace and black.
Solution
P(ace and black )
P(ace / black )
P(black )
2
P(ace / black ) 52 2 * 52
26 52 26
52
2 1
P(ace / black )
26 13
Question 2
20 marking pens are displayed in a store. Six are red and 14 are blue. In selecting two
markers from the 20, what is the probability that both markers are red?
Solution
( AR and BR ) ( AR / BR ) ( BR )
Solution
26 1
Probability of selecting black card =
52 2
88
4 1
Prob (ace) =
52 13
2 2
=
52 52
4 1
=
52 13
Prob (ace or black) = prob (ace) + prob (black) – prob (black) – prob (ace and black)
4 26 2
=
52 52 52
28 7
= =
52 13
13 13 26 1
=
52 52 52 2
Question 3
The probability that Kofi will graduate from UDS is 0.7 and the probability that Ama will
graduate is 0.5. Find the probability that both will complete school.
Question 4
A box contains 2 black balls and 3 white balls. If the balls are drawn without replacement,
find the probability that
Solution
2 2
a. r ( B)
3 2 5
89
W 1 1
b. r ( )
A 3 1 4
Question 5
A fair die is rolled once and a coin is tossed once. Find the probability of getting two on the
die and a head on the coin
1 1 1
r (2 and H ) (2) ( H )
6 2 12
Self-Assessment Test 15
For Practice (Source: Heiman, 2011).
1. The probability of any event equals its ………………..in the ……………………
2. As the probability of an event decreases, the event’s relative frequency in this situation
...................
3. As the probability of an event increases, our confidence that the event will occur
………………………
4. Tossing a coin (heads or tails) is sampling ……………..replacement.
Answers
1. relative frequency; population.
2. decreases
3. increases
4. with
90
References
Heiman, G. W. (2011). Basic Statistics for the Behavioral Sciences, Sixth Edition.
Wadsworth, Cengage Learning
Gary W. Heiman
91