0% found this document useful (0 votes)

139 views

Statistics and Basic Distribution - Mabe

Statistics and basic distribution

Uploaded by

Osman Achulo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views

Statistics and Basic Distribution - Mabe

Statistics and basic distribution

Uploaded by

Osman Achulo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

UNIVERSITY FOR DEVELOPMENT STUDIES

STATISTICS AND BASIC DISTRIBUTION

INSTITUTE OF DISTANCE AND CONTINUING EDUCATION

DR. FRANKLIN N. MABE

DEPARTMENT OF AGRICULTURAL AND RESOURCE ECONOMICS
UNIVERSITY FOR DEVELOPMENT STUDIES
TAMALE

JULY, 2018
Copyright
All rights reserved. No part of this book may be reproduced, photocopied or stored in a retrieval
system by anybody or transmitted to any person without the permission of the Author, Dr.
Franklin N. Mabe and the publisher, Institute of Distance and Continuing Education (IDCE),
University for Development Studies.

Publisher
Institute of Distance and Continuing Education (IDCE), University for Development Studies,
Tamale.

July, 2018

Author:
Dr. Franklin N. Mabe
Department of Agricultural and Resource Economics
University for Development Studies
Tamale
[email protected]
+233-242760053

Editorial Team
Dr. Hamdiya Alhassan
Dr. Abdul-Basit Tampuli

Typesetting
Bright K. D. Tetteh
Dominc Konja Tasila

Teaching Assistants
Scholastica Atara
Makafui Adzo Dikro

i
Acknowledgements

The author, Franklin N. Mabe expresses his sincere appreciation to the Director of Institute of
Distance and Continuing Education, Dr. Ebenezer Owusu-Sakyere for providing this
opportunity for him to write this book. The works of editorial team, Dr. Hamdiya Alhassan,
Dr. Abdul-Basit Tampuli, typesetters, Bright K. D. Tetteh, Dominic Konja Tasila and the
teaching assistants, Scholastica Atara, Richard Sulemana Danaa and Makafui Adzo Dikro are
much appreciated.

ii
Preface

The book, Statistics and Basic Distribution is the official and recommended manual for the
distance and continuing education students in University for Development Studies. It is the
textbook and manual for introductory statistics course for distance education students but it can
be used by regular undergraduate students. Postgraduate students can also use it as a revision
material. The book provides easy to read and understand notes and worked example questions
for students who are not enrolled for full-time programme. It is also a perfect reading material
for students who are not studying quantitative programmes such as mathematics, economics,
statistics, physical sciences and engineering.
The motivation for writing this book is to provide a quality and easy-to-read and understand
textbook and manual for students. Many statistics books are difficult for students to
comprehend. For easy comprehension, this book focuses on the calculation and interpretation
of statistical results, especially in real world settings taking cognizant of the fact that the
students taking the course are not quantitatively inclined. The solved examples in the book are
based on our everyday reality problems. The examples are also localised and fit well in the
Ghanaian context. In writing the manual, the level of students’ maturity was highly taken into
considered.

iii
Table of Contents

Acknowledgements ....................................................................................................................ii
Preface...................................................................................................................................... iii
Table of Contents ...................................................................................................................... iv
LIST OF TABLES .................................................................................................................... ix
LIST OF FIGURES ................................................................................................................... x
CHAPTER ONE: INTRODUCTION TO STATISTICS .......................................................... 1
Unit 1: Origin and Definition of Statistics ................................................................................. 2
1-1.1 Origin of Statistics ...................................................................................................................... 2
1-1.2 Meaning of the Word “Statistics” ....................................................................................... 3
1-1.3 Definition of Statistics as a Subject ..................................................................................... 3
1-1.4 Definition of Statistics as a Body of Numbers or Information ............................................ 3
1-1.5 Statistics in Education, Medicine, Business and Agriculture .............................................. 3
1-1.5.1 Educational Statistics .......................................................................................................... 3
1-1.5.2 Medical Statistics ................................................................................................................ 4
1-1.5.3 Business Statistics ............................................................................................................... 4
1-1.5.4 Agricultural Statistics .......................................................................................................... 4
Self-Assessment Test 1.1 .................................................................................................................... 4
Unit 2: Importance of Study of Statistics ................................................................................... 5
1-2.1 Importance of the Study of Statistics.................................................................................. 5
1-2.2 Importance of the Study of Educational Statistics .............................................................. 5
1-2.3 Four Main Reasons for the Study of Educational Statistics ................................................ 6
1-2.3.1 Monitor Student's Performance and Progress ................................................................... 6
1-2.3.2 Help Teachers Evaluate their Own Performance................................................................ 6
1-2.3.3 Evaluate Performance of Subjects ...................................................................................... 6
1-2.3.4 Importance of Statistics in Educational Management........................................................ 6
1-2.4 Everyday Uses of Statistics .................................................................................................. 7
Self-Assessment Test 1.2 .................................................................................................................... 7
Unit 3: Descriptive Versus Inferential Statistics ........................................................................ 8
1-3.1 Overview of Descriptive Versus Inferential Statistics ......................................................... 8
1-3.2 Definition of Descriptive statistics ...................................................................................... 8
1-3.3 Definition of Inferential statistics .............................................................................................. 9
1-3.4 Key Differences between Descriptive and Inferential Statistics ................................................ 9
Self-Assessment Test 1.3 .................................................................................................................... 9

iv
Unit 4: Terms in Statistics and Scales of Statistical Measurements ........................................ 11
1-4.1 Definition of terms ............................................................................................................ 11
1-4.2 Measurement Scales ......................................................................................................... 12
1-4.2.1 Nominal Versus Ordinal Scales ......................................................................................... 12
1-4.2.2 Interval Versus Ratio Scales .............................................................................................. 13
1-4.3 Types of Variables ............................................................................................................. 13
1-4.3.1 Discrete/Categorical Versus Continuous Variables .......................................................... 13
1-4.3.2 Dependent Versus Independent/Explanatory Variables .................................................. 13
1-4.3.3 Ordered and Unordered Variables ................................................................................... 14
1-4.3.4 Quantitative Versus Qualitative Variables ........................................................................ 14
Self-Assessment Test 1.4 .................................................................................................................. 14
CHAPTER TWO: RELATIONSHIPS AND REPRESENTATION OF DATA .................... 15
Unit 1: Meaning and Sources of Data ...................................................................................... 15
2-1.1 Definition of Data .............................................................................................................. 15
2-1.2 Sources of Data ........................................................................................................................ 15
2-1.3 Kinds of Data ............................................................................................................................ 16
2-1.3.1 Numerical Versus Categorical Data .................................................................................. 16
2-1.3.2 Univariate, Bivariate and Multivariate Data ................................................................. 16
2-1.3.3 Qualitative Versus Quantitative Data ........................................................................... 16
2-1.3.4 Cross-Sectional, Time Series and Panel Data .................................................................... 17
Self-Assessment Test 2.1 .................................................................................................................. 18
Unit 2: Concepts of Relationships and Data Representation ................................................... 19
2-2.1 Concepts of Relationship .................................................................................................. 19
2-2.2 Meaning of Data Representation ...................................................................................... 19
2-2.3 Importance of Pictorial Data Representation ................................................................... 19
2-2.4 Effective Pictorial Representation of Data ........................................................................ 20
Self-Assessment Test 2.2 .................................................................................................................. 20
Unit 3: Symbol Chart and Graphical Representation of Data .................................................. 21
2.3.1 Symbol Charts or Diagrams ............................................................................................... 21
2-3.2 Bar Charts/Graphs............................................................................................................. 21
2-3.3 Single Column Bar Chart/Graph .................................................................................... 22
2-3.4 Single Horizontal Bar Chart/Graph................................................................................ 22
2-3.5 Multiple or Compound Column Bar Chart/Graph......................................................... 22
2-3.6 Multiple or Compound Horizontal Bar Chart/Graph .................................................... 23
2-3.7 Component or Stacked Column Bar Chart/Graph ........................................................ 23
2-3.8 Component or Stacked Horizontal Bar Chart/Graph .................................................... 24

v
2-3.9 Line Charts or Graphs ........................................................................................................ 24
2-3.10 Single or simple line graph ............................................................................................ 25
2-3.11 Multiple or compound line graph ................................................................................. 25
2-3.12 Component Line Chart or Graph ................................................................................... 26
2-3.13 Histogram .......................................................................................................................... 27
2-3.14 Pie Chart ............................................................................................................................ 27
Self-Assessment Test 2.3 .................................................................................................................. 28
CHAPTER THREE: SUMMARISING AND DESCRIBING DATA: STATISTICAL
MEASURES ............................................................................................................................ 30
Unit 1: Introduction to Statistical Measures ............................................................................ 31
3-1.1 Definition of Statistical Measures ..................................................................................... 31
3-1.2 Types of Statistical Measures............................................................................................ 31
3-1.3 Measure of central tendency ............................................................................................ 31
3-1.4 Measure of Variation/Dispersion...................................................................................... 32
3-1.5 Measure of Positions ........................................................................................................ 33
3-1.6 Measure of Shape ............................................................................................................. 33
Self-Assessment Test 3.1 .................................................................................................................. 34
Unit 2: Measure of Central Tendency ...................................................................................... 35
3-2.1 Introduction to Measure of Central Tendency ........................................................................ 35
3-2.2 Mean ........................................................................................................................................ 35
3-2.2.1 Arithmetic Means of Ungrouped Data ............................................................................. 35
3-2.2.2 Arithmetic Mean of Grouped Data ................................................................................... 36
3-2.2.3 Arithmetic Mean When Probability is Given .................................................................... 37
3-2.2.4 Weighted Arithmetic Mean .............................................................................................. 38
3-2.3 Median ..................................................................................................................................... 39
3-2.3.1 Median for Ungrouped Data............................................................................................. 39
3-2.3.2 Median for Grouped Data................................................................................................. 40
3-2.4 Mode........................................................................................................................................ 41
3-2.4.1 Mode of Ungrouped Data ................................................................................................. 42
3-2.3.2 Mode of Grouped Data ..................................................................................................... 42
3-2.4 Midrange.................................................................................................................................. 43
Self-Assessment Test 9...................................................................................................................... 44
Unit 3: Measures of Dispersion/Variation ............................................................................... 46
3-3.1 Introduction ............................................................................................................................. 46
3-3.2 Range ....................................................................................................................................... 46
3-3.3 Interquartile Range (IQR) ......................................................................................................... 47

vi
3-3.3 Semi-Interquartile Range ......................................................................................................... 48
3-3.4 Variance ............................................................................................................................ 48
3-3.4.1 Variance for Ungrouped Data ........................................................................................... 49
3-3.4.2 Variance for Frequency Distribution Table ....................................................................... 50
3-3.5 Standard Deviation .................................................................................................................. 52
3-3.6 Mean deviation MD ................................................................................................................. 53
Self-Assessment Test 3.3 .................................................................................................................. 54
Unit 4: Measures of Position.................................................................................................... 55
3-4.1 Introduction to the Measure of Position .......................................................................... 55
3-4.2 Quartiles ............................................................................................................................ 55
3-4.2.1 Quartiles of Ungrouped Data ........................................................................................ 55
3-4.2.2 Quartiles for Grouped Data of Frequency Distribution .................................................... 58
3-4.3 Deciles and Percentiles ..................................................................................................... 61
3-4.4 Steps in Finding a Value Corresponding to a Given Percentile ......................................... 62
Self-Assessment Test 11 ................................................................................................................... 64
UNIT 5: MEASURES OF SHAPE .......................................................................................... 66
3-5.1 Meaning of Measure of Shape .......................................................................................... 66
3-5.2 Skewness ........................................................................................................................... 66
3-5.2.1 Normal or Symmetric Distribution ................................................................................... 66
3-5.2.2 Positive Skewness or Right Skewed .................................................................................. 67
3-5.2.3 Negative Skewness or Left Skewed .................................................................................. 67
3-5.3 Test of skewness ............................................................................................................... 68
3-5.4 Measure of skewness ........................................................................................................ 68
3-5.5 Kurtosis ............................................................................................................................. 70
3-5.6 Moments ........................................................................................................................... 71
Self-Assessment Test 3.5 .................................................................................................................. 73
CHAPTER FOUR: SAMPLING ............................................................................................. 75
Unit 1: Probability Sampling ................................................................................................... 76
4-1.1 Sample, Sampling and Population ........................................................................................... 76
4-1.2 Purpose of Sampling ................................................................................................................ 76
4-1.3 Types of Sampling .................................................................................................................... 76
4-1.4 Probability Sampling ................................................................................................................ 76
4-1.4.1 Simple random sampling .................................................................................................. 77
4-1.4.2 Stratified random sampling .............................................................................................. 77
4-1.4.3 Cluster sampling: .............................................................................................................. 79
4-1.4.4 Systematic sampling: ........................................................................................................ 80

vii
Unit 2: Non-Probability Sampling ........................................................................................... 82
4-2.1 Non probability-sampling ........................................................................................................ 82
4-2.2 Purposive sampling: ................................................................................................................. 82
4-2.3 Quota sampling:....................................................................................................................... 82
4-2.4 Convenient sampling: .............................................................................................................. 83
4-2.5 Snowball sampling ................................................................................................................... 83
4-2.6 Judgmental sampling ............................................................................................................... 84
4-2.7 Consecutive sampling .............................................................................................................. 84
CHAPTER FIVE: PROBABILITY ......................................................................................... 85
Unit 1: Introduction to Probability........................................................................................... 86
5-1.1 Definition of Probability........................................................................................................... 86
5-1.2 Basic Probability Concept ........................................................................................................ 86
Unit 2: Simple and Mutually Exclusive Events ....................................................................... 87
5-2.1 Types of Events ........................................................................................................................ 87
5-2.2 Simple or Single Events ............................................................................................................ 87
5-2.3 Addition rule ............................................................................................................................ 87
5-2.4 General Multiplication Rule ..................................................................................................... 88
5-2.5 Joint probability ....................................................................................................................... 89
5-1.7 Probability Rule of Multiplication for Independent Event ...................................................... 90
Self-Assessment Test 15 ................................................................................................................... 90
References ................................................................................................................................ 91

viii
LIST OF TABLES

Table 1: Differences between descriptive and inferential statistics ........................................................ 9

Table 2: 1 Properties of Measurement Scales ....................................................................................... 13
Table 3: Cross-Sectional Data............................................................................................................... 17
Table 4: Time-Series Data .................................................................................................................... 17
Table 5a: Panel Data for One Individual (John Nyabeni) ..................................................................... 18
Table 6b: Panel Data for Several Individuals (Baba and Belinda) ....................................................... 18
Table 7: School Enrolment from 2011 to 2017..................................................................................... 25
Table 8: School Enrolment for SHS A and SHS B from 2011 to 2017 ................................................ 26
Table 9: Annual Sales of Companies from 2011 to 2017 ..................................................................... 26
Table 10: Population of Tribes of Students in Level 100 ..................................................................... 28
Table 11: Monthly starting wages ......................................................................................................... 47

ix
LIST OF FIGURES

Figure 1: Population of people .............................................................................................................. 21

Figure 2: Eggs ...................................................................................................................................... 21
Figure 3: Single Column Bar Chart: Average Marks of Students in Exams......................................... 22
Figure 4: Single Horizontal Bar Chart of Average Marks of Students in Exams ................................. 22
Figure 5: Multiple or Compound Column Bar Chart of Average Marks of Students in Exams ........... 23
Figure 6: Multiple or Compound Horizontal Bar Chart of Average Marks of Students in Exams....... 23
Figure 7: Component or Stacked Column Bar Chart of Average Marks of Students in Exams ........... 24
Figure 8: Component or Stacked Horizontal Bar Chart of Average Marks of Students in Exams ....... 24
Figure 9: Single or Simple Line Chart of School Enrolment from 2011 to 2017 ................................. 25
Figure 10: Multiple or Compound Line Graph of School Enrolment for SHS A and SHS B from 2011
to 2017 .................................................................................................................................................. 26
Figure 11: Component Line Chart of Annual Sales of Companies from 2011 to 2017 ........................ 27
Figure 12: Histogram of Frequency of Student Marks ......................................................................... 27
Figure 13: Pie Chart Showing Percentage of Tribes of Students in Level 100 ..................................... 28
Figure 14: Statistical Measures ............................................................................................................. 31
Figure 15: Measures of central tendency .............................................................................................. 32
Figure 16: Normal or Symmetric Distribution ...................................................................................... 67
Figure 17: Positive or Right Skewed Distribution ................................................................................ 67
Figure 18: Negative or Left Skewed Distribution ................................................................................. 68

x
CHAPTER ONE: INTRODUCTION TO STATISTICS

Introductory Remarks
The study of statistics has become more popular in the recent years due to the upsurge in the
usage of computers and statistical software packages. Almost every profession requires some
knowledge in statistics. In light of this, students in almost all disciplines are required to take at
least a course in statistics. The use of graphs and statistical tests is common in the analysis of
data for research today.
Hello, you are welcome to chapter one of this course, Basic Statistics. I believe you enjoyed
Basic Mathematics that you were taught in the first trimester. This chapter introduces you to
the concept of statistics and its importance. Note that you are going to be writing your
undergraduate dissertation in final year and hence you would be working with data related to
your field of study. Therefore, it is important you study this course, Basic Statistics for
Undergraduates. By the end of this chapter you would understand the reasons why this course
is cardinal in all fields.
In this chapter, you would be introduced to the concepts and definition of terms in Basic
Statistics. Your duty as a student is to develop the strong interest in the course and this cannot
be done without taking the pains to understand this first chapter of the book. The chapter is
divided into four units namely:
Unit 1: Origin and Definition of Statistics
Unit 2: Importance of Study of Statistics
Unit 3: Descriptive Versus Inferential Statistics
Unit 4 Terms in Statistics and Scales of Statistical Measurements

1
Unit 1: Origin and Definition of Statistics

Welcome to unit one of this chapter. It is my fervent hope that you will develop a strong interest
in Basic Statistics in this unit. Many at times, students complain about how statistics is being
taught. Most of these complains are engineered by the phobia of Mathematics developed since
elementary education. You are probably asking yourself the question, "When and where will I
use statistics?". I can assure you would find answers after studying this unit.
Unit Objectives
By the end of this unit, you should be able to:
 Define the term statistics
 Name and explain the branches of statistics
 Know the differences between ancient statistics and modern statistics
 Develop the desire to study statistics
1-1.1 Origin of Statistics
Statistics was used unofficially before 18th century. It was used during biblical Moses era and
also at the time of the birth of Jesus Christ (Roman census). The use of statistics was not
common. Statistics was used when rulers and kings needed information about lands,
agriculture, commerce, population of their states etc. to assess their military potentials, their
wealth, taxation and other aspects of government. The early use of statistics did not involve
much analyses and interpretation.
The first official use of statistics was in the 18th century. This official use of statistics occurred
in 1801 in England during a population census which included the collection, analysis and
interpretation of data. During the 20th century, several statisticians have actively developed
new methods, theories and applications of statistics. The availability of statistical computer
sofwares such as SPSS, Stata, Eviews etc for modern data analysis is a clear indication of
modern development of statistics.
In Ghana, chiefs unofficially used to count their subjects and used that information to share
how much tolls each individual was to pay. The application of statistics in Ghana dates back to
the British colonial rule. In the then Gold Coast, the British Colonial Government conducted
the first population census in 1891 and this was done mainly along the coast. After the Northern
and the Ashanti Territories were put under British rule, another census was conducted in 1901.
After Ghana gained independent, more censuses have been conducted and the latest was the
2010 population and housing census.
There is a big difference between the old statistics and the modern statistics. It is important to
note that the old statistics is part of the present statistics implying the ancient statistics is a
subset of modern statistics. Ancient statistics did not have statistical software to do rigorous
analysis.
Now, mention two differences between the old and the modern statistics.
1. ………………………………………………………………………………………

2. ………………………………………………………………………………………

2
1-1.2 Meaning of the Word “Statistics”
Have you heard the word statistics before? If yes, have you look at its meaning? If you do not
know the meaning, do not worry. Here is the literary meaning of the word “statistics”.
The Word statistics is derived from the Latin word “Status” or the Italian word “Statista”.
Status as in Latin or Statista as in Italian means “Political State” or a “Government”. In other
words, it means a “Statesman”.
There are numerous definitions of statistics. For the purpose of this course, we shall look at the
definition of statistics as Numerical Data and Statistics as a Subject. A number of specialties
have evolved to apply statistical theory and methods to various disciplines.
1-1.3 Definition of Statistics as a Subject
There are different subjects. Since your elementary school, you have studied many subjects.
Can you remember some? Obviously, you can remember Mathematics, Science, English,
Geography etc. Each of the subjects has its own definition.
For the purpose of this course, Statistics as a subject is defined as the science of collecting,
organizing, presenting, analyzing and interpreting data for the purpose of assisting in
making a more effective decision (Mason, 1990). As a subject, the term statistics is often used
as a statistical procedure and hence it is a science which follows certain steps. Therefore, the
order of arrangement of key words in the definition must not be mixed. It should be in the
order; collection of data → organization of data → presenting of data → analyzing data →
interpreting analyzed data →decision making.
1-1.4 Definition of Statistics as a Body of Numbers or Information
As a body of numbers, Statistics can be said to be the raw numerical facts or data collected
for analysis. It can also be defined as the information generated from numerical facts or data
after the analysis. With this, one can talk of demographic statistics, agricultural statistics, health
statistics, educational statistics etc.
1-1.5 Statistics in Education, Medicine, Business and Agriculture
1-1.5.1 Educational Statistics
Educational statistics falls under both statistics as a subject and statistics as a body of numbers
or information.
As a subject of study, educational statistics is the study of the methods and procedures used in
collecting, organizing, analyzing and interpreting data or information relating to education
for effective decision making (Etsey, 2014). It also entails making meaning out of raw data
available in the field of education. At the end of this module, you would understand educational
statistics very well and apply it in an office or classroom.
As a body of numbers or information, educational statistics is defined as a body of numbers or
data or information on education. Information such as the school enrolment, number of
teaching staff, literacy rate, rate of school dropout, pupils’ attendance, male-female student
ratio etc deal with educational statistics.

3
1-1.5.2 Medical Statistics
Similarly, medical statistics can be defined as a subject and as a body of numbers or
information.
Medical statistics is the study of the methods and procedures used in collecting, organizing,
analyzing and interpreting data or information relating to medicine and health sciences for
effective decision making (Armitage, Berry and Matthews, 2002).
As a body of numbers or information, medical statistics is defined as a body of numbers or
data or information on medicine and health sciences. For instance, information on births,
deaths, temperature, weight, blood pressure are called vital statistics in medical field.
1-1.5.3 Business Statistics
Business statistics can also be defined as a subject and as a body of numbers or information.
Business statistics is the study of the methods and procedures used in collecting, organizing,
analyzing and interpreting data or information for effective decision making in the face of
uncertainty (Morien, 2007).
As a body of numbers or information, business statistics is defined as a body of numbers or
data or information on business. For instance, financial information, marketing information etc
1-1.5.4 Agricultural Statistics
As noted above agricultural statistics can be defined as a subject and as a body of numbers or
information.
Agricultural statistics is the study of the methods and procedures used in collecting, organizing,
analyzing and interpreting data or information on the agricultural production, agro-
processing, agricultural marketing and consumption for effective decision making in the face
of uncertainty.
As a body of numbers or information, agricultural statistics is defined as a body of numbers
or data or information on acreages, quantity of fertilizer, number of bags of maize etc.

Self-Assessment Test 1.1

Q1. Briefly explain the meaning of the word “statistics” as derived from Latin or Italian.
Q2. Briefly explain the origin of statistics
Q3. Define statistics as
(a) a subject
(b) a body of numbers
Q4. Give two examples each of statistics as a subject and as a body of numbers
Q5. Define agricultural statistics, business statistics, medical statistics, economics statistics and
educational statistics
Q6. Briefly explain the “ancient statistics” and “modern statistics”.

4
Unit 2: Importance of Study of Statistics

Hello, you are welcome to unit 2 of this introductory chapter. From unit 1, we noted that
statistics originated from Latin or Italian word. We also learnt the ancient and modern
application of statistics. The various definitions of statistics have also been vividly explained.
In this unit, we shall focus on the importance of this study of statistics
Unit Objectives
By the end of this unit, you should be able to:
 State the importance of the study of statistics
 Mention and explain the application of statistics in everyday life
 Explain why teachers need to assess the performance of the students
 Give examples of statistics encountered in everyday life
1-2.1 Importance of the Study of Statistics
A number of statistical concepts have impact on a wide range of fields. Broadly the following
are the reasons why one need to study statistics irrespective of the persons’ field of study.

 Effective conduction of research

With the study of statistics, one can effectively design data collection instrument,
collect the data, analyse, present, discuss, interpret and make necessary predictions. The
study of statistics also helps one to know the type of data that is needed and the
methodology required for the analysis.

 To be able to read and understand research findings

Many methods are used in analysing data and discussing research findings. It is
important for graduate to read and understand basic research findings.

 Develop critical and Analytical thinking

Most students do not understand basic relations. Statistics is sometimes rigorous. If a
student is able to go through this rigorous study and understand, it helps the student to
develop critical and analytical skills for life.

 Help managers to make sound judgments as decisions are based on data but not
on assumptions

 Helps businesses to plan better and make predictions about the future

1-2.2 Importance of the Study of Educational Statistics

For instance, the study of statistics is important to teachers for several reasons. These reasons
include:
 Ensuring the quality of education is being kept high.

5
 Monitor students’ performance and progress.
 Monitor the teacher's productivity performance and progress.
 Check the effectiveness of a subject being taught.
 Design of experiments for study.
 Statistics can be helpful in any number of situations as they make it possible to
analyze sets of data so as to make informed conclusions about that data.
All these reasons for the study of statistics for teachers can be grouped into four; these are
explained in the section below.
1-2.3 Four Main Reasons for the Study of Educational Statistics
There are four main reasons why teachers are mandated to study educational statistics. With
the study of educational statistics, teachers are able to monitor students’ performances,
evaluate their own performance, evaluate the performance of subjects and manage
educational institutions in the management level. These four reasons are explained below.
1-2.3.1 Monitor Student's Performance and Progress
It is important for teachers to study statistics so as to be able to monitor students' progress
throughout the term or academic year. Teachers often give students tests or quizzes
intermittently and end of term examination. These tests or exams are aimed at keeping track
and monitoring the performance of students to see if they are doing well. The statistical
information from tests or examination scores are useful to the students as it helps the teacher
to offer extra help to students who are weak or advice the parents accordingly.
1-2.3.2 Help Teachers Evaluate their Own Performance
Statistics can also be used by educationists or teachers to evaluate their own performance in
terms of the methodology they used in teaching students. With information on the performance
of students, teachers can assess themselves to know whether or not they are teaching well. It
also affords teachers the opportunity to know whether or not the time allocated for tests or
examinations are adequate.
1-2.3.3 Evaluate Performance of Subjects
Statistics can also be used by educational institutions in general to assess the performance of
students in a particular subject. It can also show where there is possible room for improvement
and by analysing this data; these improvements can be implemented as soon as possible.
1-2.3.4 Importance of Statistics in Educational Management
It is important for the school manager or administrator to plan, implement plans and evaluate
their success. Statistical data is used to do these effectively. The day to day decision making of
educational managers is based on data. For instance, to make purchases for stationaries or
furniture, the school manager uses statistical data on class sizes and the age of students so as
to arrive at the quantities to be purchased. Data on enrolment, class size and number of teachers
will enable a school manager make requisition on the materials require for day to day running
of the school.

6
1-2.4 Everyday Uses of Statistics
As you are aware that we live in information age and the world is now a global village,
information abounds on the internet and we need to use them effectively to enhance our stay
on this planet earth. Statistics shape your life in the following areas:
 Everyday weather forecast helps you to plan your daily movements
 Political campaign and indicators of the economic performance helps one to make
voting decision
 Advertisement on TV, Radio, internet or print media help you make choices as to which
commodity to buy
 Taking stocks or inventory of foodstuffs in the house and deciding on the quantity to
purchase
 Making decision on the quantity of food to eat to be satisfied
Self-Assessment Test 1.2
Q1. Explain the four-main importance of the study of statistics
Q2. Why do educational managers need to learn statistics?
Q3. Give five examples of how statistics is used in our everyday activities
Q4. Discuss five importance of the study of statistics in your field of study.

7
Unit 3: Descriptive Versus Inferential Statistics

Introductory Remarks
Hello, I hope you have had brief understanding of why you should study Educational Statistics
as you are preparing to work in your field of study after graduation. In this unit you are going
to build on from the knowledge you gained from units 1 and 2. We shall look at differences
between descriptive and inferential statics.
Unit Objectives
By the end of this unit, you should be able to:
 Define descriptive statistics
 Define inferential statistics
 Explain measures of descriptive statistics
 Explain measures of inferential statistics
 State differences between descriptive and inferential statistics
1-3.1 Overview of Descriptive Versus Inferential Statistics
In recent times, the role statistics play in research cannot be under-estimated. Through the study
and applications of statistics, researchers are able to collect data, organise, analyse and make
inductions for predictions and decision making. It is important for researchers to make analysis
of the data by employing descriptive or inferential statistics. From different definitions of
statistics, can you try to define descriptive and inferential statistics? Try and define it yourself.
Descriptive statistics ……………………………………………………………………………
Inferential statistics…………………………………………………………………………….
Now, here you are with the correct definition of descriptive and inferential statistics
1-3.2 Definition of Descriptive statistics
Descriptive statistics: It is the discipline of statistics which describes important features,
characteristics and properties of data set. It describes the features of the sample or population
data. Components of descriptive statistics are:
Measure of central tendency: Measure of central tendency describes the point about which
the various observed values group or cluster. The examples of measure of central tendency are
mean, mode, and median.
Measure of dispersion or variation: Another statistical measure which depicts descriptive
statistics is the measure of dispersion. Measure of dispersion show the spread of data about the
point. Eg. range, interquartile range, variance, standard deviation, mean deviation and
coefficient of variation.
Measure of positions: There is also a measure of position which is also descriptive statistics.
It is a measure of the point or location at which a data set can be identified. Quartiles, quintiles,
deciles and percentiles are all measures of positions which describe particular location of a data
set.

8
Diagrammatic or Tabular Representations: Lastly, graphical tools such as charts, tables,
and graphs summarize and describe features of data sets.
These components of descriptive statistics will be dealt with in details in subsequent chapters.
Measure of shape: It is a measure of how a data set is distributed. Skewness, kurtosis and
moment are used to measure the distribution of data.
1-3.3 Definition of Inferential statistics
Inferential statistics: It is the type of statistics which estimates the characteristic of a sample
data set with the main aim of making generalisation about the population. It is the branch of
statistics that involves drawing conclusions about a population based on information contained
in a sample taken from that population (http://www.saylor.org/books)
With inferential statistics, a head teacher can take the age of a certain percentage of pupils in
JHS 2 of schools A and B, estimate a statistic parameter and generalised that there are
differences between the ages of pupils in the two schools. In a nutshell, it is a statistical analysis
that can be used to draw conclusions about the population when it is not possible to collect data
from each and every member of the population. It is important to note that inferential statistics
involve hypothesis testing. Do not worry if you do not understand hypothesis testing. The
methods of inferential statistics are analysis of variance, chi-square test, student’s t-test,
regression analysis, etc.
1-3.4 Key Differences between Descriptive and Inferential Statistics
From the above definitions of descriptive and inferential statistics, try and write four
differences between the two. Discuss it with your course mates.

The table below illustrates the key differences between descriptive and inferential statistics.
Table 1: Differences between descriptive and inferential statistics
Basis of Descriptive statistics Inferential statistics
Comparison
Meaning Describes the features of the Draw conclusions about the population
data set based on the sample data
What it does? Organise, analyse and present Compares data, tests hypothesis and make
data in a meaningful way predictions
Form of final Numerical values, charts, Probability
results graphs and tables
Usage To describe a situation To explain the chances of occurrence of
an event
Function It explains the data which is It makes conclusions about the population
already known with the aim of which extend beyond the available data
summarising the data

Self-Assessment Test 1.3

Q1. Define descriptive statistics
Q2. Define inferential statistics

9
Q3. Mention and explain measures of descriptive statistics
Q4. Explain measures of inferential statistics
Q5. State four differences between descriptive and inferential statistics
Q6. Differentiate between measure of tendency and measure of dispersion
Q7 Descriptive statistics is less important for policy implications in research than inferential statistics.
Briefly explain

Q8. Give two examples each of:

(a) measure of central tendency
(b) measure of position
(c) measure of dispersion
(d) measure of shape

10
Unit 4: Terms in Statistics and Scales of Statistical Measurements

Welcome to unit 4 of the introductory chapter of this book. Well done for reaching this far. I
believe it has been so fantastic and you have enjoyed the previous units. In this unit, you are
going to build on from the knowledge you gained from units 1, 2 and 3 by learning
terminologies which are commonly used in statistics. You are also going to study scales of
statistical measurement. Are you set for us to start unit 4? Ready, go!!!
Unit Objectives
By the end of this unit, you should be able to:
 Define certain terms used in statistics
 Mention and explain different scales of statistical measurements
 Differentiate between nominal and ordinal scales
 Differentiate between interval scale and ratio scale
 Mention and explain types of variables
 Differentiate between discrete and continuous variables
 Differentiate between categorical and continuous variables
 Differentiate between discrete and categorial variables
 Differentiate between ordered and unordered variables
 Differentiate between dependent and independent variables
 Differentiate between quantitative and qualitative variables
1-4.1 Definition of terms
 Population: it is the totality of items or things under consideration. Population is
thought to be an entire collection of persons, things, or objects under study. Population
is the collection of all individuals or items under consideration in a statistical study
(Weiss, 1999). Due to inadequate resources, and the fact the population has similar
characteristics, a sample can be selected for a study.

 Sample: The idea of sampling is to select a portion (or subset) of the larger population
and study that portion (the sample) to gain information about the population. Sample is
that part of the population from which information is collected (Weiss, 1999).

 Finite population: A population under consideration which can be physically listed.

For example, the students of the University for Development Studies, the lecturers in
the University for Development Studies.

 Parameter: it is a summary measure that is computed to describe a characteristic of an

entire population. A parameter is an unknown numerical summary of the population.

 Statistic: it is a summary measure that is computed to describe a characteristic from a

sample of a population. A statistic is a known numerical summary of the sample which
can be used to make inference about parameters (Agresti and Finlay, 1997).

 Census: A survey that includes every member of the population.

11
 Sample Survey: The technique of collecting information from a portion of the
population.

 Representative Sample: It is a sample whose characteristics have close resemblance

of the characteristics of the population.

 Random Sample: A random sample is a sample which is selected in such a way that
each member of the population has equal chance of being selected.

 Data: It is a raw numerical or non-numerical information collected for analysis to make

a more informed decision.

 Variable: It is a characteristic under study or investigation that assumes different

values for different elements. A variable is anything that, when measured can produce
two or more different scores. A few of the variables found in behavioural research
include age, race, gender, intelligence, your personality type or political affiliation,
anxiousness, anger, aggressiveness, attractiveness, hard work, memory recall etc.
1-4.2 Measurement Scales
Data are usually collected using survey instruments which are sets of questions. The variables
that are measured can be classified into several characteristics or scales. In terms of
measurements, there are four levels of scales. In the order of weakest to strongest, they are
nominal, ordinal, interval and ratio scales.
1-4.2.1 Nominal Versus Ordinal Scales
If the categories of a qualitative variable are unordered, then the qualitative variable is said to
be on a nominal scale. The word nominal refers to the fact that the categories are merely names.
When measuring using a nominal scale, one simply names or categorizes responses. Gender,
marital status, favourite colour, and religion are examples of variables measured on a nominal
scale. The essential point about nominal scales is that they do not imply any ordering among
the responses. For example, when classifying people according to their favourite colour, there
is no sense in which green is placed “ahead of” blue. Responses are merely categorized.
Nominal scales embody the lowest level of measurement.
If the categories can be put in order, the scale is called an ordinal scale. For example, a
researcher wishing to measure students’ satisfaction about the banku they consume, one student
may specify his/her feelings as either “very dissatisfied,” “somewhat dissatisfied,” “somewhat
satisfied,” or “very satisfied.” The items in this scale are ordered, ranging from least to most
satisfied. Unlike nominal scales, ordinal scales allow comparisons of the degree to which two
subjects possess the dependent variable. For example, our satisfaction ordering makes it
meaningful to assert that one person is more satisfied than another Examples of ordinal
variables are education (classified e.g. as low, high) and "strength of opinion" on some proposal
(classified according to whether the individual favors the proposal, is indifferent towards it, or
opposes it), and position at the end of race (first, second, etc.).

12
1-4.2.2 Interval Versus Ratio Scales
Quantitative variables, whether discrete or continuous, are deﬁned either on an interval scale
or on a ratio scale.
Interval scales are numerical scales in which intervals have the same interpretation throughout.
An interval scale is an ordered scale in which the difference between the measurements can be
compared meaningfully.
If, on the other hand, one can compare both the differences between measurements of the
variable and the ratio of the measurements meaningfully, then the quantitative variable is
deﬁned on ratio scale. A ratio scale is an interval scale with a meaningful absolute zero point
and it is the highest level of measurement. It has all the properties of interval scale.
Additionally, it has absolute zero point which interval scale of measurement has not. For
example, temperature measured on the Centigrade system is an interval variable and the height
of person is a ratio variable.
Table 2: 1 Properties of Measurement Scales
Properties/Scales Nominal Ordinal Interval Ratio
scale scale scale scale
Order No Yes Yes Yes
Difference No No Yes Yes
Ratio No No No Yes

1-4.3 Types of Variables

There are different types of variables.
1-4.3.1 Discrete/Categorical Versus Continuous Variables
A variable which has only a countable number of distinct possible values is called a discrete
variable. A discrete variable is a variable whose possible values are some or all of the ordinary
counting numbers like 0,1,2,3,.... That is, a variable is discrete if it can assume only a ﬁnite
numbers of values or as many values as there are integers. E.g. number of houses, cars,
accidents. A scale such as 1, 2 and 3 representing low, high and highest is an example of
discrete or categorical variable.
On the other hand, numerical responses that arise from a measuring process with intermediate
values called continuous variable e.g. length, age, height, weight, time, marks of class test etc.
For instance, there are intermediate ages between 12 and 13 years. Note that a discrete variable
can assume only certain values with no intermediate values but a continuous variable can take
any value including intermediate values.
1-4.3.2 Dependent Versus Independent/Explanatory Variables
A dependent variable is a variable whose value depends on some other variable (s). It reflects
either pre-existing or is the result of manipulation of some independent variable(s). An
independent variable is one that does not depend on any other variable or factor. This is the
variable that can be manipulated in order to see if its changes will affect behaviour of another
variable. Independent variables are variables that cause the effects of order variables. Examples
include the following; tastes and any special dietary needs of household members are some of
the variables that influence a household’s decision about food expenditure. These variables are

13
therefore called independent or explanatory variables because they all vary independently, and
they explain the variation in food expenditure among different households. In other words,
these variables explain why different households spend different amounts of money on food.
Food expenditure is called the dependent variable because it depends on taste and special
dietary needs (IVs). Simply a dependent variable is a condition or piece of data in an
experiment that is controlled or influenced by at least one outside factor, mostly the
independent variable. An independent variable is a condition or piece of data in an experiment
that can be controlled or changed.
1-4.3.3 Ordered and Unordered Variables
An ordered variable is a categorical variable for which the possible values are ordered. In
ordered variables, the ordering of the variables matters but not the differences between/among
values. The values simply express an order. Examples include; 1-first class, 2-second class
upper, 3-second class lower, 4-third class, 5-pass; 1-excellent, 2-very good, 3-good, 4-average,
5-pass.
Unordered variables are those variables that are categorical with two or more categories
without having any kind of natural order. If the variable is ordered, the ordering makes very
little sense or none at all. These variables have no numeric values and arithmetic operations
such as addition, subtraction among many others cannot be performed on them. Examples
include the following: political affiliation, sex, eye colour, genotypes, blood groups, etc.
1-4.3.4 Quantitative Versus Qualitative Variables
A quantitative variable is a variable that can be measured numerically. Quantitative data can
easily be analysed using mathematical tools than qualitative data. For example, it does not
make sense to ﬁnd an average hair color or blood type. Amount of money, pulse rate, weight,
number of teachers in the school, and the number of students who take statistics are examples
of quantitative data. Quantitative data may be either discrete or continuous
Qualitative variable is a variable which is not numeric. It categorizes or describes the attributes
of a population. Qualitative data are generally described by words or letters. For instance, hair
color might be black, dark brown, light brown, blonde, gray, or red. Blood type might be AB+,
O-, or B+.
Self-Assessment Test 1.4
1. Define the following terms: sample, population, data, variable, census, statistic,
parameter, finite population
2. Mention four scales of measurement and briefly explain them.
3. Differentiate between
(i) discrete and continuous variables
(ii) qualitative and quantitative variables
(iii) ordered and unordered variables
(iv) dependent and independent variables

14
CHAPTER TWO: RELATIONSHIPS AND REPRESENTATION OF DATA

Introductory Remarks
Hello, you are welcome to chapter two of this course, Basic Statistics. I believe you enjoyed
all units under chapter one. This chapter is a continuation of the concepts in chapter one.
In this chapter, you will learn how to establish relationships in data. How to represent data on
graphs and charts for easy appreciation will be brought to bear in this chapter as well.
This chapter will be treated under the following units:
Unit 1: Meaning and Sources of Data
Unit 2: Concepts of Relationships and Data Representation
Unit 3: Symbol Chart and Graphical Representation of Data

Unit 1: Meaning and Sources of Data

You are now in unit one of chapter two. It is important to note that this unit introduces you to
data. In this unit, you will first be introduced to definition of statistical data and its sources.
You will also learn the types of statistical data and their meanings to help you appreciate the
data that you will need for a particular analysis.

Unit Objectives
By the end of this unit, you should be able to:
 Define data
 Mention and explain sources of data
 Mention and explain types of data

2-1.1 Definition of Data

It is a raw numerical or non-numerical information which can be analysed to make an informed
decision. It is a set of raw quantitative or qualitative information that needs to be analysed for
better understanding and decision making. Some data are semi-processed or analysed and can
easily be understood. Some too are raw and require analysis before ones understanding.
2-1.2 Sources of Data
Data are collected from organisations, individuals, government etc. There are two main sources
of data. Data can be obtained from secondary sources such as governmental institutions or
organisation, non-governmental organisations or private institutions or organisations. We can
also design an experiment or conduct a survey to obtain data from primary sources.
 Primary Data: It is the type of data collected from the field or through an experiment
designed by the researcher or investigator for first-hand experience. Its advantage is
that is specifically tailored to meet one’s research need. Meanwhile, it is expensive to
collect (www.managementstudyguide.com/secondary-data.htm).
 Secondary Data: it is the type of data that have originally been collected by someone
rather than the user (www.managementstudyguide.com/secondary-data.htm).

15
 .
2-1.3 Kinds of Data
There are different kinds of data just like variables. These kinds of data are explained below.
2-1.3.1 Numerical Versus Categorical Data
Numerical data is information that can easily be measured. This information is always collected
in numbers or in number form. An example of numerical data would be the number of students
who attended basic maths class last semester on a weekly basis. One way to identify a
numerical data is seeing if the data can be added together. Numerical data can also be put in
ascending or descending order.
Categorical data represents a type of data that may be divided into groups or categories.
Examples of categorical data are sex, age group and educational level. Age group and
educational level may also, be considered as numerical by using the exact values for age and
the highest class completed. However, it is often more informative to categorize such variables
into a relatively small number of groups for qualitative descriptions.
2-1.3.2 Univariate, Bivariate and Multivariate Data
A univariate data is a collection of information characterised by or depending on only one
random variable. This is a type of data where the focus is only on observing one aspect of the
item of interest at a time. Univariate data does not answer the question of relationship that
exists between variables, but rather it is used to describe one characteristic or attribute that
varies from observation to observation. Examples of univariate data include the heights of
students, the net salary of lecturers, ages of workers among many others.
A bivariate data deals with two variables that can change and compared to find relationships.
In the case that one variable influences another variable, then a bivariate data is said to exist or
observed. This is because one variable depends on the other. Thus a dependent variable
depending on an independent variable to study the relationship that exists between the two. For
example, studying the relationship that exists between academic performance (test scores) and
study time (hours spent studying). The study time is independent which could influence
academic performance, the dependent.
A multivariate data refers to data in which description and analysis is based on more than two
variables per observation. Usually multivariate data is used for explanatory purposes just like
in a bivariate data but here, there are more than two variables that are under consideration at
the same time. An example of multivariate data is data on a lecturer studying the relationship
that exists among academic performance, study time, monthly allowance of students, residency
status and so on.
2-1.3.3 Qualitative Versus Quantitative Data
Qualitative observations or data are typically categories, groups or characteristics. Examples
include levels of education (primary, JHS, SHS, VOC/TECH, tertiary, etc.), hair colour, eye
colour and favourite food. Quantitative observations or data on the other hand are numerical
values. Weight of students, shoe sizes, ages of students and workers are examples of
quantitative observations. These two types of data are generally treated differently. This is so
because, qualitative observations cannot be sorted into a numerical order. For example, suppose

16
you are analysing the hair colour of a group of ladies. You might take each lady and categorize
her into one of a few groups: black, blonde, brown, gray, etc. The colour black is not superior
or inferior to blonde; it is just different. Quantitative observations, on the other hand, are
meaningful numerical values and so they can be sorted. If you are recording the ages of
students, a student who is 30 years is older than someone who is 25 years and that students is
also older than one who is 20 years and so on. This mathematical ordering allows us to use the
full resources of arithmetic, algebra, and even calculus to summarize quantitative data.
2-1.3.4 Cross-Sectional, Time Series and Panel Data
Cross-sectional data: A cross-sectional data set consists of sets of samples of individuals,
households, firms, countries, regions or any other type of units at a specific point in time. This
kind of data is simply data on individuals, households, firms and so on at a given point in time.
Example includes the expenditure of first year students in the last semester.
Cross-Section Data are data collected on different elements at the same point in time or at the
same period of time.
Table 3: Cross-Sectional Data
Schools School fees for second
term in 2017 (Gh¢)
JHS A 67
JHS B 80
Millennium Academy JHS 450

A time series data: A time series data set consists of observations on one or several variables
over time. Time series data are data collected at multiple points in time on a single unit or
observations. The time period could be hourly, weekly, monthly, quarterly, yearly and so on.
Examples of time series data include fuel prices collected monthly over a 10-year period,
Ghana’s gross domestic product (GDP) over a period of time, etc.
Time-series data are data collected on the same element for the same variable at different points
in time or for different periods of time.
Table 4: Time-Series Data
Year Performance level
in BECE (%)
2012 68
2013 80.8
2014 79.5
2015 90
2016 96

17
Panel data: it is the type of data set that follows a given sample of individuals over time and
thus provides multiple observations of each individual in the sample (Hsiao, 2003, page 2). It
consists of observations on several variables obtained over multiple period of time for the same
firms or individuals. Panel data is also called longitudinal data. It gives large data set and hence
it is difficulty to analyse (Diggle et al., 2002).
Table 5a: Panel Data for One Individual (John Nyabeni)
Year Number Farm Size Annual Revenue Annual
of goats (acres) (Gh¢) Savings (Gh¢)
2013 10 5 5500 1500
2014 17 5 6000 1600
2015 23 6.5 7200 2200
2016 23 8 7000 1900
2017 26 7.2 7600 2250

Table 6b: Panel Data for Several Individuals (Baba and Belinda)
Persons Year Annual Educational Level
Salary (Gh¢)
Baba 2015 6000 SHS
Baba 2016 6500 SHS
Baba 2017 7800 Teacher Training College
Belinda 2015 8600 Teacher Training College
Belinda 2016 8950 Teacher Training College
Belinda 2017 14000 University

Self-Assessment Test 2.1

1. Name two sources of data and explain them
2. Distinguish between
(a) Numerical and categorical data
(b) Quantitative and qualitative data
3. Briefly explain cross-section data, time series data and panel data and give one
example each
4. Briefly explain univariate data, bivariate data and multivariate data

18
Unit 2: Concepts of Relationships and Data Representation

Welcome to unit 2 of chapter two. Well done for reaching this far. How has it been so far? I
hope you have understood and appreciated data, sources of data and types of data treated in
unit one of this chapter. This is the second unit of chapter two and you are going to be learning
the concept of data representation and its importance in statistics. Are you set for us to start
unit 2? Ready, go!!!
Unit Objectives
By the end of this unit, you should be able to:
1. Explain relationship in statistics
2. Understand graphical data representation
3. State and explain the importance of pictorial data representation
4. State the principles of effective pictorial data representation

2-2.1 Concepts of Relationship

The world over have natural and artificial things which relate with each other in one way or
the other. In real world, animals have relationship with plants, soil, human etc. There is a
relationship between you and the lecturer or you and your father, you and the president of
Ghana.
A relationship is a pattern that exists between two or more variables in such a way that, as one
variable change, the corresponding one also changes in a consistent manner. For example, one
can collect data to establish the relationship between the number of hours studied and test score
of students. It is also possible to establish the strength of relationships. This relationship can be
represented on graphs, tables or chart for easy understanding.
2-2.2 Meaning of Data Representation
Data representation is the diagrammatical or tabular representation of raw or processed data
for easy understanding. The diagrams that represent the data are able to show the pictorial
information for easy assimilation.
Graphical representation of data was introduced in statistics by William Playfair in 1786. He
introduced the line chart, bar chart and histogram into his works on economics. This was
followed by his invention of the pie chart and circle chart in 1795 which he used to display the
evolution of England's imports and exports.
2-2.3 Importance of Pictorial Data Representation
Etsey (2014) classify the purposes of pictorial data representation into four. In this book, they
are termed as importance of pictorial representation of data and they are grouped into five.

 Easy to keep pictures or charts in mind: Pictures or graphs or charts appeal to the
mind faster and help the target audience to easily understand.
 Pictorial representation of data helps us identify relationships. One can easily picture
the information presented and be able to identify the relationship between variables.

19
 With pictorial illustration of data, target audience are able to make comparisons of
variables easily.
 Another reason why pictorial data representation is important is that, it helps one to
easily understand the information. It actually hastens the process of processing
information by the mind and hence increases the rate of understanding.
 Pictorial representation of data helps one to display a large information in a small
space

2-2.4 Effective Pictorial Representation of Data

Data can pictorially be represented to convey effective information if certain principles are
adhered to. These principles must be followed to the latter.
 Check the data to ensure that it can be pictorially represented
 Label the axes of your graph or chart appropriately and correctly. Thus, the dependent
and independent variables should always be plotted correctly
 Include the unit of the variables
 Give your graphs or charts descriptive titles. E.g. a graph showing the relationship
between age and height of students
 Indicate the key on the graph or chart for easy understanding
 The graph should be bold, large enough and clear for audience to see

Self-Assessment Test 2.2

1. Define relationship in statistics
2. What do you understand by data representation?
3. State and explain the importance of data representation
4. What are the principles for effective data representation?

20
Unit 3: Symbol Chart and Graphical Representation of Data

Splendid, you have done well by sustaining the interest and studying this book up to this unit.
Welcome to unit 3 of this chapter. In unit 2, we learnt the importance of representing data on
graphs, tables or chart. This unit will help you understand how to represent data by using
symbol charts, graphs, tables among others. It is my fervent hope that you will enjoy the unit.
Unit Objectives
By the end of this unit, you should be able to:
 Know the differences between symbol charts and graphical charts.
 Know the different types of bar charts and when they are used
 Manually draw or construct all forms of bar charts, pie charts, histogram and line graphs
 Know the difference between histogram and bar charts
 Discuss the uses, strengths and limitations of bar charts, pie charts, histogram and line
graphs

2.3.1 Symbol Charts or Diagrams

With symbol charts or diagrams, data or information are pictorially represented by the pictures
of the items they represent. It involves the representation of data using pictures that represent
the identity of the data. They include pictograms, ideographs, isotypes or pictograph. This type
of data representation is always good for audiences who have no or little education.
For instance, the population of adults, youth and children can be represented as shown below:

Figure 1: Population of people

Similarly, the eggs produced by a farmer can be depicted as shown below:

Figure 2: Eggs

2-3.2 Bar Charts/Graphs

As the name suggests, they are charts or graphs which are in the form of bars. The bars are
either in horizontal or vertical (column) form. The bars can be multiple or single. There are
also floating bar charts.

21
2-3.3 Single Column Bar Chart/Graph
It is a bar chart or graph with columns or vertical bars that represent a single category or
group. In figure 3, each subject represents a single category of group.

100
Average marks obtained (%)

50 78 88 85
45 55

Subjects
Figure 3: Single Column Bar Chart: Average Marks of Students in Exams

2-3.4 Single Horizontal Bar Chart/Graph

It is a bar chart or graph with horizontal bars that represent a single category or group. In figure
4, each subject represents a single category of group.

Economics 85
Geography 88
Subjects

Science 55
Maths 45
English 78

0 50 100
Average marks obtained (%)

Figure 4: Single Horizontal Bar Chart of Average Marks of Students in Exams

2-3.5 Multiple or Compound Column Bar Chart/Graph

With multiple or compound column bar chart, there are two or more columns or vertical bars
which represents subgroups within a major group. In figure 5, the males and females are
subgroups within a major group subject.

22
95
100 Males Females 88 85 85
Average marks obtained (%)
78 80
80 65
55
60 45
40
40
20
0

Subjects

Figure 5: Multiple or Compound Column Bar Chart of Average Marks of Students in Exams

2-3.6 Multiple or Compound Horizontal Bar Chart/Graph

The multiple or compound horizontal bar chart has two or more horizontal bars which
represents subgroups within a major group. In figure 6, the males and females are subgroups
within a major group subject with the bars being horizontal

Economics 85
85
Geography 95
88

Science 65
55
Subjects

Maths 40
45
80
English 78

0 20 40 60 80 100

Females Males Average marks obtained (%)

Figure 6: Multiple or Compound Horizontal Bar Chart of Average Marks of Students in

Exams

2-3.7 Component or Stacked Column Bar Chart/Graph

It is a column bar which has subgroups that come together to form the total major group (see
figure 7).

23
Males Females
Average marks obtained (%)
200
180
160
140 95
85
120 80
100
65
80
60 40
40 78 88 85
45 55
20
0
English Maths Science Geography Economics

Subjects

Figure 7: Component or Stacked Column Bar Chart of Average Marks of Students in Exams

2-3.8 Component or Stacked Horizontal Bar Chart/Graph

It is a horizontal bar which has subgroups that come together to form the total major group (see
figure 8).

Economics 85 85

Geography 88 95
Subjects

Science 55 65

Maths 45 40

English 78 80

0 50 100 150 200

Males
Average marks obtained (%)
Females

Figure 8: Component or Stacked Horizontal Bar Chart of Average Marks of Students in

Exams

2-3.9 Line Charts or Graphs

A line chart or graph illustrates or displays information as a series of points which are connected
by straight line segments (https://en.m.wikipedia.org/wiki/Line_chart). It is similar to scatter
plot. The difference is that a scatter plot points are not joined together but with line chart, the
24
points are joint together in an orderly manner. There are many line graphs but notable among
them are single or simple line graph, multiple or compound line graph and component line
graph. Line charts are used to show trends or changes in data over a period of time.
2-3.10 Single or simple line graph
A single or simple line chart or graph is a type of line chart which illustrates or displays
information as a series of points which are connected by straight line segments for one group
or category. It has one independent variable and one dependent variable (see figure 9).

Table 7: School Enrolment from 2011 to 2017

Year Enrolment
2011 80
2012 178
2013 160
2014 200
2015 225
2016 210
2017 240

300

250

200
Enrolment

150

100

0
2011 2012 2013 2014 2015 2016 2017
Years

Figure 9: Single or Simple Line Chart of School Enrolment from 2011 to 2017

2-3.11 Multiple or compound line graph

A multiple or compound line graph is a type of line chart which illustrates or displays
information as a series of points which are connected by straight line segments for two or more
groups or categories. It has one independent variable with two or more dependent variables
(see figure 10).

25
Table 8: School Enrolment for SHS A and SHS B from 2011 to 2017
Year Enrolment in SHS A Enrolment in SHS B
2011 80 85
2012 178 150
2013 160 170
2014 200 140
2015 225 160
2016 210 250
2017 240 275

300

250
School Enrolment

200

150
Enrolment in SHS A
100
Enrolment in SHS B
50

0
2010 2011 2012 2013 2014 2015 2016 2017 2018

Years

Figure 10: Multiple or Compound Line Graph of School Enrolment for SHS A and SHS B
from 2011 to 2017

2-3.12 Component Line Chart or Graph

It shows two or more grouped items on the graph. It is also called sectional, strata or band
charts. The sales of two companies from 2011 to 2017 shown in table 6 are used to draw the
component line chart or graph shown in figure 11.
Table 9: Annual Sales of Companies from 2011 to 2017
Year Sales of Company A (GH¢) Sales of Company B (GH¢)
2011 80000 100000
2012 178000 220000
2013 160000 180000
2014 182000 200000
2015 200000 240000
2016 260000 300000
2017 190000 285000

26
600000 Sales of Company A Sales of Company B
500000

400000
Sales (Ghc)

300000

200000

100000

0
1 2 3 4 5 6 7
Years

Figure 11: Component Line Chart of Annual Sales of Companies from 2011 to 2017

2-3.13 Histogram
It is a graphical representation of information which is made up of rectangles whose heights
indicate the frequency of the variable and the width indicating the class interval. Histogram
was first introduced by Karl Pearson. It is used for continuous data but not discrete data.

Figure 12: Histogram of Frequency of Student Marks

2-3.14 Pie Chart

It is a circular chart which is divided into sectors with each sector corresponding to a proportion
of the whole. It shows relative size of data. From table 7, the pie in figure 13 illustrates the
percentage of tribes of students in level hundred in a teacher training college.

27
Table 10: Population of Tribes of Students in Level 100
Tribes Population
Fante 300
Dagomba 180
Dagati 250
Ewe 370
Akyem 320
Others 150

10% 19%
20%
11%

16%
24%

Fante Dagomba Dagati Ewe Akyem Others

Figure 13: Pie Chart Showing Percentage of Tribes of Students in Level 100

Self-Assessment Test 2.3

1. With the aid of diagram or graph, explain the following
(a) symbol chart
(b) bar chart
(c) single column bar chart
(d) single horizontal bar chart
(e) multiple column bar chart
(f) multiple horizontal bar chart
2. Differentiate between stalked column bar chart and component line graph
3. Using the data in the table below, draw the following graphs using excel software:
(a) single column bar chart showing the distribution of the farmers’ participation in
the pillars of Planting for Food and Jobs
(b) pie chart showing the distribution of the AEAs’ participation in the pillars of
Planting for Food and Jobs
(c) single horizontal bar chart showing the distribution of the researchers’
participation in the pillars of Planting for Food and Jobs
(d) multiple column bar chart showing the distribution of the participation of farmers,
AEAs and researchers in the pillars of Planting for Food and Jobs
(e) multiple horizontal bar chart showing the distribution of the participation of
farmers, AEAs and researchers in the pillars of Planting for Food and Jobs

28
Pillars of Planting for Food
Farmers AEAs Researchers
and Jobs
Seeds 84 71 67
Fertilizer 100 79 17
Agricultural extension
51 100 0
services
Establishment of markets 32 8 17
E-agriculture 24 33 33

4. Using the table below, draw the following graphs using excel software:
(a) component or stacked column bar chart
(b) multiple line chart
(c) single line chart for only SHS B
(d) histogram using for only SHS A

Number of candidates in SHS Number of candidates in SHS

Year A B
2011 150 220
2012 175 200
2013 200 240
2014 180 250
2015 140 300
2016 220 360
2017 250 320

29
CHAPTER THREE: SUMMARISING AND DESCRIBING DATA: STATISTICAL
MEASURES

Introductory Remarks
It is important to note that presenting data is an important aspect of descriptive statistics. The
limitation is that it does not truly tell the whole story about the data. As a good data analysis,
you need to compute and summarise key features of statistical data. The main objective of this
chapter is to help you understand how to summarise, describe and interpret key features of
statistical data.
This chapter will be treated in five units, namely:
Unit 1: Introduction to Statistical Measures
Unit 2: Measure of Central Tendency
Unit 3: Measure of Dispersion/Variation
Unit 4: Measure of Position
Unit 5: Measure of Shape

30
Unit 1: Introduction to Statistical Measures

You are moving gradually. In the preceding chapters, you learnt the meaning of statistics and
the various ways of presenting statistical data. Congratulations for achieving this milestone in
the study of statistics.
Unit Objectives
By the end of this unit, you should be able to:
1. Mention the various statistical measures
2. Define the various types of statistical measures
3. Know when each statistical measure is used

3-1.1 Definition of Statistical Measures

A statistical measure is a summary of individual quantitative variable with the purpose of
summarising, describing and interpreting key features of statistical data for decision making.

3-1.2 Types of Statistical Measures

There are four statistical measures. In this study, we shall look all the four. They are measure
of central tendency (measure of location), measure of variation (measure of dispersion),
measure of shape and measure of position. There is also measure of shape which is not
discussed here. All these statistical measures and their examples are shown in figure 14.

Figure 14: Statistical Measures

3-1.3 Measure of central tendency

It describes the point about which the various observed values group or cluster about a central
point. It is a single value that representatively describes the entire data. It can also be called
measure of location or measure of averages. There are many types of averages which are
measures of central tendency. Notable among them are arithmetic mean, median, mode,
midrange and midhinge. There are also harmonic mean, weighted mean and geometric mean.
Each of these has unique features.

31
Figure 15: Measures of central tendency
Source: Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.

Importance of Measure of Central Tendency

 It helps us to determine a representative value: with measure of central tendency, a
single value is estimated and this value is representative of the entire data. Hence, the
entire data is reduced into a single value which has almost the same features as the
whole data.
 Measure of central tendency is able to reduce and condense the data
 Comparison of data: measure of central tendency helps us to estimate and compare
single values for two or more distributions for decision making.
 It can be used to do further statistical analysis. Measure of variation, measure of
position and measure of shape are based on measure of central tendency.

When to Use Which Measures of Central Tendency

For the calculation of measure of central tendency, the mode is appropriate. For qualitative data
with symmetric distribution, the mean is the better measure of central location. The median is
the better measure of central location for quantitative variable with skewed distribution. This
is because the mean is highly inﬂuenced by outliers.

3-1.4 Measure of Variation/Dispersion

The measure of how the observations are spread about the centre is called measure of dispersion
or variation. The observation may be close or far away from the central point. Measure of
variation or dispersion is the degree to which the observed values in the data tend to spread

32
about the central point. They include range, interquartile range, variance, standard deviation,
mean deviation, coefficient of variation etc.

Importance of measure of dispersion/variation

 It determines the reliability of the average: It tells the degree of representativeness of
the average to the entire data set. The smaller the dispersion the more the average
represents the entire distribution and vice versa
 It helps one to establish the nature and the cause of variation. With that, the researcher
is able to design control measures. The deviation of body temperature from the normal
helps the medical officer to establish the cause and make the necessary diagnosis.
 As noted earlier, measure of variation or dispersion can also be used to make
comparisons of two or more distributions.
 Many further statistical analyses such as regression analysis, correlation analysis,
hypothesis testing, quality control etc. involve the use of measure of variation or
dispersion.

3-1.5 Measure of Positions

This is a method by which the position of a particular data set can be identified. It also allows
for the comparison of values from different distributions. Examples, percentiles or z-scores of
an individual’s height and weight can be presented; the two measures together would provide
a better picture of how the individual fits in the overall population than any of them alone. The
most common measures of position are percentiles, quartiles, quintiles, deciles and standard
scores (Z-scores)
Percentiles assume that the elements in the data set are ranked from the smallest to the largest.
The values that divide the rank-ordered set of elements into 100 equal parts are called
percentiles.

Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part
are called the first, second and third quartiles; and they are denoted by Q1, Q2 and Q3
respectively. Note:

First quartile corresponds to the 25th percentile;

Second quartile corresponds to the 50th percentile or the median;
Third quartile corresponds to the 75th percentile.

Standard scores (z-scores) indicate how many standard deviations an element is from the
mean. The standard score can be calculated from the following formula;
z  X     ;
where z is the z-score, X is the value of the element,  is the mean of the population and  is
the standard deviation. A z-score = 1 represents an element that is 1 standard deviation greater
than the mean and so on.
3-1.6 Measure of Shape
Measures of shape describe the distribution (or pattern) of the data within a data set. This is
used to summarize data from continuous measurement scales with statistics used to describe

33
how the distribution rises or drops. This can be described as a logical order for quantitative
data but cannot be described as such for qualitative data. The distributions can either be
symmetrical or asymmetrical. Normal distributions represent symmetrical and skewed
distributions represent asymmetrical.
Symmetric distribution: refers to distributions that have the same shape on both sides of the
centre which are called symmetric. A symmetric distribution with only one peak is referred to
as normal distribution.
Skewness: refers to the degree of asymmetry in a distribution. Asymmetry often reflects
extreme scores in a distribution. Positively skewed distributions are distributions with the
mean greater than the median. This means that the mean is sensitive to each score in the
distribution and is subject to large shifts when the sample is small and contains extreme scores.
Negatively skewed distributions have their mean smaller than median. This has an extended
tail pointing to the left and reflects clustering of the numbers in the upper part of the distribution
with fewer scores at the lower end of the measurement scale.
Kurtosis: refers to how scores are concentrated in the centre of a distribution, the upper and
lower tails (ends), and between the centre and tails (shoulders) of the distribution. Kurtosis is
in three forms as mesokurtic, platykurtic and leptokurtic. Mesokurtic is also referred to as
a normal distribution. Platykurtic refers to the distribution which appears as mesokurtic but
flattens out with scores moved from the both centre and tails into the shoulders. Leptokurtic
moves score from the shoulders of a mesokurtic distribution in the centre and tail of distribution
which results in a peaked distribution with thick tails.

Self-Assessment Test 3.1

1. What is a statistical measure?
2. Mention and explain four types of statistical measure you know
3. What are the importance of measure of central tendency and measure of variation?

34
Unit 2: Measure of Central Tendency

You are welcome to unit two of this chapter. In unit one, we were introduced to statistical
measure and their importance. In this unit, we shall look at one of the statistical measures thus
measure of central tendency or location. We are going to learn how to calculate a single value
that describes the point to which the various observed values group or cluster about. Try as
much as possible to study the unit and solve the questions.
Unit Objectives
By the end of this unit, you should be able to:
 Define arithmetic mean, weighted mean, mode and median.
 Calculate arithmetic mean, weighted mean, mode and median from group and ungrouped
data.
 Describe the properties or features of the arithmetic mean, mode and median
 State the advantages and disadvantages of arithmetic mean, mode and median
3-2.1 Introduction to Measure of Central Tendency
As noted in the preceding unit, data tends to group or cluster about certain central points. In
order to describe a data, one can find a single value which is representative of the whole data
set. This single value that describes the point to which the various observed values group or
cluster about a central point is called measure of central tendency or location.
3-2.2 Mean
It is the average of the observations or data. Note that we have different types of means
namely arithmetic mean, weighted mean, geometric mean, harmonic mean.
3-2.2.1 Arithmetic Means of Ungrouped Data
The arithmetic mean of ungrouped data is the sum of a set of observations (either positive,
negative or zero), divided by the number of observations. Arithmetic mean is the most
commonly use measure of central tendency
i n

X1  X 2     X n 
Xi
i 1
X 
n n

in

X
i 1
i
X
Note that can easily be written as since this course is fundamental and
n n
not meant for students studying statistics or economics.

In statistics,
X = sample arithmetic mean
n = sample size
X i = ith observation of the random variable x

35
Example 1
Given that the ages of six children in a hospital are 1, 3, 5, 7, 9 and 5; what is the
mean age?
Solution

1  3  5  7  9  5 30
x  5
6 6
Example 2
Find the mean of the following set of data:
(a) Ages in years: 15, 21, 17, 26, 18, 29
(b) Daily sales of ice water vendors: Gh¢42, Gh¢52, Gh¢57, Gh¢63, Gh¢51
(c) Weight of patients in hospital: 90, 60, 40, 70, 50, 20, 100, 100, 50, 50, 50 and 80
in Kg

Solution
The formula for calculating the mean of ungroup data is
X  X 2   X n  X
X  1 
n n

15  21  17  26  18  29 126
(a) Mean of ages = X    21 years
6 6

42  52  57  63  51 265
(b) Mean of daily sales = X    Ghc53
5 5

(c) Mean of weights of patients

90  60  40  70  50  20  100  100  50  50  50  80 760
X   63.33kg
12 12

3-2.2.2 Arithmetic Mean of Grouped Data

The arithmetic mean of grouped data is the sum of the product of the mid-values and their
respective frequencies divided by the sum of the frequencies.

X 
f X
i i

f i

36
Example 3
Find the average marks obtained by 43 students in Mathematics Examination
Class Frequency (f)
20-29 3
30-39 5
40-49 20
50-59 10
60-69 5

Solution

Frequency (f) X fX
3 24.5 73.5
5 34.5 172.5
20 44.5 890
10 54.5 545
5 64.5 322.5
∑f=43 ∑fX=2003.5

Average mark= X 
f X i i

2003 .5
 46.59
f i 43

3-2.2.3 Arithmetic Mean When Probability is Given

Suppose that X is a random variable whose probability distribution is given as shown below:
Value of X X1 X2 X3 … Xn
Probability of X P(X1) P(X2) P(X3) … P(Xn)

The formula for calculating arithmetic mean when probability distributions are given is:
i n
X  X 1 P( X 1 )  X 2 P( X 2 )  X 3 P( X 3 )  ...  X n P( X n )   X i P( X i )
i 1

Example 4
Given the probability distribution of the number of bags of maize obtained by farmers per
acre in Nkoranza, find the mean number of bags of maize per acre.

X 4 5 6 9 12
P(X) 0.1 0.2 0.2 0.4 0.1

37
Solution
i n
X  X 1 P( X 1 )  X 2 P( X 2 )  X 3 P( X 3 )  ...  X n P( X n )   X i P( X i )
i 1

X  (4  0.1)  (5  0.2)  (6  0.2)  (9  0.4)  (12  0.1)

X  7.4bags

3-2.2.4 Weighted Arithmetic Mean

It is defined as the sum of the products of weights and the respective observed values divided
by the sum of weights. It is used when certain importance is attached to each observed value.
It is applied in calculating the average grade points of students. Given observations X1, X2,
…, Xn with respective weights w1, w2, …, wn, the weighted arithmetic mean is given as:
i n

w X  w2 X 2  ...  wn X n w X i i
X  1 1  i 1
i n
w1  w2  ...  wn
w
i 1
i

It can simply be written as: X 

w X i i

w i

i n

w X  w2 X 2  ...  wn X n w X
i 1
i i
X 1 1 
w1  w2  ...  wn in

w
i 1
i

Example 6
The table below shows the marks and grades obtained by a student in an examination in the
university. Use the information to calculate the grade point average (GPA) or the weighted
arithmetic mean.
Courses Marks Credit Grades Grade
Hours Point
Human Anatomy 82 3 A+ 5
Biochemistry 46 2 D+ 1.5
Physics I 70 3 A 4.5
Basic Mathematics 80 2 A+ 5
Botany I 62 1 B 3
Zoology I 53 3 C 2

38
Solution

GPA  X 
w X i i

w i

Note that credit hours is the weight and grade point is the observations
Courses Credit Grade wiXi
Hours Point
(wi)
Human Anatomy 3 5 3×5=15
Biochemistry 2 1.5 2×1.5=3
Physics I 3 4.5 3×4.5=13.5
Basic Mathematics 2 5 2×5=10
Botany I 1 3 1×3=3
Zoology I 3 2 3×2=6
∑w=13 ∑wiXi =50.5

50.5
GPA  X   3.88
13
3-2.3 Median
It is the middle value of observations which are arranged in either ascending or descending
order. It is the midpoint of an arranged (ordered from smallest to largest or largest to the
smallest) data. It is the same as the 50th percentile or the second quartile.
3-2.3.1 Median for Ungrouped Data
For odd number of observations, the median is obtained by using the formula:

Me  X 1
( n 1)
2

For the even number of observations, the median is obtained by using the formula:

1 
M e   X n  X n 
2 2 1
2 

Example 6
Find the median of
(a) 12, 23, 14, 5, 7, 80, 39, 10, 7
(b) 42, 52, 57, 63, 51, 75, 45, 20, 60, 15

Solution
(a) Arrange the observations in ascending order: 5, 7, 7, 10, 12, 14, 23, 39, 80
Number of observations, n is 9 and since it is an odd number, the mean is calculated
as:

39
Me  X 1  X5
( 91)
2

X5 is the observation in the 5th position and hence M e  12

(b) Arrange the observations in ascending order: 15, 20, 42, 45, 51, 52, 57, 60, 63, 75
Number of observations, n is 10 and since it is an even number, the mean is calculated
as:

1  1  1
Me   X n  X n    X 10  X 10    X 5  X 6 
2  2 1 
2 
2  2 2
1 
 2

From the arranged data, X5 which is the 5th observation is 51 and X6 which is the 6th
observation is 52.

Me 
1
 X 5  X 6   1 51  52   51.5
2 2
3-2.3.2 Median for Grouped Data
For grouped data, the formula used in finding median observation is given as:

h n 
M e  Lo   F
fo 2 

 L0 = Lower class boundary of the median class

 h = Width of the median class
 f0 = Frequency of the median class
 F = Cumulative frequency of the pre-median class

Steps to find Median for grouped data

1. Compute the less than type cumulative frequencies.
2. Determine N/2, one-half of the total number of cases.
3. Locate the median class for which the cumulative frequency is more than N/2.
4. Determine the lower limit of the median class. This is L0.
5. Sum the frequencies of all classes prior to the median class. This is F.
6. Determine the frequency of the median class. This is f0.
7. Determine the class width of the median class. This is h.

Example 7
The table below shows the distribution of ages of 60 students in a diploma class. Find the
median age.
Age in years Number of students
15-19 6
20-24 5
25-29 12
30-34 22
35-39 7
40-44 8

40
Solution
Class Number of Cumulative
Boundaries students number of
students
14.5-19.5 6 6
19.5-24.5 5 11
24.5-29.5 12 23
29.5-34.5 22 45
34.5-39.5 7 52
39.5-44.5 8 60

th
n 60
Median location    position   30 th position
2 2
Median class is 30-34 and median class boundary is 29.5-34.5
Lower limit of the median class = L0 = 29.5
Frequency of the median class = f0 = 22
Cumulative frequency of the pre-median class = F = 23
Width of the median class = h = 34.5 – 29.5 = 5

h n 
M e  Lo   F
fo  2 

5  60 
M e  29.5    23   31.09 years
22  2 

3-2.4 Mode
It is the observation with the highest frequency. It can also be defined as the most occurring
observation or value.
 Value that occurs most often
 Not affected by extreme values
 There may not be a mode
 There may be several modes
 Used for either numerical or categorical data

41
3-2.4.1 Mode of Ungrouped Data

Example 8
The marks obtain in a class test by class three pupils are 3, 10, 4, 6, 1, 6, 2, 5, 8, 6, 6, 8.
Find the mode.
Solution
The mode is the most occurring mark = 6
Example 9
The ages of students in SHS 3 are shown in the table below. Find the mode.
Ages Frequency
15 2
16 3
17 1
18 12
19 3
20 5
Solution

From the table, the age with the highest frequency is 18years, and hence the mode is 18years
3-2.3.2 Mode of Grouped Data
For grouped data, the modal value cannot be obtained easily.
1
M 0  L1  h
1   2

• L1 = Lower boundary of modal class

• Δ1 = difference of frequency between modal class and class before it
• Δ2 = difference of frequency between modal class and class after
• h = class interval of the modal class
Example 10

42
The total marks for a class test for pupils in Basic Six are 25. If the marks obtained by the
pupils are shown in the frequency table below, find the modal mark.

Marks 0-4 5-9 10-14 15-19 20-24 Total

Frequency
6 12 7 5 0 n = 30
(f)

Solution

Frequency
Marks Midpoint (x) (f) (fx)
0-4 2 6 12
5-9 7 12 84
10-14 12 7 84
15-19 17 5 85
20-24 22 0 0

Total n = 30 ∑(fx) = 265

1
M 0  L1  h
1   2

• L1 = Lower boundary of modal class= 5

• Δ1 = difference of frequency between modal class and class before it = 12 – 6 = 6
• Δ2 = difference of frequency between modal class and class after = 12 – 7 = 6
• h = class interval = 4
6
M0  5 (4)  7.18
65
3-2.4 Midrange
It is the average of smallest and largest observation or value.

X l arg est  X sm allest

Midrange 
2
Example 11
The heart beats of a hypertensive patient were monitored in a hospital. If the average heart beat
per day for 10 days are 150, 142, 120, 145, 110, 100, 80, 85, 78, 80, fid the midrange.
Solution

43
150  78
Midrange   114
2

Self-Assessment Test 9
1. The following is the probability distribution of the number of phone calls received by
an office between 8 am and 9 am on a day. Find the mean number of phone calls.
X 1 2 3 4 5

P(X) 0.1 0.2 0.2 0.4 0.1

2. The table below shows the distribution of test scores. Find the median score.
Age in years Number of students
31-40 6
41-50 7
51-60 20
61-70 14
71-80 10
81-90 8
91-100 5

3. Explain the following

(a) Arithmetic mean
(b) Median
(c) Mode
(d) Midrange

4. State the properties of

(a) Arithmetic mean
(b) Median
(c) Mode
(d) Midrange
5. The number of bags of maize obtained by 15 farmers in Chinderi is given as
8, 15, 8, 16, 14, 6, 4, 12, 3, 8, 15, 13, 10, 10, 6
(a) Arithmetic mean
(b) Median
(c) Mode
(d) Midrange

44
6. Find the median of 60, 62, 55, 75, 90, 60, 70
7. The table below shows the daily expenditure households. Use the information to
calculate
(a) Arithmetic mean
(b) Median
(c) Mode
(d) Midrange

Daily Expenditure (Gh¢) Number of people

40-49 6
50-59 7
60-69 4
70-79 15
80-89 12
90-99 4

8. Out of 20 students in a class, 12 receive Gh¢50.00 as their weekly pocket money

whereas the others receive Gh¢80.00 as their weekly pocket money. Find the average
weekly pocket money of the whole class.
9. In a church, 80% of members are women and 20% are men. If the average tithes paid
by women is Gh¢30.00 per week and that of men is Gh¢40.00, what is the average
tithes paid by all the congregants?
10. The probabilities of students who study for a certain number of hours and are able to
pass examination are in shown the table below. Use the information to calculate the
average number of hours studied by the entire students.
Hours 1 2 3 4 5 6 7
Probabilities 0.2 0.1 0.3 0.05 0.2 0.1 0.05

11. The average of five numbers is 26. If four of the numbers are -12, 90, -26, 10, what is
the fifth number? (Stewart, 2009). Note that the answer is 68.
12. If the average of six consecutive multiples of 4 is 22, what is the greatest of these
integers? (Stewart, 2009). Note that the answer is 32.
13. The marks obtained by Eunice Damba in the various courses are
Subjects Grade Credit hours Grade points
(w) (x)
Basic Mathematics A 3 4.00
Introduction to Rural Sociology A– 3 3.75
Communication Skills C+ 3 2.00
Development Communication Theory B 1 3.00
International Communication D 2 1.00
Human Development and its Evolution B– 3 2.50

(i) Calculate the weighted arithmetic mean (grade point average) of the student
(ii) Using the table below, what is the class of the student?
GPA 0 – 1.49 1.50 – 1.99 2.00 – 2.49 2.50 –3.24 3.25 –3.59 3.60 –4.00
nd nd
Class Fail Pass Third class 2 class lower 2 class upper First class

45
Unit 3: Measures of Dispersion/Variation

Bravo for your tenacity. This unit is a continuation of unit 2 in the chapter. As we treated unit
2, this unit also looked at a statistical measure called measure of dispersion. If you are able to
understand unit 2 very well, then this unit will be easy for you.
Unit Objectives
By the end of this unit, you should be able to:
 Explain range, variance, standard deviation, mean deviation, inter-quartile range and semi
interquartile range each.
 Do calculations on range, variance, standard deviation, mean deviation, inter-quartile and
semi interquartile range
 Describe the properties or features of range, variance, standard deviation, mean deviation,
inter-quartile range and semi interquartile range
3-3.1 Introduction
In addition to measures of central tendency or location, it is desirable to consider measures of
variability or dispersion. The measure of variation is used to describe the distribution of data.
For instance, the mean age of students in agribusiness level hundred might be 20, but this does
not say whether they are all around the same age or their ages range from 16 – 30. For this, the
level of dispersion/variation is often important. As noted earlier, dispersion or variation in
statistics is the measure of the level or degree of spread of data about an average value
(Spiegel and Stephens, 2008). The measures of dispersions are range, mean deviation,
interquartile range, semi-interquartile range, 10-90percentile range, standard deviation and
variance.
3-3.2 Range
It is the simplest measure of dispersion, which is the difference between the largest and smallest
values in a set of data.

Range (R)
Range = largest value – smallest value

Consider the following example:

Example 1
Below are ungrouped data showing monthly salary of 27 workers in the Tamale Metropolitan
Assembly in Ghana Cedi. Find the range of their salaries.
1200 1200 2000 1500 1250 3000 500 1550 1000 800 1000 1250 1400 1250
1700 1320 1850 1560 1300 1500 2000 2250 1800 2500 1850 1050 1500
Solution
Range = largest value – smallest value = 3000 – 500 = GHS2500

46
Although the range is the easiest measure of variation to compute, it is seldomly used as the
only measure. This is because the range is based on only two of the observations, and thus is
highly influenced by extreme values or outliers. We would realize in the example above that
as much as a worker is receiving GHȼ3000 monthly, another worker is receiving as low as
GHȼ 500. In this case the range was GHȼ2500, which is not a good descriptive of the variation
in data. Thus, one or two extreme values can affect the range, making it abnormally wide. In
another instance in a shop, it is possible to ask (about watches): what price range do you have?
The answer might be: ‘they range from GHȼ2 to GHȼ200’. This is one simple statement that
gives the dispersion or spread for this commodity. A follow up question might be: ‘what is the
average price?’ For after all, the fact that the watches range from GHȼ2 to GHȼ200 does not
give any indication whether most of them are priced closed to GHȼ2 or that of GHȼ200.
However, an obvious way of avoiding this is to ignore extreme values that are far from the
centre. This can be done using quartiles.
3-3.3 Interquartile Range (IQR)
To understand the concept of interquartile range, we need to know some basic things about
quartiles. When data is sorted into ascending order, quartiles are defined as the values that
divide a set of values into four equal sizes. Interquartile range is therefore a measure of
dispersion that overcomes the dependency on extreme values. In other words, the IQR is the
difference between the third quartile (Q3), and the first quartile (Q1).

Interquartile range (IQR)

IQR = Q3 – Q1

Example 2:
Suppose the monthly starting wages of university graduates are shown in table 8. Find the
interquartile range for the data set.
Table 11: Monthly starting wages
Graduate Monthly wage Graduate Monthly wage
1 40 11 89
2 42 12 93
3 43 13 97
4 48 14 100
5 50 15 110
6 60 16 140
7 62 17 200
8 65 18 210
9 71 19 212
10 80 20 220

47
Solution
The wages of employees 5, 10 and 15 on this list divide the distribution into four equal parts
(1-5, 6-10, 11-15, and 16-20). The position and value of the quartiles are:
th
n 20
Q1    Position   5 th Position
4 4
This is the position of the lower quartile and the wage of the fifth graduate is the lower quartile
wage. i.e GHȼ 50

th
 3n  60
Q3    Position   15 th Position
4 4

This is the position of upper quartile and the wage of the fifteenth graduate is the upper quartile
wage. i.e GHS 110
Therefore:
IQR  Q3  Q1
IQR  110  50
IQR  GHS 60

3-3.3 Semi-Interquartile Range

The semi-interquartile range is defined as the interquartile range divided by two. It is also called
quartile deviation and it is denoted Q.
Q3  Q1
Q
2
From example 2 above, the semi-interquartile range wage can be calculated as:
110  50
Q  GHS30
2

3-3.4 Variance
The variance is a measure of variability that utilizes all the data. The variance is based on the
difference between the value of each observation (Xi) and the mean ( X ). The difference
between each observation (Xi) and the mean is called a deviation about the mean.
For a sample, a deviation about the mean is written (Xi - x ); for a population, it is written
(Xi - μ). In the computation of the variance, the deviations about the mean are squared and
divided by the number of observations. The variance is defined as the average of the square of
the deviations. It can also be defined as the square of the standard deviation.
If the data are for a population, the average of the squared deviations is called the population
variance. The population variance is denoted by the Greek symbol σ2. For a population of N

48
observations and with μ denoting the population mean, the formula for the
population variance of ungrouped data is as follows.

 2

(X i  )2
N
In most statistical applications, the data being analysed are for a sample. When we compute a
sample variance, we are often interested in using it to estimate the population variance σ2. It
can be shown that if the sum of the squared deviations about the sample mean is divided by
(n  1) when the sample size is greater than 30, the resulting sample variance provides an
unbiased estimate of the population variance. This proof is beyond the scope of this course.
3-3.4.1 Variance for Ungrouped Data
For ungrouped data, the sample variance, denoted by s2, is expressed as:

s 2

(X i  X )2
, when n > 30: This is always used for sample variance
n 1

s 2

(X i  X )2
, when n ≤ 30: This is always used for estimating population variance
n
Why do we subtract one from the sample size when calculating the sample variance? This is
because, a sample is always part of the population and the subtraction compensates for the lost
in data. Variance cannot be a negative number. Variance will only be zero if all the observations
are the same.
Example 3:
Find the sample variance and the population variance for the following set of data;
5, 6, 10, 14, 20
Solution
First find the arithmetic mean for the data, thus
5  6  10  14  20 55
  11
5 5

Next find the various deviations from the arithmetic mean, that is X i  X .

Values Deviation from the mean (11) Deviation squared

(xi) Xi  X (X i  X )2
5 -6 36
6 -5 25
10 -1 1
14 +3 9
20 +9 81
∑ ( X i  X ) 2 =152

49
Sample variance s 2 
(X i  X )2
n 1
152 152
S2 =   38
5 1 4

Population variance  2

 (x i   )2
N
152
2   30.4
5
Example 4
Find the variance of the data set given. 2, 3, 7, 8, 10.
Solution
Again, calculation starts by finding the mean of the observations, which is
2  3  7  8  10 30
Mean ( X )   6
5 5
The variance is the sum of mean squared deviation divided by the sample size, which is given
as:

2
Variance (S ) =
(X i  X )2

(2  6) 2  (3  6) 2  (7  6) 2  (8  6) 2  (10  6) 2
n 5

(4) 2  (3) 2  (1) 2  (2) 2  (4) 2


5
16  9  1  4  16 46
   9.2
5 5
3-3.4.2 Variance for Frequency Distribution Table
We can extend the calculations for variance for grouped data, by approximating values by using
class midpoints.
Consider the following formula for calculating variance for frequency distribution table;

Variance (S2)   f (X  X ) 2


 f (X  X ) 2

f n

Where
x = midpoint of a class,
f = number of values in the class
x = mean value

50
N = total number of observations
Short Methods for Calculating Variance
2
i n
 i n 
 fi X   fi X i
i
2

   fX    fX
2

2

S 
2 i 1
  i 1 
n  n  n  n 
   
 
i n

X 2

 X i2
 
2
X
i
i 1 2
S 
2
    X
n n n

Example 5
Consider the table showing monthly visits to the school clinic by 32 workers in a college. The
data is displayed in class sizes.
Class size Midpoint(x) Frequency(f) Product (fx) ( X  X )2 f (X  X )2
0 – 4.9 2.5 3 7.4 169 507

5 – 9.9 7.5 5 37.5 56.25 281.25

10 – 14.9 12.5 9 112.5 9 81

15 – 19.9 17.5 7 122.5 4 28

20 – 24.9 22.5 4 90 49 196

25 – 29.9 27.5 2 55 144 288

30 – 34.9 32.5 1 32.5 289 289

35 – 40.9 37.5 1 37.5 484 484

f  32  fX =494.9  f (X  X ) 2

=2154.25

X =15.5

Variance =
 f (X  X ) 2


2154 .25
 67.3
n 32
The same could be accounted for in standard deviation whereby we take the squared root of
the variance for grouped data. Thus, we have

S 
2  f (X  X ) 2


2154 .25
n 32

S 2  67.8  8.23

51
3-3.5 Standard Deviation
The use of variance as a measure of variability or dispersion has drawbacks. Small variance
implies a small variation. Variance is good for comparing two or more data sets. It is not a good
measure of dispersion for a single set of data due to the squaring operation which affects the
unit of measurement. In order to standardise the unit to its original form, standard deviation is
preferred. This is done by taking the square roots.
The standard deviation is a measure of dispersion which uses all the values in a distribution in
the sense that every value contributes to the final result in the same way that every value
contributes to the calculation of the arithmetic mean. It is the standard measure of dispersion
for two reasons; standardization of all values of n as well as its usefulness both practically and
mathematically. Since the standard deviation is used as a measurement, only the positive roots
of the variance are taken as a measure of variability.
The formula for calculating the standard deviation is the square root of the variance. Thus,

Standard Deviation= √variance

3-3.5.1 Standard Deviation of Ungrouped Data

S
(X i  X )2
, n ≤ 30
n

S
(X i  X )2
, n > 30
n 1
Short Methods for Computing Standard Deviation
in

X 2

X  
2 2
X
i
i 1 2
S     X
i

n n n

2
in
 in 
 fi X   fi X i
i
2

 fX 2
  fX 
2

S i 1
  i 1    
n  n  n  n 
   
 
2
i n
 i n 
 fi X   fi X i
i
2

   fX    fX
2

2

S 
2 i 1
  i 1 
n  n  n  n 
   
 

52
3-3.6 Mean deviation MD
The deviation is the difference between a value and the mean or the distance that each value is
away from the mean. A basic measure gives the mean absolute deviation.
Deviation = Value – mean value

d  Xi  X

Each value has a deviation, so the mean of these deviations should give a measure of spread.
Unfortunately, the mean deviation has the major disadvantage of allowing positive and
negative deviations to cancel. If we have the three values 3, 4 and 8, the mean is 5 and the mean
deviation is given by the formula:

MD 
(X i  X)
,
n
Then
(3  5)  (4  5)  (8  5)
MD  0
3
Even-dispersed data has a mean deviation of zero, which is why this measure is never used. A
more useful alternative is the mean absolute deviation (MAD). MAD simply takes the absolute
values of deviations. In other words, it ignores negative signs and adds all deviations as if they
are positive. The result is a measure of the mean distance of observations from the mean so the
larger the mean absolute deviation, the more dispersed the data.

MAD 
 ABS ( X i  X)
n
Mean Absolute Deviation (MAD)
MAD 
| X i  X |
n

Where x = the values

X = Mean value
n = number of observations

 ABS ( X  X ) = the absolute value of X  X (that is, ignoring the sign), which is also written
as | X  X | .

Example 6
What is the mean absolute deviation of 4, 7, 6, 10 and 8?
Solution

53
The calculation of the MAD starts by finding the mean of the numbers, which is:
4  7  6  10  8
X 7
5
Then the mean absolute deviation is:
| 4  7 |  | 7  7 |  | 6  7 |  | 10  7 |  | 8  7 |
MAD 
7
| 3 |  | 0 |  | 1 |  | 3 |  | 1 |
MAD 
5
3  0 1 3 1
MAD   1 .6
5
This shows that on average the values are 1.6 units away from the mean.

Self-Assessment Test 3.3

1. Briefly explain variance
2. Differentiate between standard deviation and mean deviation
3. The daily sales of market women are illustrated in the table below. Use the information
to calculate the following measures of dispersion and interpret the values
(a) Variance
(b) Standard deviation
(c) Mean deviation
(d) Inter-quartile
(e) Semi interquartile range
Daily Sales (Gh¢) Number of market women
100-149 4
150-199 5
200-249 2
250-299 12
300-349 8
350-399 4

54
Unit 4: Measures of Position

Unit Objectives
By the end of this unit, you should be able to:
 Explain measure of position.
 Mention some examples of measures of position
 Do calculations on quartiles, deciles, quintile and percentiles and interpret the values
3-4.1 Introduction to the Measure of Position
In addition to measures of central tendency and measures of variation, there exist measures of
position or location. Statistician often talk about the position of one value relative to the other
values in data set. So also, teachers usually talk about the positions of students in examination
in the class relative others position. In this section, we shall discuss measures of position.
A measure of position is defined as the position of a single value in relation to other values in
a sample or a population data set. The common measures of positions are quartiles, and
percentiles. Meanwhile, quintiles, deciles and standard scores (z-scores) are measures of
position which is not often used.

3-4.2 Quartiles
Quartiles are the summary measures that divide a ranked data set into four equal parts. Three
measures will divide any data set into four equal parts. These three measures are:
Q1 – first quartile
Q2 – second quartile (also known as the median)
Q3 – third quartile
The second quartile is the same as the median of a data set. The first quartile is the value of the
middle term among the observations that are less than the median, and the third quartile is the
value of the middle term among the observations that are greater than the median.
3-4.2.1 Quartiles of Ungrouped Data
n 1
The first quartile Q1 is at position
4
n 1
The second quartile Q2 (the median) is at position
2
3(n  1)
The third quartile Q3 is at position
4
Note that “n” represents the number of observations in a data set.

Steps in Calculating Quartiles

1. Order the data from the smallest to the largest value.
2. Find the median of the ordered data set. This is known as the second quartile (Q2).
3. Find the median of the data falling below Q2. This is known as the first quartile (Q1).
4. Find the median of the data falling above Q2.

55
Example 1
The following are the ages (in years) of nine teachers in a Junior High School.
47 28 39 51 33 37 59 24 33
Find the values of the three quartiles (first, second and third).

Solution
Using the Median Method
1. First, we rank the data from smallest value to the largest value
24 28 33 33 37 39 47 51 59

2. We then find the median.

From this data set the median is 37. Q2 = 37

3. The values less than the median are 24 28 33 33

Finding the median of these values which are less than the median is the first quartile

4. The values greater than the median are 39 47 51 59

Finding the median of these values which are greater than the median
47  51
Q3   49
2
Thus, the values of the three quartiles are Q1  30.5 Q2  37 Q3  49

Using the quartile formula

First, we rank the data from smallest value to the largest value
24 28 33 33 37 39 47 51 59

 n  1
th

Q1    position
 4 
 9  1
th

Q1     2.5 position
th

 4 
2.5 position occurs between 28 and 33 observations.
28  33
Therefore, = Q1   30.5 = 30.5
2

 n 1
th

Q2    position
 2 
9 1
Q2   5 th position
2

56
Q2 = 37

 3n  1 
th

Q3    position
 4 
 39  1  th
Q3    position  7.5 position
th

 4 
7.5 position occurs between 47 and 51 observations.
47  51
Therefore, = Q3   49
2
Example 2
Find the values of the three quartiles for this data set.
15 13 6 5 12 59 22 18

Solution
Using the quartile formula
Rearranging the data set from the smallest to the largest

5 6 12 13 15 18 22 59

 n  1
th

Q1    position
 4 
 8  1
th

Q1     2.25 position
th

 4 
2.25 position occurs between 6 and 12 observations

6  12
Therefore, Q1  9
2

 n  1
th

Q2    position
 2 
8 1
Q2   4.5 th position
2
2.25 position occurs between 13 and 15 observations

 13  15 
Q2     14
 2 

57
 3n  1 
th

Q3    position
 4 
 38  1  th
Q3    position  6.75 position
th

 4 
6.75 position occurs between 18 and 22 observations.
18  22
Therefore, = Q3   20
2

Using the Median Method

1. Finding the median of the ordered data set
13  15
Median = = 14
2
Therefore Q2 = 14
2. Finding the median of values below the median
5 6 12 13

6  12
Median = =9
2
Q1 corresponds to 9

3. Finding the median of values greater than the median

15 18 22 59

18  22
Median = = 20
2
Q3 corresponds to 20.

3-4.2.2 Quartiles for Grouped Data of Frequency Distribution

For frequency distribution of group data, quartiles can be calculated by using the formula:
hQi  N 
Qi  LQi  i  C
f Qi  4 

i = 1, 2, 3
Where;

LQi = lower class boundary of quartile group

hQi = width of quartile group

fQi = frequency of quartile group

N = total number of observations i.e. sum of frequencies

58
C = cumulative frequency preceding quartile group

i (N )
Qi = th value
4

If i = 2, and N = 20
2(20)
Q2 = th value
4
Q 2 = 10th value

Example 3
The table below shows marks obtained from a class test. From the table below, find
(a) the first quartile
(b) second quartile
(c) third quartile
Marks Frequency
8 -12 2
13 – 17 3
18 – 22 5
23 – 27 2
28 – 32 6
33 – 37 2

Solution
Class Frequency Cumulative
boundary frequency
7.5 -12.5 2 2
12.5 – 17.5 3 5
17.5 – 22.5 5 10
22.5 – 27.5 2 12
27.5 – 32.5 6 18
33.5 – 37.5 2 20

59
hQi  N 
Qi  LQi  i  C
f Qi  4 

i (N )
Qi  th position
4
N=20
1(20)
(a) For Q1, Q1  th position  5th position
4

LQi = lower class boundary of quartile group =12.5

hQi = width of quartile group =5

fQi = frequency of quartile group =3

N = total number of observations i.e. sum of frequencies = 20

C = cumulative frequency preceding quartile group = 2

5  20 
Q1  12.5  1 *  2  17.5
3 4 

2 (20)
(b) For Q2, Q2  th position  10 th position
4

LQi = lower class boundary of quartile group =17.5

hQi = width of quartile group =5

fQi = frequency of quartile group =5

N = total number of observations i.e. sum of frequencies = 20

C = cumulative frequency preceding quartile group = 3

5  20 
Q2  17.5  2 *  3  24.5
5 4 
3 (20)
(c) For Q3, Q3  th position  15th position
4

LQi = lower class boundary of quartile group =27.5

hQi = width of quartile group =5

fQi = frequency of quartile group =6

60
N = total number of observations i.e. sum of frequencies = 20
C = cumulative frequency preceding quartile group = 12

5  20 
Q3  27.5  3*  12  30.0
6 4 

3-4.3 Deciles and Percentiles

Deciles are the nine values of a variable that divide an ordered data set into 10 equal parts.
Deciles determine the values for 10%, 20%. ………90% of a data.
A percentile provides information about how data is spread over an interval from the smallest
value to the largest value.
Let’s assume that all the elements in a data set are rank ordered from the smallest to the largest
value. The values dividing the rank ordered data set into 100 equal parts are known as
percentiles. Generally, admission test scores for colleges and universities are frequently
reported in terms of percentiles
The pth percentile of a data set is a value such that at least p percent of the items take on this
value or less and at least (100 - p) percent of the items take on this value or more.
An element having a percentile rank of Pi would have a greater value than the i percent of all
the elements in the set. Thus, the observation at the 50th percentile would be denoted P50, and
it would be greater than 50 percent of the observations in the set. An observation at the 50th
percentile would correspond to the median value in the set.
Note that there is a relationship between quartiles and percentiles. Q1 corresponds to P25, Q2
corresponds to P50, and Q3 corresponds to P75. Q2 is the median value in the set.

Percentile Formula
The percentile corresponding to a given value X is computed by using this formula;
(number of values below X )  0.5
Percentile = *100
total number of values

Example 4
A teacher gives a 20point test to 10 students. The scores are shown here. Find the percentile
rank of a score of 12.
18 15 12 6 8 2 3 5 20 10
Solution
1. Rearrange the data from lowest to highest.

2 3 5 6 8 10 12 15 18 20

61
2. Then substitute into the formula
(number of values below X )  0.5
Percentile = *100
total number of values

X=12
Number of values below X=6
Total number of values = 10
6  0.5
Therefore percentile= *100 = 65th percentile
10
Thus, a student whose score was 12 did better than 65% of the class.

Example 5

Using the data below, find the percentile rank for a score of 6.

18 15 12 6 8 2 3 5 20 10

Solution
Rearranging the data
2 3 5 6 8 10 12 15 18 20
X= 3
3  0.5
Percentile * 100 = 35th percentile
10
Thus, a student whose score was 6 did better than 35% of the class

3-4.4 Steps in Finding a Value Corresponding to a Given Percentile

1. Arrange the data from lowest to the highest value.
2. Substitute into the formula
n* p
C=
100
Where
n = total number of value
p = percentile

62
3. If C is not an integer or whole number, round up to the nearest whole number and
then count from the lowest value till you reach the rounded up value (‘C’ value).
Hence that value corresponds to the specific percentile.
4. If C is an integer or whole number, use the value halfway between the C and (C+1)
values when counting from the lowest value.

Example 6
18 15 12 6 8 2 3 5 20 10
With the above data, find the value corresponding to the 25th and 75th percentile
respectively.

Solution
Rearrange the data from lowest to highest.
2 3 5 6 8 10 12 15 18 20
Then substitute into the formula

n * p 10 * 25
C= = = 2.5
100 100
Because C is not a whole number, we round up to 3. We then count over to the third value
which is 5.
Therefore, the value 5 corresponds to the 25th percentile.
n* p 10 * 75
C= = = 7.5
100 100

Because C is not a whole number, we round up to 8. We then count over to the eighth value
which is 15.
Therefore, the value 15 corresponds to the 75th percentile.

Example 7
Find the value that corresponds to the 60th percentile
18 15 12 6 8 2 3 5 20 10

Solution
Rearranging the data

63
2 3 5 6 8 10 12 15 18 20

Substitute in the formula

n * p 10 * 60
C= = =6
100 100
From the results, C is a whole number and hence we use the value halfway between C and
C+1. Thus C = 6th value and C+1 = 7th value. Here the 6th value is 10 and the 7th value is
12.
We then add the two values and divide by 2
10  12
 11
2
Hence 11 correspond to the 60th percentile. This means that anyone scoring 11 would have
done better than 60% of the class.

Self-Assessment Test 11
1. Find the percentile rank for each test score in the following data set.
20 12 15 5 26 30
What test score corresponds to the 33rd percentile?

2. Find the percentile rank for each test score in the following data set.
12 42 35 50 13 49 48 24 30
What test score corresponds to the 60th percentile?

3. Find the percentile ranks of each weight in the data set. The weights are in pounds

92 98 82 78 86 97

What value corresponds to the 30th and 45th percentile?

4. Consider a sample with data values of

53 55 60 58 66 72 80 69 35 86 89

Compute the 20th, 25th, 65th, and 75th percentiles

64
5. Find the quartiles; Q1, Q2 and Q3 for the following data set.
51 45 24 25 30 33 34 33 32 41
16 17 16 22 18 22 23 12 15
6. Find the values of the three quartiles for this data set.
150 120 60 75 120 550 200 108 140
7. The table below shows the time taken by students in a class to complete a class
exercise. Find the
(a) first quartile
(b) second quartile
(c) third quartile
Time 8-10 11-13 14-16 17-19 20-22 23-25 26-28 29-30
(minutes)
Frequency 5 6 8 4 3 2 7 12

65
UNIT 5: MEASURES OF SHAPE

Unit Objectives
By the end of this unit, the students should be able to:
 Explain measure of shape.
 Mention and explain some examples of measures of shape
 Determine the skewness of data, moment of data and kurtosis of data
 Graphically sketch three different types of skewness.
3-5.1 Meaning of Measure of Shape
As defined earlier, measure of shape describes the distribution or pattern of a data within a
data set. It can also be defined as the manner in which the data is distributed. A distribution
of a data set may be symmetrical or asymmetrical. The pattern and distribution of data are
measured by skewness, kurtosis and moment.
3-5.2 Skewness
Skewness of a distribution is a measure of symmetry or the lack of symmetry. A distribution
of data is skewed if the scores of the highest frequency are found not at the middle but near one
end. It refers to a distribution in which one tail is stretched out longer than the other. In order
words, the distribution looks different to the right and the left of the center point. Such
distributions are asymmetric. A distribution of data is not skewed if it looks the same to the
right and the left of the center point and such distributions are symmetric. There are three
measures of skewness. They are normal distribution or symmetric distribution, negative
skewness and positive skewness.
On the contrary, a distribution is non-symmetrical if it does not look the same to the right and
to the left of the center.
3-5.2.1 Normal or Symmetric Distribution
A distribution of data is normal or symmetric if the scores of the highest frequency are found
at the middle and the distribution looks the same to the right and to the left of the center point
such that the left half and the right half are mirror images. For such distributions, there are no
extreme values in a particular direction and hence low and high values balance each other out.
The graph of such data has a normal bell-shape. For a normal or symmetric distribution, the
mean, median and mode are equal to each other.

66
Mean = Mode = Median
Figure 16: Normal or Symmetric Distribution

3-5.2.2 Positive Skewness or Right Skewed

A distribution of data is positively or right skewed if the distribution of the data is concentrated
at the right more as compared to the left about the centre. The tail of the plot of such data is
longer to the right than to the left. For such distribution, the mean is greater than the median
which in turn is also greater than the mode.

Mean > Median > Mode

Figure 17: Positive or Right Skewed Distribution

3-5.2.3 Negative Skewness or Left Skewed

A distribution of data is negatively or left skewed if the distribution of the data is concentrated
at the left more as compared to the right about the centre. The tail of the plot of such data is
longer to the left than to the right. For such distribution, the mean is lesser than the median
which in turn is also lesser than the mode.

Mean < Median < Mode

67
Figure 18: Negative or Left Skewed Distribution

3-5.3 Test of skewness

Skewness is present if
 Mean, median and mode do not coincide.
 The sum of positive deviations from the median is not equal to the sum of
negatives deviations.
 Quartiles are not of equal distance from the median
 The plot of values on the graph do not give a normal bell-shaped form.

3-5.4 Measure of skewness

 Eye view
 Karl Pearson’s measure of skewness
 Bowley’s coefficient of skewness
 Coefficients of skewness based on moments
 Skewness indices (use of quartiles)

Karl Pearson’s measure of skewness

This measure of skewness is based upon the divergence of mean from mode in a skewed
distribution. As mean = mode in a symmetrical distribution, (mean – mode) can be taken
as an absolute measure of skewness. The absolute measure of skewness hinges on the unit
of measurement. For example, if the mean = 2.45 meter and mode = 2.14 meter, then the
absolute measure of skewness will be 2.45 – 2.14 = 0.31 meter. For the same distribution,
if we change the unit of measurement to centimeter, the absolute measure of skewness will
be 245 centimeter – 214 centimeter = 31 centimeter. Hence to avoid this problem, Karl
Pearson takes a relative measure of skewness.

Karl Pearson measure of skewness S k is given by

mean  mod
Sk = , this is used when mode is well defined
S .D

3(mean  median)
Sk = , used when mode is not well defined
S .D

S.D = Standard deviation

68
If S k > 0, the distribution is positively skewed and S k < 0, the distribution is negatively
skewed

Example 1
Compute the Karl Pearson's coefficient of skewness from the following data

Height (in inches) Number of persons

58 10
59 18
60 30
61 42
62 35
63 28
64 16
65 8

Solution

Height Number of
(X) persons `(f) `X 2 fX f X2
58 10 3364 580 33640
59 18 3481 1062 62658
60 30 3600 1800 108000
61 42 3721 2562 156282
62 35 3844 2170 134540
63 28 3969 1764 111132
64 16 4096 1024 65536
65 8 4225 520 33800
Total  F  187 X 2
 30300  fX 11482  fX 2
 705588

Mode = 61

Mean = (X ) =
 fX = 11482  61.4
 f 187

69
S.D =
 fX 2

 (X )2 =
705588
 (61.4) 2 = 3773 .1978  3769 .96
f 187

= 3.2378 = 1.799 =

Therefore S.D = 1.8

mean  mod 61.4  61

But S k = = = 0.222
SD 1.8

This implies that the distribution is positively skewed

Bowley’s coefficient of skewness

This measure is based on quartiles. It can be observed from a symmetrical distribution
that, Q 1 and Q 3 are equidistant from the median. Thus, an absolute measure of
skewness is given by
S KB = Q 3 + Q 1 - 2median
Also, a relative measure of skewness is given by
Q  Q1  2median
S KB = 3
Q3  Q1

3-5.5 Kurtosis
It is a measure that is used to draw a distinction between two data sets with the same mean and
standard deviation. It is a measure of the peakness or flatness of data relative to a normal
distribution. There are three different forms of kurtosis. These are mesokurtic distribution,
leptokurtic distribution and platykurtic distribution.

A mesokurtic distribution is a normal distribution. With mesokurtic distribution, there are no

extreme values in a particular direction and hence the low and high values balance each other.
It is a distribution with 3 as its coefficient of kurtosis.

Again, a leptokurtic distribution is one with a higher peak than the normal distribution but with
heavier tails. It is distribution with coefficient of kurtosis greater than three.

Also, a platykurtic distribution is a distribution that has a lower peak than a normal distribution
but with lighter tails. It is a distribution with coefficient of kurtosis less than three.

A measure of Kurtosis (coefficients of Kurtosis) is based on both quartiles and percentiles.

This is given as

70
Q
K=
P90  P10
1
Where Q = semi interquartile range that is (Q 3 -Q 1 )
2
P 90 = 90th percentile
P 10 = 10th percentile

3-5.6 Moments
Moment is a mechanical term which is defined as the force with respect to its tendency to
provide rotation. It is the product of the force and its corresponding distance.

Moment for grouped data is given as

x1  x 2  ........ x N
r r r

X r=
N

X
r
j
j 1
Xr 
N

Where X r  r th moment
First moment, r = 1
Moment about the mean for ungrouped data is given as
N

 (x
j 1
j  x)r
 r 
N

For first moment, r=1


 (x  x) 1
 X   X  X  X 0
 1
N
=
N N

For second moment, r = 2

 x  x 
2

 2 N  S2
N

Moment about any point for ungrouped data is given by

71
N

 (x
j 1
j  A) r
d r

  
A
r
N N

Where d = x j - A is the deviations of x from A

Moment for grouped data is given as

For x 1 , x 2 ……x N with frequencies f 1 , f 2 …..f N respectively,

f
r
xj
f1 x1  f 2 x 2  ....... f N x
r r r j
N j 1
Xr  =
N N

Moment about the mean for grouped data is given as

f j 1
j (x j  x)r
 r 
N

Example 2

1. Given the data 2, 4, 6, 9, and 10, find

a. First moment
b. Second moment
c. Third moment

Solution

a. First moment is the same as arithmetic mean

2  4  6  9  10
x = = 6.2
5

b. Second moment

x
2

x 2
i

22  42  62  92  10 2
=47.4
n 5

c. Third moment

72
x3 
x 3

=
2 3  4 3  6 3  9 3  10 3
=
8  64  216  729  1000
=
2017
= 403.4
N 5 5 4

2. Find the first, second and third moments about the mean for the data 2, 4, 6, 9, 10

Solution


 (x  x)
 1
N

(2  6.2)  (4  6.2)  (6  6.2)  (9  6.2)  (10  6.2)

= =0
5


 (x  x) 2

 2
N

(2  6.2) 2  (4  6.2) 2  (6  6.2) 2  9  6.2) 2  (10  6.2) 2

= = 8.96
5


 (x  x) 3

 3
N

(2  6.2) 3  (4  6.2) 3  (6  6.2) 3  (9  6.2) 3  (10  6.2) 3

= = 32.3
5

Self-Assessment Test 3.5

1. Compute the Karl Pearson’s coefficient of skewness from the data below

Daily expenditure Number of

(Gh¢) families
0 – 20 13
20 – 40 25
40 – 60 27
60 – 80 19
80 – 100 16

73
2. The ages of students are distributed as follows: mean age was 28 years, the median
age 25 years and modal age 23 years. The standard deviation was computed to be 4.2
years. Find the coefficient of skewness and interpret your value.

74
CHAPTER FOUR: SAMPLING

Introductory Remarks
Sampling is very vital in research and estimations. It is always difficult to conduct research by
using the entire population. Researchers often than not take a portion from the population. The
portion taken (sample) need to have certain features to represent the population. The sampling
techniques differ based on the characteristics of the population. This chapter is very important
since students will use knowledge gained here to conduct their research.
This chapter will be treated in two units, namely:
Unit 1: Probability Sampling
Unit 2: Non-Probability Sampling

75
Unit 1: Probability Sampling

Unit Objectives
By the end of this unit, you should be able to:
 Define the term probability sampling
 Mention and explain the types of probability sampling
 State the advantages and disadvantages of probability sampling
 Collect data using different random sampling techniques
 Know appropriate probability sampling technique for different populations
4-1.1 Sample, Sampling and Population

 Sampling: It is the process of selecting a portion (or subset) of the larger population
with the objective of estimating the characteristics of the whole population. In research
sampling is the process of selecting units (e.g., people, organizations) from a
population of interest so that by studying the sample we may fairly generalize our
results back to the population from which they were chosen.

 Population: it is the totality of items or things under consideration. Population is

thought to be an entire collection of persons, things, or objects under study. Population
is the collection of all individuals or items under consideration in a statistical study
(Weiss, 1999). To study the larger population, we select a sample.
4-1.2 Purpose of Sampling

Sampling is usually done by researchers to:

 reduce cost
 safe time
 prevent homogeneity
 improve the accuracy and quality of the data.
 If accessing the population is impossible; sampling is the only option.

4-1.3 Types of Sampling

There are two main types of sampling are probability (random) sampling and non-
probability sampling.
4-1.4 Probability Sampling
It is the type of sampling where the sample or the elements or subsets are selected in such a
way that all the individuals in the population have the equal chance of being selected. In this
sampling technique, the researcher or whoever is doing the sampling must ensure that every
individual has an equal opportunity for selection and this can be achieved if the researcher

76
utilizes randomization. This is to ensure that the sample selected is free of a bias and be a true
representation of the entire population.
Examples of probability sampling
4-1.4.1 Simple random sampling
Simple random sampling is defined as a technique where there the sample or subset is
taking from a known elements of the population such that each selected element has an equal
chance of being selected. Simple random sampling is only possible if the sampling frame
(number and every item of the population) is known. It is used when the population is
homogeneous (have the same characteristics)
 Number each frame unit from 1 to N, shuffle them in a container and pick randomly
 Use a random number table or a random number generator to select n distinct
numbers between 1 and N, inclusively.
 Easier to perform for small populations
 Cumbersome for large populations

Advantages of simple random sampling

Simple random sampling technique
 reduces the potential for human bias in the selection of cases to be included in the
sample. As a result, the simple random sample provides us with a sample that
is highly representative of the population being studied, assuming that there is limited
missing data.
 Since the units selected for inclusion in the sample are chosen using probabilistic
methods, simple random sampling allows us to make generalizations (i.e., make
statistical inferences) from the sample to the population. Such generalizations are
more likely to be valid.
 It is easy to perform

Disadvantages of simple random sampling

The disadvantages include

 It is time consuming since it one need to gather the full list of a specific population
 the capital necessary to retrieve and gather the list is high and hence it is expensive to
do simple random sampling
 If homogeneity of population is not well checked, biases could occur

4-1.4.2 Stratified random sampling

Stratified random sampling is a method of sampling that involves the division of a

population into smaller groups (homogeneous segments) known as strata based on members'
shared attributes or characteristics and using simple random sampling to select from each
stratum.

77
Stratified random sampling is a better method than simple random sampling. Stratified
random sampling divides a population into subgroups or strata, and random samples are
taken, in proportion to the population, from each of the strata created. The members in each
of the stratum formed have similar attributes and characteristics. This method of sampling is
widely used and very useful when the target population is heterogeneous.

In stratified random sampling or stratification, the strata are formed based on members'
shared attributes or characteristics. Stratified random sampling is also called proportional
random sampling or quota random sampling.

For stratified sampling,

• Population is divided into nonoverlapping subpopulations called strata

• A random sample is selected from each stratum
• Proportionate: If sampling fraction is equal for each stratum
• Disproportionate: Unequal sampling fraction in each stratum

Example

78
Advantages of stratified sampling

 it ensures each subgroup within the population receives proper representation within
the sample. As a result, stratified random sampling provides better coverage of the
population since the researchers have control over the subgroups to ensure all of them
are represented in the sampling.
 Potential for reducing sampling error

Disadvantages of stratified sampling

 Unfortunately, stratified sampling cannot be used in every study. The method's

disadvantage is that several conditions must be met for it to be used properly.
Researchers must identify every member of a population being studied and classify
each of them into one, and only one, subpopulation. As a result, stratified random
sampling is disadvantageous when researchers can't confidently classify every
member of the population into a subgroup.
 Also, finding an exhaustive and definitive list of an entire population can be
challenging.
 Also, overlapping can be an issue if there are subjects that fall into multiple
subgroups. When the random sampling is performed, those that are in multiple
subgroups are more likely to be chosen and as a result, would be misrepresentation or
inaccurate reflection of the population.

4-1.4.3 Cluster sampling:

In this type of probability sampling, the researcher divides the target population into
nonoverlapping already existing clusters or areas and use random sampling technique to
select the subsets or elements from each of the clusters. Each cluster is a miniature, or
microcosm, of the population. If the number of elements in the subset of clusters is larger

79
than the desired value of n, these clusters may be subdivided to form a new set of clusters and
subjected to a random selection process.

For example, a researcher wants to survey the academic performance of students of UDS. He
can divide the entire population of UDS into four campuses with each campus being a cluster.
Then the researcher selects a number of students from each campus (cluster) simple or
systematic random sampling.

Cluster sampling is similar to stratified sampling but the two are different.

 Unlike stratified sampling, cluster sampling appropriate for heterogeneously

distinguishable geographical areas.
 Also, in cluster sampling, all the units of randomly selected clusters form a sample
while in stratified sampling, the sample is taken from all the strata.
 In stratified sampling, there is homogeneity within groups but there is homogeneity
between groups
 Population elements are selected in aggregates in cluster sampling but in the case of
stratified sampling, population elements are selected individually from each stratum
 Stratified sampling is done to improve precision but cluster sampling is done to
reduce cost and inefficiency
 Cluster sampling is done used when the group already exist but for stratified
sampling, the group maybe non-existing.

Advantages of cluster sampling

 It is less expensive and quicker as it reduces travel costs to contact sample elements
 It allows us to obtain information from one or more areas
 It makes also combine advantages of both simple and systematic random sampling
 More convenient for geographically dispersed populations and hence it permits each
accumulation of large samples
 Simplified administration of the survey
 May be more precise than simple random sample.


Disadvantages of cluster sampling

 Its findings may not be able to apply to another area

 When unequal size of some of the subjects is selected, then element of sample bias
will arise.


4-1.4.4 Systematic sampling:

A type of probability sampling method in which sample members from larger population are
selected according to a random starting point and fixed, periodic interval. This interval is
called the sampling interval and it is calculated by dividing the population size by the desired
sample size. (Could not find a suitable example)

80
• Population elements are an ordered sequence
• The first sample element is selected randomly from the first k population elements.
• Thereafter, sample elements are selected at a constant interval, k, from the ordered
sequence frame.

N = 100

Want n = 20

K=N/n = 100/20 =5

Start with #4 and take every 5th unit

Advantages of systematic sampling

81
 It is relatively easy to construct, execute, compare and understand
 It also provides researchers and statisticians with a degree of control and sense of
process.
 It is less risky

Disadvantages of systematic sampling

The method assumes the size of the population is available or can reasonably be
approximated.

Unit 2: Non-Probability Sampling

4-2.1 Non probability-sampling

I t is a sampling where the selection of elements from the population is done such that all the
individuals in the population have no equal chances of being selected. This is opposite to
probability sampling.
There are five types of non-probability sampling which are discussed as follows;
4-2.2 Purposive sampling:
It is a non-probability sampling technique where the elements are selected based on
characteristics of a population and the objective of the study. It is also known as subjective or
selective sampling. This type of sampling is mostly used in qualitative research.
Advantages of purposive sampling
 It is simple and reaches the target sample easily
 It is cheaper to use
 It ensures that the population chosen is evenly sampled and so reliable conclusions
can be drawn
Disadvantages of purposive sampling
 It is prone to research bias
 It can be difficult to defend
 There can be a problem of proportionality
4-2.3 Quota sampling:
It is a non-probability sampling technique wherein the assembled sample has the same
proportions of individuals as the entire population with respect to known characteristics, traits
or focused phenomenon. In other words, it is the method of gathering representative data
from a group. As opposed to random sampling, quota sampling requires that representative

82
individuals are chosen out of a specific subgroup. This method is based on the researcher’s
judgment.
Advantages of quota sampling
 It is relatively cheaper
 It can be performed quickly
 It accounts for population properties
 It is a useful method when probability sampling techniques are not possible
Disadvantages of quota sampling
 Sample selection is not random
 There is a potential bias, which is unrepresentative of the population
4-2.4 Convenient sampling:
It is a non-probability sampling technique where subjects are selected because of their
convenient accessibility and proximity to the researcher. Usually data is collected from
people who are closer to the researcher and are easily accessible. For example, using student
volunteers as subjects for a research.
Advantages of convenience sampling
 Ease of availability
 It saves time
 It also saves money since time is money
 It is also very useful in a pilot study
Disadvantages of convenience sampling
 It is possible to introduce biasness
 There is a high possibility of a sampling error
 Results cannot be generalized
4-2.5 Snowball sampling
In this type of non-probability sampling, where the characteristics to be possessed by the
samples are rare and difficult to find and the selection of elements is based on referral from
the earlier selected element. With this sampling, the informant nominates the next element or
individual to be selected. It is mostly used in sociology and statistics research and often
referred to as referral sampling or chain–referral. Selecting prostitutes is difficult and not easy
to get, with snowballing, the researcher need to get the first prostitutes and she will nominate
the next prostitute for inclusion in the survey.
Advantages of snowballing
 It allows for studies to take place where otherwise it might be impossible to conduct
because of lack of known participants
 It can also help a researcher to discover characteristics about a population that he/she
were not aware existed.
Disadvantages of snowballing

83
 It is usually impossible to determine the sampling error or make inferences about
populations based on the obtained sample.
4-2.6 Judgmental sampling
It is a non-probability sampling technique where the researcher selects units to be sampled
based on their knowledge and professional judgement. It is also called expert sampling. For
example, a researcher may decide to choose a population he thinks are more qualified and are
willing to give a more detailed information about something.
Advantages of judgmental sampling
 The approach is well understood and has been refined by experience over several
years.
 No special knowledge of statistics is required
 It is not expensive
 It is saves time
Disadvantages of judgmental sampling
 It is not scientific
 It is wasteful and usually too large samples are selected
 The conclusions reached are usually vague
 There is no logic to the selection of a sample or its size
 Personal bias is unavoidable in selecting the sample
4-2.7 Consecutive sampling
Also known as enumerative sampling, is a sampling technique in which in every subject
meeting the criteria of inclusion is selected until the required sample size is achieved.
Advantages of consecutive sampling
 It is relatively easy to employ
 There is less opportunity for any manipulation
Disadvantages of consecutive sampling
 It is more difficult to do it correctly
 It requires more attention
 It may take longer to fill the sample size

84
CHAPTER FIVE: PROBABILITY

Introductory Remarks
From the previous chapter, we learnt how to establish relationship between or among variables.
You now know the common descriptive statistics used in behavioral research. It is important
to note that in real life, decisions are made under both certainty and uncertainty situations.
Under uncertainty situation, the decision maker cannot tell the actual outcome of his or her
decision. Quantitative tools that can help us understand and make decisions under uncertainty
situation is probability. Probabilistic (also called stochastic) models can be used to make
predictions, inventory control, quality control etc. This chapter sets the foundation for you to
learn basic probability. We shall limit our study to simple probability. So, do not worry about
equations so much.

Unit 1: Introduction to Probability

Unit 2: Simple and Mutually Exclusive Events

85
Unit 1: Introduction to Probability

5-1.1 Definition of Probability

It is the likelihood or chance that a particular event will occur. It is the percentage of times that
a specific outcome would happen if an event were repeated so many times. This implies that a
probability of an event occurring is ratio of the number of times the event occurs to the number
it and other event would occur. The value of probability ranges from 0 to 1.

The theory of probability can be traced to the indoor games of chance such as ruffle, tossing of
coins, casting of dice, playing of cards etc..

An event that has no chance of occurring is called null event and has a probability of zero. An
event which is sure to occur is called certain event and it has a probability of one.

X
Probability of occurrence (P) =
T

X = number of outcomes in which the event occurs.

T = total number of possible outcomes / sample space

5-1.2 Basic Probability Concept

 Sample space versus event: sample space is the collection of all the possible events.
It is the complete set of all outcomes or events whilst event it is a subset of sample
space. In tossing a coin twice, the sample space are {HH, HT, TH, TT} but HH or TH
is an event.
 Experiment: It is and investigation or an operation whose outcome cannot be predicted
with certainty. An experiment is a game of chance.
 Trial: it is a single performance of an experiment.

86
Unit 2: Simple and Mutually Exclusive Events

5-2.1 Types of Events

 A Complement of event: it is all events that are not part of event A. Symbolically, it
is represented by A1.

 Joint event: it is an event that has two or more characteristics.

 Simple event: an event with single characteristics.

 Mutually exclusive events: Two events are mutually exclusive if both events cannot
occur at the same time. Example, sex is mutually exclusive since one person cannot be
both male and female. Also, whether it rains or not does not depend on washing of cars.

 Collectively exhaustive events: Two events are collectively exhaustive if one of the
events must occur. Addition of probability of all collectively exhaustive event must be
one. E.g. Male and female are mutually exclusive and collectively exhaustive. No one
is both (mutually exclusive) but everyone is one or the other (collectively exhaustive).

5-2.2 Simple or Single Events

For simple or single event, if an event can happen in A ways and fails to happen in B ways,
then the probability of A happening is:

A
P ( A) 
A B

The probability of A not happening is:

A
P( A)  1 
A B

5-2.3 Addition rule

P (A or B) = P (A) + P(B) – P(A and B)
For mutually exclusive; P (A and B) = P (A) + P (B) because Prob (A and B) = 0

Conditional probability: It is when the probability of a particular event is computed using

the information about the occurrence of another event.

Probability of A given B is equal to the probability of A and B divided by the probability of

B.
Thus

P( A and B )
P( A / B) 
P( B)

Where P (A and B) = joint probability of A and B

87
P (B) = marginal probability of B
P(A / B) is read as ‘conditional probability of A given B.

Question 1.
A student deck of 52 playing cards containing four suits (spades, hearts, clubs and diamonds),
each of which has 13 different cards (ace, king, queen, jack, 10, 9, 8, 7, 6, 5, 4, 3, 2). There
are 26 red and black colors each. The black color has 2 ace and 24 non-ace. The red color
also has 2 ace and 24 non-ace. Find the probability that the cards are ace and black.

Solution
P(ace and black )
P(ace / black ) 
P(black )

2
P(ace / black )  52  2 * 52
26 52 26
52
2 1
P(ace / black )  
26 13

5-2.4 General Multiplication Rule

The probability of A and B is equal to the probability of A given B times the probability of B.
P(A and B) = P(A/B) * P(B)

Question 2
20 marking pens are displayed in a store. Six are red and 14 are blue. In selecting two
markers from the 20, what is the probability that both markers are red?

Solution

AR  Second marker selected is red

BR  First marker selected is red

 ( AR and BR )   ( AR / BR )   ( BR )

= ( 519)( 6 20) = 0.079

From question 1, find the probability of selecting a black card.

Solution

26 1
Probability of selecting black card = 
52 2
88
4 1
Prob (ace) = 
52 13

5-2.5 Joint probability

number of black aces 2 1

Prob (black and ace) = = 
52 52 26

Prob (ace) =  (ace and red) +  (ace and black)

2 2
= 
52 52

4 1
= 
52 13

Prob (ace or black) = prob (ace) + prob (black) – prob (black) – prob (ace and black)

4 26 2
=  
52 52 52

28 7
= =
52 13

5-2.6 Mutually exclusive

 ( heart or spade) =  (heart) +  (spade)

13 13 26 1
=   
52 52 52 2

Question 3
The probability that Kofi will graduate from UDS is 0.7 and the probability that Ama will
graduate is 0.5. Find the probability that both will complete school.

 (k  A) = (0.7) (0.5) = 0.35

Question 4
A box contains 2 black balls and 3 white balls. If the balls are drawn without replacement,
find the probability that

a. The first ball drawn is black

b. The second ball drawn is black given that the first ball drawn was black

Solution
2 2
a.  r ( B)  
3 2 5

89
W 1 1
b.  r ( ) 
A 3 1 4

5-1.7 Probability Rule of Multiplication for Independent Event

 ( Aand B)   ( A)   (B)

Question 5
A fair die is rolled once and a coin is tossed once. Find the probability of getting two on the
die and a head on the coin

1 1 1
 r (2 and H )   (2)   ( H )   
6 2 12

Self-Assessment Test 15
For Practice (Source: Heiman, 2011).
1. The probability of any event equals its ………………..in the ……………………
2. As the probability of an event decreases, the event’s relative frequency in this situation
...................
3. As the probability of an event increases, our confidence that the event will occur
………………………
4. Tossing a coin (heads or tails) is sampling ……………..replacement.

Answers
1. relative frequency; population.
2. decreases
3. increases
4. with

90
References

Weiss, N.A. (1999). Introductory Statistics. Addison Wesley, 1999.

Agresti, A. and Finlay, B. (1997) Statistical Methods for the Social Sciences, 3th Edition.
Prentice Hall, 1997.

David et al (2011). Statistics for Business and Economics.

Donald Waters, (2011). Quantitative Methods for Business. Hannagan, (1986). Mastering
Statistics. Macmillan Education, (1982).

Armitage, P., Berry, G. and Matthews, J. N. S. (2002). Statistical Methods in Medical

Research, Blackwell, ISBN 978-0-632-05257-8.

Morien, D. (2007). Business Statistics, Thomson Learning Nelson. ISBN 978-0-17-013147-6

Heiman, G. W. (2011). Basic Statistics for the Behavioral Sciences, Sixth Edition.
Wadsworth, Cengage Learning

Gary W. Heiman

The Demography of Health and Health Care: Second Edition
No ratings yet
The Demography of Health and Health Care: Second Edition
385 pages
Intro To Hydrology
No ratings yet
Intro To Hydrology
415 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Regression Analysis of Count Data 2nd Ed
No ratings yet
Regression Analysis of Count Data 2nd Ed
9 pages
Applied Longitudinal Analysis Lecture Notes
No ratings yet
Applied Longitudinal Analysis Lecture Notes
475 pages
Estimation and Hypothesis
100% (1)
Estimation and Hypothesis
32 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Preparing Data For Machine Learning - Pluralsight PDF
No ratings yet
Preparing Data For Machine Learning - Pluralsight PDF
74 pages
An Introduction To Bayesian Statistics and MCMC Methods
No ratings yet
An Introduction To Bayesian Statistics and MCMC Methods
69 pages
Time Series Analysis
100% (1)
Time Series Analysis
15 pages
STP531 Course Syllabus Fall2013
No ratings yet
STP531 Course Syllabus Fall2013
2 pages
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
No ratings yet
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
101 pages
Full Download Multivariate Statistical Methods A Primer Third Edition Manly PDF DOCX
100% (8)
Full Download Multivariate Statistical Methods A Primer Third Edition Manly PDF DOCX
65 pages
1 Interim Report
No ratings yet
1 Interim Report
29 pages
The Workflow of Data Analysis Using Stata
0% (1)
The Workflow of Data Analysis Using Stata
37 pages
Sampling Distribution and Simulation in R
No ratings yet
Sampling Distribution and Simulation in R
10 pages
Frequency Distribution For Categorical Data
No ratings yet
Frequency Distribution For Categorical Data
6 pages
Bio Statistics
No ratings yet
Bio Statistics
174 pages
Imbalanced Data: How To Handle Imbalanced Classification Problems
No ratings yet
Imbalanced Data: How To Handle Imbalanced Classification Problems
17 pages
STAT 650 - Foundations of Data Science Syllabus
No ratings yet
STAT 650 - Foundations of Data Science Syllabus
13 pages
Poisson Regression and Negative Binomial Regression
100% (2)
Poisson Regression and Negative Binomial Regression
34 pages
M.SC 2022-2023
No ratings yet
M.SC 2022-2023
220 pages
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
100% (1)
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
614 pages
GAMS Getting Started
No ratings yet
GAMS Getting Started
31 pages
Statistics
No ratings yet
Statistics
41 pages
Chapter10 Sampling Two Stage Sampling
No ratings yet
Chapter10 Sampling Two Stage Sampling
21 pages
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
No ratings yet
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
1 page
[Ebooks PDF] download Research Methods, Statistics, and Applications Kathrynn A. Adams full chapters
100% (1)
[Ebooks PDF] download Research Methods, Statistics, and Applications Kathrynn A. Adams full chapters
65 pages
Statistics
No ratings yet
Statistics
27 pages
Statistical Inference
No ratings yet
Statistical Inference
113 pages
Statistical Computing Using Statistical Computing Using
No ratings yet
Statistical Computing Using Statistical Computing Using
128 pages
STEP SPSS ANALYSIS COHEN KAPPA and ICC
No ratings yet
STEP SPSS ANALYSIS COHEN KAPPA and ICC
5 pages
R Lesson (1 of 2) PDF
No ratings yet
R Lesson (1 of 2) PDF
182 pages
Missing Data & How To Handle It
No ratings yet
Missing Data & How To Handle It
32 pages
Statistical Methods
100% (1)
Statistical Methods
77 pages
WATER-QUALITY-PREDICTION-USING-MACHINE-LEARNING-TECHNIQUE
No ratings yet
WATER-QUALITY-PREDICTION-USING-MACHINE-LEARNING-TECHNIQUE
9 pages
Advanced Modeling in Biological Engineering Using Soft Computing Methods
No ratings yet
Advanced Modeling in Biological Engineering Using Soft Computing Methods
16 pages
SIS Model For An Infectious Disease
No ratings yet
SIS Model For An Infectious Disease
3 pages
Basic Business Statistics: Introduction and Data Collection
No ratings yet
Basic Business Statistics: Introduction and Data Collection
33 pages
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
No ratings yet
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
19 pages
Class 7
No ratings yet
Class 7
42 pages
IE506 Bagging Boosting April5 6
No ratings yet
IE506 Bagging Boosting April5 6
14 pages
Prof. R C Manocha Autocorrelation: What Happens If The Error Terms Are Correlated?
No ratings yet
Prof. R C Manocha Autocorrelation: What Happens If The Error Terms Are Correlated?
21 pages
Design of Experiments - Week 1, 2
No ratings yet
Design of Experiments - Week 1, 2
50 pages
Approaches To The Analysis of Survey Data PDF
No ratings yet
Approaches To The Analysis of Survey Data PDF
28 pages
Linear Algebra in Python
No ratings yet
Linear Algebra in Python
42 pages
Multivariate Analysis IBS
No ratings yet
Multivariate Analysis IBS
20 pages
Multiple Regression Tutorial 3
100% (2)
Multiple Regression Tutorial 3
5 pages
1-Introduction To Epidemiology
No ratings yet
1-Introduction To Epidemiology
22 pages
Workflow of Statistical Data Analysis
No ratings yet
Workflow of Statistical Data Analysis
105 pages
Random Forest PDF
No ratings yet
Random Forest PDF
92 pages
CSU Forecast
No ratings yet
CSU Forecast
41 pages
Causal Inference for Statistics Social and Biomedical Sciences An Introduction 1st Edition Guido W. Imbens all chapter instant download
100% (1)
Causal Inference for Statistics Social and Biomedical Sciences An Introduction 1st Edition Guido W. Imbens all chapter instant download
27 pages
Survival Plots SURVMINER Package Tutorial
No ratings yet
Survival Plots SURVMINER Package Tutorial
5 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Complex Interventions Guidance
No ratings yet
Complex Interventions Guidance
39 pages
Data Mining Notes
No ratings yet
Data Mining Notes
1,231 pages
Presentations Using Latex: An Introduction To The Beamer Class
No ratings yet
Presentations Using Latex: An Introduction To The Beamer Class
62 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet