0% found this document useful (0 votes)
7 views

Business and Social Statistics Lecture Manual

Uploaded by

ujorfidelis003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Business and Social Statistics Lecture Manual

Uploaded by

ujorfidelis003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 50

Statistical Methods in Business & Social

Sciences 1

(A Preliminary Guide)

A Lecture Manual Prepared By:

Dr. E. O. Eleje

Department of Banking & Finance, Federal University Wukari,


Taraba State

December, 2015
Chapter One
Introduction to Statistics

1
1.0 Objectives
By the end of this chapter, students should have refreshed their knowledge on the
following:

 Nature of Business and Social Statistics


 Relevance of Business and Social Statistics
 Distinction between Data and Information
 Nature/Types of Data
 Sources and Instruments of Data Collection

1.1 Meaning and Nature of Business and Social Statistics

Generally, statistics is a branch of mathematics that deals with the analysis and
interpretation of numerical data in terms of samples and populations. As an academic
field of study, statistics is an integral part of management curriculum and a capstone
integrative course offered to students who have previously been through a set of core
functional area courses. In this sense, statistics is defined as the study of the field of
mathematics that deals with the analysis and interpretation of numerical data to aid
enterprise in decisions that determine the direction of the organization and shape its
future.

In modern time, business and social statistics involve the use of scientific methods to
generate and analyze relevant numerical data to produce information necessary for
effective planning and decision making in both private and public organizations.
Business statistics encompasses decision making in the face of uncertainty and it is
used in many discipline including finance, accounting, economics, production and
operations, services improvement, and marketing research among others.

1.2 Relevance of Business and Social Statistics

2
Business and social statistics is useful to any economy or system in diverse ways.
Few of these could be pointedly documented here for class discussion:

 Effective developmental planning by government

 Aid government budgeting and control

 Enhance information generation about state of the rural areas

 Help private business to plan

 Enhance effective decision making by business

 Aid auditing and financial crime detection

 Effective tool for researchers and analysts

 Aid management of uncertainty business scenario

 Source of employment (e.g., business analyst)

 Guide government in international transactions

 Very relevant in politics (e.g., election)

 Indispensable in time of Census.

1.3 Nature of Statistical Data and Information

Data in statistics simply means unprocessed facts while information is a processed


fact. Data are like ingredients required for preparation of effective and reliable
information. They are indispensable in statistical analysis and business decision
making. In fact, the main business of statistics is playing or manipulating with data
to solve human problem of indecision. From now henceforth, all that we will be
doing in this lecture will focus on data generation, processing, interpretation and
decision making. These are the hallmark of statistics.

Data could be primary or secondary in nature. Primary data are raw and uncollated.
They are original data which exist in their natural form. Secondary data on the other
hand are historical data already collated, processed and stored in retrievable form.

3
N/B: Variable or fact itself does not all the time qualify as data or information. What
makes it so is the usage. For this reason, particular variable may be data to some
persons and at the same time information to another person just as Cement is a
finished product to cement manufacturer and raw material to an estate developer.

1.4 Sources and Instruments of Data Collection

The source of primary data is the field survey. The instrument is through the
administration of a self-designed questionnaire, interviews, or direct observations by
the researcher. On the other hand, the sources of secondary data could be
companies’ articles, Journals, Magazines, dailies, seminar and workshop papers, as
well as unpublished materials in the form of handouts and project works earlier done
in the area. The main instruments for secondary data are the libraries and the
internet.

Chapter Two
Sample and Sampling Procedures

2.0 Objectives

4
By the end of this chapter, students should:

 Know the meaning of statistical population otherwise termed “Universe”


 Know the meaning of a statistical sample
 Understand how to determine a population sample
 Know the meaning and basic purposes of sampling
 Explain the various techniques or methods of choosing a sample (i.e.,
sampling techniques)

2.1 Concept of Statistical Population/Universe


A statistical population otherwise called Universe can be defined as the totality of
items or events from which a sample can be selected for statistical analysis,
description and decision making. According to Osuala (1993), it could be described
as a group of thing with similar characteristics. Onwumere (2009) defined it as a
thing comprising all elements, subjects, and perhaps observations in relation to a
particular phenomenon.

Case 1: Suppose Ganaja village is an electoral ward that consists of 6000 voters
amongst which are 60% Christians, 30% Muslim, and 10% scheduled tribe. As an
investigator, represent in a lucid tabular format the population of Ganaja village
according to their category:

Solution:
Total number of voters = 6000
Population of Christians = 60% (6000) = 3600 voters
Population of Muslims = 30% (6000) = 1800 voters
Population of Scheduled tribe = 10% (6000) = 600 voters

Table 2.1: Population of Voters in Ganaja Village


Category of Population Percentage of Population (%) Number
Christians 60 3600
Muslims 30 1800

5
Scheduled Tribe 10 600
Total 100 6000

2.2 Statistical Sample and Sample Size Determination


A sample is a representation of a population. It is necessary in research when a
population of study is so large that it will be too difficult to manage without bias. A
sample size is not and should not be assumed unscientifically else, any decision
taken from it will be highly biased and prone to high degree of error. Some known
methods exist to determine a sample size to represent a statistical population such as
rule of thumb, one tenth model, (1/10th of the population), etc. However, a most
commonly used is the Yaro Yamane model defined thus:

n =

Where n = Sample size


N = Population size
I = Constant
e = Tolerant error = 0.05 or 5%

Case 2: Use the above Yaro Yamane sample size determination model to draw a
sample from the 6000 population of voters in Ganaja village as contained in case 1:

Solution:
Total number of voters (N) = 6000
Sample Size = ?
Tolerant error = 5% or 0.05

Therefore, n = 6000/1+6000 (0.05)2


= 375 Voters
2.3 Sampling Process

6
Sampling may be defined as the process of selecting some part of an aggregate or
totality on the basis of which a judgment or inference about the aggregate or totality is
made. In other words it is the process of obtaining information about an entire
population by examining only a part of it. In most of the research work and surveys
the usual approach happens to be to make generalization or to draw inferences based
on samples about the parameters of population from which the population are taken.
So we can now define a sample as any number of persons, units or objects selected to
represent the population according to some rule or plan.

Sampling process is different from census. The census method is the enumeration of
all the numbers or units of a population to get the idea of the entire population
whereas sampling is the method of selecting a fraction of the population in such a way
that it represents the entire population.

Sampling is used in practice for a variety of reasons:

 Sampling is cheaper and economical than census method which requires


studying the entire population.
 As the magnitude of operations is small in case of sampling, so data collection
and analysis can be carried out accurately and efficiently.
 Sampling is the best way to make inference about a population when a
population is as large as the population of a country.
 Sampling enables the researcher to make a precise estimate of the standard
error which helps in obtaining information concerning some characteristic of
the population.

Sampling Methods/Techniques
In the process of choosing a good sample, two basic methods or techniques are
employed: Probability and Non probability sampling methods.

1. Probability Sampling Methods: In probability sampling methods the universe


from which the sample is drawn should be known to the investigator. Under this
sampling method, every item of the universe has an equal chance of inclusion in the

7
sample. A lottery method of selecting a student from the complete students’ names
from a box with blind or folded eyes is the best example of random sampling. It is the
best technique and unbiased method. It is the best process of selecting representative
sample. But the major disadvantage is that for this technique, we need the complete
sampling frame i.e. the list of the complete items or population which is not always
available. Probability sampling methods are of three types:

a.) Simple random sampling: This is a method where each element has the equal
probability or chance to be selected as a sample. It is bias free. Here, the makeup of
the population is a simple group with common characteristics and not different
groups. Selection is drawn once and for this reason, each element of the group has
only one chance of being selected and as such cannot come twice as sample.

b.) Stratified random sampling: In stratified random sampling the population is first
divided into different homogeneous group or strata which may be based upon a single
criterion such as male or female; or upon combination of more criteria like sex, caste,
level of education and so on. This method is generally applied when different category
of individuals constitutes the population. To have an actual picture of a particular
population about the standard of living, it is advisable to categorize the population on
the basis of caste, religion or land holding otherwise some section may be under-
represented or not represented at all.

Stratified random sampling may be of two types: Proportionate stratified random


sampling and Disproportionate stratified random sampling.

Proportionate stratified random sampling: In case of proportionate random sampling


method, the researcher stratifies the population according to known characteristics and
subsequently, randomly draws the sample in a similar proportion from each stratum of
the population according to its proportion. That is, the population is divided into
several sub-populations depending upon some known characteristics, this sub
population is called strata and they are homogeneous.

8
Case 3: From table 2.1, proportionately assign the sample size of 370 voters in case
two to the various categories of voters in Ganaja village. Represent the sample
proportion in a lucid tabular format:

Solution:
Total number of voters (N) = 6000
Sample size of all voters (n) = 375

Hence;
Sample size for Christians:
3600/6000 x 375 = 225 voters
Sample size for Muslims:
1800/6000 x 375 = 112 voters
Sample size for Christian:
600/6000 x 375 = 38 voters

Table 2.2: Sample size of Voters in Ganaja Village


Category of Sample Percentage of Sample (%) Number
Christians 60 225
Muslims 30 112
Scheduled Tribe 10 38
Total 100 375

Disproportionate stratified random sampling: For disproportionate stratified random


sampling, the sampling unit in each stratum is not necessarily as per their population.
Suppose an investigator wants to know the voting pattern of male and female of
Ganaja Village, Lokongoma, and Natako voters; in that case he must take equal
number of male and female voter from each category. Here the investigator has to
give equal weight to each stratum. This is a biased type of sampling and in this case
some stratum could be over-represented while some are less represented.

9
c) Cluster sampling: This is another type of probability sampling method, in which
the sampling units are not individual elements of the population, but group of
elements or group of individuals are selected as sample. In cluster sampling the total
population is divided into a number of relatively small sub-divisions or groups which
are themselves clusters and then some of these cluster are randomly selected for
inclusion in the sample. Suppose an investigator wants to study the functioning of
mid-day meal service in a district in that case he can use some schools clustering in a
block or two without selecting the schools scattering all over the district. Cluster
sampling reduces the cost and labour of collecting the data of the investigator but less
precise than random sampling.

2. Non Probability Sampling Methods: In this type of sampling, items for the
sample are selected deliberately by the researcher instead of using the techniques of
random sampling. It is also known as purposive or judgment sampling. Some
important techniques of non-probability sampling methods include:

a) Quota sampling: This method of sampling is almost the same with that of
stratified random sampling as stated above, the only difference is that here in selecting
the elements, randomization is not done instead quota is taken into consideration.

b) Purposive sampling: this is also non random sampling method; here the
investigator selects the sample arbitrarily which he considers important for the
research and believes it as typical and representative of the population. Say, an
investigator wants to forecast the chance of coming into the power of a political party
in a general election; for that purpose he selected some reporters, some teachers and
some elite people of the territory and collect their opinions. He considers these chosen
people as leading persons in the issue and their views are relevant for the chance of
coming into power of a party. As it is a purposive method it has big sampling errors
and carry misleading conclusion.

c) Systematic sampling: In this method every nth element is selected from a list of
population having serial number. For a large population of 1000 that is taken into

10
study a sample size of 100, the investigator is to select every nth name, that is,
(1000/100). This means that the starting name may be anyone within 1-1000, another
name is selected after every 10 th name and continuously in that other. However,
selecting a particular element/person taking the 10th name cannot represent the
different strata or groups that may exist in that big population. Moreover once the
starting number is decided and collected data it cannot be changed or switched over
the other category as per its definition (systemic). Moreover the list may have the
chance to repeat the same category of element by passing the other. It is biased and
misleading but useful in homogeneous population.

d) Snow ball sampling: This is a sociometric sampling technique generally used to


study a small group. All the persons in a group identify their friends who in turn know
their friends and colleagues, until the informal relationships converge into some type
of a definite social pattern. It is just like the snow ball that goes on increasing its size
when rolling in an ice-field. For drug addicts, it is difficult to find out who are the
drug user but when one person is identified he can tell the names of his partner then
each of his partner can tell another 2 or 3 whom he knows uses the drug . This way the
required number of persons is identified and data collected from such a group. This
method is suitable for diffusion of innovation, network analysis, decision making.

e) Double sampling: In this method sampling is drawn twice. For the first time a
large size of sample is selected and send the mailed questionnaire to the respondents
(say 500) after receiving back the answered questionnaire (say 300, as all mailed
questionnaire do not come back,) the investigator again randomly draws the required
number of sample (say100) and send the modified questionnaire to the respondents.
This method is time consuming and expensive.

11
Figure 2.1: Diagrammatical Summary of the Sampling Methods/Techniques

Sampling Techniques

Probability/Random Sampling Non-Probability/Non-Random/Judgmental Sampling

Quota
Purposive
Simple Stratified Cluster
Systematic
Snowball
Double Sampling
Non-Proportionate
Proportionate

Source: Author

12
Chapter Three

Methods of Data Presentation


3.0 Objectives

By the end of this chapter, students should be able to:


 Know the various ways data can be presented for analysis
 Explain the constituents of the numerical form of data presentation
 Explain the different methods in which data can graphically be presented
 Know the meaning and makeup of Audio-visual method of data presentation

3.1 Purpose of Data Presentation


Data presentation is the arrangement of data in a form that it can be effectively
managed. Good presentation of data can make statistical data easy to read, understand
and interpret. It is therefore very important for any investigator to present data clearly.
For the purpose of clarity and effective analysis of data, statistical data can be
presented in three major ways. We shall henceforth discuss these methods in the
following subsections.

3.2 Numerical Form of Data Presentation


This is a method of data presentation which requires the use of numbers to
communicate information. The process may include placing numerical values in
between sentences either in the initial, middle or the ending of a sentence. For
instance, one can write: 25 students came to class early today; 8 were late while 4
were absent. We will be requiring 4 items for the assignment tomorrow, etc.

Numerical data presentation sometimes could also involve the use of table to
summarize data in a distribution. This is necessary when multiple number values
occur in a sentence or paragraph. For example, consider this statement:

Case 4: 20 female students and 15 male students passed mathematics, 50 female


students and 70 male students passed English, 40 male students and 20 female
students passed physics, while 30 male students and 15 female students passed
statistics.
13
This kind of distribution or lengthy statement is better summarized in a table thus:

Subjects Male Students Female Students Total


Mathematics 15 20 35
English 70 50 120
Physics 40 20 60
Statistics 30 15 45
Grand Total 155 105 260

3.3 Graphical Method of Data Presentation


This form of data presentation requires the use of graph to create a spot-picture of
situation, scenario, behavioral pattern, or other phenomenon of a distribution to aid
effective planning and decision making. Graphical forms of data presentation include
the use of pictogram, histogram, bar chart pie chart, and line graph otherwise called
frequency polygon.

A. Pictogram: As the name implies, pictogram uses pictures, paintings, or drawings


to give a quick and easy meaning to statistical data. Figure 3.1 represents students who
scored the grades as shown in a statistics exam:

Figure 3.1: Pictogram showing grades of Students in a Statistics Examination


Grade A
Grade B
Grade C
Grade D
Grade E
Grade F

B. Histogram: This is a set of vertical bars or columns whose areas are proportional
to the frequencies they represent. The bars of a histogram are joined together. The
students’ grade in statistics examination in figure 3.1 is represented in the histogram
below:
Figure 3.2: Histogram showing grades of Students in a Statistics Examination

14
Frequency
6

0
A B C D E F

Grade

C. Bar Chart: Bar chart is similar to histogram. It is also a set of vertical bars
whose areas are proportional to the frequencies they represent. However, in a bar
chart, the bars are not joined together but separated apart from each other in even
proportion. The students’ grade in statistics examination in figure 3.2 is represented in
the bar chart below:

Figure 3.3: Bar Chart showing grades of Students in a Statistics Examination

15
D. Pie Chart: Pie chart is a graphical representation which is in the form of a
circular ‘pie’. To prepare a pie chart, the values of a distribution are first converted
into degrees of which the summation of all converted values MUST equal to 360
degrees. In figure 3.1, the sum total of the students’ grade in statistics examination
(20) makeup the whole pie of 360 0. Each piece of the pie is a sector of the circle as
represented in the pie chart below:

Figure 3.4:

E. Line Graph/Frequency Polygon: A line graph is an alternative to the


histogram. It is another graphical method of data presentation in which a straight line
is drawn to join the mid-points of the class intervals proportionately to the
frequencies. It is also called a frequency polygon. The students’ grade in statistics
examination in figure 3.1 is represented in the line graph below:

Figure 3.5: Line graph showing grades of Students in a Statistics Examination

16
3.4 Audio-Visual Method Data Presentation
Recent innovations in computer and information technology have tended to
significantly facilitate the method of data presentation and management. Today, data
can now be easily imputed and processed via electronic devices such as personal
computers, film projectors, television gadgets, mobile phones etc. The use of
electronic device to present data for easy and accurate data management is audio-
visual method of data presentation. It is audio in a situation whereby the electronic
device is capable of producing sound and visual where data can be viewed through a
screen.

17
Chapter Four
Statistical Data Analysis and Frequency Distributions

4.0 Objectives

After studying this chapter, students will:


 Clearly understand the proper meaning and steps of statistical analysis
 Understand the basic concepts in statistical analysis such as descriptive and
inferential statistics, quantitative and qualitative statistics; parametric and
non-parametric statistics, among others
 Know how to construct and use frequency distributions
 Know the methods and benefits of calculating the Arithmetic mean, median
and mode (i.e., central tendencies)
 Know the rationale and methods of calculating the key measures of
dispersion (i.e., variance and standard deviation)
 Distinguish between absolute and relative dispersion
 Understand how to estimate the coefficient of variation or dispersion
 Understand what is meant by skewness and kurtosis of distributions

4.1 What is Statistical Data Analysis?


Simply put, it is a process of analyzing data to obtain useful information. Yes, but
this definition is not enough. More precisely, statistical analysis deals with
quantitative data. It is better conceptualized as a scientific method of analyzing
masses of numerical/quantitative data so as to summarize the essential features
and relationships of the data in order to generalize from the analysis to determine
patterns of behaviour, particular outcomes, or future tendencies.

The theory underlying the concept of statistical analysis is based on the


mathematics of probability which provides the bases for determining not only the
general characteristics of data but also the reliability of each generalization.
Statistics can be applied in any field in which there is extensive numerical data. It
can be applied in the field of finance, accounting, business, economics, public

18
administration, engineering, medicine, among others. From the viewpoint of
management sciences, the basic steps in statistical data analysis will include:
 Collecting the data from record or other sources or from sample surveys;
 Arranging the data into manageable form;
 Analysing and interpreting the figures by means of statistical techniques;
 Using the calculated results to make rational/informed decisions.

4.2 Basic Concepts in Statistical Analysis

A. Descriptive and Inferential Statistics: Descriptive statistics deals with the


presentation of numeric facts in either tables or graphs. The major uses are to
describe a sample and to get a feel for data. Inferential statistics on the other hand
is a process by which conclusions are drawn about some measures or attribute of a
population based upon analysis of sample data.

B. Qualitative and Quantitative Statistics: Quantitative data are measures of


values or counts expressed in numbers. Example; how many, how much, how
often, etc. Qualitative data on the other hand are measures of ‘types’ and may be
represented by a name, symbol, or a number code. They are data about categorical
variables e.g.; “what type”.

Hence, quantitative statistics categorize information collected through


experimental means expressed and evaluated numerically, while qualitative
statistics evaluate such information related to the quality, character, or
composition of the subject being studied.

C. Parametric and non-Parametric Statistics: Parametric statistics or test is


concerned with determining the type of relationships existing between sets of
numeric otherwise called cardinal variables in a population or sample. They are
normally concerned with the parameter of a normal distribution such as mean,
proportion, or standard deviation. Examples of parametric tests include T-test, F-
test, Analysis of variance (ANOVA), Pearson Product moment of correlation,
Regression, etc.

19
Non-parametric statistics or test on the other hand is concerned with determining
the type of relationship existing between sets of non-numeric otherwise called
ordinal variables in a population or sample. Examples of non-parametric tests
include chi-square test, Spearman’s rank order correlation, etc.

D. Hypothesis and Hypothesis Testing: Hypothesis simply connotes testable belief


or assumption or an intelligent guess of answer to question whose validity is
subject to empirical verification. Hypothesis testing is the process by which the
belief or assumption is tested by statistical means.

4.3 Frequency Distribution Concepts


Frequency means the number of times something happens. For example, in the
chapter three of this manual above, six students got grade B in figure 3.1. The
frequency of grade B is therefore six.

In statistics, before data can effectively be analyzed, it is normal to arrange the


raw data into a manageable form. Frequency distribution involves a process of
arranging data into manageable form for it to be effectively used or analyzed. A
frequency distribution is expected to show:
 The variables values, and;
 The number of occurrences (frequency) of each value or class of values.

Frequency distribution can be a simple distribution or a grouped distribution.

4.3.1 Simple Frequency Distributions


A simple frequency distribution as the name implies is a distribution with few
manageable elements or values. The elements or raw scores in a simple
distribution are not many and as such do not require to be grouped when preparing
the data for analysis. Analysis of a simple distribution will require the use of the
raw values in the distribution. The advantage of this is that the final result from
analysis of a simple distributed data is more accurate and reliable.

20
Case 5: Example of a simple frequency distribution could be the scores of 20
students in a business statistics test viz: 60, 40, 60, 50, 40, 45, 45, 70, 70, 60, 40,
45, 44, 72, 60, 40, 72, 50, 40, and 60.

To form an array of distribution, the data is re-arrange in ascending order viz: 40,
40, 40, 40, 40, 44, 45, 45, 45, 50, 50, 60, 60, 60, 60, 60, 70, 70, 72, 72

Table 4.1:

Simple Frequency Distribution of Students Score in a Statistics Examination


Scores of Students (x) Frequency (f) Fx
40 5 200
44 1 44
45 3 135
50 2 100
60 5 300
70 2 140
72 2 144
Total 20 1063

4.3.2 Grouped Frequency Distributions


Oftentimes however, a frequency distribution can be so large that it becomes
difficult to treat the elements (scores) in the distribution individually when
analyzing the distribution. At this point, it becomes imperative to group the
elements into classes. This will give rise to another form of distribution called
grouped frequency distribution. Hence, a grouped frequency distribution shows
the variable values and the number of occurrences of each class of values.

Case 5: The outputs of 50 operators in Salem University pure water factory were
recorded during a shift as follows:
Table 4.2: Bags of Pure Water Bagged by Workers in Salem Pure
Water Factory
601 702 876 965 1001
1023 787 1290 548 1196
845 1321 779 1123 799
670 789 898 987 1135
921 902 615 1189 1056
1019 1098 908 876 966
987 589 890 824 690
1022 1242 1280 800 567
934 1390 812 1399 1043

21
1156 1278 912 1479 1485
The data in table 4.2 could be re-arranged in ascending or descending order and
would then be termed an ‘array’. More simply, the values would be grouped into
classes and the frequency of the class entered. Such an arrangement would be as
follows:

Table 4.3: Grouped Frequency Distribution of Bags of Pure Water Bagged by


Workers in Salem University Pure Water Factory
Output of Bags of Pure water Number of Workers (Frequency)
500 < 600 3
600 < 700 4
700 < 800 5
800 < 900 8
900 < 1000 9
1000< 1100 7
1100< 1200 5
1200< 1300 4
1300< 1400 3
1400< 1500 2
∑f 50

Notes:
 A grouped frequency table such as table 4.3 is a convenient and
informative method of summarizing the original raw data albeit with some
loss of accuracy.
 The above table uses equal class intervals. However, in some occasions,
unequal or open ended class intervals are used.
 You will observe that ∑f = 50, i.e. the number of recordings in the original
data.

4.4 Charting the Grouped Frequency Distributions


Grouped frequency distribution can be chart using different graphical methods of
data presentation as earlier demonstrated above in the previous chapter 3.3.
However, the most common and often used methods are histogram and line graph.

22
Case 6: Chart the above grouped frequency distribution in a concise Histogram
and line graph using a scale of 2cm = 100units on the X axis and 2cm = 1 unit on
the Y axis. Depict the same grouped frequency distribution by a Cumulative
Frequency Curve using a scale of 2cm = 100units on the X axis and 2cm = 5 units
on the Y axis respectively. From your cumulative frequency curve, determine the
output of the median worker.

Solution:
The histogram, line graph and cumulative frequency in figure 4.1-4.3 below were
generated from the grouped frequency distribution of bags of pure water bagged
by workers in Salem University pure water factory in table 4.3 above with the aid
of excel computer software. Meanwhile, also find attached a manually drawn
histogram, line graph and cumulative frequency using the approved standard
statistical graph sheet.

You will remember that a histogram as earlier stated is a set of vertical bars or
columns whose areas are proportional to the frequencies they represent.

Figure 4.1:
Histogram Showing Bags of Pure Water Bagged By Workers in Salem University
Pure Water Factory

500 600 700 800 900 1000 1100 1200


1300 1400 1500

23
Similarly, a line graph or frequency polygon on the other hand is a straight line
joining the mid-points of the class intervals proportionately to the frequencies.

Figure 4.2:
Frequency Polygon Showing Bags of Pure Water Bagged By Workers in Salem
University Pure Water Factory

1400
1300
1100
1000
1500
1200
900
800
700
600

Cumulative frequency curve: Otherwise called Ogive, cumulative frequency


curve is formed by plotting the cumulative frequencies of a distribution against the
upper limit of each class interval. Cumulative frequency curve is an easy way of
determining the median of a distribution.
700 800 900 1000 1100 1200 1300 1400 1500
Before sketching our graph, we will first compute the cumulative frequencies of
the distribution in table 4.3 above as follows:
Table 4.4: Cumulative Frequency Table of Pure Water Bagged By Workers in the Factory
Output of Pure water No. of Workers (Freq) Cumulative Frequency
500 < 600 3 3
600 < 700 4 7
700 < 800 5 12
800 < 900 8 20
900 < 1000 9 29
1000< 1100 7 36
1100< 1200 5 41
1200< 1300 4 45
1300< 1400 3 48
1400< 1500 2 50
∑f 50
Figure 4.3:

24
Cumulative Frequency Showing Bags of Pure Water Bagged By Workers in Salem
University Pure Water Factory

500 600 700 800 900 1000 1100 1200 1300 1400 1500

4.5 Characteristics of Distributions


Statistical analysis as earlier explained seeks to provide summary measures which
describe the important features or characteristics of a distribution of values. There
exist four of such important features of distributions namely:

 Averages: That is, the typical size of the distribution otherwise known as the
central tendency. For our purpose, the most important measures are the
arithmetic mean, the median, and the mode.
 Dispersion: This is the variation, spread, or scatter of the distribution for
which the most important measures are the variance and the standard
deviation. Other measures of variation are the range (i.e., the difference
between the largest and smallest values); and the semi-interquartile range
(i.e., half the range of the middle 50% of items).
 Skewness: This is the lopsidedness or asymmetry of a distribution.
 Kurtosis: The peakedness or height of a distribution.

25
4.5.1 Averages
The most important measure of central tendency or average of a distribution is the
arithmetic mean or simply the mean. It is mathematically denoted by .

A. Mean of Frequency Distributions

i. Mean of a Simple Frequency Distribution

For an array of a simple distribution, the Mean is calculated thus:

() = ∑x/n

Where x = each element or value in a distribution


n = total number of elements

Case 7: Using the array of scores of the 20 students in a business statistics test in
case 5 above; determine the mean () of the simple distribution:

Solution:
Mean () =

40+40+ 40+40+40+44+45+45+45+50+50+60+60+60+60+60+70+70+72+72
20

() =

But for a tabulated simple frequency distribution, the mean is calculated thus:

() = ∑fx/∑f

Where fx = each value (score) in a distribution multiplied by its frequency.


f = frequency or number of occurrence of each value in the distribution

Case 8: Using data from the tabulated simple frequency table of scores of the 20
students in a business statistics test above and here represented below; determine
the mean () of the simple distribution:

Table 4.5:

26
Simple Frequency Distribution of Students Score in a Statistics Examination
Scores of Students (x) Frequency (f) Fx
40 5 200
44 1 44
45 3 135
50 2 100
60 5 300
70 2 140
72 2 144
Total 20 1063

Solution:
Mean () = ∑fx/∑f
Mean () = 1063/20 =

ii. Mean of a Grouped Frequency Distribution

The formula of the mean of a grouped frequency distribution is similar to that of


the tabulated simple frequency distribution above but with slight variation in
computation. It is also calculated thus:

() = ∑fx/∑f

Here, fx = each class of value (i.e., midpoint of each class interval) in


the grouped distribution multiplied by the frequency or
number of values that fall within each class interval.
f = frequency or number of values that fall within each class
interval in the grouped distribution.

Case 9: Using data from the above grouped frequency distribution table 4.3 of
bags of pure water bagged by workers in Salem University pure water factory;
determine the mean () of the distribution:

Solution:

27
Hint: - Because the distribution is grouped, we will need to compute the x column
first before fx column.

Table 4.6: Mean Distribution of Bags of Pure Water Bagged by Workers in


Salem University Pure Water Factory
Output of Bags Class Midpoint No of Workers Fx
of Pure water (x) (f)
500 < 600 550 3 1650
600 < 700 650 4 2600
700 < 800 750 5 3750
800 < 900 850 8 6800
900 < 1000 950 9 8550
1000< 1100 1050 7 7350
1100< 1200 1150 5 5750
1200< 1300 1250 4 5000
1300< 1400 1350 3 4050
1400< 1500 1450 2 2900
∑f = 50 ∑fx = 48,400

Hence; Mean () = ∑fx/∑f


Mean () = 48,400/50 = 968 Bags of Pure Water.

B. Median of Frequency Distributions


For a simple distribution, the median of any set of data is the middle value (n) in
order of size if n is odd. On the other hand, the mean of the two middle items
becomes the median if n is even. However, in a grouped frequency distribution,
the median is best determined using the cumulative frequency curve (Ogive). This
is done by determining the 50th percentile of the total cumulative frequency value;
then, tracing the y-value coordinates to x-coordinates. The median output for our
pure water business above is 955 bags (See the attached cumulative frequency
curve).

Note; where the data in a frequency distribution contain a few very large or small
values, the median value is often considered to be a more representative value
than the mean although it cannot be used for subsequent calculations as is possible
with the mean.
C. Mode of Frequency Distributions

28
Simply put, the mode is the number with the highest frequency. More precisely,
the mode of a frequency distribution is the value which occurs most often or the
value around which there is the greatest degree of clustering. Ordinarily, the mode
is only meaningful if there is a marked or significant clustering of values round a
single point else, the mode is not often an issue of concern.

N/B: In a symmetrical distribution, the mean, median, and mode must have the
same value. But in asymmetric conditions, their values differ (see skewness
below).

4.5.2 Dispersion

Dispersion is the scatter or variation of a set of values in a distribution. A measure


of the degree of dispersion of data is needed for two basic reasons:

 To assess the reliability of the average of the data, and;


 To serve as a basis for control of the variability. For instance, assessment
of the degree of quality or output variation is an essential part of quality
and quantity control procedure in our Salem University pure water factory.
The most important measures of dispersion are standard deviation denoted
in this manual by δ and variance denoted by δ2.

For an ungrouped distribution, the formula for variance and Standard deviation are
represented thus:

Standard Deviation (δ) = ∑(x-µ)2


N

Variance (δ2) = ∑(x-µ)2


N

Where µ = Population mean and N Population size.

29
For a grouped distribution, the formula for Standard deviation and variance are
represented thus:

Standard Deviation (δ) = ∑f(x-)2


∑f - 1

Variance (δ2) = ∑f(x-)2


∑f - 1

Case 10: Using data from the same grouped frequency distribution table 4.3
above determine the output variance and standard deviation of bags of pure water
produced in the SU factory:

Solution:

Hint: - From the formula, we shall need to compute for the following columns: x,
(x-), (x-)2 and f(x-) 2 respectively. However, remember that the mean () of the
distribution is 968 bags as already determined in case 9.

Table 4.7: Grouped Frequency Distribution of the Variance and Standard


Deviation of Bags of Pure Water in SU Pure Water Factory
Output (x) (f) (x-) (x-)2 f(x-)2
500 < 600 550 3 -418 174724 524172
600 < 700 650 4 -318 101124 404496
700 < 800 750 5 -218 47524 237620
800 < 900 850 8 -118 13924 111392
900 < 1000 950 9 -18 324 2916
1000< 1100 1050 7 82 6724 47068
1100< 1200 1150 5 182 33124 165620
1200< 1300 1250 4 282 79524 318096
1300< 1400 1350 3 382 145924 437772
1400< 1500 1450 2 482 232324 464648
50 2,713,800

Hence; Variance (δ2) = ∑f(x-)2


∑f - 1

Output Variance (δ2) = 2,713,800/50-1 = 55,384 Bags.

30
Standard Deviation (δ) = ∑f(x-)2
∑f - 1

= 2,713,800
50-1

= 235

Relative and Absolute Dispersions


The standard deviation provides basic information on the absolute dispersion of a
given distribution. Normally, the higher the standard deviation of a distribution,
the greater the amount of variation or scatter of such a distribution.

Occasionally, it is often imperative to have a measure of the relative dispersion of


a distribution, particularly if distributions are being compared. The measure which
provides this relative view is the coefficient of variation otherwise called the
coefficient of dispersion. This is simply the ratio of the standard deviation of a
distribution to the mean of the distribution expressed as a percentage.
Mathematically, this can be demonstrated thus:

Coefficient of variation = δ x 100



Case 11: Take the grouped frequency distribution table 4.3 above as distribution
‘A’ with mean of 968 and standard deviation of 235. Assume that another set of
distribution ‘B’ has a mean of 735 and standard deviation of 321. Does
distribution ‘A’ vary more or less than distribution ‘B’?

Solution:
δA = 235; A = 968

δB = 321; A = 735

Coefficient of variation (A) = 235 x 100 = 24.28%


968
(B) = 321 x 100 = 43.67%
735
Therefore; Distribution B is relatively more variable.

31
4.5.3 Skewness of Distributions
Skewness occurs where there is a lack of symmetry or evenness in a distribution.
When there is no symmetry in a distribution, such distribution is said to be
asymmetric. The effect of asymmetric distribution is that the mean, median, and
mode will manifest differing values. Distribution could be positively skewed or
negatively skewed. The diagrams below illustrate the two forms of skewedness:

Figure 4.4: Skewness of Distributions

f f

Mode Median Mean Mean Median Mode


Value Value
Positive Skewness Negative Skewness

Skewness of a distribution is usually treated in descriptive terms rather than


summarized by a single measure. To estimate the accurate measure of skewness of
a distribution will require advanced techniques. However, an appropriate and
reliable measure of the amount and direction of skewness can be found using the
Pearson coefficient of skewness represented mathematically thus:

SK = 3 (Mean - Median)
δ

However, it is important to assert that the above formula still has limited practical
application.

Review question 1:
Estimate the nature of skewness of the two distributions ‘A’ and ‘B’ in Case 11
above assuming the median output of ‘A’ is 520 bags and ‘B’ 450 bags
respectively.

32
4.5.4 Kurtosis of Distributions

Kurtosis relates to the peakedness of a distribution viewed from the frequency


distribution curve. It ranges from distributions which are:

 Platykurtic – That is, flatter than normal curve


 Mesokurtic – That is, normal distribution curve; and,
 Leptokurtic – That is, more peaked than normal curve.

Figure 4.5 below is an illustration of these forms of kurtosis:

Figure 4.5: Kurtosis of Distributions

A = Leptokurtic
B = Mesokurtic
C = Platykurtic

33
Chapter Five
Correlation Statistics

5.0 Objectives

After studying this chapter, students will:


 Understand the meaning and forms of correlation
 Distinguish between zero correlation and “nonsense” correlation
 Understand the key measures of correlation vis-à-vis;
 Spearman’s rank order coefficient of correlation (R) and;
 Pearson Product Moment coefficient of correlation (r)
 Coefficient of Determination (r2)

5.1 Meaning of Correlation


Correlation as the name suggest, is a measure of the degree to which two variables
are related. For instance, if x and y are two variables, Correlation would be a linear
association between them, it can be regarded as a method for measuring how well the
best relationship of a specified kind fits a set of data.

5.2 Forms of Correlation

A. Perfect Correlations

A perfect correlation is a relationship in which a straight line can be drawn through all
the points. Elasticity or coefficient of perfect positive correlation is 1 while that of
perfect negative correlation is -1.
34
B. Partial Correlations

A partial correlation is a form of correlation in which some interrelationship but not


exact relationship exists. Elasticity or coefficient of partial positive correlation ranges
from >0<1. Elasticity or coefficient of partial negative correlation ranges from <0>-1.

C. Zero Correlation

Zero Correlation or Uncorrelated

A zero correlation is a case of no relationship at all. The coefficient of zero correlation


is zero.

35
 Nonsense Correlation: A nonsense correlation occurs in a situation where two
variables produce a high calculated ‘r’ value and yet no causal relationship
exists between the two variables.

5.3 Measures of Correlation


A standard measure of correlation is the coefficient of correlation. Two basic
coefficient models are often used in management and social sciences: Spearman’s
rank order and Pearson’s product moment coefficients of correlation.

5.3.1 Spearman’s Rank Order Coefficient of Correlation (R)


The Spearman’s Rank Order coefficient of correlation model is of the form:

R = 1 - {6∑d2/ n (n2-1)}

Case 12: A group of 8 Accounting and Business Administration students are tested in
business mathematics and business statistics tests. Their performance ranking in the
two tests were as shown in table 5.1.
Table 5.1: Positions of Students in Business Administration and Mathematics Tests
Students Business Statistics Business Mathematics
A 2 3
B 7 6
C 6 4
D 1 2
E 4 5
F 3 1
G 5 8
H 8 7

You are required to determine the nature of association (R) between the two subjects
and interpret your result.

Solution:
Looking at the formula, a new table that captured ‘d’ and d2 emerged as follows:

36
Table 5.2: Spearman’s Rank Order Coefficient of Correlation Table
Students Statistics Mathematics D d2
A 2 3 -1 1
B 7 6 1 1
C 6 4 2 4
D 1 2 -1 1
E 4 5 -1 1
F 3 1 2 4
G 5 8 -3 9
H 8 7 1 1
2
∑d 22

Hence; Coefficient of Corr. (R) = 1- {6∑d2/ n (n2-1)}

R = 1-{6X22/8(82-1)
R = +0.74
Interpretation:
Positive ‘R’ of 0.74 students’ performance outcome implies a partial but strong
positive agreement or correlation between the two subjects.

Tied Rankings

A slight adjustment to the above formula is necessary if some students obtain the same
marks in the test and thus are given the same ranking.

The Adjustment is of the form:


t3-t
12
Where t is the number of tied rankings. Given this therefore, the adjusted formula is
thus:
Coefficient of Corr. (R) = 1- 6∑d2 + t3-t
12
n (n2-1)

Review Question 2: Assume that students E and F achieved equal marks in statistics
and were given equal third place; determine the coefficient of correlation of the
students’ performance.

37
5.3.2 Pearson’s Product Moment Coefficient of Correlation (r)
The Pearson’s product moment coefficient of correlation model is of the form:

r = n∑xy - ∑x∑y
√n∑x – (∑x)2 . √n∑y2-(∑y)2
2

Case 13: The following data have been collected in respect of sales and advertising
expenditure in a manufacturing company.
Table 5.3: Relationship between Advertising and Sales Volume of a Firm
Advertising Expenditure (N million) Y Sales Volume (N million) X
8.5 210
9.2 250
7.9 290
8.6 330
9.4 370
10.1 410
Determine the nature and rate of correlation (r) between advertising expenditure and
sales volume.

Solution:
Looking at the Pearson’s model above, we will first create table that will
accommodate all the variables of the model as follows:

Table 5.4 Pearson’s Coefficient of Correlation Table for Advertising and Sales
Volume of a Firm
Advert Cost in NM (Y) Sales NM (X) Y2 X2 XY
8.5 210 72.25 44100 1785
9.2 250 84.64 62500 2300
7.9 290 62.42 84100 2291
8.6 330 73.96 108900 2838
9.4 370 88.36 136900 3478
10.1 410 102.01 168100 4141
53.7 1,860 483.63 604600 16833

Coefficient of Corr. (r) = n∑xy - ∑x∑y


√n∑x – (∑x)2 . √n∑y2-(∑y)2
2

r = 6 (16833) – (1860x53.7)
√6 (604600) – (1860)2 x √6 (483.63)-(53.7)2

r = 0.64

38
Interpretation:
The positive r of 0.64 is an indication of partial but semi-strong positive correlation or
agreement between advertising expenditure and the volume of sales of the firm.

5.4 Coefficient of Determination (r2)

The coefficient of determination model is of the form:

r2 = { n∑xy - ∑x∑y }2
√n∑x2 – (∑x)2 . √n∑y2-(∑y)2

Case 14: Using the data provided in table 5.3 on the relationship between advertising
and sales volume, evaluate the coefficient of determination (r2) and interpret your
result.

Solution:
A cursory look at the coefficient of determination model above shows that the formula
is simply the square of the coefficient of correlation model. Hence;

r2 = (0.64)2
= 0.41.

Interpretation:
The coefficient of determination value of 0.41 means that approximately 41% of the
variation in sales volume is actually caused by changes in advertising expenditure
while 59% (i.e., 1-0.41) is caused by other factors other than advertising cost.

39
Chapter Six
Regression Statistics
6.0 Objectives
After studying this chapter, students will:
 Be able to explain the concept of regression
 Know the basic types of regression statistics
 Determine the equation of a straight line
 Estimate the coefficients of regression
 Interpret the nature of various regressions results

6.1 Concept of Regression

Regression on the other hand, tells us the exact kind of linear association that exists
between those two variables.

Regression analysis involves identifying the relationship between a dependent


variable and one or more independent variables. A model of the relationship is
hypothesized, and estimates of the parameter values are used to develop an estimated
regression equation. Various tests are then employed to determine if the model is
satisfactory. If the model is deemed satisfactory, the estimated regression equation can
be used to predict the value of the dependent variable given values for the independent
variables.

6.2 Basic Types of Regression


Two major types of regression could be found: Simple linear regression and multiple
regression. In simple linear regression, the model used to describe the relationship
between a single dependent variable y and a single independent variable x is y = a0 +
a1x + k. a0and a1 are referred to as the model parameters, and is a probabilistic error
term that accounts for the variability in y that cannot be explained by the linear
relationship with x. If the error term were not present, the model would be
deterministic; in that case, knowledge of the value of x would be sufficient to
determine the value of y.

40
6.3 Estimating the Coefficients of Simple Linear Regression (r)
Coefficient of correlation and regression are both concerned with association or
interrelationship between variables. Hence, the product moment coefficient of
correlation (r) model used in the correlation statistics above is also employed in the
case of coefficient of regression thus.

r = n∑xy - ∑x∑y
√n∑x – (∑x)2 . √n∑y2-(∑y)2
2

Case 15: The table below represents the demand (Y) of a commodity of an enterprise
at various prices (X) within the defined period 2000-2009.

Table 6.1: Demand and Prices of Commodity of a Firm


Year Price (N Million) X Quantity Demanded (in Bags) Y
2000 5 100
2001 7 75
2002 6 80
2003 6 70
2004 8 50
2005 7 65
2006 5 90
2007 4 100
2008 3 110
2009 9 60

What is the nature of the relationship between X and Y (i.e., determine ‘r’ and
interpret result)

Solution:

Table 6.2: Regression Table for the Relationship between Prices and Demand of
Commodity of a Firm
Year Price (NM) X Q. DD (Y) XY Y2 X2
2000 5 100 500 10000 25
2001 7 75 525 5625 49
2002 6 80 480 6400 36
2003 6 70 420 4900 36
2004 8 50 400 2500 64
2005 7 65 455 4225 49
2006 5 90 450 8100 25
2007 4 100 400 10000 16
2008 3 110 330 12100 9
2009 9 60 540 3600 81
Coefficient of Regression (r) = n∑xy - ∑x∑y
41
√n∑x2 – (∑x)2 . √n∑y2-(∑y)2

r = 10 (4500) – (60x800)
√10 (390) – (60)2 x √10 (67450)-(800)2

r = -0.93.25

Interpretation:
Regression coefficient of -0.93.25 is an indication of highly strong negative
association between price and quantity demanded of the firm’s commodity. This also
means that the higher the price, the lower the quantity demanded of the commodity.

6.4 Equation of a Straight Line


There are two major methods of evaluating the equation of a straight line: the normal
simultaneous equation and the Cramer’s rule. The normal equation is thus stated:

an + b∑x = ∑y …………….. equation 1


2
a∑x + b∑x = ∑xy ……………. equation 2

The Cramer’s model is of the form:

a = ∑y - b∑x
n

b = n∑xy - ∑x∑y
n∑x2 – (∑x)2

Case 16: State the equation of a straight line. Using data from table 6.1, determine the
values of a and b in the stated equation.

A typical equation of a straight line is stated thus:

Y = a+bx

42
Adopting Cramer’s model implies;
b = n∑xy - ∑x∑y
n∑x2 – (∑x)2
Where: n = 10
∑xy = 4500
∑x = 60
∑y = 800
∑x2 = 390
b = 10 (4500) – (60x800)
(10x390) – (60)2

b = - 10
Substituting ‘b’ in Cramer’s equation for ‘a’ implies;

a = ∑y - b∑x
n

a = 800 – (-10) 60
10

a = 140

Hence, equation of the straight line: Y = 140 – 10x

Review Question 3: Using data from table 6.1, determine the values of a and b in the
equation of the straight line above.

6.5 Multiple Linear Regression


Sometimes, the value of r2 in the simple linear model y = a+bx may not be considered
satisfactory. This means that the model at that point is not a good predictor. In such
situation, they are possible causes of action. One of this could be that the movements
in the dependent variable ‘Y’ are caused by several independent factors and not just
one, thereby prompting the need to incorporate these factors in a model.

Multiple regression model is thus a model which incorporates several independent


variables. The model is of the form: Y = a + b1x1 + b2x2 + ……. µ.

Where:
43
b1 = (∑∆Y∆X1)(∑∆X22) - (∑∆Y∆X2)(∑∆X1∆X2) b2 = (∑∆Y∆X2)(∑∆X21) - (∑∆Y∆X1)(∑∆X1∆X2)
(∑∆X21)(∑∆X22) - (∑∆X1∆X2)2 (∑∆X21)(∑∆X22) - (∑∆X1∆X2)2

a = ∑Y/N – (b1∑ X1/N) – (b2∑ X2/N)

An investigation on the other possible causes of variations in the demand of the above
commodity in table 6.1 reveals that income level of consumers could be another
variable as shown in table 5 below:
Table 6.3: Demand Prices and Income of Consumer of Commodity of a Firm
Year Quantity Demanded (in Bags) Price (N Million) Income (N000)
2000 100 5 10
2001 75 7 6
2002 80 6 12
2003 70 6 5
2004 50 8 3
2005 65 7 4
2006 90 5 13
2007 100 4 11
2008 110 3 13
2009 60 9 3

Review Question 4:Find the relationship existing between Y and XI, X2. That is,
determine the values of a, b1 and b2 in the relation Y = a + b1X1 + b2X2

Chapter Seven

44
Decision Making Analysis
7.0 Objectives

After studying this chapter, students will:


 Understand the concept of Risk and Uncertainty
 Be able to explain the meaning of probability
 Understand the basic rules of probability
 Differentiate between subjective and objective probability
 Understand how to calculate and use expected values
 Understand the concept of permutation
 Know the rationale and how to calculate permutation
 Understand the concept and computation of combination

7.1 Risk and Uncertianty

In an ideal senario, business decisions can be predicted with certainty. But in reality,
certainty conditions are rare in the business world. One can only speak of certainty
conditions whenever the number of possible outcomes from a business activity falls
within a very narrow range of possible values. In that case, there would be only a very
remote possibility of divergence between expected and realized outcomes. This
problem of unpredictability of certainty in business decison making, virtually all
business decisions are taking under conditions of either risk or uncertainty. What then
is Risk and Uncertainty?

Risks and uncertainty are in most cases used inter-changeably. However, different
authors have identified slight variation and relationship. Literarily, the term ‘risk’
means exposure to danger, or economic adversity. This is the layman’s viewpoint. The
News Oxford Advanced Learner’s Dictionary (2000) defined risk as the possibility of
meeting danger or of suffering harm or loss. However, a more embracing definition is
found in Okafor (1983) who posits that risk is the exposure to loss arising from
variations between the expected and the actual outcome of investment activities. He
further argues that where the range of possible outcomes is wide, exposure to risk

45
would be high, but if the range is narrow, exposure to risk would be narrow. Deducing
from this position, a condition of risk will occur where an investor knows exactly the
range of possible outcomes to expect from a business opportunity and the possible
occurrences of each outcome as well.

Uncertainty on the other hand refers to a situation where alternative outcomes exist
with unknown probabilities. That is, when the future outcome of event cannot be
predicted with any degree of confidence from a knowledge of past or existing events.
A condition of uncertainty implies a near complete ignorance of the future outcomes
of present decisions. It arises where the decision maker has no dependable information
about the nature of factors which impinge on his investment activities. Uncertainty is a
subjective phenomenon. This means that two or more investors are unlikely to have
identical perception of the outcome of investment decision taken under condition of
uncertainty. Consequently, it is very difficult to generate universally acceptable model
for dealing with uncertainty. A decision maker faced with uncertainty condition would
attempt to generate probability distribution of possible outcomes on ground of his
personal judgment of the situation. By so doing, a condition of uncertainty would at
least conceptually, be reduced to one of risk.

7.2 Concept of Probability


Probability can be considered as the quantification of uncertainty or risk. Probability
is represented by p, and can only take values ranging from 0 (impossibility) to I
(certainty). For example, it is impossible to fly to the moon unaided and it is certain
that one day we will die. This is expressed as:

P (flying to the moon unaided) = 0


and P (dying) = 1

More generally, we may define the probability of an event, E, as follows:

Number of favorable outcomes


P (E) = Total number of possible outcomes

46
where an event is some occurrence, e.g. spinning a coin to show a head, drawing an
Ace from a pack of cards, etc. Probabilities can thus be regarded as relative
frequencies, as shown in the following examples.

Case 17: What is the probability of drawing an Ace from a shuffled pack of cards?

P (drawing an Ace) = 4/52 = 1/13

The numerator is 4 because there are four Aces in a pack and the denominator is 52
because there are fifty two cards in a pack.

What is the probability of throwing a 3; with a six-sided unbiased die?

P (throwing a 3) = 1/6

The numerator is 1 because only one side depicts a three and the denominator is 6
because there are six sides on the die.

7.3 Objective and Subjective Probability

Where the probability of an event is based on past data and the circumstances are
repeatable by test, the probability is known as statistical or objective probability. For
example, the probability of tossing a coin and a head showing is 50% or ½ or 0.5. This
value can be shown to be correct by repeated trials. In most circumstances objective
probabilities are not available in business, so that subjective probabilities must be
used. These are quantifications of personal judgment, experience and expertise. For
example, the Sales Manager considers that there is a 40% chance (i.e. p = 0.4) of
obtaining the order for which the firm has just quoted. Clearly this value cannot be
tested by repeated trials. In spite of the undoubted shortcomings of subjective
probabilities they are all that are normally available and so they are used to help in the
decision-making process. It should be emphasized that the use of probabilities does
not of itself make the decision. It merely provides more information on which a more
informed decision can be taken.

47
Mutually Exclusive and Independent Events: Events which cannot happen at the
same time are said to be mutually exclusive events. If one happens, the other(s) cannot
occur. For example, if we are considering the classification of people the events
‘male’ and ‘female’ are mutually exclusive. If a person is ‘female’, this automatically
excludes the possibility of being ‘Male’. But two or more events are independent if
the occurrence or non-occurrence of any one event does not affect the occurrence or
non-occurrence of the others. For example, the outcome of any throw of a die is
independent of the outcome of any preceding or succeeding throws.

7.4 Permutation and Combination

Permutation is simply an ordered arrangement of items. Thus, AB is a different


permutation to BA even though the individual two items A and B are the same; they
are in a different order.Frquently, an analyst has to work out the number of ways that
an event can occur in order to calculate a required probability. If each possibility can
be listed, it can be time consuming as well as error prone. The formular (n!) is often
used to make the task easy. This formular implies: n(n-1)(n-2)(n-3) ...

Case 18: A restaurant offers a choice of 3 starters, 4 main courses and 3 sweets. How
many different meals are available?

The solution is 3 x 4 x 3 = 36 different meals

But there will be occasions when selections will be made where the order does not
matter meaning that the arrangement A, B will be same as B, A. This is known as a
combination.

Case 19: 6 apprentices A, B, C, D, E, F have to be paired into two’s for an exercise.


In how many ways may this be done?

The solution is thus:


AB BC CD DE EF
AC BD CE DF
AD BE CF
AE BF
AF There are 15 ways.
48
However, it is not often necessary to list all the ways as we did above. The following
formula can be used for combinations:
n!
r!(n —r)!

Where n is the total number of items and r is the number of items per arrangement.
The above example works out as follows:

6! 6x5x4x3x2x1
2!(6-2)! = 2x1x4x3x2x1 = 15 ways

This type of arrangement is known as a combination of n items r at a time and is


denoted:
n
Cr
7.5 Expected value
The basic rules of probability so far outlined help to quantify the options open to
management and thereby may help in coming to a decision. Where the options have
values (so much profit, contribution, etc.) as well as probabilities, the concept of
expected value is often used. The expected value of an event is its probability times
the outcome or value of the event over a series of trials.

Case 19: A company is considering investing in either of two projects A and B. You
are expected to calculate the expected value of each of the two projects and advice on
which is more preferable.

Solution:
Possible Outcomes Project A Project B
N Prob. Exp. V (N) N Prob. Exp. V (N)
Optimistic 6000 0.2 1200 6500 0.1 650
Most Likely 3500 0.5 1750 4000 0.6 2400
Pessimistic 2500 0.3 750 1000 0.3 300
Project E.V 3700 3350

Decision:
On the basis of Expected Value, Project A would be preferred as it has the higher
expected value.

49
Notes:
a) Although the EV of A is N3700, strictly this value would only be achieved in the
long run over many similar decisions — extremely unlikely circumstances!
b) If the project was implemented, any of the three outcomes could occur, with the
values stated.

Advantages and Disadvantages of Expected Value


Expected Value is a useful summarizing technique, but suffers from similar
advantages and disadvantages to all averaging methods.

Advantages:
a) Simple to understand and calculate
b) Represents whole distribution by a single figure
c) Arithmetically takes account of the expected variabilities of all outcomes.

Disadvantages:
a) By representing the whole distribution with a single figure it ignores the other
characteristics of the distribution, e.g. the range and skewness.
b) Makes the assumption that the decision maker is risk neutral, i.e. he would rank
equally the following distributions:
Pessimistic Outcome
Most likely Outcome
Optimistic Outcome
It is of course unlikely that any decision maker would rank them equally due to his
personal attitude to risk.
c) Although it appears to be widely used for the purpose, expected value is not
particularly well suited to one-off decisions. Expected value can strictly only be
interpreted as the value that would be obtained if a large number of similar decisions
were taken with the same ranges of outcome and associated probabilities.

50

You might also like