STATISTICS
STATISTICS
MEASURES OF DISPERSION
CHARACTRISTICS OF A GOOD MEASURE OF DISPERSION
RELATIVE AND ABSOLUTE MEASURES – DEFINITION, TYPE, MERITS AND
DEMERITS
STANDARD DEVIATION
SKEWNESS AND KUOTOSIS
ELEMENTS OF PROBABILITY
BASIC CONCEPTS OF PROBABILITY, TYPES AND USES
COUNTING TECHNIQUES, SET THEORY
LAWS OF PROBABILITY AND APPLICATIONS – ADDITION AND MULTIPLICATION
LAWS
CONDITIONAL PROBABILITY – CALC AND BAYES THEORY, APPLICATIONS IN
DECISION MAKING
PROBABILITY DISTRIBUTION AND THEIR APPLICATION – NORMAL DISTRIBUTION
AND POISON DISTRIBUTIONS
SAMPLING
REASONS FOR SAMPLING
SAMPLING AND CENSUS
TYPES OF SAMPLING AND THEIR LIMITATIONS
NETWORK ANALYSIS
NETWORK DISTRIBUTION
IMPORTANCE OF NETWORK ANALYSIS
NETWORK CONSTRUCTION
CRITICAL PART DETERMINATION
APPLICATIONS OF NETWORKING ANALYSIS TO DECISION AND COST
SCHEDULING
MANPOWER SCHEDULING
INTRODUCTION TO STATISTICS
Statistic is the science of organizing, describing and analyzing quantitative data.
The term statistics is also used to refer indices which are derived from data through statistical
procedures. Examples of such indices include mean/averages, standard deviation, correlation
co efficient etc.
However the word statistics has two meanings one in singular and the other in plural sense.
Statistics in singular sense – it refers to a science which studies the statistical methods; it
therefore means that it is a science in which we study how numerical data is to be collected,
analyzed, presented and interpreted.
Statistics as a science has been defined differently by different scholars
1. Bowley
He has called statistics “the science of counting and averages”
He also called it the science of measurement of the social organism regarded as a whole in all its
manifestations.
2. Boddington
Defines statistics as “the science of estimates and probabilities”
3. Croxton and Cowden
Defines statistics as a science of collection, presentation, analysis and interpretation of
numerical data
Statistics in plural sense – by the application of the science of statistics what we get is called
statistics in plural.
It has been defined by Horac Surist
He defined statistics are aggregate of facts affected to extent by multiplicity or causes,
numerically expressed or enumerated or estimated according to a reasonable standard or
accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to
each other.
Examples of such statistics in plural are total population in a country, number of students in a
school, college or university, number of wild animals in a park etc.
Types of statistics
Are mainly two i.e. descriptive and inferential statistics
Descriptive statistics
Are indices that describe a given sample for example the measurers of central tendency which
includes (mean, mode, and median), measurers of dispersal (range, standard deviation, variants)
and distributions e.g. (percentages, frequencies), relationships (correlation)
Inferential statistics
Research the use of particular indices in order to draw influences about a given phenomena in a
population. Such inferences are based on the results from randomly selected samples.
The purpose of inferential statistics is to test hypothesis to enable the researcher to generalize the
results from the sample to the population.
Characteristics of statistical data
1. Are aggregates of facts e.g. total sales of a firm for a year
2. Are affected to a marked extent by a multiplicity of causes e.g. the volume of wheat
production in a given year depends on the number of factors such as poor rainfall, poor quality
seeds
3. Are numerically expressed e.g. population of Kenya arose to 3 million in the year 1966
4. Are estimated according to reasonable standards of accuracy e.g. 90% accuracy
5. Are collected in a systematic manner e.g. interviews, observation, questionnaire, sampling
6. Are collected in a pre determined purpose e.g. to determine population growth rate
7. Should be placed in relation to each other
Statistical methods
These are those devices by which complex and numerical data are so systematically view of
them. They include:
Collection of data
Organization of data
Presentation of data
Analysis of data
Interpretation of data
1. Collection of data
Collection of data constitutes the 1st step in any statistical investigation. Utmost care must be
exercised in collecting data because they form the foundation of statistical analysis.
If the data are faulty, the conclusions drawn can be unreliable. The data may be available from
existing published or unpublished work or sources or else may be collected by investigator
him/herself.
2. Data organization
Data collected from published sources are generally organized form. However, the large mass of
figures/data collected from a survey frequently needs to be organized.
The 1st step is editing; collected data must be edited very carefully so that omissions,
inconsistencies, inaccuracies and irrelevant answers or responses and wrong computations
(mathematical) from a survey may be corrected or adjusted.
After the data have been edited, the next step is to classify the data; classification is the process
of arranging data according to some common characteristics possessed by the items constituting
the data.
The last step is tabulation; the objective is to arrange data in columns and rows so that there is
absolute clarity of data presented.
3. Data presentation
After the data has been collected and organized, they are now ready for presentation; can be
presented by the use of tables, diagrams, charts etc.
Data presented in an organized manner facilitates statistical analysis.
4. Analysis
After collection, organization and presentation, the next step is analyzing the data. The focus
under this analysis is the methods used in analyzing the presented data.
The methods used are numerous ranging from simple observation to complicated, sophisticated
and highly mathematical techniques.
5. Interpretation
If the data analyzed are not properly interpreted, then the whole object of the investigation may
be defeated and fallacious conclusions drawn.
N/B: Correct interpretation would lead to correct conclusion hence the correct decision making.
Reasons for studying statistics/importance
Is an aid supervision – are helpful in controlling affairs of an organization; statistical
records maintain can be used to evaluate performance of employees and thus the
management of such organization can resign whether on the basis of there statistics, the
policies being implemented effectively or not.
Offers a basis for planning – statistical aids a base for future planning on the records of
accurate and relevant data we are able to prepare and develop plans for the expansion of
business organization and for the development of the country by the government.
Acts as the eyes of administration – statistics are required byb the government or any
organization to study the causes and find out remedies/solutions of various problems e.g.
government needs adequate statistical data in order to control crimes, produce
employment, improve the supply condition of water, electricity etc.
Acts as arithmetic of human welfare – statistics are used to understand the problems of
human beings such as poverty, food shortage, disease prevalence, illiteracy etc. that
cannot be understood without statistical data.
Helps disclose connection between related purposes – it indicates the connection between
related facts e.g. there is a relation between prices, demand and supply.
Functions of statistics
1. Simplifies data
It simplifies the complicated data and presents them in a manner so that they become intelligible
i.e. the complicated data may be reduced to totals, averages, percentages etc. and presented
graphically or diagrammatically.
2. Increases one’s experience
One major function of statistics is to enlarge individual’s experience or knowledge. Example.
One has to conduct an investigation and collect requisite data which will enable him/her to get
more clear and adequate information about that particular phenomenon.
DATA COLLECTION
Statistical data are usually collected from different sources and different methods, are adopted to
collect adequate and reliable data.
These data are collected to conduct some enquiries/to analyze some problems.
In order to collect the data a for a particular investigation, it is important to keep in mind the
following
Objectives and scope of inquiry
Nature and types of inquiry
Statistical units
Degree of accuracy
Statistical units
Units to be used for the collection of data are called statistical units and are of two types
Physical units
Arbitrary units
Physical units are those units used in day to day life to determine the measurements of given
objects e.g. to determine the value of Kenyan money in kshs, weight is litres/capacity/volume
etc.
Arbitrary are usually adopted by the investigators for their use in statistics e.g. wage rates,
literacy level etc. these units are to be defined by the investigator him/herself so that thay may
not be understood differently.
Statistical units should have the following qualities
Suit the purpose of inquiry
Be stable i.e. not be easily affected by other factors
Be homogenous i.e. uniform to allow general understanding
Defined correctly and clearly
Degree of accuracy
Is highly important while planning a statistical inquiry to determine what extent its accuracy is
requiring
The degree of accuracy is presided by taking into account the nature and type of the inquiry
undertaken, should always be static.
Sources of data
There are 2 main sources of data i.e.
Primary data
Secondary data
Primary data is that data collected for the 1 st time whether directly or indirectly by the
investigator him/herself.
Secondary data however on the other hand is data collected from already researched and
published articles by other individuals or agencies.
Methods of collecting primary data
They include observation, interviews, questionnaires, sampling.
Data required for a specific inquiry may not be available from the existing records. In such cases,
surveys are usually conducted to obtain the figure to the source; in order to collect primary data,
the following methods are adopted:
Observation
Interviews
Questionnaires
Sampling
Observation
In observation, investigator asks no questions but instead observes the objects or actions in
which he/she is interested. He/she observes and records the desired information.
In this method, investigator may either participate in some activities or watches in person
Advantages
Data collected are highly reliable
Relatively cheaper i.e. in comparison to other methods
Gives more relevant and accurate information
Disadvantages
May have unreliable/misinterpretation of the investigated
Consumes time and is tiresome
Presence of the investigator may make performer to behave/work in a different manner
that may not depict the reality
Results of observation may be different under different conditions depending in the
prevailing factors at that moment.
Interviews
According to this method, investigator interviews different people and ask them various
questions relating to the problem under investigation.
Investigator meets different persons i.e. face to face and collects the information form the
sources concerned.
This method is also referred to as direct personal observation.
Advantages
Immediate responses can be achieved
Where there is unclarity investigator can clarify
Information collected is usually reliable and accurate
Is good for intensive investigation
Gives sufficient information i.e. through probing
Disadvantages
Time consuming and thus not suitable for extensive inquiry
Tiresome and expensive
The bias on the part of investigator can damage the whole inquiry i.e. he/she may
influence the response of the interviewee
Sometimes the respondent/information may be reluctant to answer the questions
Questionnaires
According to this method, a standard list of questions relating to a particular investigation is
prepared. This list of questions is called a questionnaire.
The data are then collected by sending the questionnaire to the informers and requesting them to
return the questionnaires after answering the questions.
In order to make this method successful, a very polite letter should be sent to the informants and
emphasizing the need and usefulness of the content under investigation.
Also assurance should be given to them that the information given to them should be held secret.
At times enumerators would be appointed to go to the informants with the questionnaires and
help them in recording answers or responses.
Advantages
It is less expensive and helpful to collect information in a wide area however most of the
respondents may not take the trouble of filling in questionnaires and at times do not
return them.
Questionnaires are incharge of enumerators who are very useful for extensive inquiry
and no time is wasted in collecting questionnaire since enumerators collect them
personally
Features of a good questionnaire
The use of questionnaires provides a quick cheap method for conducting a survey or
investigation.
Drafting a questionnaire means to prepare questions which are to be asked to the concerned
persons
While preparing a questionnaire the following points should be taken into consideration
Short and clear – questions should be short and clear so that they can be easily
understood by the respondent. If some technical terms are used in the questionnaires,
their definitions could be given.
Questions should be few in numbers to avoid boredomness
Questions should be definite i.e. have specific responses/answers
Be non confidential – of non-confidential nature because most informants may not like to
answer questions relating to personal or confidential aspects of life.
Questions should be in a logical sequence i.e. being put in some logical order as well as
their responses. This will help analyze the data gathered easily or quickly
Be relevant to the problem under investigation
Sampling
Population – total number of items in a specific area of inquiry
Sample – few units of items out of the general population under inquiry
Units – individual items
According to the sampling methods, a few units form the whole population are selected and
studied on the results obtained from the basis of these units.
N/B: When samples are selected from a population, the unit selected must be at random i.e. each
unit of the population must have equal chance to be selected.
Advantages
Relatively cheap because only a small part of the whole population is studied
Saves a lot of time since data can be collected and analyzed quickly
A good quality of labour with better supervision can be provided because only part of
whole units is studied
Gives a more detailed information since only a few units are studied
Sources of secondary data
Journals and periodicals
Newspapers and magazines
Internet
Videos and tapes
Publications by private individuals research works
Reports by business concerns
Publications by research institutes like universities and colleges
Official publications by central and provincial governments
Editing the collected data
Editing is a means of scrutinizing or the examination of the collected data in order to find out
mistakes, errors and omissions.
In the editing, aspects of approximation, accuracy and errors are analyzed.
Accuracy
Perfect accuracy means to describe a phenomenon exactly as it is.
Absolute/perfect accuracy cannot be obtained and therefore in statistics, relative is required and
not absolute.
However, the degree of accuracy depends on the nature and purpose of inquiry and also the
materials of its measurement.
Approximation
Is the basis of rounding off the figures with a view to simple them without affecting of
reasonable accuracy.
Errors
Is a special meaning of statistics; mistakes in statistics mean incorrect presentation or calculation
due to human factors. They may occur in the collection of data, example the respondent may
mistakenly tick the “yes” instead of the “no” box.
An error on the other hand means difference between actual figure and the estimated figure.
This deviation is just by chance and not due to carelessness of human beings.
The main sources of statistical errors are
Errors fro origin – are errors that arise from faulty definition of the units, bias or erratic
trends in the data.
Errors in adequacy – are errors due to inadequacy of samples or incomplete information
Errors of manipulation – result while measuring, weighing and counting
unconsciousness.
Kinds of errors
There are two kinds of error i.e. sampling and non – sampling
Sampling errors
Sampling errors means the difference between the estimate of a value as obtained from the
sample and the actual value of the population. This arises because even when a sample is chosen
in a correct manner, it cannot exactly be representative of a population in which it is chosen. The
amount of the sample error will depend on the size of the sample i.e. the least the size of the
sample the smaller the size of the sampling error and vice versa.
Non-sampling errors
When errors arise due to other reasons other than sampling then they are known as non-sampling
errors and may arise due to the following reasons
i) Bias behaviour on the side of the investigator
ii) Haphazard selection of units
iii) Failure to cover all the units under study or items
iv) Use of approximate or estimated values
Measurement of errors
Errors are measured either absolutely or relatively
Absolute error is the difference between actual value and estimated value and is usually
expressed as
Ae = A – E where
Ae refers to the absolute error
A represent the actual value and
E represents the estimated value
Example
If you assume that the population of a town was estimated as 1,424,880 whereas the actual
population is 1,578,620, determine the absolute error
Ae = 1578620 – 1424880
=153,740
Relative error is the ratio between absolute error and the actual value and is expressed as
Re = A –E (Ae)
A
In the above example Re = 153740
1578620
=0.097389
=0.097
N/B Relative error can be converted to percentage to give the percentage error and thus
percentage error is the percentage of the relative error and thus from the above example
Percentage error = 0.097 * 100%
= 9.7%
Classification and tabulation of data
After the data have been collected, they must be organized or sorted in a form in which they are
easily understood. The organization of the data refers to classification and tabulation of data.
Collected data are mostly large in quantity and is necessary to organize the data in such a way
that further analysis and interpretation of data are made easily and correctly.
Data classification
Is the act of arranging data in groups and classes according to some resemblance of the data in
each group or class. In data classification, the elements which possess same characteristics are
grouped in one class.
L. N. Connor defines classification as the process of arranging things in groups or classes
according to their resemblances and affinity. Classification is thus th4e sorting of data into
homogenous groups according to their observed characteristics.
Types of data classification
Qualitative data
Classification may accord some attributes or quality such as religion, literacy, sex etc. is a
qualitative classification.
Quantitative data
Refers to classification of data to some characteristics that can be measured quantitatively e.g.
height, weight, age etc. In this, data are classified by assigning arbitrary limits i.e. class limits
Temporal classification
Is classification of data with respect to time
Spatial classification
Is classification of data with respect to space or place
Categories of qualitative data classification
Discrete data
Is that data that assumes only countable values i.e. there are no in termed into values between
conservative values of a discrete.
Continuous data
Is that data that can take any value within a continuous of measurement
Scales of measurement
Measurement may be defined as assignment of numerical to objects or events according to
certain rules. The use of different rules for the assignment of numerical leads to different types of
measurement of which in turn lead to different measurement scales
There are basic 4 scales of measurement and they include
Nominal scale of measurement
Is used when referring to qualitative data that can only be expressed as category; it entails
arbitrary assignment of numbers (process of category) to the different categories that make up a
variable
The different categories simply constitute a classification and cannot be ranked. Furthermore,
they cannot perform mathematical operations like addition, multiplication, division and
subtraction of such data.
Ordinal scale of measurement
Applies to data that can be divided into categories which can be ranked; however, in the case of
nominal scale of data, arithmetic operations are meaningless for ordinal scale.
Example: if you ask a sample of people of how satisfied they are with their jobs and presented
them in the following possible responsibilities then you will have the following ordinal scale.
a) Very satisfied
b) Fairly satisfied
c) Undecided i.e. neither satisfied nor dissatisfied
d) Fairly dissatisfied
e) Very dissatisfied
Interval scale of measurement
It applies to data that can be ranked and from which the difference can be valued e.g. scores in a
test
Data can contain zero point but it does not have any meaning
Ratio scale of measurement
Applies to data that can be ranked and from which arithmetical operations can be done.
Main reasons for classification
Helps us eliminate unnecessary details
Brings out clearly points of similarity and dissimilarity
Enables one to make comparisons and draw influences or conclusions
Helps an individual reflect on the important aspects of data
Helps in utilizing data for further statistical analysis
DATA TABULATION
Refers to systematic arrangement of the statistical data in columns and rows
The main advantage of data tabulation is that a large mass of data that is confusing to the mind
can be presented in a logical sequence giving the shape of statistical table that answers all
questions of the problem under investigation.
Advantages of tabulation
Tabulated data can be understood easily as compared to data given in narrative form
Comparisons between different classes of data can be made easily
The required data can be located easily
The unnecessary details are avoided
Tabulated data take less space
Principles of data construction
The format of a table depends upon the nature of data and purpose of constructing the table.
A table should be constructed in such a way that it achieves its purpose in the best possible
manner.
Principles followed
The table should be self explanatory
Each table should have a title
The size of the table should be suitable
The source of the data must be stated or quoted
The headings to the columns and rows should be clear
The sub total for separate classes must be given where appropriate
The columns whose data are to be compared should be placed in adjacent columns and
rows i.e. place them next to each other
The units of measurement used should be clearly mentioned
Example
In January 1995, a firm employed 90 staff whom 79 were men. During the year, 17 staff left, 13
was men. The total recruitment during the year was 13, 3of whom were women. During 1996,
wastage declined by 3 amongst men compared with 1995 and no woman left. 6 more men but 2
fewer women were recruited than in the previous year. The total number employed as at 1 st
January 1997 amounted to 13. Kindly arrange above information in a concise tabular form
showing relevant total and subtotals.
Title: Recruitment of staff for the year
Year: 1995 1996
Number of employees
Men 49 76
Women 11 10
Recruitment: Men 10 16
Women 3 1
Wastage: Men 13 10
Women 4 0
Recruitment of the staff for the year
1995 – 1997
1995 1996
Men women total men women total
Number of employees as at 1st Jan 79 11 90 76 10 86
Recruitment of the year 10 3 13 16 1 17
Wastage of the year 13 4 17 10 0 10
Total 76 10 86 82 11 93
=4
N/B: The class interval for different classes should be equal; if the class intervals are not equal
then these will mislead results.
Class limits
These are values which actually identify the groups; this should be representative of data to the
extent of possibility.
Frequency in each class
The value falling in a particular group will be called as a frequency. It should be calculated
carefully i.e. tally sheets method can be used. According to this method, the class intervals are
written on a sheet of paper called tally sheet then a stoke is marked for each time against the
class interval in which it lies.
Types of frequency distribution tables
There are 2 main types of frequency distribution table.
These include
Discrete series
Continuous series
Discrete series
Various limits are capable of exact measurement and each limit of the data is separate and
complete definite breaks are visible between different limits e.g. we can count the number of
persons whose salary are exactly 5000 per month; giving rise to (distribute) discrete series e.g.
Marks 11 12 13 15 16 17
Number of students 5 7 11 26 18 13
Continuous series
In these series statistical units are arranged in grouped or class because they are not exactly
measured and are only approximations.
Example of continuous series
Marks obtained Number of students
0 – 20 15
20 - 40 25
40 – 60 35
60 – 80 16
80 – 100 6
Cumulative frequency distribution
This is s distribution that shows cumulative frequency below the upper real limit of the
corresponding intervals. Cumulative frequency distribution aids in interpretation of frequency
and it also helps in obtaining the median and various percentile ranks of scores.
Diagrammatic and graphical presentation of data
Presentation of data
The main object of statistics is to simplify the complexity of the quantitative data to make them
easily intelligible. Diagrams and graphs help to understand the information in an easy and
comprehensive form. Business data like sales, production, prices etc. are frequently presented in
the forms of diagrams or graphs. A diagrammatic or graphical presentation of statistical data is
one of the simple devices which helps us to remove complexity and brings out main and
important features.
Diagrams; diagrammatic representation is best suited to spatial series and data split into
different categories whenever a comparison of the same type of data at different places is to be
made, diagrams will be the best way to do that.
Advantages of diagrams
1. They provide an easy and attractive means of representing data
2. They make the information contained in data readily intelligible
3. They facilitate comparisons
4. They save time and labours
5. They give an effective impression
6. They have great memorizing value as compared to mere figures
Limitations of diagrams
1. Diagrams do not give accurate result but rough idea
2. A technical – hand can construct a diagram so a common man cannot do this correctly
3. Comparison of diagrams cannot be made if the unit is not common or phenomena is not the
same
4. This method of presentation of data is very expensive
5. Many people are not accustomed to it and they generally do not attach much importance to it.
6. These can be misused very easily
Construction of diagrams
The following points must be kept in mind while constructing diagram
a) The diagrams should be neat and clean so that they may have an attractive impression on
the mind of the reader
b) A brief heading on the top of the diagram should be given so that the reader may create
an idea about the diagram before he studies.
c) The relative data should be given neat the diagram so it can give a correct view
d) The scale to be used should be suitable and mentioned on the right hand top or left hand
bottom.
e) All types of symbols used should be explained
Types of diagrams
a) One dimensional diagrams e.g. bar diagrams
b) Two dimensional diagrams e.g. rectangles, squares and circles
c) Pictograms and maps
Bar charts
In bar charts, data are represented by a series of bars. They may be on the following kinds
Simple bar charts
In simple bar charts, data are represented by a series of bars. The height or length of each bar
indicates the size of the figure represented.
The number of bars depends on the number of figurers.
The width of the bars is not taken into account and it should be uniform for all bars.
Component bar charts
These are also referred to as subdivided bar charts. These are like simple bar charts except that
the bars are subdivided into component parts.
This sort of parts is constructed when each total figure is built up from two or more component
figures.
They can be of the two kinds
Component bar chart (actual)
Percentage component bar chart
These charts are suitable when the changes are shown as separate bar charts adjoining each bar
represents the actual value of the component figure. These charts are suitable when totals of
components are not required.
Advantages of bar charts
They are easy to construct
They can depict data more easily/accurately
They can be used to indicate the sizes of component figures
Disadvantages
They are not more informative
They are restricted to three or four component figures only
Pie charts
A pie chart is a circle divided by radial lines into sections so that the area of each section is
proportional to the size of the figure represented.
It is particularly useful where it is desired to show the relative proportions of the figures that are
obtained to make up a single overall total. In order to construct a sub – divided circle, first find
out the various angles which represent the various sectors according to the formula
Angle of each component = 360 * component
Total components
When the angles of the various sectors are known, arrange them in ascending order of magnitude
and draw a circle. This circle is divided into different sectors according to those sectors.
Advantages of pie charts
a) A pie chart can be used where it is desired to show the relative proportion of the figures
that go to make up a single overall total.
b) Unlike the bar chart, it is not restricted to 3 or 4 component figures only
Disadvantages
A pie chart cannot be used effectively where a series of figures is involved
Changes in the overall total cannot be shown by changing the size of the pie chart
Pictograms
When relative values of items are represented by pictures they are known as pictograms. There
are 2 kinds of pictograms
1. Those in which the same picture, always the same size, is shown repeatedly. The value of
a figure represented is indicated by the number of pictures shown
2. Those in which pictures change in size; the value of a figure represented is indicated by
the size of the picture
The main use is to illustrate comparison between sets of data
Example
XYZ Ltd is manufacturers of 3 products – biscuits, bread and cakes
Year Biscuits Bread Cakes Total
1995 50 80 40 170
1996 60 100 50 210
1997 70 110 30 210
1998 90 120 50 260
i) Draw a simple bar chart
ii) Draw a component bar chart
iii) Draw a multiple bar chart
100%
90%
80%
70%
60%
30%
20%
10%
0%
1995 1996 1997 1998
250
200
150
Cakes (shs,000)
Biscuits (shs, 000)
100 Bread (shs,000)
50
0
1995 1996 1997 1998
Example 3
From the following information, construct a pie chart
Product Sales (shs, 000s)
A 200
B 150
C 100
D 150
TOTAL 600
Solution
To construct a pie chart, the number of degrees is calculated.
Total sales shs. 600 are shown as 360 degree
Product A 200 * 360
600
= 120%
Product B 150 * 360
600
=90%
Product C 100 * 360
600
=60%
Product D 150 * 360
600
=60%
Sales
PRODUCT A
PRODUCT B
PRODUCT C
PRODUCT D
Quadrant II
X negative –ve X positive +ve
Y positive +ve Y positive +ve
0
X X
All the points in these four quadrants indicate a relationship between two variables taken along X
and Y axes. The coordinates of a point determine its position in relation to the origin and axes. A
point has its own pair of coordinates and no other point can have that pair of coordinates. These
pairs of coordinates are shown in the brackets. The X coordinates are shown in the brackets. The
X coordinate is always shown first in the bracket.
Example 1
Plot the following values of X and Y on a plane paper
a) X = 3, Y = 5
b) X = -2, Y = 4
c) X = -3, Y =-2
d) X = 4, Y = -3
Functional relationships
Functional relationships between two variables exist when the change in one variable results in a
change in the other variable. For example, when a price of a commodity rises, its demand
decreases. It means there is a functional relationship between price and demand.
Whenever a point is located by means of coordinates, this point expresses a relation between two
factors. These factors are known as variables.
A variable may be defined as a measurable quantity which varies from one value to another.
The examples of variables are time, temperature, production, sales etc.
Variables may be of the following two types
a) Independent
b) Dependent
Independent variables
Are those variables which change arbitrary or the variables in case of which the cause of change
is not taken into consideration e.g. time is an independent variable.
Dependent variable
Variable whose value depends on the value of an independent variable; for example, the time is
an independent variable in a time series and other values like sales, production, prices etc.
change over a period of time are dependent variables. The independent variable is plotted along
X axis and dependent variable is plotted along Y-axis.
When the two variables X and Y are so related that the value of U can be calculated by a given
value of X then Y is said to be function of X. the derived function for change in such a
relationship is y = fx . If the values of independent variables are taken along X axis and the
corresponding values of the dependent variable along Y axis a graph representation of the
function will be obtained in the form of a curve.
We assume; Y =3x + 7
In this equation the value of Y increases by 3 due to increase in x by 1. It means y is the function
of x; this functional relationship can be shown on a graph by the help of a straight line or a
parabola.
Example 2
If y =5x + 4, then explain the functional relationship between X and Y by the help of a graph.
Solution
We can assume the following values of X and the corresponding values can be calculated
accordingly
X Y
-3 -11
-2 -6
-1 -1
0 4
1 9
2 14
3 19
The values of X and Y can be plotted as under
Characteristics of a graph
The graphs must give the correct impression
The graphs must have clear and comprehensive impression
The graph must not be overcrowded with curves
The curve must be discrete
The scale chosen along the X axis and Y axis must be suitable according to given data
The graphs must be neat and clean
Graph drawing conventions
The independent variable should be always taken along horizontal axis and dependent
variable along Y-axis.
The vertical scale should always start at zero. Where not possible, at definite break in the
scale is shown between zero and the next number.
The scale chosen must be such which could easily accommodate the whole data.
There are no hard and fast rules regarding the scale of two axes conventionally, it would
be convenient if X-axis is taken 11/2 times as compared to Y axis.
Equal distances on the Y-axis should mean equal absolute amounts for example, if one
company represents shs. 1000 sales revenues, two companies would represent shs. 2000
and three companies would indicate shs. 3000; likewise on X-axis = distances should
mean equal values of independent variable.
It should be remembered that against each value of independent variable given, there is a
corresponding value of dependant.
The rules regarding the joining of the points are;
(i) If the figures relate to continuous variable then the plotted points should be
connected by a smooth curve
(ii) If the variable is discrete then the plotted points should be joined by straight lines.
It is explained as under
The title head note and foot note are to be shown on the same manner as in a diagram
The scale captain to the X-axis is placed under the centre or the horizontal axis. The scale
caption is for the Y-axis is placed at the top of the Y-scale.
If more than one graph is plotted on the same graph then a different type of line should be
used for each graph e.g. a full line, dotted line, elashed line, dot cum dash line etc.
All lettering must be horizontal
The source of data must be given
Types of graphs
1) Time series graphs or histogram
2) Z – charts
3) Scatter diagrams
4) Some logarithmic graphs or ratio scale graphs
5) Lorenz curve
6) Graphs of frequency distribution
Time series graphs
In a time, values of a variable are given at a different period of time. When a graph of such a
series is drawn, it would give changes in the value of a variable with the passage of time. The
graphical presentation of such a series is called histogram.
Study is to have comparison to study the
i) Changes in one variable over a period of time
ii) Changes of two or more variables over a period of time
While constructing a histogram, time is taken along X-axis and the values along Y-axis then the
data is plotted and points are joined by means of straight lines to get the histogram.
Examples of time series
Population of a country over a specific period of time
Sales of a business enterprise over a period of one year
Prices of some specific commodities over a period of time
Temperature over a period of time
Example 3
Monthly sales of AB stores for the year 19 – 8 were as follows
Month Jan Feb. Mar April May Jun Jul Aug Sept Oct Nov Dec
Sales (000s) 50 40 60 70 50 80 100 90 110 80 70 120
Construct a graph from the above figures
Example 4
The following table gives the sales of a certain firm in 6 years
Draw a graph of the time series
Year 1991 1992 1993 1994 1995 1996
Sales (000s) 820 950 1000 950 900 1050
In this graph, false base line is required. When the fluctuations in a variable are relatively small
to its size then the definite break in the scale is shown between zero and the next number. In this
case, instead of showing the entire scale fro zero to the highest value involved, only as much as
in necessary for the purpose.
The portion which lies between zero and the lowest value of the variable is left out. This method
is termed as false line approach showing time series graphs.
Example 4
Z – Charts
A Z –chart is simply a time series chart incorporation three curves for
i) Individual monthly figures
ii) Monthly cumulative figures for the years
iii) A moving annual total
Z – Chart takes it name from the fact that 3 curves together tend to look like the letter Z.
A Z-chart is of great importance for representing business data over a period of one year. The
information given in a Z-chart can be explained as under
1) Monthly totals
This simply shows the monthly results at a glance together with and rising or falling trends and
seasonal variations.
2) Cumulative totals
This shows the performance to date and can be easily compared with planned or budgeted
performance.
3) Annual moving totals
This shows a comparison of the current levels of performance with those of the previous year. If
the time is rising then this year’s monthly results are better than the results of the corresponding
month last year and vice versa.
Some vertical separate scales are used to plot the monthly data and the data for the cumulative
and the moving annual totals. In some cases, the same vertical scale is used to plot the monthly
data and the data for the cumulative and the moving annual totals. The decision to take same
vertical scales should be made in view of the nature of the given data.
Example
The following are the sales of ABC Ltd for the years 1995 and 1996
Month 1995 1996
Jan 400 420
Feb 480 450
Mar 420 600
April 580 640
May 600 580
June 800 700
July 750 800
Aug 600 750
Sept 600 750
Oct 500 480
Nov 600 550
Dec 900 950
Construct a Z chart for the year 1996
Solution
Monthly cumulative totals can be easily obtained by adding the current months figure and
subtracting the corresponding last year’s figure to and from the proceeding month’s annual total.
In this example, the total sales of 1995 are 7180. In order to obtain the moving annual total at the
end of January 1996’s sales into 7180 and subtract from it, the sales of January 1995.
Scatter graphs
Scatter graphs are those graphs which are used to indicate the relationship between two
variables. The X-axis is used to represent the data of one variable and the Y-axis to represent the
data of one variable and the Y-axis to represent the data of other variables.
In order to construct a scatter graph or scatter diagram, we must have several pairs of two
variables. Each pair of these variables shows the value of one variable and the corresponding
value of the other variable. Each pair of data is plotted on a graph. The resulting graph will saw a
number of plotted pairs of data scattered over the graph.
Scatter graphs are usually drawn to indicate the relationship between two variables for this
purpose, a line of best fit is established from the scatter graph.
The line of best fit is that line from which the total deviation of the points plotted on scatter
diagram is minimum.
The line of the best fir indicates the relation or association between two variables. It is one way
of measuring correlation. In a scatter graph the line of the best fit is drawn approximately; this
line may have a rising or falling trend which shows positive and negative relationship between
two variables respectively.
Example 6
Sales and advertising expenditure of RST Ltd are given below for a period of 7 months.
Advertising expenditure (shs. 000’s) 20 25 30 35 40 45 50
Sales (shs. 000’s) 650 550 700 500 750 900 850
In this example, the advertising expenditure is taken along X-axis because it is independent
variable and sales are taken along the Y-axis as these are dependent variable.
It can be observed from the graph that the plotted data although scattered represent the rising
trend; it means the increase in advertising expenditure results in higher sales. This trend shows
there is a positive relationship between these two variables.
Semi – logarithmic graphs
A semi – logarithmic graph is that graph on which the vertical scale is logarithmic. It is also
known as ratio scale graph. These graphs are useful to study the relative movement instead of
absolute movement.
Semi – logarithmic graphs are usually used when:
1) Visual comparisons are to be made between series of greatly different magnitudes
2) The series are quoted in non – comparable units
3) The data are to be examined to see whether they are of change appears as straight line.
Ratio scale or semi- logarithmic can be constructed in 3 ways
By using semi – log graph paper
By using a slide rate
By plotting the logs of the variables
Actual values can also be shown on the vertical scale; zero has no log and should not be inserted
on the vertical scale of a semi – log graph.
In semi –log graphs, the horizontal scale is the same as an ordinary graph whereas the vertical
scale is the ratio scale or logarithmic values of the variable.
If the logarithmic curve is moving upward, it indicates that the rate of growth is increasing and
vice versa. If such a curve is straight line, it means the rate of growth is constant.
Example 7
The following are the profit of Pombe Breweries Ltd over the calendar year 1996
Month Profits in (shs.000’s)
Jan 10
Feb 11
Mar 13
April 15
May 15
Jun 18
July 16
Aug 19
Sept 20
Oct 17
Nov 18
Dec 24
Using the ordinary graph paper plot the time series for the profits using the logarithm values or
ratio scale
Solution
Month profits (shs) logarithmic value
Jan 10 4.0
Feb 11 4.0
Mar 13 4.1
April 15 4.2
May 15 4.2
June 18 4.3
July 18 4.3
Aug 19 4.3
Sept 20 4.3
Oct 17 4.2
Nov 18 4.3
Dec 24 4.4
Note: profit from January is 10,000shs so the characteristics are 4 and so on.
Lorenz curve
This is a graph to measure dispersion; it was devised by Dr. Lorenz to measure inequalities of
wealth distribution. So an important use of the Lorenz is in the measurements of the extent to
which income is unevenly distributed between the various income groups. The disparity of
proportion is a common economic phenomenon; this disparity can be demonstrated by the help
of Lorenz curve.
A Lorenz curve is constructed as follows
Write down the values of the two variables being plotted
Express the values as percentage of the total
Compute the cumulative percentage of each variable
Draw a horizontal and vertical axis and plot 0% to 100% on each axis
Mark the cumulative axis percentage on the graph and join the points together by a free
hard curve. This is a Lorenz curve.
Draw a line of equal distribution by joining 0% to the 100% point by a straight line.
If the Lorenz curve is away the line of equal distribution, there is greater disparity or inequality
and vice versa.
Example 8
The following figures are taken from survey on “Business Prospects for 1996”
Maize flour sales
Number of establishments Net output (shs.)
23 104
26 450
24 860
19 1350
14 2190
6 3125
Draw a Lorenz curve using the above data
Graphs of frequency distribution
The graphs of frequency distribution of continuous type are as under
Ogive curve
Histogram
Frequencial polygon
Frequency curve
Ogive curve
An ogive curve is the name given to the curve obtained when the cumulative frequencies of a
distribution are graphed. It is also called cumulative frequency curve. The following steps are
adopted to construct an ogive;
1) Compute the cumulative frequency of the distribution; prepare a graph with the
cumulative frequency on the vertical or is and class intervals on the horizontal axis.
2) Compute the cumulative frequency of the distribution
3) Prepare a graph with the cumulative frequency on the vertical axis and class intervals on
the horizontal axis.
4) Plot a starting point at zero on the vertical scale and the lower class limit of the 1 st class.
5) Plot the cumulative frequencies on the graph at the upper class limits of the classes to
which they refer
6) Then join all these points by the help of a curve.
An ogive curve is used to find out the values of median, quartile, deciles and percentiles
graphically.
Example 9
From the following information, draw an ogive curve
Class frequency
0 – 10 5
10 – 20 10
20 – 30 15
30 – 40 8
40 – 50 7
Solution
To draw an ogive curve, the frequency is to be converted into cumulative frequency as follows
Class frequency cumulative frequency
0 – 10 5 5
10 – 20 10 15
20 – 30 15 30
30 – 40 8 38
40 – 50 7 45
Mark cumulative frequencies (c.f) on the graph paper c.f of each group is marked against upper
limit of the respective group
Percentage Ogive
In a percentage ogive, the cumulative frequencies are shown in percentages.
A percentage is constructed as under
(i) Find out percentages of frequencies of the distribution
(ii) Find out the cumulative percentage frequencies
(iii) Take the cumulative frequencies along the vertical axis and the limits along the
horizontal axis
(iv) Mark the respective percentage cumulative frequencies against the upper limits
(v) Join these points by the help of a curve
The percentage cumulative graph is normally used to compare the ogive curve of two
distributions with different number of items.
Example 10
The following is the age distribution of employees in Natex industry.
Age group in years Number of employees
Less than 15 20
15 – 25 80
25 – 35 200
35 – 45 120
45 – 55 60
Over 55 20
Draw a percentage ogive curve
Solution
Age group percentage cumulative percentage
5 – 15 20 4 4
15 -25 80 16 20
25 – 35 200 40 60
35 – 45 120 24 84
45 – 55 60 12 96
55 – 65 20 4 100
Histogram
Is a graph that represents the class frequencies in a frequency distribution by vertical rectangles.
This consists of a series of rectangles having a base measured along the X – axis.
Proportional to the class interval and an area proportional to the frequency; where the class
intervals are not equal. The frequencies are reduced according to the ratio between different class
intervals and the results are known as frequency density. Histogram is used to find the value of
mode graphically.
Example 11
Draw a histogram from the following data
Wages (sh) Number of workers
0 – 10 15
10 – 20 17
20 - 30 19
30 – 40 25
40 – 50 16
50 – 60 15
60 – 70 13
70 – 80 10
80 – 90 5
90 – 100 2
Solution
Take the wages along Y-axis and number of workers on X-axis and draw the histogram as given
in the graph.
In the histogram, the frequency of nay particular class interval is marked against the lower limit
before the maximum frequency is marked against lower limit and upper limit or that particular
class interval.
Frequency polygon
Is drawn from the histogram by joining the mid point at the height of the class interval
rectangles. The frequency polygon gives the area histogram as is left out from inside. The points
to draw frequency polygon will be joined with the help of ruler. The frequency polygon
sometimes is used to derive mode.
Example 12
Draw a histogram and superimpose frequency polygon from the following data.
Marks Number of students
0–5 7
5 – 10 8
10 – 15 15
15 – 20 16
20 – 25 19
25 – 30 13
30 – 35 12
35 – 40 10
40 – 45 5
45 – 50 2
Solution
Draw the histogram as usual and then join the mid points at the height of the class intervals
rectangles as shown in the graph
Frequency curve
Can be drawn on the same lines as frequency polygon but the midpoints at the height of class
interval rectangles will be joined smoothly and by running off the top through the trend principle.
It means that the difference between frequency curves is of joining the points.
Example 13
Draw a histogram from the data given below and superimpose frequency curve
Wages per day (sh) Number of employees
0 – 10 7
10 -20 7
20 – 30 8
30 – 40 9
40 – 50 15
50 – 60 20
60 – 70 18
70 – 80 13
80 – 90 10
90 – 100 4
Solution
Draw a histogram as usual and then join midpoint of the class interval rectangles at the height
smoothly.
MEASURES OF CENTRAL TENDENCY
All individual observations comprising a set of data exhibit a tendency to cluster or centre
together. Generally they tend to be closer to a particular value than others. This peculiar
characteristic of data is referred to as central tendency. By its very nature, the value around
which individual observation comes to cluster is called central value.
We therefore define central tendency as an index of central location used in the description of
frequency distributions.
There are 3 principle measures of central tendency and includes
Arithmetic mean
Median
Mode
The other less relative much measures are geometric mean and harmonic mean.
Quality of good measures of central tendency
Should be rigidly defined
Should be based on whole values
Should be easily understood and calculated
Should be least affected by the fluctuations of sampling
Should be capable of further algebraic or statistical treatment
Should be least affected by extreme values
Arithmetic mean
Is the sum of scores or values of a variable divided by the number of scores. Is the most
frequently used measure of central tendency
Arithmetic mean for populated data is denoted by MI while that of sample data is typically
denoted by X, usually expressed as
X = X1 + X2 + X3 ……………………
N
i.e sum of all items X
N
Where Xis the arithmetic mean
N is the number of items in the series
X1, X2, X3……………………………..Xn is the values of items in that series
Computation of arithmetic mean
Given the value 7, 28, 11, 25 and 10, compute the arithmetic mean
7 + 28 +11 + 25 + 10
5
= 81
5
= 16.2
Obtaining arithmetic from a grouped frequency distribution
i) Obtain the mid points for each class (Mi)
ii) Multiply each midpoint by the corresponding frequency of the class to obtain Mifi
iii) Sum the products of the Mifi for all the classes
iv) The arithmetic is obtained by dividing the sum obtained above i.e. the sum of the Mifi
by the sum of all frequencies
Example
Given the following information, compute arithmetic mean
Expenditure of food (f) Number of respondents
0–5 2
5 – 10 6
10 – 15 8
15 – 20 12
20 – 25 10
25 – 30 4
30 – 35 2
X Mi fi Mifi
0–5 2.5 2 5.0
5 – 10 7.5 6 45.0
10 – 15 12.5 8 100.0
15 – 20 17.5 12 210.0
20 – 25 22.5 10 225.0
25 – 30 27.5 4 110.0
30 – 35 32.5 2 65.0
Fi = 44 mifi = 760
Calculating arithmetic mean by assumed mean formula
Under assumed mean method, a specific value from the given value is assumed as mean and is
known as provisional mean or assumed mean i.e. M Or AM
The differences between the values of various otems of the series and the assumed mean are
known as deviations denoted by d.
In this method, arithmetic mean is obtained as
X = AM + summation of fd
Summation of f
Where AM is the assumed mean
d is the deviations from assumed mean X
summation of fd is the sum of deviations from assumed mean X frequencies
summation of f is the sum of the frequency
1. Calculate Arithmetic mean from the following data using the assumed mean
Values Frequency
5 20
10 43
15 75
20 67
25 72
30 45
35 39
40 9
45 8
50 6
2. Calculate the arithmetic mean from the data below using the AM method
Marks Number of students
0 – 20 5
20 – 40 7
40 – 60 13
60 – 80 8
80 – 100 7
Advantages of Arithmetic mean
Can be easily understood
Takes into account all the items of the series
Is not necessary to arrange the data first and then calculate arithmetic average
Is capable of algebraic treatment
Is a good method for comparison
Is not indefinite i.e. its value can be determined
Disadvantages
Is affected by extreme values to a great extent
May be a figure which does not exist in a series e.g. 8, 9, 10, 11 = 9.5
Cannot be calculated if all the items in a series isn’t known
Cannot be used in cases of qualitative data
Properties of Arithmetic Mean
The product of arithmetic mean and number of items in a given series equals to the sum of all
given values. It can be expressed as
Since X = sum of the frequencies
N
Therefore nX = summation of X
The sum of the squares of deviation from arithmetic mean is the least. Example in a series of
values 3, 5, 7, 9 X = 6
The Median
Is the middle value in ordered array in a given set of data i.e. median is the value of the middle
item of a series when the values are arranged in ascending or descending order.
If the total number of observations is odd then median is the middle value. However, if the total
number of observations is even, then the median is the average of the two middle observations.
Example
Given the following data series, determine the median i.e. 5, 19, 37, 39, 45
Procedure
Arrange scores in ascending
Determine the number of observations given as N
If N is odd choose middle value
When N = odd number
Therefore median = (n+1)th term
2
e.g. 5, 19, 37, 39, 45
However when N is even number
Median value ½ (nth + (n+2)th term
2 2
Calculating median in a discrete series
In the discrete series the items are 1 st arranged in ascending or descending order of magnitude
and their corresponding frequencies are written against them. The frequencies accumulated and
then the value of the middle item is located.
Example
The following data relate to sizes of shoes sold at a store during a given week. Find the median
size
Size of shoes (x) Number of pairs (f) Cumulative frequency (c.f)
4.5 1 1
5.0 2 3
5.5 4 7
6.0 5 12
6.5 15 27
7.0 30 57
7.5 60 117
8.0 95 212
8.5 82 295
9.0 75 369
9.5 44 413
10.0 25 438
10.5 15 453
11.0 4 457
Computation of median or grouped frequency distribution
The formula for computing median from continuous data series is given as
Median = Li + I (m – c)
f
where = lower class boundary/limit of median class
I = class interval of the median class
F = frequency of the median class
M = the middle tem
C = cumulative frequency of the class or the group proceeding the median class
Advantages of median
Is easy to calculate
Simple and easy to understand
Is less affected by the value of extreme items
Is especially useful in the study of those phenomena which are qualitative nature
Disadvantages of median
Is not suitable representative of series in most of the cases
Is not suitable for further algebraic treatment
Cannot be determined exactly in the case of a continuous series