0% found this document useful (0 votes)
16 views

Stat 1

Uploaded by

Tesfahun Tegegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Stat 1

Uploaded by

Tesfahun Tegegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

GUIDELINE-1 :

DEFINITION, SCOPE AND LIMITATIONS OF


STATISTICS IN RELATION TO BUSINESS

Structure
1.0 Objectives
1.1 Introduction
1.2 Origin and Growth of Statistics
1.3 Definitions of Statistics
1.4 Scope of Statistics
1.5 Limitations of Statistics
1.6 Summary
1.7 Glossary
1.8 Unit End Questions
1.9 Readings

1.0 Objectives
a) To present importance, origin and growth of statistics,
b) To understand various definitions of statistics,
c) To acquaint the student with the scope of statistics,
d) To present the limitations of statistics.
1.1 Introduction
The word ‘Statistics’ conveys a variety of meanings to people. To some, Statistics is an
imposing form of mathematics, whereas to some others, it suggests tables, charts and figures.
Each day we are exposed to a wide assortment of numerical information, which often has a
profound impact on our lives. For example, we come across statements like,” there are 932
females per 1000 males in India, whereas in the U.S.S.R, there arc 1,170 females per 1,000
males.” Numbers play an essential role in statistics. They provide the raw material of statistics.
These materials must be processed to be useful, just as crude oil must be refined into petrol
before it can be used by an automobile engine. Hie study of Statistics involves methods of refining

6
numerical and non- numerical information into useful forms. Whenever numbers are collected
and compiled, regardless of what they represent, they become statistics. In other words, the
term statistics is considered synonymous with numbers, figures or data.

In addition to meaning data,’Statistiacs’ also refers to a subject. In this sense, Statistics is


a body of methods of obtaining and analyzing data in order to base decisions on them. Thus the
word ‘Statistics’ refers to either quantitative information or a method of dealing with quantitative
information. In the first reference, it is used as a plural noun, the statistics of births, deaths,
imports exports etc. In the second reference, the word is used as a singular statistics deals with
the collection, presentation, analysis and interpretation of the quantitative information.

Statistical methods and principles have found applications in many fields, business, social
sciences, engineering, natural and physical sciences. Infant, modern age is the age of Statistics.
Almost every aspect of human and natural phenomena and other activity is now subjected to
measurement and interpretation in terms of statistics. Statistical methods are powerful and important
tools with which we can interpret the complex world of today. Statistics provides a means for
presenting information in meaningful and understandable form. It is a vehicle for transmitting and
storing information. Statistics serves as the informational segment of decision making process.
Management at all levels is guided by facts obtained through analysis of records rather than upon
knowledge obtained merely through personal observation and experience.

1.2 Origin and Growth of Statistics

It may be interesting to point out that Statistics is not a new discipline but as old as the
human society itself. It has been used right from the existence of life on earth, though its use was
very much limited.

1.2.1 Origin of Statistics

In the good old days Statistics was regarded as the ‘science of state craft’ and was the
by-product of the administrative activity of the State. It has been the traditional function of the
governments to keep records of population, births, deaths, taxes, crop yields and many other
types of activities. Counting and measuring these events may generate many kinds of numerical
data.

Census of population and wealth were taken even in the ancient times. According to a
Greek historian, in 1400 B. C., a census of all lands in Egypt was taken. Similar reports on the

7
ancient Chinese, Greeks and Romans are also available. People and land were the earliest object
of statistical enquiry.

The word ‘Statistics’ comes from the Italian word ‘statista’ or the German word ‘statistik’
each of which means a Political State. Professor Gottfried Achenwall inl 749 to refer to the
subject matter as a whole first used it. Achenwall defined Statistics as “the political state of the
several countries”.

1.2.2 Growth of Statistics

Although Statistics originated as a science of kings, there has been a phenomenal


development in the use of Statistics in several fields. Statistics is now regarded as one of the most
important tools for taking decisions in the midst of uncertainty. In fact, there is hardly any branch
of science today that does not make use of Statistics. The following are the two main factors,
which are responsible for the development of statistics in modern time.

a) Increased demand for Statistics

In the present century, considerable development has taken place in the field of business
and commerce, governmental activities and science. Statistics helps in formulating suitable policies,
and as such its need is increasingly felt in all these spheres. Taking the case of business, not only
has the magnitude of business considerably increased but also growing size of business has made
its problems more complex.

b) Decreasing cost of Statistics

The time and cost of collecting data are very important limiting factors in the use of
Statistics. However, with the development of electronic machines, such as calculators, computers
etc, the cost of analyzing data has considerably gone down. This has led to increasing use of
Statistics in solving various problems. Moreover, with the development of statistical theory the
cost of collecting and processing data has gone down..

1.3 Definition of Statistics

According to J.M.Keynes, Achenwall of Germany is accredited with, for giving Statistics


status of a separate science; it was essential to accord it a definition. However, there are as many
definitions of statistics as there are authors. In 1869 Quetelet collected 180 definitions of Statistics.
In the last few decades, Statistics has assumed great importance; therefore, there has been
change in the form and number of definitions also. A few definitions are analytically examined
below.
3

8
1.3.1 Statistical Data:

Quantitative or numerical information may be found almost everywhere in business,


economics and many other areas. It is probably more common to refer to data in quantitative
forms as statistical data. But not all numerical data is statistical and hence it is necessary to
examine a few definitions of Statistics to understand the characteristics of statistical data.

Webster defined statistics as, “the classified facts representing the conditions of the
conditions of the people in a state especially those facts, which can be stated in numbers or in
tables of numbers in any tabular or classified arrangement”.

The above definition is too narrow as it confines |he scope of Statistics to only such facts
and figures, which relate to the conditions of people in a State.

Yule and Kendall defined Statistics as” By Statistics we mean quantitative data affected
to a marked extent by multiplicity of causes”.

This definition is less comprehensive than the one given by I lorace Secrist who defined
Statistics as follows.

“By statistics we mean aggregate offsets affected to a marked extent by multiplicity of


causes, numerically expressed, enumerated or estimated according to reasonable standards of
accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to
each other.

This definition clearly points out certain characteristics, which numerical data must possess
in order that they may be called statistics. These are as follows:

i) Statistics arc aggregates of facts:

Single and isolated figures are not statistics for the simple reason that such figures are
unrelated and cannot be compared. To illustrate, if it is stated that the income of Mr. X is Rs.
3,00,000 per annum, this would not constitute statistics although it is a numerical statement of
fact.

ii) Statistics are affected to a marked extent by multiplicity of causes:

Generally speaking, facts and figures are affected to a considerable extent by a number
of forces operating together. For example, statistics of production of paddy are affected by
4

9
rainfall, quality of soil, seeds and manure, method of cultivation etc. It is very difficult to study
separately the effect of each of these forces on the production of paddy .The same is true of
statistics of prices, imports, exports, sales, profits etc.

iii) Statistics are numerically expressed:

All statistics are numerical statements of facts, i.e., expressed in numbers. Qualitative
statements such as ‘the population of India is rapidly increasing’, or ‘the production of wheat is
not sufficient’, or ‘India is a poor country’ do not constitute statistics.

iv) Statistics are enumerated or estimated according to a reasonable standards of


accuracy:

Facts and figures about any phenomenon can be divided in two ways, viz., by actual
counting and measurement or by estimate. Estimates cannot be as precise and accurate as actual
counts or measurements. The degree of accuracy desired largely depends upon the nature and
object of the enquiry. For example, in measuring height of persons even 1 / 10 th of a cm is
material whereas in measuring distance between two places, even fraction of a kilometer can be
ignored.

v) Statistics are collected in a systematic manner:

Before collecting statistics, a suitable plan of data collection should be prepared and the
work carried out in a systematic manner. Data collected in a haphazard manner would very likely
lead to fallacious conclusions.

vi) Statistics are collected for a predetermined purpose:

The purpose of collecting data must be decided in advance. The purpose should be
specific and well defined. A general statement of purpose is not enough. For example, if the
objective is to collect data on prices, it would not serve any useful purpose unless one knows
whether he wants to collect data on wholesale or retail prices and what are the relevant
commodities in view.

vii) Statistics should be placed in relation to each other:

It numerical facts are to be called statistics, they should be comparable. Statistical data
are often compared period wise or region wise. For instance, the population of India at a particular
point of time may be compared with that of earlier years or with the population of other countries.

10
Valid comparisons can be made only if the data are homogeneous, i.e., relate to the same
phenomenon or subject.

1.3.2 Statistical Methods:

The large volume of numerical information gives rise to the need for systematic methods,
which can be used to organize, present, analyze and interpret the information effectively. Statistical
methods are primarily developed to meet this need. Different writers too have defined the term
Statistics in this sense differently. A few definitions are examined below.

Prof A.L.Bowley has given some definitions. At one place he says, “Statistics may be
called the science of counting.” This definition is too narrow because it covers only one aspect of
the science, namely, the collection of data. Other aspects like analysis, presentation, interpretation
etc are completely ignored.

At another place Bowley says,” Statistics may rightly be called the science of averages.”
This definition also is not satisfactory because averages are only one of the devices used in
statistical analysis. The other devices like dispersion, skewness, correlation etc., are not at all
covered by this definition.

Boddington defines Statistics as “ The science of estimates and probabilities”. This definition
is also incomplete because estimates and probabilities are only a part of statistical methods

According to Berenson and Revin,” The science of statistics can be viewed as the
application of the scientific method in the analysis of numerical data for the purpose of making
rational decisions”.

Croxton and Cowden have given a very simple and concise definition of Statistics. In
their view, “Statistics may be defined as the collection, presentation, analysis and interpretation
of numerical data”. This definition clearly points out four stages in a statistical investigation, viz.,
collection, presentation, analysis and interpretation of data.

However to the above stages, one more stage may be added and that is the organization
of data Thus, Statistics may be defined as the science of collection, organization, presentation,
analysis and interpretation of numerical data.

According to the above definition, there are five stages in a statistical investigation.

i) Collection of data:

Collection of data constitutes the first step in statistical investigations: Utmost care must
be exercised in collecting data because they form the foundation of statistical analysis. If data are
6

11
faulty, the conclusions drawn can never be reliable. The data may be available from existing
published or unpublished sources or else may be collected by the investigator himself.

ii) Organization of data:

Data collected from published sources are generally in unorganized form. However, a
large mass of figures that are collected from a survey frequently needs organization. The first step
in organizing a group of data is editing. The collected data must be edited very carefully so that
the omissions, inconsistencies, irrelevant answers and wrong computations in the returns from a
survey may be corrected or adjusted. After the data have been edited, the next step is to classify
them. The purpose of classification is to arrange the data according to some common characteristics
possessed by the items constituting the data. The last step is tabulation. The purpose of tabulation
is to arrange the data in columns and rows so that there is absolute clarity in the data presented.

iii) Presentation of data:

After the data have been collected and organized, they are ready for presentation. Data %

presented in an orderly manner facilitate statistical analysis. There are two different methods in
which the collected data may be presented viz., diagrams and graphs.

iv) Analysis of data:

After collection, organization and presentation, the next step is that of analysis. .The
purpose of analyzing data is to dig out information useful for decision-making. Methods used in
analyzing the presented data are numerous, ranging from simple observation of the data to
complicated, sophisticated and highly mathematical techniques like, measures of central tendency,
measures of dispersion, correlation, regretion etc.

v) Interpretation of data:

The last stage in statistical investigation is interpretation, i.e., drawing conclusions from
the data collected and analyzed. The interpretation of data is a difficult task and necessitates a
high degree of skill and experience. Correct interpretation will lead to a valid conclusion of the
study and they can aid one in taking suitable decisions.

It may be noted that over the past few decades the primary emphasis on Statistics has
been to develop procedures that can be used to deal with uncertainty. This modern conception
7

12
of the subject is a far cry from the one usually held by layman. Indeed even the pioneers in
statistical research have adopted it only with in the past three decades.

1.4 Scope of Statistics:

The scope of Statistics is so vast and ever expanding that not only it is difficult to define
it but also unwise to do so. Statistics pervades all subject matter, its use has permeated almost
every facet of our lives. It is a tool of all sciences indispensable to search and intelligent judgment
and has become a recognized discipline in its own right. There is hardly any field whether it is
trade, industry or commerce, economics, biology, astronomy, physics, chemistry, education,
medicine, sociology, or meteorology where statistical tools are not applicable.

1.4.1 Statistics and the State:

Since ancient times the ruling kings and chiefs have relied heavily on Statistics in framing
suitable military and fiscal policies. Most of the statistics such as that of crimes, military strength,
population, and taxes etc, which were collected by them, were a by-product of administrative
activity. In recent years the functions of the State have increased tremendously. The concept of a
State has changed from that of simply maintaining law and order to that of a welfare state.
Statistical data and statistical methods are of great help in promoting human welfare.

1.4.2 Statistics and Business:

The problems of the business enterprises are becoming complex due to the growing size
and competition. They are using more and more statistics in decision making. However, the
employment of statistical methods in the solution of business problems belongs almost exclusively
to the 20 th century. In earlier days when business firms were small, owners of the firms were
directly engaged in almost all the areas of business activity. With the growth in the size of business
firms it has often become impossible for the owners to maintain personal contact with the thousands
and lakhs of customers. Management has become a specialized job and a manager is called
upon to plan, organize, supervise and control the operations of the business house. Since very
little personal contact is possible with customers these days, a modern business firm faces a
much greater degree of uncertainty concerning future operations. A businessman who has to deal
in an atmosphere. Of uncertainty can no longer adopt the method of trial and error in taking
decisions. The businessman has to apply statistical methods systematically in order to deal with
the uncertainty. In recent years it has become increasingly evident that Statistics and statistical
methods have provided the businessman with one of his most valuable tools for decision-making.

13
Business activities can broadly be grouped under the following heads, viz., Production, Sales,
Purchase, Finance, Personnel, Accounting, Marketing and Product research and Quality control.

1.4.3 Statistics and Economics:

Statistical data and statistical methods are of immense help in the proper understanding
of the economic problems and in the formulation of economic policies. In fact, these are the tools
and appliances of an economist’s laboratory. Statistics of production help in adjusting to supply
to demand. Statistics of consumption enable us to find out the way in which people of different
strata of society spend their income. Such Statistics are very help full in knowing the standards of
living and taxable capacity of the people.

Statistical methods help not only in formulating appropriate economic policies but also
evaluating their effect. For example, in order to check the ever-growing population, if emphasis
has been placed on the family planning methods, one can ascertain statistically the efficacy of
such methods in attaining the desired goal. In recent years, econometrics, which comprises the
application of statistical methods to the theoretical economic methods, is widely used in economic
research. Statistical methods of sampling are useful for collecting the basic data of economic
studies. Statistical methodology also indicates the reliability of the data and the significance to be
attached to them.

1.4.4 Statistics and Physical Sciences:

The physical sciences, especially astronomy, geology and physics, were among the fields
in which statistical methods were first developed and applied. But, until recently these sciences
have not shared the 20th century developments of Statistics to the same extent as the biological
and social sciences. Currently, however, the physical sciences seem to be making increasing use
of Statistics, especially in astronomy, chemistry, engineering, geology, meteorology and certain
branches of physics.

1.4.5 Statistics and Natural Sciences:

Statistical techniques have proved to be extremely useful in the study of all natural sciences
like astronomy, biology, medicine, zoology, botany, etc: For example, in diagnosing the correct
disease, the doctor has to rely heavily on actual data like temperature of the body, pulse rate,
blood pressure. Similarly in judging the efficiency of a particular drug for curing a certain disease,
experiments have to be conducted and the success or failure would depend upon the number of
people who are cured after using the drug.
9

14
1.4.6 Statistics and Research:

Statistics is indispensable in research work. Most of the advancement in knowledge has


taken place because of experiments conducted with the help of statistical methods. For example,
experiments about crop yields and different types of fertilizers and different types of soils or the
growth of animals under different diets and environments are frequently designed and analyzed
with the help of statistical methods. Statistical methods also affect research in medicine and
public health.

1.4.7 Statistics and Other Uses:

The significance of Statistics in some important fields has been discussed above. Besides
these, Statistics is useful to bankers, brokers, insurance companies, social workers, labor unions,
and trade associations, chambers of commerce and to the politicians. For example, the banks
have to make a very careful study of the cash requirements, otherwise, they may find they are
short of cash and their existence is at stake. Similarly, the premium rates of the life insurance
companies are based upon very careful study of the expectation of life.

1.5 Limitations of Statistics:

In spite of so many merits, Statistics is not completely free from certain limitations. Unless
the data are properly collected and critically interpreted, there is every possibility of drawing
wrong conclusions. Therefore, it is also necessary to know the limitations and the possible misuses
of Statistics. The following are the important limitations of Statistics

a) Statistics does not deal with individual measurements:

Since statistics deals with aggregate of facts, the study of individual measurements lies
outside the scope of Statistics. Data are statistical when they rotate to measurement of masses,
not statistical where they rotate to an individual item or event as separate entity. For example, the
wage earned by an individual worker at any one-time taken by itself is not a statistical datum. But
the wages of workers of a factory can be used statistically.

b) Statistics deals only with quantitative characteristics:

Statistics are numerical statement of facts. Such characteristics as cannot be expressed


in f& numbers are incapable of statistical analysis. Thus qualitative characteristics like honesty,
efficiency, intelligence, etc., cannot be studied directly.
10

15
c) Statistical results are true only on an average:

The conclusions obtained statistically are not universally true. They are only under certain
conditions. This is because Statistics as a science is less exact as compared to natural sciences.

d) Statistics is only one of the methods of studying a problem:

Statistical tools do not provide the best solution under all circumstances. Very often, it is
necessary to consider a problem in the light of a country’s culture, religion and philosophy.

e) Statistics can be misused:

The greatest limitation of Statistics is that it is liable to be misused. The misuse of Statistics may
arise because of several reasons. For example, if statistical conclusions are based on incomplete
information, one may arrive at wrong conclusions. Statistics are like clay and they can be moulded
in any manner so as ,to establish right or wrong conclusions. They very fact that it may lead to
wrong conclusions in the hands of inexperienced people limits the possibility of mass popularity
of such a useful science.

1.6 Summary:

The term Statistics is used both in singular and plural sense. In plural sense it represents
figures or data. In singular sense it represents statistical methods. The word Statistics comes
from the Italian word ‘statista’ or the German word ‘statistik’ which means a Political State:
Although Statistics originated a$ a science of kings, there has been a phenomenal development in
the use of Statistics in several fields. Statistics is now regarded as one of the most important tools
for taking decisions under uncertainty.

Different writers both in terms of data and methods have defined Statistics. Statistical
methods are being applied in almost every field government policies, business, economics, physical
sciences, natural sciences, research and many other fields. As in the case of other branches of
knowledge, Statistics has also certain limitations. Proper understanding of the subject must be
there in order to make use of it in full advantage.

1.7 Glossary:

1. Data: Data refers to any group of measurements that happens to interest us. these
measurements provide information the decision maker uses.
11

16
2. Statistics: Statistics is the use of data to help the decision maker reach better decisions.

3 Qualitative Dale: Qualitative data reflect non-numeric figures or qualities of experimental


units.

4. Quantitative Data: Data that possess numerical properties are known as quantitative
data.

5. Variable: A variable is a characteristic that may take on different values at different times,
places or situations.

1.8 Unit End Questions:

1. Distinguish between ‘Statistics as data’ and ‘Statistics as method’.

2. Discuss various definitions of statistics in terms of data as well as a method.

3. Discuss the nature and scope of Statistics,

4. What are the limitations of Statistics?

1.9 Readings:

1. S. P. Gupta: Statistical methods.

2. B.N.Gupta: Statistics.

3. R.I.Levin& D.S.Rubin: Statistics for management.

***************

12

17
GUIDELINE -2:
SOURCES OF DATA AND METHODS OF DATA
COLLECTION

Structure :
2.0 Objectives
2.1 Introduction
2.2 Primary Sources of Data
2.3 Methods of Collecting Primary Data
2.4 Secondary Sources of Data
2.5 Classification of Sources of Secondary Data
2.6 Choice between Primary and Secondary Data.
2.7 Summary
2.8 Glossary
2.9 Unit End Questions
2.10 Readings

2.0 Objectives:
a) To present the sources of data
b) To distinguish between primary and secondary* data
c) To highlight the methods of collecting primary data
d) To identify methods of collecting secondary data
2.1 Introduction
Collection of the required and relevant data is the first step in any statistical investigation.
Utmost care must be exercised while collecting data because data constitute the foundation on
which the superstructure of statistical analysis is built. The results obtained from the analysis are
property interpreted and policy decisions are taken. Hence, if the data are inaccurate and
inadequate the whole analysis may be faulty and the decisions taken misleading.
13

18
Data may be collected either from the primary source or the secondary source. A primary
source is one that itself collects the data. A secondary data is one that makes available data,
which were collected by some other agency. For example, the data collected by the Ministry of
Industries and made available through various publications constitute primary source However, if
the Ministry of Industries uses data collected by some other organisation, say, National Sample
Survey Organisation, this will constitute secondary source for the Ministry.

2.2 Primary Sources of Data

Primary data are information collected or generated by the researcher for the purposes
of the project. Primary data are obtained by a study specifically designed to fulfill the data needs
of the problem at hand. For example, an investigator wants to know about the level of job
satisfaction .iijoyed by the teachers in a University, He can prepare a schedule and meet a sample
member of teachers and ask for their opinions. This is going to be the information collected for
the purpose of this study and hence becomes primary in character. When the data are collected
for the first time, the responsibility for the processing of data also rests with the original investigator.
Ordinarily, experiments and surveys constitute the principal sources of primary data. In order to
understand the nature of primary sources of data better, we have to consider the advantages and
disadvantages of primary data.

2.2.1. Advantages of Primary Data:

The following are some of the important advantages of this type of data.

i) Primary data are the first hand account of the situation. We can observe the phenomenon
as Mistaking place.

ii) There is greater scope for reliability of the information. As the investigator collects the
data for himself he can take all precautions to ensure the reliability of data.

iii) Primary data are the logical starting point for research in several disciplines. Unless
someone gathers and accumulates fact or information, there is no body of knowledge.

iv) For the purpose of knowing opinions, personal qualities, etc., primary data are the only
source.

14

19
2.2.2 Disadvantages of Primary Data:

The chief disadvantages are as follows:

i) Collection of primary data is expensive in terms of both time and money. To accumulate
the needed data, we may have to spend, sometimes, years too. It is for this reason,
individual researchers try to limit their scope to a manageable level, unlike the studies
undertaken by research organisations.

ii) There is greater scope for bias of the researcher. Unless the research investigator is fair
to the respondents and methods of data collection, the results of the study will not be
reliable.

iii) Sample selection is yet another problem in the collection of primary data. If the conclusions
of the study are to be meaningful, the researcher must select a representative sample.
But the selection of such representative sample is not an easy task.

iv) The limitations of methods of collecting primary data turn out to be disadvantageous for
this source. For example, limitations of observation technique like non-cooperation of
respondents, non-observability of the situation, low reliability of conclusions etc, become
the disadvantages of the primary source of data.

2.3 Methods of Collecting Primary Data:

The primary data are the information generated to meet the specific requirements of the
investigation at hand. As such, the investigator is required to collect data separately for the study
taken up by him. The primary data are collected by different methods such as, direct personal
interviews, indirect oral interviews, information from correspondents, mailed questionnaires,
schedules and observation methods. The methods one discussed in detailed below.

2.3.1 Direct Personal Interviews:

Personal interview is one of the techniques of data collection through primary sources. It
is a verbal method of securing data in the field surveys. Information are obtained by conversing
with the respondents. Though talking has importance in the conduct of interview, it is not a simple
two way conversation between the researcher and the respondent. Gestures, glances, facial
expressions and pauses often reveal the subtle feelings of the respondents.

Interview is the most direct means of conducting enquiry into any research problem.
Since many of the social science research problems involve the personal contact of the respondents,
15

20
interview is, probably, the only method through which the researcher would be able to establish
such direct contact.

2.3.2 Indirect Oral Interviews:

Under this method of collecting data, the investigator contacts third parties called witnesses
capable of supplying the necessary information. The method is generally adopted in those cases
where the information to be obtained is of a complex nature and the informants are not inclined to
respond if approached directly. For example, in an enquiry regarding addiction to drugs, alcohol,
etc., people may be reluctant to supply information about their own habits. It would be necessary
in that case to get the desired information from those dealing in drugs, liquor or other people who
may be knowing them, for example, their neighbours, friends, etc. Enquiry Committees and
Commissions appointed by the Government generally adopt this method to get people’s views
and all possible details of facts relating to the enquiry.

2.3.3. Information from Correspondents:

Under this method, the investigator appoints local agents or correspondents in different
places to collect information. These correspondents collect and transmit the information to the
central office where the data are processed. News paper agencies generally adopt this method.
Correspondents in different places supply in formal ion relating to such events as accidents, riots,
strikes, etc., to the head office. This method is also adopted by various departments of Government
in such cases where regular information is to be collected from a wide area. This method is
particularly suitable in case of crop estimates. The special advantage of this method is that it is
cheap and appropriate for extensive investigation.

2.3.4. Mailed Questionnaire:

A questionnaire is a tool or device for securing answers to the set of questions by the
respondent who fills in the form of questionnaire himself. It is a systematic compilation of questions
that are submitted to a sampling of population from which information is desired. The questions
are normally arranged in a sequence depending on the nature of the study and are administered
for reply. Usually the questionnaires are posted to the respondents and hence they are called
mailed questionnaires.

In this process, a questionnaire is normally cyclostyled or printed with details of objects,


or purpose of investigation. The respondents would be asked to complete the questionnaire and

16

21
post it to the address given within a specified time period. This method of data collection is
preferred when the investigator cannot reach the respondents in person either because of the
cost involved or there is no particular reason to see them personally.
Questionnaire is the more popular method of data collection used in social science research.
It is the cheaper method of collecting data, as the investigator is not required to approach the
respondents personally. He can simply post the questionnaire and appeal lo the respondents to
return it in time. The questionnaire helps in collecting areas of data and not very extensive bodies
of data. It is effective because the respondents are able to express their reactions clearly with
greater openness as there is less fear when there is no immediate listener.

2.3.5. Schedules Sent Through Enumerators:

A schedule is a list of questions, which helps to collect data from the Held. This is generally
filled in by the enumerator or the researcher. He sits with the respondent face to face and fills up
the data sheet by asking him the questions. According to Goode and Matt, schedule is the name
usually applied to a set of questions which are asked and filled by the i nterviewer, in face to face
situation with another. One can get accurate and first hand information through this method. As
the researcher meets the respondents in person, he can talk to them, explain to them the utility of
the study and convince them to cooperate with him, in making the study meaningful.

Data collection under this method proceeds in a systematic manner. The investigators or
enumerators proceed to the field with the schedules and administer them on the sample, selected
by them. They goon asking the questions incorporated in the schedule and note down the responses
of the respondents. If there is any difficulty, the enumerators are supposed to assist respondents
for overcoming the difficulty. As such, the quality of the data depends on the people who go to
the field and collect the data. Normally in case of individual researchers they themselves meet
every respondent and collect the data.

2.3.6. Observation:

Observation, in simple terms, is defined as watching the things with some purpose in
view. According to Goode and Matt, science begins with observation and must ultimately return
to observation for its final validation. In the words of P.V.Young, it is a systematic viewing coupled
with consideration of the seen phenomenon. In yet another way. observation is defined as the
process of recognizing and noting people, objects and occurences rather than asking tor
information.

17

22
Observation is one of the cheaper and more effective techniques of data collection. For
example, instead of asking consumers what brands they buy or what television programmes they
view, a belter alternative may be to simply observe what products are bought and what programmes
are viewed. This approach to the collection of information is as old as the human race. In the
fields of commerce and economics, observation of the prices, markets and capital flows is more
a common activity, which serves as a good example for probing into the behaviour of the
phenomena. Observation is considered to be a handy tool in marketing research.

2.4. Secondary Sources of Data:

Secondary data refer to the information that have been collected by someone other than
a researcher for purposes other than those involved in the research project at hand. Books,
journals, manuscripts, diaries, letters etc., all become secondary sources of data as they are
written or compiled for a separate purpose. The researcher depending on his necessity and
relevance may use the data, findings or results incorporated in these documents.

There are various factors such as the nature of the study, status of the investigator,
availability, of financial resources, time and degree of accuracy of the results desired, that decide
the choice of the sources of data.

2.4.1 Advantages of Secondary Data:

When compared to primary data, secondary data have the following advantages.

i) Economy is clearly the greatest advantage of secondary data. Instead of printing data
collection forms, hiring held workers, sending them to the places throughout the field
area, data tabulation and analysis, we can get ready results from the secondary data
compiled by somebody else.

ii) Besides economy, quickness in data is another factor associated with secondary data.
For example, we required factual position regarding the reasons for absenteeism in the
Indian industry. If we start collecting data on this, perhaps, it may take one to two years.
But the same can be obtained with the help of secondary data in a few days.

iii) Another greater advantage of secondary data is that they provide information that may
not he secured by the individual researcher. For example, a research organisation like
National Institute of Public Finance and Policy is studying the problems of tax evasion in

18

23
India and the measures to curtail the same. Probably the organisation with its credibility
tor conducting such studies may be able to gather information from the Ministry of Finance
and Central Board of Direct faxes, which would be difficult to obtain by an individual
researcher.

2.4.2. Disadvantages of Secondary Data:


Though secondary data have some advantages, there arc many disadvantages associated
with.it.
They are as follows:
i) Quite often, secondary data do not satisfy the needs of the study. Basically, secondary
data result from a study undertaken by some one with a specific purpose. As such, the
results and data incorporated in that study may not be suitable for the study at hand of a
researcher. This is the reason why every research organisation wants to generate its own
source of information.
ii) Inaccuracy and unreliability are the two other factors, which are limiting the utility of the
secondary data. Most of the times the researcher is confused about the accuracies of the
data available to him. He could not decide to what extent he can rely on such data. In
such circumstances, the researcher may go into the details like the organisation that
collected the data and the purpose of its collection.
iii) The method used for data collection by the organisation or individual researcher may not
be precise and apt to the situations, rendering the whole data questionable.
iv) Not only the methods of data collection, but also their analysis has to be observed
carefully. If the conclusions drawn are not based on the information collected, or they
lack sophistication, the utility of the secondary data is limited.
v) Another danger with secondary data is that the researcher often tries to report preliminary
data as final and fail to in corporate revised data when they become available.
2.5. Classification of Sources of Secondary Data:
Sources of secondary data may be classified broadly as: Internal and External.
2.5.1. Internal Sources:
Internal source of data represents the data that are already available with the research
organisation or company. In case of a research organisation, data might have been collected for

19

24
some other purpose. For example, Central Statistical Organisation is collecting data on savings,
consumption, investment etc. These can be used for several purposes. It all becomes internal to
the organisation. Similarly, in case of a company, information are compiled on several items like
sales, cost, assets, liabilities, profits, production etc. The data compiled usually for record purpose
may be used in several studies undertaken by itself or by an outside research organisation.
2.5.2 External Sources:

Much of the secondary data is external in nature. All that is available with outside
organisations falls into this category. By all means, the sources of external information are quite
numerous and vast. These external sources-again can be divided into two, namely personal and
public sources.

A. Personal Sources:

This is the information compiled by the individuals. An individual may record his views.
thoughts about himself, others, society, etc., for his own sake or as a memory. Whatever be the
reason for recording one’s own thoughts, several people do carry out this activity. These sources
would be helpful to a researcher who is probing into personalities, history, events etc.

Personal sources may be of the following four kinds.

i) Autobiographies or Life Histories:

Usually people who have some stature and dedicated their lives for a particular cause
think it useful to record their life and experiences for the posterity. These autobiographies of life
histories written by others, serve as use fill documents to highlight a particular point. For example,
the autobiographies of freedom fighters like Mahatma Gandhi, Netaji, Jawaharlal Nehru, etc.,
will serve as better sources.

ii) Diaries:

Many have the habit of recording their daily events in the form of a book. They write not
only their own activities, but also their view point on the issues, reactions, of the people and many
other things. Diaries of important people are also published. These serve to be the most important
source of knowing the life history of a person and the contemporary society if they have been
written continuously over a long period.

20

25
iii) Letters:

Letters also occupy an important place in the personal source of information. Many of us
know that freedom fighters like Gandhi, Nehru, Patel wrote several letters to their leaders, relatives
discussing various aspects. The letters of Radha Krishnan are widely acknowledged in the discipline
of philosophy. These help the researchers to understand more inmate aspects of an event and to
clarify the stand taken by them regarding that aspect. Letters are helpful in giving an idea of the
attitudes of a person and trend of his mind.

iv) Memoirs:

Some people write memoirs of their travel important events of their life and,other significant
phenomena that they come across. These memoirs provide useful material in the study of many a
social phenomena. Memoirs are different from diaries, in the sense that they describe only some
events and are more elaborate than the diary.

B. Public Sources:

Much of the material for business research is obtained from these sources. These are
termed public for the reason that deals with issues rather than lives and histories of the people.
These sources are again divided into two categories, viz., unpublished and published.

i) Unpublished Sources:

For several reasons, though the matter is of public interest, they are not published for the
benefit of many. They would be available only at the place of their origin. Theses submitted to
universities by several researchers come into this category Till they are published by himself or by
some outside agency, the copy of the thesis would be available with the researcher and with the
library. Besides these, the proceedings of committees, minutes of meetings, nothings on tiles are
all unpublished sources of information. Though unpublished, these sources have their own uses.

ii) Published Sources

There are a variety of published sources from which one can get the required data. A
brief description of such published sources is given below.

a) Books:

Books are significant among the published sources of information. Ever since man secured
superiority over the other beings in nature, he was in search of codifying what he learnt and
21

26
practiced. Lot of information can be obtained from books edited consisting of papers submitted
in seminars, conferences etc.

b) Journals:

Besides books, journals provide a lot of research material to an investigator. Journals are
the rapid source of communication compared to a book. These supply material of current interest
and help to arouse discussion on a subject. The general tendency in the publication of journals is
that they relate to a particular subject like commerce and management. On the other hand, they
may be devoted to the development of a particular aspect in the discipline like foreign trade,
banking, marketing, personnel etc.

c) News Papers:

Useful data can be obtained from news papers published daily or otherwise. The Economic
Times, Financial Express, Business Standard are the three important dailies that are regarded as
highly useful to the research students in commerce, economics and management. These three
newspapers carry articles, news, and other information on several matters connected with these
three disciplines mainly. Occasionally they also bring out special issues focusing on the specific
topics of interest. Besides, they have introduced weekly features to cover special areas like
management, taxation, finance, personnel, advertising, small industry and entrepreneurship.

d) Reports of the Government Departments:

In our country, every Government department attached to the Ministry bring out annual
and other periodical reports on the working of its department. These are later published as status
reports. For instance, the Departments of Agriculture, Industry, Education, Public Enterprises,
bring out reports on the working of the establishments, institutions, undertakings functioning under
their jurisdiction. The Publications Division of the Government of India is always busy in publishing
the same for the use of the departments, offices etc.

e) Reports of the Government Bodies and Autonomous Organisations:

Several independent bodies have come up in India by virtue of constitutional provisions,


ordinances and legislation. While carrying out their objectives for which they are created, their
also publish their operations in the form of reports. These reports serve as authentic material on

22

27
the subject probed into by them. Important among such bodies are the Office of the Comptroller
and Auditor General of India, Committee on Public Undertakings and Public Accounts Committee.
Besides, we have autonomous corporations created through a separate enactment like Reserve
Bank of India, Industrial Credit and Investment Corporation of India etc. Reserve Bank of India
and other special financial institutions are bringing out various publications which serve as useful
sources of data.

f) Publications of Research Organisations, Centers, Institutes, etc.

There are various non-profit organisations established for the purpose of promoting
academic pursuit like National Council for Applied Economic Research (NCAER), National
Institute for Public Finance and Policy (NIPFP), National Institute for Educational Planning and
Administration (NIEPA), National Institute of Bank Management (NIBM), National Institute of
Personal Management (N1PM), Indian Institute of Foreign Trade (IIFT), Institute of Public
Enterprise (IPE). Some more other organisations are National Institute of Rural Development
(NIRD), National Institute for Small Industry Extension Training (N1S1ET). Besides, there are
four management institutes and over 150 university departments. These autonomous organisations
bring out from time to time various publications in the name of monographs, occasional papers,
research bulletins, abstracts etc. All these serve as a useful reference to the research students.

2.6 Choice between Primary and Secondary Data.

In any statistical investigation, the researcher has to select between primary and secondary
data very carefully. In most of the studies both the sources may be used. But the choice of the
data depends on the nature and purpose of the study. The following are said to govern the choice
of the sources of data.

i) Nature and Scope of Enquiry:

This is the first and foremost factor that decides the type of data to be collected. For
example, in order to understand the perception and factors influencing entrepreneurship among a
particular community, one has to collect primary date. On the other hand, the trends in the capital
market activity can be adequately presented with the help of secondary data.

ii) Availability of Time and Money :

Sometimes, time and money also decide the nature and scope of the study to be conducted.
These two considerations have their own impact as to the choice of a particular source of data.

23

28
If one cannot spend much of his time and money, he prefers to avoid primary sources and
depend on the secondary sources.

iii) Degree of Accuracy Desired :

If one wants to ensure high degree of accuracy and unquestionable findings, he may have
to collect data from the primary sources. As indicated already, secondary data are those collected
by some body for the purposes other than those of the investigator. Such data may or may not be
suitable to the requirements of the present problem.

iv ) Status of the Investigator:

Status implies whether the researcher is an individual, a corporation, a Government


department or a research organisation. The status of investigator becomes material in terms of
time, money and organizational resources at his command.

2.7 Summary:

There are two sources of collecting data viz., primary and secondary. If the investigator
collects data himself or with the assistance of somebody, they are called primary data. If the data
arc already collected by some other agency and make them available, they are called secondary
data. There are certain advantages and limitations in both primary and secondary data.

The primary data are collected by different methods such as, personal interviews, oral
interviews, information from correspondents, mailed questionnaires, schedules and observation
methods. Books, journals, news papers, government reports, publications of research organisations,
all become secondary sources of data. The choice between selection of primary and secondary
data depends on nature and scope of the study, availability of time and money, degree of accuracy
desired, and status of the investigator.

2.8 Glossary :

1. Primary source: It is one that itself collects the data.

2. Secondary source: It is one that makes available data collected by some other agency.

3. Investigator: Investigator is a person who collects the information.

4. Respondent: A person who fills the questionnaire or supplies the required information for
a schedule.

24

29
2.9. Unit end Questions:

1. What are primary and secondary sources of data? Explain the relative merits and limitations
of these two sources of data.

2. What is the importance of primary data? Explain different methods of collecting primary
data.

3. What is the importance of secondary data? How can you classify various sources of
secondary data?

2.10. Readings :

1. K. V. Rao: Research Methodology in Commerce and Management.

2. S.P.Gupta: Statistical Methods.

3. D.C. Sancheti & V.K.Kapoor: Statistics -Theory, Methods and Applications.

******************

25

30
GUIDELINE-3 :
CLASSIFICATION AND
TABULATION OF DATA

Structure

3.0 Objectives

3.1 Introduction

3.2 Classification

3.3 Types of Classification

3.4 Tabulation of Data

3.5 Types of Tables

3.6 Summary

3.7 Unit end Questions and Exercises

3.8 Readings

3.0 Objectives

a) To identify various types of classification.

b) To acquaint the student with preparation of discrete and continuous frequency distributions.

c) To present different parts of a table

d) To examine different types of tables.

3.1 Introduction

The collected data are usually contained in schedules and questionnaires. But they are
not in an easily understandable from. The answers will require some analysis if their salient points
are to be brought out. As a rule, the first step in the analysis is to classify and tabulate the
information collected. In case published data have been collected, the investigator has to rearrange
these into new groups and tabulate the new arrangement. In case of some investigations, the
classification and tabulation may give such a clear picture of the significance of the material
26

31
arranged that no further analysis is required. Although the phrase “classification and tabulation”
has been used, classification is, in effect, only the first step in tabulation, for in general, items
having common characteristics must be brought together before the data can be displayed in
tabular form.

3.2 Classification :

Classification is the process of arranging data in groups or classes on the basis of common
characteristics. Data having a common characteristic arc placed in one class, and in this way the
entire data get divided into a number of groups or classes. Classification of data is a function very
similar to that of sorting letters in a post office. It is well known that letters collected in a post
office are sorted into different lots on a geographical basis, i.e., in accordance with their
destinations such as Chennai, Kolkata, Mumbai, Delhi etc. They are then put in separate bags,
each containing letters with a common characteristic, viz., having the same destination. Classification
of statistical data is comparable to the sorting operation.

3.2.1 Objectives of Classification

The principal objectives of classifying the data arc:

i) To condense the mass of data in such a manner that similarities and dissimilarities can be
readily apprehended.

ii) To facilitate comparison.

iii) To pinpoint the most significant features of the data at a glance.

iv) To give prominence to the important information gathered while dropping out the
unnecessary elements.

v) To enable a statistical treatment of the material collected.

3.3.1 Types of Classification

Broadly, the data can be classified on various bases, viz., geographical, chronological,
qualitative and quantitative.

27

32
3.3.1 Geographical Classification
In this type of classification, data are classified on the basis of geographical or locational
differences between various items like, states, regions, cities, zones, areas etc. For example,
procurement of rice in India may be presented state wise in the following manner.
State wise procurement of Rice in 2004-05

State / UT Quantities (Lakh tonns)


Andhra Pradesh 39.04
Bihar 3.43
Chattisgarh 23.37
Haryana 16.62
Maharastra 2.05
Orissa 15.90
Punjab 91.06
Tamilnadu 6.52
Utter Pradesh 29.71
West Bengal 9.44
Others 4.69
Total 246.83
Source: Economic survey, 2005-06, P.93.
Geographical classifications are usually listed in alphabetical order for easy reference.
Items may also be listed by size to emphasize the important areas as in ranking the status by
population. Normally in reference table the first approach is followed and in summary tables the
second approach is followed.
3.3.2 Chronological Classification
When data are observed over a period of time the type of classification is known as
chronological classification. For example, we may present the production of tea in India for the
last eight years as follows.

28

33
Stale wise Procurement of Rice

Year Production
(Million Kgs.)

1997-98 835.6

1998-99 855.2

1999-00 836.8

2000-01 848.4

2001-02 847.4

2002-03 846.0

2003-04 850.5

2004-05 830.7

Source: Economic survey, 2005-06, P. 161.

Time series are usually listed in chronological order, normally starting with the earliest period,
when the major emphasis tails on the most recent events, a reverse time order may be used.

3.3.3.1 Qualitative Classification:

Qualitative classification refers to classification according to attributes. The qualities or


characteristics of human beings which arc not capable direct quantitative measurement are called
attributes. These attributes are descriptive such as sex, literacy, marital status, employment. We
have to note that in this type of classification, the attributes under study cannot be measured
directly, but one can only find out whether it is present or absent in the units of the population
under study.

When only are attribute is studied two classes are formed, one possessing the attribute
and the other not possessing the attribute. This type of classification is known as simple classification.
For example, the population under study may be divided into two categories as follows.

29

34
Population

 
Urban Rural

In a similar manner, we may classify the population on the basis of sex, i.e. into males and
females, or literacy, i.e., into literates and illiterates and so on. The type of classification where
only two classes are formed is also called two fold or simple classification.

If instead of forming only two classes we further divide the data on the basis of some
attribute or attributes so as the form several classes, the classification is known as manifold
classification. For example, we may first divide the population into males and females on the
basis of the attribute sex; each of these classes may be further subdivided into literates and
illiterates on the attribute literacy. Further classification can be made on the basis of some other
attribute, say, employment. An example of manifold classification is given below.

Population

 
Males Females

   
Literates Illiterates Literates Illiterates

       
Emp. Unemp. Emp. Unemp. Emp. Unemp. Emp. Unemp.

3.3.3.2 Quantitative Classification (Formation of a Frequency Distribution)

Quantitative classification refers to the classification of data according to some


characteristics, which can be measured such as height, weight, income, sales, production etc.
For example, the students of a college may be classified according to weight as follows.

30

35
Weight No. of Students

90-100 50

100-110 200

110-120 260

120-130 360

130-140 90

140-150 40

Total 1000

Such a distribution is known as empirical frequency distribution or simply frequency


distribution.

In this type of classification, there are two elements, namely (i) the variable, i.e., the
weight in the above example, and (ii) the frequency, i.e., the number of students in each class.
There were 50 students having weight ranging from 90 to 1001b, 200 students having weight
ranging from 100 to 1101b, and so on. Thus we can find out the ways in which the frequencies
are distributed.

A frequency distribution refers to data classified on the basis of some variable that can be
measured such as prices, wages, age, number of units produced or consumed. The term ‘variable’
refers to the characteristic that varies in amount or magnitude in a frequency distribution. A
variable may be either continuous or discrete. A continuous variable is capable of manifesting
every conceivable fractional value within the range of possibilities, such as the height or weight of
persons or the weight of a product. Thus, in a continuous variable data are obtained by numerical
measurements rather than counting. For example, when a student grows, say, from 90 cm. To
150cm., his height posses through all values between these limits.

A discrete variable is that which can very only by finite “jumps” and cannot manifest
every conceivable fractional value. For instance, the number of rooms in a house can only take
certain values such as 1, 2. 3 etc. Similarly, the number of machines in an establishment are
discrete variables. Generally speaking, continuous data are obtained through measurements,
while discrete data are derived by counting. Series which can be described by a continuous

31

36
variable arc called continuous series. Series represented by a discrete variable are called discrete
series. The following are two examples of discrete and continuous frequency distributions.

No. of Children No.of Families Weight (lbs) No.of Persons

0 10 100-110 10

1 40 110-120 15

2 80 120-130 10

3 100 130-140 45

4 250 140-150 20

5 150 150-160 4

6 50 Total 134

Total 680

Discrete frequency distribution Continuous frequency distribution

Although the theoretical distinction between continuous and discrete variation is clear
arid precise, in practical statistical work it is only an approximation. The reason is that even the
most precise instruments of measurement can be used only to a finite number of places. Thus,
every theoretically continuous series can never be expected to How continuously with one
measurement touching another without any break in actual observations.

A. Formation ot a Discrete Frequency Distribution

The process of preparing this type of distribution is very simple. We have just to count
the number of times a particular value is repeated which is called the frequency of that class In
order to facilitate counting, prepare a column of tallies. In another column, place all possible
values of variable from the lowest to the highest. Then put a bar (vertical line) opposite the
particular value to which it relates. To facilitate counting, blocks of five bars arc prepared and
some space is left in between each block. We finally count the number of bars and get frequency.

Illustration I: In a survey of 35 families in a village, the number of children per family was recorded
and the following data obtained.

1, 0, 2, 3, 4, 5, 6, 7, 2, 3, 4, 0, 2, 5, 8, 4, 5, 4, 6, 3, 2, 7, 6, 5, 3, 3, 7, 8, 9, 7, 9, 4, 5, 4, 3

32

37
Represent the data in the form of a discrete frequency distribution.

Solution:

Frequency distribution of number of children

No.of Children Tallies Frequency

0 11 2

1 1 1

2 1111 4

3 1111 1 6

4 1111 1 6

5 1111 l 5

6 111 3

7 1111 4

8 11 2

9 11 2

Total 35

B. Formation of Continuous Frequency Distribution

This type of classification is most popular in practice. The following technical terms are
important when a continuous frequency distribution is formed, or data are classified according to
class intervals.

i) Class Limits

The class limits are the lowest and the highest values that can be included in the class. For
example, take the class 10-20. The lowest value of the class is 10 and the highest 20. Thus 10 is
called lower limit and 20 is called upper limit of that class. The Way in which class limits are
stated depends upon the nature of the data.

ii) Class Intervals:

The difference between the upper and lower limit of a class is known as class interval of
that class. For example, in the class 100-200, the class interval is 100. An important decision
33

38
while constructing a frequency distribution is about the width of the class interval, i.e., whether it
should be 10, 20, 50, 100, 500 etc. The decision would depend upon a number of factors such
as the range in the data, i.e., the difference between the largest and smallest item, the number of
classes to be formed etc. A simple formula to obtain the estimate of appropriate class interval i.e.,
C is

Where L  largest item


L -S
C S  Smallest item
K
K  the number of classes

The question now is how to fix the number of classes, i.e., K. The number can be’ either
fixed arbitrarily keeping in view the nature of the problem under study or it can be decided with
the help of Sturges Rule. According to him, number of classes can be determined by the formula
: K=l +3.322 log N, where N=total number of observations.

iii) Class Frequency:

The number of observations corresponding to a particular class is known as the frequency


of that class. It we add together the frequencies of all individual classes, we obtain the total
frequency.

iv) Class Mid Point or Class Mark:

It is the value lying half way between the lower and upper class limits of a class interval.
Mid point of a class is ascertained as follows:

Upper limit of the class  Lower limit of the class


Mid point of a class 
2

For the purpose of further calculations in statistical work, the mid point of each class is taken to
present that class.

v) Types of Class Intervals:

There are two types of classifying the data according to class intervals, viz., exclusive
and inclusive.

34

39
a) Exclusive Method: When the class intervals are so fixed that the upper limit of one class is the
lower limit of the next class, it is known as the exclusive method of classification. The following
data are classified on this basis.
Income (Rs.) No. of Persons
1000-1100 50
1100-1200 100
1200-1300 200
1300-1400 150
1400-1500 40
1500-1600 10
Total 550
It is clear that the exclusive method ensures continuity of data in as much as the upper
limit of one class is the lower limit of the next class. Even though it is widely used method, some
times we may get confusion. If an item 20 occurs, then doubt will arise that in which class either
10-20 or 20-30, the item is to be included. However, it is always presumed that the upper limit
is exclusive, i.e. the item 20 should be included in 20-30 class.
b) Inclusive Method: Under the inclusive method of classification the upper limit of one
class is included in that class itself. The following example illustrates the method.
Income (Rs. No.of Persons
1000-1099 50
1100-1199 100
1200-1299 200
1300-1399 150
1400-1499 40
1500-1599 10
Total 550
To decide whether to use the inclusive or the exclusive method, it is important to determine
whether the variable under observation is a continuous or discrete one. In case of continuous
variables the upper limit exclusive method must be used. In general, the inclusive method should
be used in case of discrete variables.
Illustration 1: Prepare a frequency table for the following data with width of each class
interval as 10. Use exclusive method of classification.
35

40
57, 44, 80, 75, 00, 18, 45, 14, 04, 64, 72, 51, 69, 34, 22, 83, 70, 20, 57, 28, 96, 56,
50, 47, 10, 34, 61, 66, 80, 46, 22, 10, 84, 50, 47, 73, 42, 33, 48, 65, 10, 34, 66, 53, 75, 90,
58, 46, 38, 69
Solution:
Class Interval Tallies Frequency
0-10 11 2
10-20 1111 5
20-30 1111 4
30-40 1111 5
40-50 1111 111 8
50-60 1111 111 8
60-70 1111 111 7
70-80 1111 5
80-90 1111 4
90-100 11 2
Total 50

3.4 Tabulation of Data :


One of the simplest and most revealing devices for summarizing data and presenting
them in meaningful fashion is the statistical table. A table is a systematic arrangement of statistical
data in columns and rows. Rows are horizontal arrangements whereas columns are vertical ones.
The purpose of a table is to simplify the presentation and to facilitate comparisons. The simplification
results from the clear cut and systematic arrangement, which enables the reader to quickly locate
desired information. Comparison is facilitated bringing related items of information close together.
3.4.1 Significance of Tabulation
Tables make it possible for the analyst to present a huge mass of data in a detailed
orderly manner within a minimum of space. Because of this, tabular presentation is the cornerstone
of statistical reporting.
The significance of tabulation will be clear from the following points.

36

41
i) It simplifies complex data.

When data are tabulated, all unnecessary details and repetitions are avoided. Data are
presented systematically in columns and rows. Hence, the reader gets a very clear idea of what
the table represents. There is thus a considerable saving in time and taken in understanding what
is represented by the data and all confusion is avoided. Also a large amount of space is saved
because of non-duplicating of headings and designations; the description at the top of a column
serves for all the terms beneath it.

ii) It facilitates comparison

Tabulation facilitates comparison. Since a table is divided into various parts and for each
part there arc totals and subtotals, the relationship between different parts of data can be studied
much more easily with the help of a table than without it.

iii) It gives identities to the data

When the data are arranged in a table with a title and number, they can be distinctly
identified and can be used as a source reference in the interpretation of a problem.

iv) It reveals patterns

Tabulation reveals patterns within the figures which cannot be seen in the narrative form. It also
facilitates the summation of the figures if the reader desires to check the total.

3.4.2 Parts of a table

The number of parts of a table varies from case to case depending upon the given data.
However, the main parts of a table in general are: table number, title of the tabic, caption, stub,
body of the table, head note and footnote. These parts are explained below.

i) Table Number

Each table should be numbered. There are different practices with regard to the place
where this number is to be given. The number may be given either in the centre at the top above
the title or inside of the title at the top or in the bottom of the table on the left hand side.

37

42
ii) Title of the table

Every table must be given suitable title. The title is a description of the contents of the
table. A complete title has to answer the questions what, where and when in that sequence. The
title should be clear, brief and self explanatory. However, clarity should not be sacrificed for the
sake of brevity. The title should be so worded that it permits one and only one interpretation. It
should be in the form of a series of phrases rather than complete sentences. Its lettering should be
the most prominent of any lettering on the table.

iii) Caption

Caption refers to the column heading. It explains what the column represents. It may consist of
one or more column headings. Under a column heading there may be subheads. The caption
should be clearly defined and placed at the middle of the column. If different columns are expressed
indifferent units, the units should be mentioned with the captions. As compared with the main part
of the table the caption should be shown in small letters. This help in saving space.

(iv) Stub

As distinguished from caption, stubs are the designations of the rows or row headings.
They are at the extreme left and perform the same function for the horizontal rows of numbers in
the table as the column headings do for the vertical columns of numbers. The .tubs are usually
wide than column headings but should be kept as narrow as possible without sacrificing precision
and clarity of statements.

v) Body:

The body of the table contains the numerical information. This is the most vital part of the
table. Data presented in the body arranged according to description are classifications of the
captions and stubs.

vi) Head note:

It is a brief explanatory statement applying to all or a major part of the material in the
table, and is placed below the point centered and enclosed in brackets. It is used to explain
certain points relating to the whole table that have not been included in the title nor in the captions

38

43
or stubs. For example, the unit of measurement is frequently written as a headnote, such as “in
thousand” or “in million tones” or “in crores” etc.

vii) Foot notes:

Any thing in a table which the reader may find difficult to understand from the title,
captions and stubs should be explained in footnotes. If footnotes are needed, they are placed
directly below the body of the table. Footnotes are used for the following main purposes:

a) To point out any exceptions as to the basis of arriving at the data.

b) Any special circumstances affecting the data, for example, strike, lockout, fire etc.

c) To clarify anything in the table.

d) To give the source in case of secondary data. The reference to the source should be
complete in itself.

There are various systems of identifying the footnotes. One is numbering them
consecutively with small numbers 1, 2, 3... or letters a, b, c, d....

The following is a specimen of a table indicating the above parts.

Format of a Table

Table Number

Title

Headnote

Stub Heading Caption

Column heading Column heading

Stub Entries BODY

Source

Foot notes.

3.5 Types of Tables

Tables may be broadly classified into two categories, viz., simple and complex tables,
general purpose and special purpose tables.

39

44
3.5.1 Simple and complex tables

The distinction between simple and complex table is based on the number of characteristics
studied. In a simple table only are characteristic is shown. This type of table is also known as
one-way table. On the other hand, in a complex table, two or more characteristics are shown.
Such tables are more popular in practice because they enable full information to be incorporated
and facilitate a proper consideration of all related facts. When two characteristics are shown
such a table is known as’ two-way table. When three characteristics are shown in a table, this
type of tabulation is known as treble tabulation. When four or more characteristics are
simultaneously shown it is a case of manifold tabulation. The following examples will illustrate the
distinction between simple and complex tables.

A. Simple table or one-way table

In this type of table only one characteristic is shown. This is the simple type of table. The
following is the illustration of such table.

Number employees in State Bank According to Age group

Age (in years) No. of Employees

Below 25 ..............

25-35 ..............

35-45 ..............

45-55 ..............

Above 55 ..............

Total ..............

B. Two-way table

Such a table shows two characteristics and is formed when either the stub or the caption
is divided into two coordinate parts. The following example illustrate the nature of such table.

Number of Employees of State Bank in Different Age Groups According to Sex

40

45
No. of Employees Total

Age (in years) Males Females

Below 25 .............. .............. ..............

25-35 .............. .............. ..............

35-45 .............. .............. ..............

45-55 .............. .............. ..............

Above 55 .............. .............. ..............

Total .............. .............. ..............

C. Higher Order Table

When three or more characteristics are represented in the same table, such a table is
called higher order table. The need for such a table arises when we are interested in presenting a
number of characteristics simultaneously. While constructing such a table it is necessary to first
establish an order of precedence among the attributes or characteristics sought to be classified
having regard to their relative importance.

3.5.2 General and Special Purpose Tables

A. General Purposes Tables:

General purpose tables also known as reference tables or repository tables provide
information for general use or reference. They usually contain detailed information and are not
constructed for specific discussion. In other words, these tables serve as repository of information
and are arranged for easy reference. Tables published by Governmental agencies are mostly of
this kind, such as the tables contained in the Statistical Abstract of the Indian, detailed tables
contained in the census reports etc. Such tables tell facts which are not for particular discussion.
When such tables are used by a researcher, they are usually placed in the appendix of the reports
for easy reference.

B. Special Purpose Tables:

Special purpose tables, also known as summary tables, provide information for particular
discussion. When attached to a report they are found in the body of the text These tables are also
called derivative tables since they arc often derived from general tables. Thus the large detailed
tables in the census records of the Government of India are general purpose tables. When such
data are used, they are ordinarily taken from the general purpose tables and presented as special

41

46
purpose tables, which emphasise the relation the user wishes to stress A special purpose table
should be designed in such a way that reader may easily refer to the table lor comparison,
analysis or emphasis concerning the particular discussion.

3.6 Summary :

Classification and tabulation is the first step in the analysis of any collected data.
Classification refers to the process of dividing the entire data into different groups or classes. The
data can be classified on various bases, viz., geographical, chronological, qualitative and
quantitative. Quantitative classification of data forming discrete or continuous distributions help
the investigator for further analysis to a great extent.

The process of arranging the statistical data in columns and rows in a systematic manner
is called tabulation. Tabulation has got its own significance. In general a table consists of different
parts, viz., table number, title of the table, captions, stubs, main body, headnote and footnotes.
Tables may be broadly classified into two categories, viz., simple and complex tables, and general
and special purpose tables.

3.7 Unit end Questions and Exercises :

1. What is meant by classification? How can a statistical data be classified on different


bases?

2. State the advantages of tabular representation of data. Explain different parts of a table.

3. Explain the significance of tabulation of data. What are the different types of tables?

4. Prepare a frequency distribution with class internal as 10 from the following sequence of
observations.

67, 34, 36, 48, 49, 31, 61, 34, 43, 45, 38, 32, 27, 61, 29, 47, 36, 50, 46, 30, 46, 32, 30, 33,
45, 49, 48, 41,53,36,37, 47, 47, 30, 46, 50, 28, 35, 35, 38, 36, 46, 43, 34, 62, 69, 50,28,44.43

42

47
3.8 Readings :

1. S.P.Gupta : Statistical Methods

2. K.V.Rao : Research Methodology in Commerce and Management

3. D.C.Sancheti & V.K.Kapoor : Statistics - Theory, Methods and Applications.

43

48
GUIDELINE - 4 :
GRAPHIC PRESENTATION OF DATA

Structure

4.0 Objectives

4.1 Introduction

4.2 Advantages and Limitations of Graphic Presentation

4.3 General Rules for Graphic Presentation

4.4 Graphs of Time Series

4.5 Graphs of Frequency Distributions

4.6 Summary

4.7 Unit end Questions & Exercises

4.8 Readings

4.0 Objectives :

a) To understand the meaning of graphic presentation.

b) To present the graphs of time series.

c) To acquaint the student with presentation of histogram.

d) To distinguish between frequency polygon and frequency curve.

e) To highlight the ogives.

4.1 Introduction :

One of the most convincing and appealing ways in which statistical results may be presented
is through diagrams and graphs. Evidence of this can be found in newspapers, magazines, journals,
advertisements etc. There are numerous ways in which statistical data may be displayed pictorials
such as different types of diagrams and graphs. Very often, the problem is that of selecting the
best out of several methods that may be available. This is a difficult task and requires a great deal
44

49
of artistic talent and imagination on the part of the individual or agency engaged in the preparation
of diagrams and graphs.

In a graphic mode of presentation, the points or lines of various kinds are used to represent
data. Each graph paper has thick lines for each division of an inch or centimeter measure and thin
lines for smaller part of the same. A graph, of whatever size, is divided into four quadrants but
normally the first quadrant is used unless there are negative figures to be shown on either of the
axes. The horizontal axis is called x-axis and the vertical line is called y-axis. These intersect at a
center point called the origin indicated by O. The negative quantity of any variable shown on
horizontal axis is on the left of the origin. The negative quantity of any variable shown on the
vertical axis will be indicated on the lower portion of the origin.

4.2 Advantages and Limitations of Graphic Presentation :

Graphic presentation of any statistical data has got certain advantages and some limitations
also.

4.2.1 Advantages

Graphic presentation has the following advantages,

i) Render complex data simple: Graphic presentation renders complex data simple ana
easily understandable by giving a picturesque view.

ii) Give attractive, interesting and impressive view: A graph looks to be more attractive than
a table of figures. The features of data become visible at a glance from a graph. So it be
comes very easy to study the tendency and fluctuations in data.

iii) Save time and labour: Graphic method is the simplest-method of presenting statistical
data. Therefore, it saves time and labour of both the statistician as well as the observer
Display of time services and frequency distribution can be made quite effective through
a graph.

iv) Make comparisons easy: Comparisons between two or more phenomena can he made
very easily with the help of a graph. In the words of Dickson Hail well, “illustrations
including graphs tend to simplify comparisons of statistical matter and trend.”

45

50
v) Avoid knowledge of mathematics: No special knowledge of mathematics is required to
understand the message of the data from the graph.

vi) Certain statistical measures can be ascertained: Graphic presentation of statistical data is
helpful in interpolation, extrapolation and forecasting. With the help of them, one can
also determine median, quartiles and mode.

Due to the above advantages, graphic presentation of statistical data is becoming more
and more popular with the statisticians.

4.2.2 Limitations

Graphic presentation of data has the following limitations.

1. A curve simply shows tendency and fluctuations, actual values are not known.

2. Complete accuracy is not possible on a graph.

3. Graphs cannot be quoted to support some statement.

4. Only few characteristics can be depicted on a graph.

5. All graphic devices are not simple and straightforward.

4.3 General Rules for Graphic Presentation

While graphing statistical data, the following points should be born in mind.

1. Title: Every graph must have a clear and comprehensive title so that what facts are
represented in the graph may be known.

2. Structural frame work: The independent variables should always be measured along the
x-axis and dependent variable along the y-axis. The scale along the y-axis should begin
from zero as origin. For actual plotting of the data, it should be remembered that for
every value of the independent variable, there is a corresponding value of the dependent
variable.

3. Choice of scale: The choice of the scale should be so made as to accommodate the
whole data. Tn the words of A.L. Bowley, “It is difficult to lay down rules for the proper
choice of scales by which the figures should be plotted out. It is only the ratio between

46

51
the horizontal and vertical scales that need to be considered.” One has to note that the
scale need not be the same for both x-axis and y-axis.

4. Use of false base line: When fluctuations in a variable are small relative to its size and it is
desired to visualize these fluctuations properly, the vertical scale may be stretched. This
can be done if, instead of showing the entire scale from zero to the highest value involved,
only as much is shown as is necessary for the purpose. The portion which lies between
zero and the lowest value of the variable is left out. This omission is indicated by a scale
break.

5. Use of ratio or logarithmic scale: For showing proportional changes, ratio, or logarithmic
scale should be used.

6. Line designs: If more than one line is plotted on the same graph, it is necessary to distinguish
them by different patterns.

7. Captions: The scale caption for the x-axis is placed under the centre of the horizontal
axis. The scale caption for the y-axis is placed at the top or middle of the y-scale.

8. Index: An index should be given to show the scales and the meaning of different curves.

4.4 Graphs of Time Series :

A large variety of graphs are used in practice. However, here we shall discuss only some
of the important types of graphs, which are more popular. Broadly various graphs can be divided
into two heads, viz.. graphs of time series and graphs of frequency distributions. The first category
is dealt in this section and the second category is dealt in the next section.

When we observe the values of a variable at different points of time, the series so formed
is known as a time series. The technique of graphic presentation is extremely helpful in analysing
changes at different points of time. On the x-axis we generally take the time and on the y-axis the
value of the variable and join the various points by straight lines. The graph so formed is known
as the time series graph.

47

52
Graphs of time series can be constructed either on a natural scale or on a ratio scale. In
natural or arithmetic scale, absolute changes from one period to another are shown whereas, in
a ratio scale the rates of change or the relative changes are shown.
Illustration 1: Represent the following data of per capita income graphically.

Year Per capita Income (Rs.)


(at current prices)
1980-81 1627.2
1981-82 1851.0
1982-83 1993.4
1983-84 2287.9
1984-85 2493.4
1985-86 2734.0
1986-87 2974.2
1987-88 3284.2
Solution :
Per capita Income (in Rupees) at
current prices
3200
3000
2800
2600
Income

2400
2200
2000
1800
1600
1980-81
1981-82
1982-83
1983-84
1984-85
1985-86
1986-87
1987-88

Years

48

53
4.5 Graphs of Frequency Distributions :

In a frequency graph, the size or the value of” the item is presented on the horizontal axis
ami the frequency or the number of items on the vertical axis. A frequency distribution can be
presented graphically in any one of Hie ways, viz., histogram, frequency polygon, frequency
curve and ogives.

4.5.1 Histogram :

Out of several methods of presenting a frequency distribution graphically, histogram is


the most popular and widely used in practice. A histogram is a set of vertical bars whose areas
are proportional to the frequencies represented.

While constructing histogram, the variable is always taken on the x-axis and the frequencies
depending on it on the y-axis. Each class is then represented by a distance on the scale that is
proportional to its class interval. The distance for each rectangle on the x-axis shall remain the
same in case the class intervals are uniform throughout. If the class intervals are different, the
width of the rectangles shall also vary. The y-axis represents the frequencies of each class which
constitute the height of its rectangle. In this manner we get a series of rectangles each having a
class interval distance as its width and the frequency as its height. The area of the histogram
represents the total frequency as distributed throughout the classes.

The technique of constructing histogram is presented below (a) for distributions having
equal class intervals and (b) for distributions having unequal class intervals.

a) Equal class intervals :

When class intervals are equal, take the variable on the x-axis and the corresponding
frequency on y-axis, and construct adjacent rectangles. In such a case, the height of the rectangles
will be proportional to the frequencies.

Illustration 1: Draw the histogram for the following data.

Variable 100-110 110-120 120-130 130-140 140-150 150-160 160-170

Frequency 11 28 36 49 33 20 8

49

54
Histogram
60
50
Frequency

40
30
20
10
0
____ 100 110 120 _ 130 140 150 _ 160
Variable

b) Unequal class intervals:


When class intervals are unequal, a correction for unequal class intervals must be made.
The correction consists of finding the frequency density for each class. The frequency density is
the frequency for that class divided by the width of that class. A histogram from these density
values would have the same general appearance as the corresponding graphical display developed
from equal class intervals.
For making the adjustment, we take that class which has lowest class interval and adjust
the frequencies of other classes in the following manner. If one class interval is twice as wide as
the one having lowest class interval, we divide the height of its rectangle by two, if it is three times
are, we divide the height of its rectangle by three etc. The heights will be proportional to the ratio
of the frequencies of the width of the class.
Illustration 2: Represent the following data by means of a histogram.

Weekly 10-15 15-20 20-25 25-30 30-40 40-60 60-80


wages (Rs)
No. of 7 19 27 15 12 12 8
workers

50

55
Solution:

We observe thai the class interval for 5lh class is 10, which is double the minimum class
interval of 5. Hence we have to divide the frequency 12 by 2 and take the height as 6. Similarly
for the last two classes, the class interval is 20 which is 4 limes the minimum class interval of 5.
So, we have to divide both the class frequencies 12 and 8 by 4 and take the height as 3 and 2
respectively.

4.5.2 Frequency polygon :

A frequency polygon is a graph of frequency distribution. It is particularly effective in


comparing two or more frequency distributions. There arc two ways in which a frequency polygon
may be constructed.

1. With the help of a histogram: We may draw a histogram of the given data and
then join by straight lines the mid points of the upper horizontal side of each
rectangle with the adjacent ones. The figure so formed is called frequency
polygon. It is an accepted practice to close the polygon at both ends of the
51

56
distribution by extending them to the base lino. When this is done, two
hypothetical classes all each end would have to he included, each with a frequency
of zero.

2. With mid values of class intervals: Another method of constructing frequency


polygon is to take the mid points of various class intervals and then plot the
frequency corresponding to each point and to join all these points by straight
lines. The figure obtained would exactly be the same as obtained in the earlier
method.

By constructing a frequency polygon the value of mode can be easily ascertained. If a


perpendicular is draw n from the apex of the polygon on x-axis we get the value of mode.
Moreover, frequency polygons facilitate comparison of two or more frequency distributions on
the same graph, in the construction of frequency polygon the same difficulties are laced as in the
case of histograms. They cannot be used for open-end distributions Also suitable adjustment has
lo be made when there are unequal intervals.
Illustration 3: Draw a frequency polygon for the following data.

Marks 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100


No. of 4 6 14 16 14 8 6 5
students

Solution: If we calculate mid values of class intervals,


they will be 25,35, 45, 55, 65, 75, 85 and 95

Frequency Polygon
No. of students

15
10
5
0
25 35 45 55 65 75 85 95
Marks (Mid values of class intervals)

52

57
4.5.3 Frequency curve

The procedure of drawing a frequency curve is quite similar to that of a frequency polygon
except, joining the points by a free hand instead of a line segment. As in the case of frequency
polygon, the curve can be obtained from a histogram or by calculating niijdlc values of class
intervals. We have to. take middle values of class intervals on x-axis and corresponding frequencies
on y-axis, plot the points and then join the points with free hand.

Illustration 4: Represent the following frequency distribution by means of a frequency curve.

Salary 300-400 400-500 500-600 600-700 700-800 800-900 900-1000


(Rs.)

No. of 20 30 60 75 115 100 60


Emp.

Solution:

Frequency curve

120

100
No. of Employees

80

60

40

20

300 400 500 600 700 800

Salary (Rs.)

4.5.4 Ogives or Cumulative Frequency curves :

If we go on adding the frequencies in a distribution, they are called cumulative frequencies.


If these frequencies are listed in a table, it is called a cumulative frequency table. The curve
obtained by plotting cumulative frequencies is called a cumulative frequency curve or an olive.

53

58
There are two methods of constructing ogives, namely ‘less than’ and ‘more than’ methods.

a) Less than method: In the less than method, we start with the upper limits of the
classes and go on adding the frequencies. When these frequencies are plotted,
we get a raising curve.

b) More than method: In the more than method, we start with the lower limits of the
classes and from the total frequency we go on subtracting frequency of each
class. When these frequencies are plotted, we get a declining curve.

From the stand point of graphic presentation, the ogives are used for some special
purposes. Ogives are used to determine the number or proportion of cases above or below a
given value. Ogives are also drawn for determining certain values graphically such as median,
quartiles, deciles etc.

Illustration 5: Draw less than and more than cumulative frequency curves for the following
data.

Marks 10-20 20-30 30-40 40-50 50-60 60-70

No. of 4 6 10 20 18 2
Students

Solution: Cumulative frequency distributions.

Marks less than No. of Students Marks more than No. of students

20 4 10 60

30 10 20 56

40 20 30 50

50 40 40 40

60 58 50 20

70 60 60 2

54

59
Less than cumulative frequency curve More than cumulative frequency curve

4.6 Summary:

Graphical presentation is one of the most convincing way in which statistical results may
be presented. Graphic presentations render complex data simple. They give attractive, interesting
and impressive view. They save time and labour. Comparisons can be made very easily with the
graphic presentation. Some graphs also help to obtain certain statistical measures like median
quartiles etc. certain general rules are prescribed for graphic presentation.

Graphs can broadly be divided into two categories, viz., graphs of time series and graphs
of frequency distributions. Histogram, frequency, polygon, frequency curve and ogives are the
most popular graphic presentations of frequency distributions.

4.7 Unit end Questions & Exercises :

1. What is the significance of graphic presentation of statistical data? What are the general
rules to be followed for graphic presentation?

2. Explain various methods of presenting a frequency distribution graphically.

3. Distinguish between a histogram and ogives.

4. The index numbers of Indian industrial projects with 1950-100, as the base year are
given in the following table. Present the data by a suitable graph.

55

60
Year 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960

Index 187 222 246 239 234 229 192 260 182 247
Number

(Hint: Draw a Time series graph)

5. The annual steel production of a company for a period of nine years is given in the
following table. Draw an appropriate diagram to present the data.

Year 1965 1966 1967 1968 1969 1970 1971 1972 1973

Production 212 220 235 257 289 263 250 283 290
(Tonns)

(Hint; Draw a Time series graph)

6. The monthly profits in thousand rupees of 100 shops are distributed as follows:

Monthly profits 0-5 5-10 10-15 15-20 20-25 25-30


(1000 Rupees)

No. of Shops 12 18 27 20 17 6

Draw a histogram, frequency polygon, frequency curve and ogives for the above data.

7. Draw a histogram and ogives for the following data

Mid Value 15 25 35 45 55 65 75

Frequency 10 24 40 32 20 14 4

(Hint: Construct the class intervals)

8. You arc given the following frequency distribution of monthly expenditure on food incurred
by a sample of 100 families.

Expenditure on 0-50 50-150 150-250 250-400 400-600 1600-800


food (Rs.)

No. of families 7 24 30 27 8 4

Draw a histogram to the above data.

(Note: Unequal class intervals)

56

61
4.8 Readings:

1. S.P.Gupta : Statistical Methods

2. D.C.Sanchati & V.K.Kapoor: Statistics - Theory, Methods and Applications.

3. K.V.Rao : Research Methodology in Commerce and Management..

57

62
GUIDELINE-5 :
DIAGRAMATIC PRESENTATION OF DATA

Structure :

5.0 Objectives

5.1 Introduction

5.2 Merits and Limitations of Diagramatic Presentation

5.3 Types of Diagrams

5.4 Summary

5.5 Unit end Questions

5.6 Readings

5.0 Objectives:

a) To understand the meaning of the concept of diagramatic representation,

b) To give insight into the merits and limitations of diagramatic representation.

c) To present various types of diagrams.

5.1 Introduction :

One of the most convincing and appealing ways in which statistical results may be presented
is through diagrams and graphs. A diagram is a visual form for presentation of statistical data,
highlighting their basic facts and relationship. Diagrams are used with great effectiveness in the
presentation of all types of data. When properly constructed, they readily show information that
might otherwise be lost amid the details of numerical tabulation.

5.2 Merits and Limitations of Diagramatic Presentation:

A properly constructed diagram appeals to the eye and also to the mind because it is
practical, clear and easily understandable even by those unacquainted with the methods of
presentation. Though diagrams do not add any new meaning to the statistical facts, yet they
exhibit the results more clearly. Diagramatic representation of statistical facts is the best way of

58

63
appealing to the mind through the eyes. We can observe the following merits and limitations of
diagramatic presentation.

5.2.1 Merits:

a) They are attractive and impressive:

Diagrams are attractive and create lasting impression. A person who does not like to
devote even a single minute to the study of a page containing numerical tables, in most eases
would not like to take his eyes away from an attractively constructed diagram even from the
same data. They do not strain the mind of the observer.

b) They make data simple and intelligible:

Diagrams have the merit of rendering the whole data readily intelligible. The mass of
complex data, when depicted through a diagram, can be understood easily. Diagrams bring forth
the characteristics of data.

c) They make comparison possible:

Diagrams make comparison between two sets of data possible. This is one of the objectives
of diagramatic presentation. In absolute figures, the comparison is sometimes not very clear, but
diagramatic presentation makes it simpler and easier.

d) They save time and labour :

Diagramatic presentation saves a lot of time which could have otherwise been lost in
grasping the significance of numerical data. The data which will take hours to understand them,
their diagramatic presentation will make their basic characteristics clear in minutes.

e) They have universal utility :

Diagramatic presentation of statistical data is practised universally. It is a widely used


technique in economic, business, administration, social and other fields.

f) They give more information:

A diagram depicts more information than the data shown in a table. It clarifies the existing
trend in the data and how the trend changes. Though such information is there in the tables also,
but to find out trend from them is a difficult and time consuming job.

59

64
5.2.2 Limitations:

Diagrams arc important tools. But they arc not substitutes for classification and tabulation.
They arc complementary to those processes. Only a rough idea can be had about data from a
diagram. Diagramatic presentation is risky in the hands of those who draw conclusions from
them without making a careful study. Care has to be taken while using diagramatic presentation
and interpreting it against mis-understanding or misrepresenting it.

Diagramatic presentation has the following limitations.

i) Diagramatic presentation of statistical data is useful to a common man. Its utility


to an expert is limited.

ii) Diagram can show only a limited amount of information.

iii) Diagram can show only appropriate values.

iv) It is not possible to present a precise difference between two sets of data.

v) If there is wide gap between different measurements, thus also cannot be shown
meaningfully in a diagram.

vi) A diagram is limited to the portrayal of two or three aspects of a set of data,
otherwise it becomes too complex to be understood.

vii) Diagrams cannot be analysed further.

viii) Diagram is only a means of drawing conclusions. It will be better, if a practice of


double presentation is adopted-tables for detailed reference and diagrams for
rapid understanding.

5.3 Types of Diagrams :

There is a large number of diagramatic forms to choose from. The choice of the types of
diagrams in which the data are to be presented i§ a difficult one. Selection of the type will-
depend upon ability and experience. The commonly used types of diagrams are: one dimensional,
two dimensional, three dimensional diagrams, portograms and cartograms.

60

65
5.3.1 One Dimensional Diagrams :

In such diagrams, only one dimensional measurement, i.e., height, is used and the width
is not considered. Such diagrams are in the form of line or bar charts. On the basis of sizes of
figures, heights of lines or bars are drawn. In bar charts, no doubt, width is kept, but ii has no
relation with the measurement. Width is used only to make diagram look beautiful and attractive.
One dimensional diagrams may be of the following types.

A. Line Diagram

Line diagram is used in case where there arc many items to be shown and there is not
much of difference in their values. Such diagram is prepared by drawing a vertical hne for each
item according to the scale. The distance between lines is kept uniform. Line diagram makes
comparison easy, but it is less attractive.

Illustration 1: Show the following data by a line chart.

No.of Children 0 1 2 3 4 5

Frequency 15 10 13 6 3 3

Solution:

Line Chart

16

14
Frequency

12

10

0 1 2 3 4 5

No. of Children

61

66
B. Simple Bar Diagram
A simple bar diagram is used to represent only one variable. For example, the figures of
sales, production, population etc., for various years may be shown by means of a simple bar
diagram. Simple bar diagrams are very popular in practice. They can be vertical or horizontal. In
practice, vertical bars are more popular. In bar diagrams, width of the each bar is the same, but
the length varies according to the numerical data.
Illustration 2: Following tables gives the birth rate per thousand of different countries
over a certain period.
Country Birth Rate Country Birth Rate
India 33 China 40
Germany 16 New Zealand 30
U.K. 20 Sweden 15
Represent the above data by a suitable diagram.
Solution:
Bar Diagram
45
40
35
Birth Rate

30
25
20
15
10
5
0
Germany

Sweden
Zealand
China
U.K.
India

New

C. Component Bar Diagram (Subdivided bar diagram)

62

67
In component bar diagram, each bar representing the magnitude of a given phenomenon
is further subdivided into its various components. Each component occupies a part of the bar
proportional to its share in the total. The subdivisions are distinguished by different colours or
crossings or dottings.
Illustration 3: The number of students in five different colleges are given below.
College Boys Girls Total
A 450 620 1070
B 1260 910 2170
C 1590 1260 2850
D 1340 1150 2490
E 1050 830 1880
Represent the data by a suitable diagram:
Solution:
Component Bar Diagram
3000
2500
2000
1500
1000
500
0
A B C D E
College
 Girls
 Boys

63

68
D. Multiple Bar Diagram
In a multiple bar diagram, two or more sets of interrelated data are represented. The
technique of drawing such a diagram is the same as that of simple bar diagram. The only difference
is that, since more than one phenomenon is represented, different shades, colours, dots or crossing
are used to distinguish between the bars. Whenever a comparison between two or more related
variables is to be made, multiple bar diagram should be preferred.
Illustration 4: Draw a suitable diagram from the following data.
Year Sales Gross Profit Net profit
2001 120 40 20
2002 135 45 30
2003 140 55 35
2004 150 60 40
Multiple Bar Diagram
180
160
Sales, Gross profit, Net Profit

140
120
(Rs. 1000)

100
80
60
40
20
0
Year
Sales
Gross profit
Net profit

64

69
E. Percentage Bar Diagram

Percentage bars are particularly useful in statistical work, which requires the portrayal of
relative changes in data. When such diagrams are prepared, the length of the bars is kept equal
to 100 and segments are cut in these bars to represent the components (percentages) of an
aggregate.

Illustration 5: Represent the following by subdivided bar diagram on the percentage basis.

Particulars 1986 1987 1988

1.Cost per chair

a)Wages 9 15 21

b)Other cost 6 10 14

c)Polishing 3 5 7

Total cost 18 30 42

2.Sale proceeds/chair 20 30 40

Profit/loss 2 - -2

Solution: Take the sale price per chair as 100 and express the other figures in percentages. The
percentages so obtained are given below.

Particulars 1986 1987 1988

Wages 45.0 50.0 52.5

Other cost 30.0 33.3 35.0

Polishing 15.0 16.7 17.5

Total costs 90.0 100.0 105.0

Sale price 100.0 100.0 100.0

Profit or loss 10.0 - -5.0

65

70
Cost, Sale price and Profit & Loss per chain

5.3.2 Two Dimensional Diagrams :

As distinguished from one dimensional diagrams in which only the length is taken into
account, in two dimensional diagrams the area will be taken into consideration. There are so
many types of two dimensional diagrams, of which pie diagram is the most popular one.

Pie Diagram :

Pie diagrams arc very popularly used in practice to show percentage breakdowns. For
example, with the help of a pie diagram we can show how the expenditure of the Government is
distributed over different heads like Agriculture, Irrigation, Industry, Transport & Defence etc.
Similarly through a pie diagram we can show how the expenditures incurred by an industry arc
divided under different heads like raw materials, wages and salaries, selling and distribution costs
etc. The pie chart is so called because the entire diagram looks like pie, and the components
resemble slices cut from pie.

While making comparisons, pie diagrams should be used on a percentage basis and not
on an absolute basis, since a series of pie diagrams showing absolute figures would require that
larger totals be represented by larger circles. Such presentation involves difficulties of two-
dimensional comparisons. However when pie diagrams are constructed on a percentage basis,

66

71
percentages can be presented by circles equal in size. It may be noted that this problem docs not
arise in the use of a single pie diagram.

In laying out the sectors for pie chart, it is desirable to follow some logical arrangement
or sequence. It is a common practice to begin the largest component sector of a pie diagram at
12’0 clock position on the circle. Usually the other component sectors are placed in clockwise
succession in decending order of magnitude, except for catch-all components like “miscellaneous”
and “”all others” which are shown last, contrast with adjacent sectors.

In constructing a pie chart, the first step is to prepare the data so that the various component
values can be transposed into corresponding degrees on the circle. The second step is to draw a
circle of appropriate size with a compass. The size of the radious depends upon the available
space and other factors of presentation.

The third step is to measure points on the circle representing the size of each sector with
the help of a protractor. The ordinary protractor is based upon a scale in which the total circle is
360°, but it is possible to purchase a protractor in which the entire circle is divided not into 360
but 100 equal parts so that the angle representing any desired percentages can be read directly.

An essential feature of the pie chart is careful identification of each sector with some kind
of explanatory or descriptive level. If there is sufficient room, the labels can be placed inside the
sectors; otherwise the labels should be placed in contiguous positions out side the circle, usually
with an arrow pointing to the appropriate sector.

Limitations of Pie Diagrams:

Pic diagrams are at times less effective than bar diagrams for accurate reading arid
interpretation, particularly when series are divided into a large number of components or the
difference among the components is very small. It is generally inadvisable to attempt to portray a
series of more than five or six categories by means of a pie chart.

Illustration 6: Draw a pie diagram for the following data of sixth five year plan public sector
outlays.

67

72
Agricultural and rural development 12.9%

Irrigation etc. 12.5%

Energy 27.2%

Industry and minerals 15.4%

Transport, communication etc. 15.9%

Social services and others 16.1 %

Solution: The angle at the center is given by

Percentage outlay
X 360  percentage outlay x 3.6
100

Computation of Angles

The angles are computed in the following table.

Sector Percentage Angle


outlay

Agriculture and Rural Development 12.9 12.9 x 3.6 = 46°

Irrigation etc. 12.5 12.5 x 3.6 = 45°

Energy 27.2 27.2 x 3.6 = 98°

Industry and minerals 15.4 15.4 x 3.6 = 56°

Transport, communication etc. 15.9 15.9 x 3.6 = 57°

Social services & others 16.1 16.1 x 3.6 = 58°

Total: 100.0 360°

Now a circle shall be drawn suited to the size of the paper and divided into 6 parts
according to degrees of angles at the center. The angles have been arranged in descending and
the diagram is presented below.

68

73
Pie Diagram showing sixth five year Plan Public Sector Outlays

5.4 Summary:

A diagram is a visual form for presentation of statistical data, highlighting their basic facts
and relationship. Diagramatic presentation has got certain merits as well as some limitations.

Diagrams arc attractive and impressive. They make data simple and intelligible. They
make comparison possible. They save time and labour. They have universal utility. They give
more information. Diagrams can show only a limited amount of information. Diagrams cannot be
analysed further.

The commonly used diagrams are bar diagrams of different types and pie diagrams.
Selection of an appropriate diagram is also somewhat difficult problem. The choice would primarily
depend upon two factors, namely, the nature of the data and the type of people for whom the
diagram is meant. There are different types of bars and the appropriate type of bar chart can be
divided on the following basis.

a) Simple bar charts should be used where changes in totals are required to be
conveyed.

b) Component bar charts arc more useful where changes in totals as well as in the
size of component figures are required to be displayed.

c) Percentage comparison bar charts are better suited where changes in the relative
size of component figures are to be exhibited.

69

74
d) Multiple bar charts should be used where changes in the absolute values of the
component figures are to be emphasised and the overall total is of no importance.
A pie chart is particularly useful where it is desired to show the relative proportions of the
figures that go to make up a single overall total. Unlike bar charts, it is not restricted to there or
four component figures although its effectiveness tends to dwindle with more than seven or eight
components.
5.5 Unit end Questions & Exercises :
1. Discuss the meaning, utility and limitations of diagramatic presentation of statistical data.
2. State briefly the purposes served by the diagramatic presentation. Explain different types
of bar diagrams.
3. Discuss the types of data which are usually represented by pie diagrams. Explain the
procedure of constructing a pie diagram.
4. The distribution of factories in 6 districts of Karnataka state is given below. Present the
data by a suitable bar diagram.
Name of the District No. of Factories
Bangalore 1,001
Belgaum 244
Bijapur 122
Bidar 14
Bellary 127
Coorg 27
(Draw a simple bar diagram)
5. The table given below gives the data relating to exports and imports in a country during
four years ending 1976-77.

70

75
Year Exports Imports
(Crores of Rs.) (Crores of Rs.)
1973-74 320 250
1974-75 340 260
1975-76 340 240
1976-77 310 200
Draw a multiple bar diagram to present the above data.
6. Draw suitable diagrams to present the following data.
Distribution of India’s population: 1971
Religion Percentage of Population
Rural Urban
Hinduism 84.33 76.25
Islam 9.96 16.21
Christian 2.43 3.26
Sikhism 1.92 1.81
Others 1.36 2.47
Total 100 100
(Hint: Use percentage bar diagram)
7. Represent the following data by subdivided bars drawn on percentage basis.
Particulars 1974 1975 1976
(Rs.) (Rs.) (Rs.)
Materials 48 68 99
Wages 36 52 63
Other costs 24 30 45
Total cost 108 150 207
Profit / loss 12 Nil -27
Sale price 120 150 180

71

76
8. Monthly expenditure of a family on various items is given in the following table. Draw a
pie diagram to the data.
Item Expenditure (Rs.)
Food 240
Clothing 160
House rent 120
Education 80
Fuel & Lighting 40
Miscellaneous 40
Total 680

5.6 Readings:
1. S.P.Gupta : Statistical Methods.
2. D.C.Sancheti & V.K.Kapoor : Statistics - Theory, Methods & Applications.
3. K.V.Rao: Research Methodology in Commerce and Management.

72

77
GUIDELINE-6 :
MEASURES OF CENTRAL TENDENCY

Structure :
6.0 Objectives
6.1 Introduction
6.2 Requisites of a Good Average
6.3 Types of Averages
6.4 Summary
6.5 Unit end Questions
6.6 Readings

6.0 Objectives
a) To present the significance of studying averages,
b) To enable the student to understand the concept of arithmetic mean,
c) To look into methods of obtaining median.
d) To present the concept and method of obtaining mode.

6.1 Introduction:
One of the most important objectives of statistical analysis is to get one single value that
describes the characteristic of the entire mass of unwieldy data. Such a value is called the central
value or an average. The word average is very commonly used in day to day conversation. For
example, we often talk of average boy in a class, average height or life of an Indian, average
income etc. When we say ‘he is an average student, what it means is that he is neither very good
nor very bad, just a mediocre type of student. However, in Statistics, the term average has a
different meaning.
The word ‘average’ has been defined differently by various authors. “An average value
is a single value within the range of the data that is used to represent all of the values in the series.

73

78
Since an average is somewhere within the range of the data, it is also called a measure of central
value.” - Croxton & Cowden.

6.2 Requisites of a Good Average:

Since an average is a single value representing a group of values, it is desired that such a
value satisfies the following properties.

i) Easy to understand: Since statistical methods arc designed to simplify complexity, it is


desirable that an average be such that can readily be understood; otherwise, its use is
bound to be very limited.

Simple to compute: An average should not only be easy to understand but also simple to
compute so that it can be used widely. However, though ease of computation is desirable,
it should not be sought at the expense of other advantages.

ii) Based on all items: The average should depend upon each and every item of the series so
that if any of the items is dropped the average itself is altered.

iii) Not be unduly affected by extreme values: Although each and every item should influence
the value of the average, none of the items should influence it unduly. If one or two very
small or very large items unduly affect the average, i.e., either increase its value or reduce
its value, the average cannot be really typical of the entire series. In other words, extremes
may distort the average and reduce its usefulness.

iv) Rigidly defined: An average should be properly defined so that it has one and only one
interpretation. It should preferably be defined by an algebraic formula so that if different
people compute the average from the same figures, they will get the same answer. The
average should not depend upon the personal prejudice and bias of the investigator;
otherwise the results can be misleading.

v) Capable of further algebraic treatment: We should prefer to have an average that could
be used for further statistical computations so that its utility is enhanced. For example, it
we are given the data about the average income and number of employees of two or
more factories, we should be able to compute the combined average.

74

79
vi) Sampling stability: We should prefer to get a value which was sampling stability. This
means that if we pick 10 different groups of college students, and compute the average
of each group, we should expect to get approximately the : ,ie value. It does not mean,
however, that there can be no difference in the values of different samples. There may be
some difference, but those samples in which this difference, called sampling fluctuation,
is less are considered better than those in which this difference is more.

6.3 Types of Averages :

The very commonly used averages or measures of central tendency are arithmetic mean,
median, mode, geometric mean and harmonic mean. These measures are discussed in detail in
the following sections.

6.3.1 Arithmetic Mean:

The most popular and widely used measure of representing the entire data by one value
is ‘average’ in the common usage. Statisticians call this as arithmetic mean. Its value is obtained
by adding together all the items and by dividing this total by the number of items.

A. Merits of Arithmetic Mean:

Arithmetic mean is most widely used in practice because of the following reasons:

1. It is the simplest average to understand and easiest to compute. Neither the


arraying the data as required for calculating median nor grouping of data as
required for calculating mode is needed while calculating mean.

2. It is affected by the value of every item in the series.

3. It is defined by a rigid mathematical formula with the result that every one who
computes the average gets the same answer.

4. Being determined by a rigid formula, it lends itself to subsequent algebraic


treatment better than the median or mode.

75

80

You might also like