Stat 1
Stat 1
Structure
1.0 Objectives
1.1 Introduction
1.2 Origin and Growth of Statistics
1.3 Definitions of Statistics
1.4 Scope of Statistics
1.5 Limitations of Statistics
1.6 Summary
1.7 Glossary
1.8 Unit End Questions
1.9 Readings
1.0 Objectives
a) To present importance, origin and growth of statistics,
b) To understand various definitions of statistics,
c) To acquaint the student with the scope of statistics,
d) To present the limitations of statistics.
1.1 Introduction
The word ‘Statistics’ conveys a variety of meanings to people. To some, Statistics is an
imposing form of mathematics, whereas to some others, it suggests tables, charts and figures.
Each day we are exposed to a wide assortment of numerical information, which often has a
profound impact on our lives. For example, we come across statements like,” there are 932
females per 1000 males in India, whereas in the U.S.S.R, there arc 1,170 females per 1,000
males.” Numbers play an essential role in statistics. They provide the raw material of statistics.
These materials must be processed to be useful, just as crude oil must be refined into petrol
before it can be used by an automobile engine. Hie study of Statistics involves methods of refining
6
numerical and non- numerical information into useful forms. Whenever numbers are collected
and compiled, regardless of what they represent, they become statistics. In other words, the
term statistics is considered synonymous with numbers, figures or data.
Statistical methods and principles have found applications in many fields, business, social
sciences, engineering, natural and physical sciences. Infant, modern age is the age of Statistics.
Almost every aspect of human and natural phenomena and other activity is now subjected to
measurement and interpretation in terms of statistics. Statistical methods are powerful and important
tools with which we can interpret the complex world of today. Statistics provides a means for
presenting information in meaningful and understandable form. It is a vehicle for transmitting and
storing information. Statistics serves as the informational segment of decision making process.
Management at all levels is guided by facts obtained through analysis of records rather than upon
knowledge obtained merely through personal observation and experience.
It may be interesting to point out that Statistics is not a new discipline but as old as the
human society itself. It has been used right from the existence of life on earth, though its use was
very much limited.
In the good old days Statistics was regarded as the ‘science of state craft’ and was the
by-product of the administrative activity of the State. It has been the traditional function of the
governments to keep records of population, births, deaths, taxes, crop yields and many other
types of activities. Counting and measuring these events may generate many kinds of numerical
data.
Census of population and wealth were taken even in the ancient times. According to a
Greek historian, in 1400 B. C., a census of all lands in Egypt was taken. Similar reports on the
7
ancient Chinese, Greeks and Romans are also available. People and land were the earliest object
of statistical enquiry.
The word ‘Statistics’ comes from the Italian word ‘statista’ or the German word ‘statistik’
each of which means a Political State. Professor Gottfried Achenwall inl 749 to refer to the
subject matter as a whole first used it. Achenwall defined Statistics as “the political state of the
several countries”.
In the present century, considerable development has taken place in the field of business
and commerce, governmental activities and science. Statistics helps in formulating suitable policies,
and as such its need is increasingly felt in all these spheres. Taking the case of business, not only
has the magnitude of business considerably increased but also growing size of business has made
its problems more complex.
The time and cost of collecting data are very important limiting factors in the use of
Statistics. However, with the development of electronic machines, such as calculators, computers
etc, the cost of analyzing data has considerably gone down. This has led to increasing use of
Statistics in solving various problems. Moreover, with the development of statistical theory the
cost of collecting and processing data has gone down..
8
1.3.1 Statistical Data:
Webster defined statistics as, “the classified facts representing the conditions of the
conditions of the people in a state especially those facts, which can be stated in numbers or in
tables of numbers in any tabular or classified arrangement”.
The above definition is too narrow as it confines |he scope of Statistics to only such facts
and figures, which relate to the conditions of people in a State.
Yule and Kendall defined Statistics as” By Statistics we mean quantitative data affected
to a marked extent by multiplicity of causes”.
This definition is less comprehensive than the one given by I lorace Secrist who defined
Statistics as follows.
This definition clearly points out certain characteristics, which numerical data must possess
in order that they may be called statistics. These are as follows:
Single and isolated figures are not statistics for the simple reason that such figures are
unrelated and cannot be compared. To illustrate, if it is stated that the income of Mr. X is Rs.
3,00,000 per annum, this would not constitute statistics although it is a numerical statement of
fact.
Generally speaking, facts and figures are affected to a considerable extent by a number
of forces operating together. For example, statistics of production of paddy are affected by
4
9
rainfall, quality of soil, seeds and manure, method of cultivation etc. It is very difficult to study
separately the effect of each of these forces on the production of paddy .The same is true of
statistics of prices, imports, exports, sales, profits etc.
All statistics are numerical statements of facts, i.e., expressed in numbers. Qualitative
statements such as ‘the population of India is rapidly increasing’, or ‘the production of wheat is
not sufficient’, or ‘India is a poor country’ do not constitute statistics.
Facts and figures about any phenomenon can be divided in two ways, viz., by actual
counting and measurement or by estimate. Estimates cannot be as precise and accurate as actual
counts or measurements. The degree of accuracy desired largely depends upon the nature and
object of the enquiry. For example, in measuring height of persons even 1 / 10 th of a cm is
material whereas in measuring distance between two places, even fraction of a kilometer can be
ignored.
Before collecting statistics, a suitable plan of data collection should be prepared and the
work carried out in a systematic manner. Data collected in a haphazard manner would very likely
lead to fallacious conclusions.
The purpose of collecting data must be decided in advance. The purpose should be
specific and well defined. A general statement of purpose is not enough. For example, if the
objective is to collect data on prices, it would not serve any useful purpose unless one knows
whether he wants to collect data on wholesale or retail prices and what are the relevant
commodities in view.
It numerical facts are to be called statistics, they should be comparable. Statistical data
are often compared period wise or region wise. For instance, the population of India at a particular
point of time may be compared with that of earlier years or with the population of other countries.
10
Valid comparisons can be made only if the data are homogeneous, i.e., relate to the same
phenomenon or subject.
The large volume of numerical information gives rise to the need for systematic methods,
which can be used to organize, present, analyze and interpret the information effectively. Statistical
methods are primarily developed to meet this need. Different writers too have defined the term
Statistics in this sense differently. A few definitions are examined below.
Prof A.L.Bowley has given some definitions. At one place he says, “Statistics may be
called the science of counting.” This definition is too narrow because it covers only one aspect of
the science, namely, the collection of data. Other aspects like analysis, presentation, interpretation
etc are completely ignored.
At another place Bowley says,” Statistics may rightly be called the science of averages.”
This definition also is not satisfactory because averages are only one of the devices used in
statistical analysis. The other devices like dispersion, skewness, correlation etc., are not at all
covered by this definition.
Boddington defines Statistics as “ The science of estimates and probabilities”. This definition
is also incomplete because estimates and probabilities are only a part of statistical methods
According to Berenson and Revin,” The science of statistics can be viewed as the
application of the scientific method in the analysis of numerical data for the purpose of making
rational decisions”.
Croxton and Cowden have given a very simple and concise definition of Statistics. In
their view, “Statistics may be defined as the collection, presentation, analysis and interpretation
of numerical data”. This definition clearly points out four stages in a statistical investigation, viz.,
collection, presentation, analysis and interpretation of data.
However to the above stages, one more stage may be added and that is the organization
of data Thus, Statistics may be defined as the science of collection, organization, presentation,
analysis and interpretation of numerical data.
According to the above definition, there are five stages in a statistical investigation.
i) Collection of data:
Collection of data constitutes the first step in statistical investigations: Utmost care must
be exercised in collecting data because they form the foundation of statistical analysis. If data are
6
11
faulty, the conclusions drawn can never be reliable. The data may be available from existing
published or unpublished sources or else may be collected by the investigator himself.
Data collected from published sources are generally in unorganized form. However, a
large mass of figures that are collected from a survey frequently needs organization. The first step
in organizing a group of data is editing. The collected data must be edited very carefully so that
the omissions, inconsistencies, irrelevant answers and wrong computations in the returns from a
survey may be corrected or adjusted. After the data have been edited, the next step is to classify
them. The purpose of classification is to arrange the data according to some common characteristics
possessed by the items constituting the data. The last step is tabulation. The purpose of tabulation
is to arrange the data in columns and rows so that there is absolute clarity in the data presented.
After the data have been collected and organized, they are ready for presentation. Data %
presented in an orderly manner facilitate statistical analysis. There are two different methods in
which the collected data may be presented viz., diagrams and graphs.
After collection, organization and presentation, the next step is that of analysis. .The
purpose of analyzing data is to dig out information useful for decision-making. Methods used in
analyzing the presented data are numerous, ranging from simple observation of the data to
complicated, sophisticated and highly mathematical techniques like, measures of central tendency,
measures of dispersion, correlation, regretion etc.
v) Interpretation of data:
The last stage in statistical investigation is interpretation, i.e., drawing conclusions from
the data collected and analyzed. The interpretation of data is a difficult task and necessitates a
high degree of skill and experience. Correct interpretation will lead to a valid conclusion of the
study and they can aid one in taking suitable decisions.
It may be noted that over the past few decades the primary emphasis on Statistics has
been to develop procedures that can be used to deal with uncertainty. This modern conception
7
12
of the subject is a far cry from the one usually held by layman. Indeed even the pioneers in
statistical research have adopted it only with in the past three decades.
The scope of Statistics is so vast and ever expanding that not only it is difficult to define
it but also unwise to do so. Statistics pervades all subject matter, its use has permeated almost
every facet of our lives. It is a tool of all sciences indispensable to search and intelligent judgment
and has become a recognized discipline in its own right. There is hardly any field whether it is
trade, industry or commerce, economics, biology, astronomy, physics, chemistry, education,
medicine, sociology, or meteorology where statistical tools are not applicable.
Since ancient times the ruling kings and chiefs have relied heavily on Statistics in framing
suitable military and fiscal policies. Most of the statistics such as that of crimes, military strength,
population, and taxes etc, which were collected by them, were a by-product of administrative
activity. In recent years the functions of the State have increased tremendously. The concept of a
State has changed from that of simply maintaining law and order to that of a welfare state.
Statistical data and statistical methods are of great help in promoting human welfare.
The problems of the business enterprises are becoming complex due to the growing size
and competition. They are using more and more statistics in decision making. However, the
employment of statistical methods in the solution of business problems belongs almost exclusively
to the 20 th century. In earlier days when business firms were small, owners of the firms were
directly engaged in almost all the areas of business activity. With the growth in the size of business
firms it has often become impossible for the owners to maintain personal contact with the thousands
and lakhs of customers. Management has become a specialized job and a manager is called
upon to plan, organize, supervise and control the operations of the business house. Since very
little personal contact is possible with customers these days, a modern business firm faces a
much greater degree of uncertainty concerning future operations. A businessman who has to deal
in an atmosphere. Of uncertainty can no longer adopt the method of trial and error in taking
decisions. The businessman has to apply statistical methods systematically in order to deal with
the uncertainty. In recent years it has become increasingly evident that Statistics and statistical
methods have provided the businessman with one of his most valuable tools for decision-making.
13
Business activities can broadly be grouped under the following heads, viz., Production, Sales,
Purchase, Finance, Personnel, Accounting, Marketing and Product research and Quality control.
Statistical data and statistical methods are of immense help in the proper understanding
of the economic problems and in the formulation of economic policies. In fact, these are the tools
and appliances of an economist’s laboratory. Statistics of production help in adjusting to supply
to demand. Statistics of consumption enable us to find out the way in which people of different
strata of society spend their income. Such Statistics are very help full in knowing the standards of
living and taxable capacity of the people.
Statistical methods help not only in formulating appropriate economic policies but also
evaluating their effect. For example, in order to check the ever-growing population, if emphasis
has been placed on the family planning methods, one can ascertain statistically the efficacy of
such methods in attaining the desired goal. In recent years, econometrics, which comprises the
application of statistical methods to the theoretical economic methods, is widely used in economic
research. Statistical methods of sampling are useful for collecting the basic data of economic
studies. Statistical methodology also indicates the reliability of the data and the significance to be
attached to them.
The physical sciences, especially astronomy, geology and physics, were among the fields
in which statistical methods were first developed and applied. But, until recently these sciences
have not shared the 20th century developments of Statistics to the same extent as the biological
and social sciences. Currently, however, the physical sciences seem to be making increasing use
of Statistics, especially in astronomy, chemistry, engineering, geology, meteorology and certain
branches of physics.
Statistical techniques have proved to be extremely useful in the study of all natural sciences
like astronomy, biology, medicine, zoology, botany, etc: For example, in diagnosing the correct
disease, the doctor has to rely heavily on actual data like temperature of the body, pulse rate,
blood pressure. Similarly in judging the efficiency of a particular drug for curing a certain disease,
experiments have to be conducted and the success or failure would depend upon the number of
people who are cured after using the drug.
9
14
1.4.6 Statistics and Research:
The significance of Statistics in some important fields has been discussed above. Besides
these, Statistics is useful to bankers, brokers, insurance companies, social workers, labor unions,
and trade associations, chambers of commerce and to the politicians. For example, the banks
have to make a very careful study of the cash requirements, otherwise, they may find they are
short of cash and their existence is at stake. Similarly, the premium rates of the life insurance
companies are based upon very careful study of the expectation of life.
In spite of so many merits, Statistics is not completely free from certain limitations. Unless
the data are properly collected and critically interpreted, there is every possibility of drawing
wrong conclusions. Therefore, it is also necessary to know the limitations and the possible misuses
of Statistics. The following are the important limitations of Statistics
Since statistics deals with aggregate of facts, the study of individual measurements lies
outside the scope of Statistics. Data are statistical when they rotate to measurement of masses,
not statistical where they rotate to an individual item or event as separate entity. For example, the
wage earned by an individual worker at any one-time taken by itself is not a statistical datum. But
the wages of workers of a factory can be used statistically.
15
c) Statistical results are true only on an average:
The conclusions obtained statistically are not universally true. They are only under certain
conditions. This is because Statistics as a science is less exact as compared to natural sciences.
Statistical tools do not provide the best solution under all circumstances. Very often, it is
necessary to consider a problem in the light of a country’s culture, religion and philosophy.
The greatest limitation of Statistics is that it is liable to be misused. The misuse of Statistics may
arise because of several reasons. For example, if statistical conclusions are based on incomplete
information, one may arrive at wrong conclusions. Statistics are like clay and they can be moulded
in any manner so as ,to establish right or wrong conclusions. They very fact that it may lead to
wrong conclusions in the hands of inexperienced people limits the possibility of mass popularity
of such a useful science.
1.6 Summary:
The term Statistics is used both in singular and plural sense. In plural sense it represents
figures or data. In singular sense it represents statistical methods. The word Statistics comes
from the Italian word ‘statista’ or the German word ‘statistik’ which means a Political State:
Although Statistics originated a$ a science of kings, there has been a phenomenal development in
the use of Statistics in several fields. Statistics is now regarded as one of the most important tools
for taking decisions under uncertainty.
Different writers both in terms of data and methods have defined Statistics. Statistical
methods are being applied in almost every field government policies, business, economics, physical
sciences, natural sciences, research and many other fields. As in the case of other branches of
knowledge, Statistics has also certain limitations. Proper understanding of the subject must be
there in order to make use of it in full advantage.
1.7 Glossary:
1. Data: Data refers to any group of measurements that happens to interest us. these
measurements provide information the decision maker uses.
11
16
2. Statistics: Statistics is the use of data to help the decision maker reach better decisions.
4. Quantitative Data: Data that possess numerical properties are known as quantitative
data.
5. Variable: A variable is a characteristic that may take on different values at different times,
places or situations.
1.9 Readings:
2. B.N.Gupta: Statistics.
***************
12
17
GUIDELINE -2:
SOURCES OF DATA AND METHODS OF DATA
COLLECTION
Structure :
2.0 Objectives
2.1 Introduction
2.2 Primary Sources of Data
2.3 Methods of Collecting Primary Data
2.4 Secondary Sources of Data
2.5 Classification of Sources of Secondary Data
2.6 Choice between Primary and Secondary Data.
2.7 Summary
2.8 Glossary
2.9 Unit End Questions
2.10 Readings
2.0 Objectives:
a) To present the sources of data
b) To distinguish between primary and secondary* data
c) To highlight the methods of collecting primary data
d) To identify methods of collecting secondary data
2.1 Introduction
Collection of the required and relevant data is the first step in any statistical investigation.
Utmost care must be exercised while collecting data because data constitute the foundation on
which the superstructure of statistical analysis is built. The results obtained from the analysis are
property interpreted and policy decisions are taken. Hence, if the data are inaccurate and
inadequate the whole analysis may be faulty and the decisions taken misleading.
13
18
Data may be collected either from the primary source or the secondary source. A primary
source is one that itself collects the data. A secondary data is one that makes available data,
which were collected by some other agency. For example, the data collected by the Ministry of
Industries and made available through various publications constitute primary source However, if
the Ministry of Industries uses data collected by some other organisation, say, National Sample
Survey Organisation, this will constitute secondary source for the Ministry.
Primary data are information collected or generated by the researcher for the purposes
of the project. Primary data are obtained by a study specifically designed to fulfill the data needs
of the problem at hand. For example, an investigator wants to know about the level of job
satisfaction .iijoyed by the teachers in a University, He can prepare a schedule and meet a sample
member of teachers and ask for their opinions. This is going to be the information collected for
the purpose of this study and hence becomes primary in character. When the data are collected
for the first time, the responsibility for the processing of data also rests with the original investigator.
Ordinarily, experiments and surveys constitute the principal sources of primary data. In order to
understand the nature of primary sources of data better, we have to consider the advantages and
disadvantages of primary data.
The following are some of the important advantages of this type of data.
i) Primary data are the first hand account of the situation. We can observe the phenomenon
as Mistaking place.
ii) There is greater scope for reliability of the information. As the investigator collects the
data for himself he can take all precautions to ensure the reliability of data.
iii) Primary data are the logical starting point for research in several disciplines. Unless
someone gathers and accumulates fact or information, there is no body of knowledge.
iv) For the purpose of knowing opinions, personal qualities, etc., primary data are the only
source.
14
19
2.2.2 Disadvantages of Primary Data:
i) Collection of primary data is expensive in terms of both time and money. To accumulate
the needed data, we may have to spend, sometimes, years too. It is for this reason,
individual researchers try to limit their scope to a manageable level, unlike the studies
undertaken by research organisations.
ii) There is greater scope for bias of the researcher. Unless the research investigator is fair
to the respondents and methods of data collection, the results of the study will not be
reliable.
iii) Sample selection is yet another problem in the collection of primary data. If the conclusions
of the study are to be meaningful, the researcher must select a representative sample.
But the selection of such representative sample is not an easy task.
iv) The limitations of methods of collecting primary data turn out to be disadvantageous for
this source. For example, limitations of observation technique like non-cooperation of
respondents, non-observability of the situation, low reliability of conclusions etc, become
the disadvantages of the primary source of data.
The primary data are the information generated to meet the specific requirements of the
investigation at hand. As such, the investigator is required to collect data separately for the study
taken up by him. The primary data are collected by different methods such as, direct personal
interviews, indirect oral interviews, information from correspondents, mailed questionnaires,
schedules and observation methods. The methods one discussed in detailed below.
Personal interview is one of the techniques of data collection through primary sources. It
is a verbal method of securing data in the field surveys. Information are obtained by conversing
with the respondents. Though talking has importance in the conduct of interview, it is not a simple
two way conversation between the researcher and the respondent. Gestures, glances, facial
expressions and pauses often reveal the subtle feelings of the respondents.
Interview is the most direct means of conducting enquiry into any research problem.
Since many of the social science research problems involve the personal contact of the respondents,
15
20
interview is, probably, the only method through which the researcher would be able to establish
such direct contact.
Under this method of collecting data, the investigator contacts third parties called witnesses
capable of supplying the necessary information. The method is generally adopted in those cases
where the information to be obtained is of a complex nature and the informants are not inclined to
respond if approached directly. For example, in an enquiry regarding addiction to drugs, alcohol,
etc., people may be reluctant to supply information about their own habits. It would be necessary
in that case to get the desired information from those dealing in drugs, liquor or other people who
may be knowing them, for example, their neighbours, friends, etc. Enquiry Committees and
Commissions appointed by the Government generally adopt this method to get people’s views
and all possible details of facts relating to the enquiry.
Under this method, the investigator appoints local agents or correspondents in different
places to collect information. These correspondents collect and transmit the information to the
central office where the data are processed. News paper agencies generally adopt this method.
Correspondents in different places supply in formal ion relating to such events as accidents, riots,
strikes, etc., to the head office. This method is also adopted by various departments of Government
in such cases where regular information is to be collected from a wide area. This method is
particularly suitable in case of crop estimates. The special advantage of this method is that it is
cheap and appropriate for extensive investigation.
A questionnaire is a tool or device for securing answers to the set of questions by the
respondent who fills in the form of questionnaire himself. It is a systematic compilation of questions
that are submitted to a sampling of population from which information is desired. The questions
are normally arranged in a sequence depending on the nature of the study and are administered
for reply. Usually the questionnaires are posted to the respondents and hence they are called
mailed questionnaires.
16
21
post it to the address given within a specified time period. This method of data collection is
preferred when the investigator cannot reach the respondents in person either because of the
cost involved or there is no particular reason to see them personally.
Questionnaire is the more popular method of data collection used in social science research.
It is the cheaper method of collecting data, as the investigator is not required to approach the
respondents personally. He can simply post the questionnaire and appeal lo the respondents to
return it in time. The questionnaire helps in collecting areas of data and not very extensive bodies
of data. It is effective because the respondents are able to express their reactions clearly with
greater openness as there is less fear when there is no immediate listener.
A schedule is a list of questions, which helps to collect data from the Held. This is generally
filled in by the enumerator or the researcher. He sits with the respondent face to face and fills up
the data sheet by asking him the questions. According to Goode and Matt, schedule is the name
usually applied to a set of questions which are asked and filled by the i nterviewer, in face to face
situation with another. One can get accurate and first hand information through this method. As
the researcher meets the respondents in person, he can talk to them, explain to them the utility of
the study and convince them to cooperate with him, in making the study meaningful.
Data collection under this method proceeds in a systematic manner. The investigators or
enumerators proceed to the field with the schedules and administer them on the sample, selected
by them. They goon asking the questions incorporated in the schedule and note down the responses
of the respondents. If there is any difficulty, the enumerators are supposed to assist respondents
for overcoming the difficulty. As such, the quality of the data depends on the people who go to
the field and collect the data. Normally in case of individual researchers they themselves meet
every respondent and collect the data.
2.3.6. Observation:
Observation, in simple terms, is defined as watching the things with some purpose in
view. According to Goode and Matt, science begins with observation and must ultimately return
to observation for its final validation. In the words of P.V.Young, it is a systematic viewing coupled
with consideration of the seen phenomenon. In yet another way. observation is defined as the
process of recognizing and noting people, objects and occurences rather than asking tor
information.
17
22
Observation is one of the cheaper and more effective techniques of data collection. For
example, instead of asking consumers what brands they buy or what television programmes they
view, a belter alternative may be to simply observe what products are bought and what programmes
are viewed. This approach to the collection of information is as old as the human race. In the
fields of commerce and economics, observation of the prices, markets and capital flows is more
a common activity, which serves as a good example for probing into the behaviour of the
phenomena. Observation is considered to be a handy tool in marketing research.
Secondary data refer to the information that have been collected by someone other than
a researcher for purposes other than those involved in the research project at hand. Books,
journals, manuscripts, diaries, letters etc., all become secondary sources of data as they are
written or compiled for a separate purpose. The researcher depending on his necessity and
relevance may use the data, findings or results incorporated in these documents.
There are various factors such as the nature of the study, status of the investigator,
availability, of financial resources, time and degree of accuracy of the results desired, that decide
the choice of the sources of data.
When compared to primary data, secondary data have the following advantages.
i) Economy is clearly the greatest advantage of secondary data. Instead of printing data
collection forms, hiring held workers, sending them to the places throughout the field
area, data tabulation and analysis, we can get ready results from the secondary data
compiled by somebody else.
ii) Besides economy, quickness in data is another factor associated with secondary data.
For example, we required factual position regarding the reasons for absenteeism in the
Indian industry. If we start collecting data on this, perhaps, it may take one to two years.
But the same can be obtained with the help of secondary data in a few days.
iii) Another greater advantage of secondary data is that they provide information that may
not he secured by the individual researcher. For example, a research organisation like
National Institute of Public Finance and Policy is studying the problems of tax evasion in
18
23
India and the measures to curtail the same. Probably the organisation with its credibility
tor conducting such studies may be able to gather information from the Ministry of Finance
and Central Board of Direct faxes, which would be difficult to obtain by an individual
researcher.
19
24
some other purpose. For example, Central Statistical Organisation is collecting data on savings,
consumption, investment etc. These can be used for several purposes. It all becomes internal to
the organisation. Similarly, in case of a company, information are compiled on several items like
sales, cost, assets, liabilities, profits, production etc. The data compiled usually for record purpose
may be used in several studies undertaken by itself or by an outside research organisation.
2.5.2 External Sources:
Much of the secondary data is external in nature. All that is available with outside
organisations falls into this category. By all means, the sources of external information are quite
numerous and vast. These external sources-again can be divided into two, namely personal and
public sources.
A. Personal Sources:
This is the information compiled by the individuals. An individual may record his views.
thoughts about himself, others, society, etc., for his own sake or as a memory. Whatever be the
reason for recording one’s own thoughts, several people do carry out this activity. These sources
would be helpful to a researcher who is probing into personalities, history, events etc.
Usually people who have some stature and dedicated their lives for a particular cause
think it useful to record their life and experiences for the posterity. These autobiographies of life
histories written by others, serve as use fill documents to highlight a particular point. For example,
the autobiographies of freedom fighters like Mahatma Gandhi, Netaji, Jawaharlal Nehru, etc.,
will serve as better sources.
ii) Diaries:
Many have the habit of recording their daily events in the form of a book. They write not
only their own activities, but also their view point on the issues, reactions, of the people and many
other things. Diaries of important people are also published. These serve to be the most important
source of knowing the life history of a person and the contemporary society if they have been
written continuously over a long period.
20
25
iii) Letters:
Letters also occupy an important place in the personal source of information. Many of us
know that freedom fighters like Gandhi, Nehru, Patel wrote several letters to their leaders, relatives
discussing various aspects. The letters of Radha Krishnan are widely acknowledged in the discipline
of philosophy. These help the researchers to understand more inmate aspects of an event and to
clarify the stand taken by them regarding that aspect. Letters are helpful in giving an idea of the
attitudes of a person and trend of his mind.
iv) Memoirs:
Some people write memoirs of their travel important events of their life and,other significant
phenomena that they come across. These memoirs provide useful material in the study of many a
social phenomena. Memoirs are different from diaries, in the sense that they describe only some
events and are more elaborate than the diary.
B. Public Sources:
Much of the material for business research is obtained from these sources. These are
termed public for the reason that deals with issues rather than lives and histories of the people.
These sources are again divided into two categories, viz., unpublished and published.
i) Unpublished Sources:
For several reasons, though the matter is of public interest, they are not published for the
benefit of many. They would be available only at the place of their origin. Theses submitted to
universities by several researchers come into this category Till they are published by himself or by
some outside agency, the copy of the thesis would be available with the researcher and with the
library. Besides these, the proceedings of committees, minutes of meetings, nothings on tiles are
all unpublished sources of information. Though unpublished, these sources have their own uses.
There are a variety of published sources from which one can get the required data. A
brief description of such published sources is given below.
a) Books:
Books are significant among the published sources of information. Ever since man secured
superiority over the other beings in nature, he was in search of codifying what he learnt and
21
26
practiced. Lot of information can be obtained from books edited consisting of papers submitted
in seminars, conferences etc.
b) Journals:
Besides books, journals provide a lot of research material to an investigator. Journals are
the rapid source of communication compared to a book. These supply material of current interest
and help to arouse discussion on a subject. The general tendency in the publication of journals is
that they relate to a particular subject like commerce and management. On the other hand, they
may be devoted to the development of a particular aspect in the discipline like foreign trade,
banking, marketing, personnel etc.
c) News Papers:
Useful data can be obtained from news papers published daily or otherwise. The Economic
Times, Financial Express, Business Standard are the three important dailies that are regarded as
highly useful to the research students in commerce, economics and management. These three
newspapers carry articles, news, and other information on several matters connected with these
three disciplines mainly. Occasionally they also bring out special issues focusing on the specific
topics of interest. Besides, they have introduced weekly features to cover special areas like
management, taxation, finance, personnel, advertising, small industry and entrepreneurship.
In our country, every Government department attached to the Ministry bring out annual
and other periodical reports on the working of its department. These are later published as status
reports. For instance, the Departments of Agriculture, Industry, Education, Public Enterprises,
bring out reports on the working of the establishments, institutions, undertakings functioning under
their jurisdiction. The Publications Division of the Government of India is always busy in publishing
the same for the use of the departments, offices etc.
22
27
the subject probed into by them. Important among such bodies are the Office of the Comptroller
and Auditor General of India, Committee on Public Undertakings and Public Accounts Committee.
Besides, we have autonomous corporations created through a separate enactment like Reserve
Bank of India, Industrial Credit and Investment Corporation of India etc. Reserve Bank of India
and other special financial institutions are bringing out various publications which serve as useful
sources of data.
There are various non-profit organisations established for the purpose of promoting
academic pursuit like National Council for Applied Economic Research (NCAER), National
Institute for Public Finance and Policy (NIPFP), National Institute for Educational Planning and
Administration (NIEPA), National Institute of Bank Management (NIBM), National Institute of
Personal Management (N1PM), Indian Institute of Foreign Trade (IIFT), Institute of Public
Enterprise (IPE). Some more other organisations are National Institute of Rural Development
(NIRD), National Institute for Small Industry Extension Training (N1S1ET). Besides, there are
four management institutes and over 150 university departments. These autonomous organisations
bring out from time to time various publications in the name of monographs, occasional papers,
research bulletins, abstracts etc. All these serve as a useful reference to the research students.
In any statistical investigation, the researcher has to select between primary and secondary
data very carefully. In most of the studies both the sources may be used. But the choice of the
data depends on the nature and purpose of the study. The following are said to govern the choice
of the sources of data.
This is the first and foremost factor that decides the type of data to be collected. For
example, in order to understand the perception and factors influencing entrepreneurship among a
particular community, one has to collect primary date. On the other hand, the trends in the capital
market activity can be adequately presented with the help of secondary data.
Sometimes, time and money also decide the nature and scope of the study to be conducted.
These two considerations have their own impact as to the choice of a particular source of data.
23
28
If one cannot spend much of his time and money, he prefers to avoid primary sources and
depend on the secondary sources.
If one wants to ensure high degree of accuracy and unquestionable findings, he may have
to collect data from the primary sources. As indicated already, secondary data are those collected
by some body for the purposes other than those of the investigator. Such data may or may not be
suitable to the requirements of the present problem.
2.7 Summary:
There are two sources of collecting data viz., primary and secondary. If the investigator
collects data himself or with the assistance of somebody, they are called primary data. If the data
arc already collected by some other agency and make them available, they are called secondary
data. There are certain advantages and limitations in both primary and secondary data.
The primary data are collected by different methods such as, personal interviews, oral
interviews, information from correspondents, mailed questionnaires, schedules and observation
methods. Books, journals, news papers, government reports, publications of research organisations,
all become secondary sources of data. The choice between selection of primary and secondary
data depends on nature and scope of the study, availability of time and money, degree of accuracy
desired, and status of the investigator.
2.8 Glossary :
2. Secondary source: It is one that makes available data collected by some other agency.
4. Respondent: A person who fills the questionnaire or supplies the required information for
a schedule.
24
29
2.9. Unit end Questions:
1. What are primary and secondary sources of data? Explain the relative merits and limitations
of these two sources of data.
2. What is the importance of primary data? Explain different methods of collecting primary
data.
3. What is the importance of secondary data? How can you classify various sources of
secondary data?
2.10. Readings :
******************
25
30
GUIDELINE-3 :
CLASSIFICATION AND
TABULATION OF DATA
Structure
3.0 Objectives
3.1 Introduction
3.2 Classification
3.6 Summary
3.8 Readings
3.0 Objectives
b) To acquaint the student with preparation of discrete and continuous frequency distributions.
3.1 Introduction
The collected data are usually contained in schedules and questionnaires. But they are
not in an easily understandable from. The answers will require some analysis if their salient points
are to be brought out. As a rule, the first step in the analysis is to classify and tabulate the
information collected. In case published data have been collected, the investigator has to rearrange
these into new groups and tabulate the new arrangement. In case of some investigations, the
classification and tabulation may give such a clear picture of the significance of the material
26
31
arranged that no further analysis is required. Although the phrase “classification and tabulation”
has been used, classification is, in effect, only the first step in tabulation, for in general, items
having common characteristics must be brought together before the data can be displayed in
tabular form.
3.2 Classification :
Classification is the process of arranging data in groups or classes on the basis of common
characteristics. Data having a common characteristic arc placed in one class, and in this way the
entire data get divided into a number of groups or classes. Classification of data is a function very
similar to that of sorting letters in a post office. It is well known that letters collected in a post
office are sorted into different lots on a geographical basis, i.e., in accordance with their
destinations such as Chennai, Kolkata, Mumbai, Delhi etc. They are then put in separate bags,
each containing letters with a common characteristic, viz., having the same destination. Classification
of statistical data is comparable to the sorting operation.
i) To condense the mass of data in such a manner that similarities and dissimilarities can be
readily apprehended.
iv) To give prominence to the important information gathered while dropping out the
unnecessary elements.
Broadly, the data can be classified on various bases, viz., geographical, chronological,
qualitative and quantitative.
27
32
3.3.1 Geographical Classification
In this type of classification, data are classified on the basis of geographical or locational
differences between various items like, states, regions, cities, zones, areas etc. For example,
procurement of rice in India may be presented state wise in the following manner.
State wise procurement of Rice in 2004-05
28
33
Stale wise Procurement of Rice
Year Production
(Million Kgs.)
1997-98 835.6
1998-99 855.2
1999-00 836.8
2000-01 848.4
2001-02 847.4
2002-03 846.0
2003-04 850.5
2004-05 830.7
Time series are usually listed in chronological order, normally starting with the earliest period,
when the major emphasis tails on the most recent events, a reverse time order may be used.
When only are attribute is studied two classes are formed, one possessing the attribute
and the other not possessing the attribute. This type of classification is known as simple classification.
For example, the population under study may be divided into two categories as follows.
29
34
Population
Urban Rural
In a similar manner, we may classify the population on the basis of sex, i.e. into males and
females, or literacy, i.e., into literates and illiterates and so on. The type of classification where
only two classes are formed is also called two fold or simple classification.
If instead of forming only two classes we further divide the data on the basis of some
attribute or attributes so as the form several classes, the classification is known as manifold
classification. For example, we may first divide the population into males and females on the
basis of the attribute sex; each of these classes may be further subdivided into literates and
illiterates on the attribute literacy. Further classification can be made on the basis of some other
attribute, say, employment. An example of manifold classification is given below.
Population
Males Females
Literates Illiterates Literates Illiterates
Emp. Unemp. Emp. Unemp. Emp. Unemp. Emp. Unemp.
30
35
Weight No. of Students
90-100 50
100-110 200
110-120 260
120-130 360
130-140 90
140-150 40
Total 1000
In this type of classification, there are two elements, namely (i) the variable, i.e., the
weight in the above example, and (ii) the frequency, i.e., the number of students in each class.
There were 50 students having weight ranging from 90 to 1001b, 200 students having weight
ranging from 100 to 1101b, and so on. Thus we can find out the ways in which the frequencies
are distributed.
A frequency distribution refers to data classified on the basis of some variable that can be
measured such as prices, wages, age, number of units produced or consumed. The term ‘variable’
refers to the characteristic that varies in amount or magnitude in a frequency distribution. A
variable may be either continuous or discrete. A continuous variable is capable of manifesting
every conceivable fractional value within the range of possibilities, such as the height or weight of
persons or the weight of a product. Thus, in a continuous variable data are obtained by numerical
measurements rather than counting. For example, when a student grows, say, from 90 cm. To
150cm., his height posses through all values between these limits.
A discrete variable is that which can very only by finite “jumps” and cannot manifest
every conceivable fractional value. For instance, the number of rooms in a house can only take
certain values such as 1, 2. 3 etc. Similarly, the number of machines in an establishment are
discrete variables. Generally speaking, continuous data are obtained through measurements,
while discrete data are derived by counting. Series which can be described by a continuous
31
36
variable arc called continuous series. Series represented by a discrete variable are called discrete
series. The following are two examples of discrete and continuous frequency distributions.
0 10 100-110 10
1 40 110-120 15
2 80 120-130 10
3 100 130-140 45
4 250 140-150 20
5 150 150-160 4
6 50 Total 134
Total 680
Although the theoretical distinction between continuous and discrete variation is clear
arid precise, in practical statistical work it is only an approximation. The reason is that even the
most precise instruments of measurement can be used only to a finite number of places. Thus,
every theoretically continuous series can never be expected to How continuously with one
measurement touching another without any break in actual observations.
The process of preparing this type of distribution is very simple. We have just to count
the number of times a particular value is repeated which is called the frequency of that class In
order to facilitate counting, prepare a column of tallies. In another column, place all possible
values of variable from the lowest to the highest. Then put a bar (vertical line) opposite the
particular value to which it relates. To facilitate counting, blocks of five bars arc prepared and
some space is left in between each block. We finally count the number of bars and get frequency.
Illustration I: In a survey of 35 families in a village, the number of children per family was recorded
and the following data obtained.
1, 0, 2, 3, 4, 5, 6, 7, 2, 3, 4, 0, 2, 5, 8, 4, 5, 4, 6, 3, 2, 7, 6, 5, 3, 3, 7, 8, 9, 7, 9, 4, 5, 4, 3
32
37
Represent the data in the form of a discrete frequency distribution.
Solution:
0 11 2
1 1 1
2 1111 4
3 1111 1 6
4 1111 1 6
5 1111 l 5
6 111 3
7 1111 4
8 11 2
9 11 2
Total 35
This type of classification is most popular in practice. The following technical terms are
important when a continuous frequency distribution is formed, or data are classified according to
class intervals.
i) Class Limits
The class limits are the lowest and the highest values that can be included in the class. For
example, take the class 10-20. The lowest value of the class is 10 and the highest 20. Thus 10 is
called lower limit and 20 is called upper limit of that class. The Way in which class limits are
stated depends upon the nature of the data.
The difference between the upper and lower limit of a class is known as class interval of
that class. For example, in the class 100-200, the class interval is 100. An important decision
33
38
while constructing a frequency distribution is about the width of the class interval, i.e., whether it
should be 10, 20, 50, 100, 500 etc. The decision would depend upon a number of factors such
as the range in the data, i.e., the difference between the largest and smallest item, the number of
classes to be formed etc. A simple formula to obtain the estimate of appropriate class interval i.e.,
C is
The question now is how to fix the number of classes, i.e., K. The number can be’ either
fixed arbitrarily keeping in view the nature of the problem under study or it can be decided with
the help of Sturges Rule. According to him, number of classes can be determined by the formula
: K=l +3.322 log N, where N=total number of observations.
It is the value lying half way between the lower and upper class limits of a class interval.
Mid point of a class is ascertained as follows:
For the purpose of further calculations in statistical work, the mid point of each class is taken to
present that class.
There are two types of classifying the data according to class intervals, viz., exclusive
and inclusive.
34
39
a) Exclusive Method: When the class intervals are so fixed that the upper limit of one class is the
lower limit of the next class, it is known as the exclusive method of classification. The following
data are classified on this basis.
Income (Rs.) No. of Persons
1000-1100 50
1100-1200 100
1200-1300 200
1300-1400 150
1400-1500 40
1500-1600 10
Total 550
It is clear that the exclusive method ensures continuity of data in as much as the upper
limit of one class is the lower limit of the next class. Even though it is widely used method, some
times we may get confusion. If an item 20 occurs, then doubt will arise that in which class either
10-20 or 20-30, the item is to be included. However, it is always presumed that the upper limit
is exclusive, i.e. the item 20 should be included in 20-30 class.
b) Inclusive Method: Under the inclusive method of classification the upper limit of one
class is included in that class itself. The following example illustrates the method.
Income (Rs. No.of Persons
1000-1099 50
1100-1199 100
1200-1299 200
1300-1399 150
1400-1499 40
1500-1599 10
Total 550
To decide whether to use the inclusive or the exclusive method, it is important to determine
whether the variable under observation is a continuous or discrete one. In case of continuous
variables the upper limit exclusive method must be used. In general, the inclusive method should
be used in case of discrete variables.
Illustration 1: Prepare a frequency table for the following data with width of each class
interval as 10. Use exclusive method of classification.
35
40
57, 44, 80, 75, 00, 18, 45, 14, 04, 64, 72, 51, 69, 34, 22, 83, 70, 20, 57, 28, 96, 56,
50, 47, 10, 34, 61, 66, 80, 46, 22, 10, 84, 50, 47, 73, 42, 33, 48, 65, 10, 34, 66, 53, 75, 90,
58, 46, 38, 69
Solution:
Class Interval Tallies Frequency
0-10 11 2
10-20 1111 5
20-30 1111 4
30-40 1111 5
40-50 1111 111 8
50-60 1111 111 8
60-70 1111 111 7
70-80 1111 5
80-90 1111 4
90-100 11 2
Total 50
36
41
i) It simplifies complex data.
When data are tabulated, all unnecessary details and repetitions are avoided. Data are
presented systematically in columns and rows. Hence, the reader gets a very clear idea of what
the table represents. There is thus a considerable saving in time and taken in understanding what
is represented by the data and all confusion is avoided. Also a large amount of space is saved
because of non-duplicating of headings and designations; the description at the top of a column
serves for all the terms beneath it.
Tabulation facilitates comparison. Since a table is divided into various parts and for each
part there arc totals and subtotals, the relationship between different parts of data can be studied
much more easily with the help of a table than without it.
When the data are arranged in a table with a title and number, they can be distinctly
identified and can be used as a source reference in the interpretation of a problem.
Tabulation reveals patterns within the figures which cannot be seen in the narrative form. It also
facilitates the summation of the figures if the reader desires to check the total.
The number of parts of a table varies from case to case depending upon the given data.
However, the main parts of a table in general are: table number, title of the tabic, caption, stub,
body of the table, head note and footnote. These parts are explained below.
i) Table Number
Each table should be numbered. There are different practices with regard to the place
where this number is to be given. The number may be given either in the centre at the top above
the title or inside of the title at the top or in the bottom of the table on the left hand side.
37
42
ii) Title of the table
Every table must be given suitable title. The title is a description of the contents of the
table. A complete title has to answer the questions what, where and when in that sequence. The
title should be clear, brief and self explanatory. However, clarity should not be sacrificed for the
sake of brevity. The title should be so worded that it permits one and only one interpretation. It
should be in the form of a series of phrases rather than complete sentences. Its lettering should be
the most prominent of any lettering on the table.
iii) Caption
Caption refers to the column heading. It explains what the column represents. It may consist of
one or more column headings. Under a column heading there may be subheads. The caption
should be clearly defined and placed at the middle of the column. If different columns are expressed
indifferent units, the units should be mentioned with the captions. As compared with the main part
of the table the caption should be shown in small letters. This help in saving space.
(iv) Stub
As distinguished from caption, stubs are the designations of the rows or row headings.
They are at the extreme left and perform the same function for the horizontal rows of numbers in
the table as the column headings do for the vertical columns of numbers. The .tubs are usually
wide than column headings but should be kept as narrow as possible without sacrificing precision
and clarity of statements.
v) Body:
The body of the table contains the numerical information. This is the most vital part of the
table. Data presented in the body arranged according to description are classifications of the
captions and stubs.
It is a brief explanatory statement applying to all or a major part of the material in the
table, and is placed below the point centered and enclosed in brackets. It is used to explain
certain points relating to the whole table that have not been included in the title nor in the captions
38
43
or stubs. For example, the unit of measurement is frequently written as a headnote, such as “in
thousand” or “in million tones” or “in crores” etc.
Any thing in a table which the reader may find difficult to understand from the title,
captions and stubs should be explained in footnotes. If footnotes are needed, they are placed
directly below the body of the table. Footnotes are used for the following main purposes:
b) Any special circumstances affecting the data, for example, strike, lockout, fire etc.
d) To give the source in case of secondary data. The reference to the source should be
complete in itself.
There are various systems of identifying the footnotes. One is numbering them
consecutively with small numbers 1, 2, 3... or letters a, b, c, d....
Format of a Table
Table Number
Title
Headnote
Source
Foot notes.
Tables may be broadly classified into two categories, viz., simple and complex tables,
general purpose and special purpose tables.
39
44
3.5.1 Simple and complex tables
The distinction between simple and complex table is based on the number of characteristics
studied. In a simple table only are characteristic is shown. This type of table is also known as
one-way table. On the other hand, in a complex table, two or more characteristics are shown.
Such tables are more popular in practice because they enable full information to be incorporated
and facilitate a proper consideration of all related facts. When two characteristics are shown
such a table is known as’ two-way table. When three characteristics are shown in a table, this
type of tabulation is known as treble tabulation. When four or more characteristics are
simultaneously shown it is a case of manifold tabulation. The following examples will illustrate the
distinction between simple and complex tables.
In this type of table only one characteristic is shown. This is the simple type of table. The
following is the illustration of such table.
Below 25 ..............
25-35 ..............
35-45 ..............
45-55 ..............
Above 55 ..............
Total ..............
B. Two-way table
Such a table shows two characteristics and is formed when either the stub or the caption
is divided into two coordinate parts. The following example illustrate the nature of such table.
40
45
No. of Employees Total
When three or more characteristics are represented in the same table, such a table is
called higher order table. The need for such a table arises when we are interested in presenting a
number of characteristics simultaneously. While constructing such a table it is necessary to first
establish an order of precedence among the attributes or characteristics sought to be classified
having regard to their relative importance.
General purpose tables also known as reference tables or repository tables provide
information for general use or reference. They usually contain detailed information and are not
constructed for specific discussion. In other words, these tables serve as repository of information
and are arranged for easy reference. Tables published by Governmental agencies are mostly of
this kind, such as the tables contained in the Statistical Abstract of the Indian, detailed tables
contained in the census reports etc. Such tables tell facts which are not for particular discussion.
When such tables are used by a researcher, they are usually placed in the appendix of the reports
for easy reference.
Special purpose tables, also known as summary tables, provide information for particular
discussion. When attached to a report they are found in the body of the text These tables are also
called derivative tables since they arc often derived from general tables. Thus the large detailed
tables in the census records of the Government of India are general purpose tables. When such
data are used, they are ordinarily taken from the general purpose tables and presented as special
41
46
purpose tables, which emphasise the relation the user wishes to stress A special purpose table
should be designed in such a way that reader may easily refer to the table lor comparison,
analysis or emphasis concerning the particular discussion.
3.6 Summary :
Classification and tabulation is the first step in the analysis of any collected data.
Classification refers to the process of dividing the entire data into different groups or classes. The
data can be classified on various bases, viz., geographical, chronological, qualitative and
quantitative. Quantitative classification of data forming discrete or continuous distributions help
the investigator for further analysis to a great extent.
The process of arranging the statistical data in columns and rows in a systematic manner
is called tabulation. Tabulation has got its own significance. In general a table consists of different
parts, viz., table number, title of the table, captions, stubs, main body, headnote and footnotes.
Tables may be broadly classified into two categories, viz., simple and complex tables, and general
and special purpose tables.
2. State the advantages of tabular representation of data. Explain different parts of a table.
3. Explain the significance of tabulation of data. What are the different types of tables?
4. Prepare a frequency distribution with class internal as 10 from the following sequence of
observations.
67, 34, 36, 48, 49, 31, 61, 34, 43, 45, 38, 32, 27, 61, 29, 47, 36, 50, 46, 30, 46, 32, 30, 33,
45, 49, 48, 41,53,36,37, 47, 47, 30, 46, 50, 28, 35, 35, 38, 36, 46, 43, 34, 62, 69, 50,28,44.43
42
47
3.8 Readings :
43
48
GUIDELINE - 4 :
GRAPHIC PRESENTATION OF DATA
Structure
4.0 Objectives
4.1 Introduction
4.6 Summary
4.8 Readings
4.0 Objectives :
4.1 Introduction :
One of the most convincing and appealing ways in which statistical results may be presented
is through diagrams and graphs. Evidence of this can be found in newspapers, magazines, journals,
advertisements etc. There are numerous ways in which statistical data may be displayed pictorials
such as different types of diagrams and graphs. Very often, the problem is that of selecting the
best out of several methods that may be available. This is a difficult task and requires a great deal
44
49
of artistic talent and imagination on the part of the individual or agency engaged in the preparation
of diagrams and graphs.
In a graphic mode of presentation, the points or lines of various kinds are used to represent
data. Each graph paper has thick lines for each division of an inch or centimeter measure and thin
lines for smaller part of the same. A graph, of whatever size, is divided into four quadrants but
normally the first quadrant is used unless there are negative figures to be shown on either of the
axes. The horizontal axis is called x-axis and the vertical line is called y-axis. These intersect at a
center point called the origin indicated by O. The negative quantity of any variable shown on
horizontal axis is on the left of the origin. The negative quantity of any variable shown on the
vertical axis will be indicated on the lower portion of the origin.
Graphic presentation of any statistical data has got certain advantages and some limitations
also.
4.2.1 Advantages
i) Render complex data simple: Graphic presentation renders complex data simple ana
easily understandable by giving a picturesque view.
ii) Give attractive, interesting and impressive view: A graph looks to be more attractive than
a table of figures. The features of data become visible at a glance from a graph. So it be
comes very easy to study the tendency and fluctuations in data.
iii) Save time and labour: Graphic method is the simplest-method of presenting statistical
data. Therefore, it saves time and labour of both the statistician as well as the observer
Display of time services and frequency distribution can be made quite effective through
a graph.
iv) Make comparisons easy: Comparisons between two or more phenomena can he made
very easily with the help of a graph. In the words of Dickson Hail well, “illustrations
including graphs tend to simplify comparisons of statistical matter and trend.”
45
50
v) Avoid knowledge of mathematics: No special knowledge of mathematics is required to
understand the message of the data from the graph.
vi) Certain statistical measures can be ascertained: Graphic presentation of statistical data is
helpful in interpolation, extrapolation and forecasting. With the help of them, one can
also determine median, quartiles and mode.
Due to the above advantages, graphic presentation of statistical data is becoming more
and more popular with the statisticians.
4.2.2 Limitations
1. A curve simply shows tendency and fluctuations, actual values are not known.
While graphing statistical data, the following points should be born in mind.
1. Title: Every graph must have a clear and comprehensive title so that what facts are
represented in the graph may be known.
2. Structural frame work: The independent variables should always be measured along the
x-axis and dependent variable along the y-axis. The scale along the y-axis should begin
from zero as origin. For actual plotting of the data, it should be remembered that for
every value of the independent variable, there is a corresponding value of the dependent
variable.
3. Choice of scale: The choice of the scale should be so made as to accommodate the
whole data. Tn the words of A.L. Bowley, “It is difficult to lay down rules for the proper
choice of scales by which the figures should be plotted out. It is only the ratio between
46
51
the horizontal and vertical scales that need to be considered.” One has to note that the
scale need not be the same for both x-axis and y-axis.
4. Use of false base line: When fluctuations in a variable are small relative to its size and it is
desired to visualize these fluctuations properly, the vertical scale may be stretched. This
can be done if, instead of showing the entire scale from zero to the highest value involved,
only as much is shown as is necessary for the purpose. The portion which lies between
zero and the lowest value of the variable is left out. This omission is indicated by a scale
break.
5. Use of ratio or logarithmic scale: For showing proportional changes, ratio, or logarithmic
scale should be used.
6. Line designs: If more than one line is plotted on the same graph, it is necessary to distinguish
them by different patterns.
7. Captions: The scale caption for the x-axis is placed under the centre of the horizontal
axis. The scale caption for the y-axis is placed at the top or middle of the y-scale.
8. Index: An index should be given to show the scales and the meaning of different curves.
A large variety of graphs are used in practice. However, here we shall discuss only some
of the important types of graphs, which are more popular. Broadly various graphs can be divided
into two heads, viz.. graphs of time series and graphs of frequency distributions. The first category
is dealt in this section and the second category is dealt in the next section.
When we observe the values of a variable at different points of time, the series so formed
is known as a time series. The technique of graphic presentation is extremely helpful in analysing
changes at different points of time. On the x-axis we generally take the time and on the y-axis the
value of the variable and join the various points by straight lines. The graph so formed is known
as the time series graph.
47
52
Graphs of time series can be constructed either on a natural scale or on a ratio scale. In
natural or arithmetic scale, absolute changes from one period to another are shown whereas, in
a ratio scale the rates of change or the relative changes are shown.
Illustration 1: Represent the following data of per capita income graphically.
2400
2200
2000
1800
1600
1980-81
1981-82
1982-83
1983-84
1984-85
1985-86
1986-87
1987-88
Years
48
53
4.5 Graphs of Frequency Distributions :
In a frequency graph, the size or the value of” the item is presented on the horizontal axis
ami the frequency or the number of items on the vertical axis. A frequency distribution can be
presented graphically in any one of Hie ways, viz., histogram, frequency polygon, frequency
curve and ogives.
4.5.1 Histogram :
While constructing histogram, the variable is always taken on the x-axis and the frequencies
depending on it on the y-axis. Each class is then represented by a distance on the scale that is
proportional to its class interval. The distance for each rectangle on the x-axis shall remain the
same in case the class intervals are uniform throughout. If the class intervals are different, the
width of the rectangles shall also vary. The y-axis represents the frequencies of each class which
constitute the height of its rectangle. In this manner we get a series of rectangles each having a
class interval distance as its width and the frequency as its height. The area of the histogram
represents the total frequency as distributed throughout the classes.
The technique of constructing histogram is presented below (a) for distributions having
equal class intervals and (b) for distributions having unequal class intervals.
When class intervals are equal, take the variable on the x-axis and the corresponding
frequency on y-axis, and construct adjacent rectangles. In such a case, the height of the rectangles
will be proportional to the frequencies.
Frequency 11 28 36 49 33 20 8
49
54
Histogram
60
50
Frequency
40
30
20
10
0
____ 100 110 120 _ 130 140 150 _ 160
Variable
50
55
Solution:
We observe thai the class interval for 5lh class is 10, which is double the minimum class
interval of 5. Hence we have to divide the frequency 12 by 2 and take the height as 6. Similarly
for the last two classes, the class interval is 20 which is 4 limes the minimum class interval of 5.
So, we have to divide both the class frequencies 12 and 8 by 4 and take the height as 3 and 2
respectively.
1. With the help of a histogram: We may draw a histogram of the given data and
then join by straight lines the mid points of the upper horizontal side of each
rectangle with the adjacent ones. The figure so formed is called frequency
polygon. It is an accepted practice to close the polygon at both ends of the
51
56
distribution by extending them to the base lino. When this is done, two
hypothetical classes all each end would have to he included, each with a frequency
of zero.
Frequency Polygon
No. of students
15
10
5
0
25 35 45 55 65 75 85 95
Marks (Mid values of class intervals)
52
57
4.5.3 Frequency curve
The procedure of drawing a frequency curve is quite similar to that of a frequency polygon
except, joining the points by a free hand instead of a line segment. As in the case of frequency
polygon, the curve can be obtained from a histogram or by calculating niijdlc values of class
intervals. We have to. take middle values of class intervals on x-axis and corresponding frequencies
on y-axis, plot the points and then join the points with free hand.
Solution:
Frequency curve
120
100
No. of Employees
80
60
40
20
Salary (Rs.)
53
58
There are two methods of constructing ogives, namely ‘less than’ and ‘more than’ methods.
a) Less than method: In the less than method, we start with the upper limits of the
classes and go on adding the frequencies. When these frequencies are plotted,
we get a raising curve.
b) More than method: In the more than method, we start with the lower limits of the
classes and from the total frequency we go on subtracting frequency of each
class. When these frequencies are plotted, we get a declining curve.
From the stand point of graphic presentation, the ogives are used for some special
purposes. Ogives are used to determine the number or proportion of cases above or below a
given value. Ogives are also drawn for determining certain values graphically such as median,
quartiles, deciles etc.
Illustration 5: Draw less than and more than cumulative frequency curves for the following
data.
No. of 4 6 10 20 18 2
Students
Marks less than No. of Students Marks more than No. of students
20 4 10 60
30 10 20 56
40 20 30 50
50 40 40 40
60 58 50 20
70 60 60 2
54
59
Less than cumulative frequency curve More than cumulative frequency curve
4.6 Summary:
Graphical presentation is one of the most convincing way in which statistical results may
be presented. Graphic presentations render complex data simple. They give attractive, interesting
and impressive view. They save time and labour. Comparisons can be made very easily with the
graphic presentation. Some graphs also help to obtain certain statistical measures like median
quartiles etc. certain general rules are prescribed for graphic presentation.
Graphs can broadly be divided into two categories, viz., graphs of time series and graphs
of frequency distributions. Histogram, frequency, polygon, frequency curve and ogives are the
most popular graphic presentations of frequency distributions.
1. What is the significance of graphic presentation of statistical data? What are the general
rules to be followed for graphic presentation?
4. The index numbers of Indian industrial projects with 1950-100, as the base year are
given in the following table. Present the data by a suitable graph.
55
60
Year 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
Index 187 222 246 239 234 229 192 260 182 247
Number
5. The annual steel production of a company for a period of nine years is given in the
following table. Draw an appropriate diagram to present the data.
Year 1965 1966 1967 1968 1969 1970 1971 1972 1973
Production 212 220 235 257 289 263 250 283 290
(Tonns)
6. The monthly profits in thousand rupees of 100 shops are distributed as follows:
No. of Shops 12 18 27 20 17 6
Draw a histogram, frequency polygon, frequency curve and ogives for the above data.
Mid Value 15 25 35 45 55 65 75
Frequency 10 24 40 32 20 14 4
8. You arc given the following frequency distribution of monthly expenditure on food incurred
by a sample of 100 families.
No. of families 7 24 30 27 8 4
56
61
4.8 Readings:
57
62
GUIDELINE-5 :
DIAGRAMATIC PRESENTATION OF DATA
Structure :
5.0 Objectives
5.1 Introduction
5.4 Summary
5.6 Readings
5.0 Objectives:
5.1 Introduction :
One of the most convincing and appealing ways in which statistical results may be presented
is through diagrams and graphs. A diagram is a visual form for presentation of statistical data,
highlighting their basic facts and relationship. Diagrams are used with great effectiveness in the
presentation of all types of data. When properly constructed, they readily show information that
might otherwise be lost amid the details of numerical tabulation.
A properly constructed diagram appeals to the eye and also to the mind because it is
practical, clear and easily understandable even by those unacquainted with the methods of
presentation. Though diagrams do not add any new meaning to the statistical facts, yet they
exhibit the results more clearly. Diagramatic representation of statistical facts is the best way of
58
63
appealing to the mind through the eyes. We can observe the following merits and limitations of
diagramatic presentation.
5.2.1 Merits:
Diagrams are attractive and create lasting impression. A person who does not like to
devote even a single minute to the study of a page containing numerical tables, in most eases
would not like to take his eyes away from an attractively constructed diagram even from the
same data. They do not strain the mind of the observer.
Diagrams have the merit of rendering the whole data readily intelligible. The mass of
complex data, when depicted through a diagram, can be understood easily. Diagrams bring forth
the characteristics of data.
Diagrams make comparison between two sets of data possible. This is one of the objectives
of diagramatic presentation. In absolute figures, the comparison is sometimes not very clear, but
diagramatic presentation makes it simpler and easier.
Diagramatic presentation saves a lot of time which could have otherwise been lost in
grasping the significance of numerical data. The data which will take hours to understand them,
their diagramatic presentation will make their basic characteristics clear in minutes.
A diagram depicts more information than the data shown in a table. It clarifies the existing
trend in the data and how the trend changes. Though such information is there in the tables also,
but to find out trend from them is a difficult and time consuming job.
59
64
5.2.2 Limitations:
Diagrams arc important tools. But they arc not substitutes for classification and tabulation.
They arc complementary to those processes. Only a rough idea can be had about data from a
diagram. Diagramatic presentation is risky in the hands of those who draw conclusions from
them without making a careful study. Care has to be taken while using diagramatic presentation
and interpreting it against mis-understanding or misrepresenting it.
iv) It is not possible to present a precise difference between two sets of data.
v) If there is wide gap between different measurements, thus also cannot be shown
meaningfully in a diagram.
vi) A diagram is limited to the portrayal of two or three aspects of a set of data,
otherwise it becomes too complex to be understood.
There is a large number of diagramatic forms to choose from. The choice of the types of
diagrams in which the data are to be presented i§ a difficult one. Selection of the type will-
depend upon ability and experience. The commonly used types of diagrams are: one dimensional,
two dimensional, three dimensional diagrams, portograms and cartograms.
60
65
5.3.1 One Dimensional Diagrams :
In such diagrams, only one dimensional measurement, i.e., height, is used and the width
is not considered. Such diagrams are in the form of line or bar charts. On the basis of sizes of
figures, heights of lines or bars are drawn. In bar charts, no doubt, width is kept, but ii has no
relation with the measurement. Width is used only to make diagram look beautiful and attractive.
One dimensional diagrams may be of the following types.
A. Line Diagram
Line diagram is used in case where there arc many items to be shown and there is not
much of difference in their values. Such diagram is prepared by drawing a vertical hne for each
item according to the scale. The distance between lines is kept uniform. Line diagram makes
comparison easy, but it is less attractive.
No.of Children 0 1 2 3 4 5
Frequency 15 10 13 6 3 3
Solution:
Line Chart
16
14
Frequency
12
10
0 1 2 3 4 5
No. of Children
61
66
B. Simple Bar Diagram
A simple bar diagram is used to represent only one variable. For example, the figures of
sales, production, population etc., for various years may be shown by means of a simple bar
diagram. Simple bar diagrams are very popular in practice. They can be vertical or horizontal. In
practice, vertical bars are more popular. In bar diagrams, width of the each bar is the same, but
the length varies according to the numerical data.
Illustration 2: Following tables gives the birth rate per thousand of different countries
over a certain period.
Country Birth Rate Country Birth Rate
India 33 China 40
Germany 16 New Zealand 30
U.K. 20 Sweden 15
Represent the above data by a suitable diagram.
Solution:
Bar Diagram
45
40
35
Birth Rate
30
25
20
15
10
5
0
Germany
Sweden
Zealand
China
U.K.
India
New
62
67
In component bar diagram, each bar representing the magnitude of a given phenomenon
is further subdivided into its various components. Each component occupies a part of the bar
proportional to its share in the total. The subdivisions are distinguished by different colours or
crossings or dottings.
Illustration 3: The number of students in five different colleges are given below.
College Boys Girls Total
A 450 620 1070
B 1260 910 2170
C 1590 1260 2850
D 1340 1150 2490
E 1050 830 1880
Represent the data by a suitable diagram:
Solution:
Component Bar Diagram
3000
2500
2000
1500
1000
500
0
A B C D E
College
Girls
Boys
63
68
D. Multiple Bar Diagram
In a multiple bar diagram, two or more sets of interrelated data are represented. The
technique of drawing such a diagram is the same as that of simple bar diagram. The only difference
is that, since more than one phenomenon is represented, different shades, colours, dots or crossing
are used to distinguish between the bars. Whenever a comparison between two or more related
variables is to be made, multiple bar diagram should be preferred.
Illustration 4: Draw a suitable diagram from the following data.
Year Sales Gross Profit Net profit
2001 120 40 20
2002 135 45 30
2003 140 55 35
2004 150 60 40
Multiple Bar Diagram
180
160
Sales, Gross profit, Net Profit
140
120
(Rs. 1000)
100
80
60
40
20
0
Year
Sales
Gross profit
Net profit
64
69
E. Percentage Bar Diagram
Percentage bars are particularly useful in statistical work, which requires the portrayal of
relative changes in data. When such diagrams are prepared, the length of the bars is kept equal
to 100 and segments are cut in these bars to represent the components (percentages) of an
aggregate.
Illustration 5: Represent the following by subdivided bar diagram on the percentage basis.
a)Wages 9 15 21
b)Other cost 6 10 14
c)Polishing 3 5 7
Total cost 18 30 42
2.Sale proceeds/chair 20 30 40
Profit/loss 2 - -2
Solution: Take the sale price per chair as 100 and express the other figures in percentages. The
percentages so obtained are given below.
65
70
Cost, Sale price and Profit & Loss per chain
As distinguished from one dimensional diagrams in which only the length is taken into
account, in two dimensional diagrams the area will be taken into consideration. There are so
many types of two dimensional diagrams, of which pie diagram is the most popular one.
Pie Diagram :
Pie diagrams arc very popularly used in practice to show percentage breakdowns. For
example, with the help of a pie diagram we can show how the expenditure of the Government is
distributed over different heads like Agriculture, Irrigation, Industry, Transport & Defence etc.
Similarly through a pie diagram we can show how the expenditures incurred by an industry arc
divided under different heads like raw materials, wages and salaries, selling and distribution costs
etc. The pie chart is so called because the entire diagram looks like pie, and the components
resemble slices cut from pie.
While making comparisons, pie diagrams should be used on a percentage basis and not
on an absolute basis, since a series of pie diagrams showing absolute figures would require that
larger totals be represented by larger circles. Such presentation involves difficulties of two-
dimensional comparisons. However when pie diagrams are constructed on a percentage basis,
66
71
percentages can be presented by circles equal in size. It may be noted that this problem docs not
arise in the use of a single pie diagram.
In laying out the sectors for pie chart, it is desirable to follow some logical arrangement
or sequence. It is a common practice to begin the largest component sector of a pie diagram at
12’0 clock position on the circle. Usually the other component sectors are placed in clockwise
succession in decending order of magnitude, except for catch-all components like “miscellaneous”
and “”all others” which are shown last, contrast with adjacent sectors.
In constructing a pie chart, the first step is to prepare the data so that the various component
values can be transposed into corresponding degrees on the circle. The second step is to draw a
circle of appropriate size with a compass. The size of the radious depends upon the available
space and other factors of presentation.
The third step is to measure points on the circle representing the size of each sector with
the help of a protractor. The ordinary protractor is based upon a scale in which the total circle is
360°, but it is possible to purchase a protractor in which the entire circle is divided not into 360
but 100 equal parts so that the angle representing any desired percentages can be read directly.
An essential feature of the pie chart is careful identification of each sector with some kind
of explanatory or descriptive level. If there is sufficient room, the labels can be placed inside the
sectors; otherwise the labels should be placed in contiguous positions out side the circle, usually
with an arrow pointing to the appropriate sector.
Pic diagrams are at times less effective than bar diagrams for accurate reading arid
interpretation, particularly when series are divided into a large number of components or the
difference among the components is very small. It is generally inadvisable to attempt to portray a
series of more than five or six categories by means of a pie chart.
Illustration 6: Draw a pie diagram for the following data of sixth five year plan public sector
outlays.
67
72
Agricultural and rural development 12.9%
Energy 27.2%
Percentage outlay
X 360 percentage outlay x 3.6
100
Computation of Angles
Now a circle shall be drawn suited to the size of the paper and divided into 6 parts
according to degrees of angles at the center. The angles have been arranged in descending and
the diagram is presented below.
68
73
Pie Diagram showing sixth five year Plan Public Sector Outlays
5.4 Summary:
A diagram is a visual form for presentation of statistical data, highlighting their basic facts
and relationship. Diagramatic presentation has got certain merits as well as some limitations.
Diagrams arc attractive and impressive. They make data simple and intelligible. They
make comparison possible. They save time and labour. They have universal utility. They give
more information. Diagrams can show only a limited amount of information. Diagrams cannot be
analysed further.
The commonly used diagrams are bar diagrams of different types and pie diagrams.
Selection of an appropriate diagram is also somewhat difficult problem. The choice would primarily
depend upon two factors, namely, the nature of the data and the type of people for whom the
diagram is meant. There are different types of bars and the appropriate type of bar chart can be
divided on the following basis.
a) Simple bar charts should be used where changes in totals are required to be
conveyed.
b) Component bar charts arc more useful where changes in totals as well as in the
size of component figures are required to be displayed.
c) Percentage comparison bar charts are better suited where changes in the relative
size of component figures are to be exhibited.
69
74
d) Multiple bar charts should be used where changes in the absolute values of the
component figures are to be emphasised and the overall total is of no importance.
A pie chart is particularly useful where it is desired to show the relative proportions of the
figures that go to make up a single overall total. Unlike bar charts, it is not restricted to there or
four component figures although its effectiveness tends to dwindle with more than seven or eight
components.
5.5 Unit end Questions & Exercises :
1. Discuss the meaning, utility and limitations of diagramatic presentation of statistical data.
2. State briefly the purposes served by the diagramatic presentation. Explain different types
of bar diagrams.
3. Discuss the types of data which are usually represented by pie diagrams. Explain the
procedure of constructing a pie diagram.
4. The distribution of factories in 6 districts of Karnataka state is given below. Present the
data by a suitable bar diagram.
Name of the District No. of Factories
Bangalore 1,001
Belgaum 244
Bijapur 122
Bidar 14
Bellary 127
Coorg 27
(Draw a simple bar diagram)
5. The table given below gives the data relating to exports and imports in a country during
four years ending 1976-77.
70
75
Year Exports Imports
(Crores of Rs.) (Crores of Rs.)
1973-74 320 250
1974-75 340 260
1975-76 340 240
1976-77 310 200
Draw a multiple bar diagram to present the above data.
6. Draw suitable diagrams to present the following data.
Distribution of India’s population: 1971
Religion Percentage of Population
Rural Urban
Hinduism 84.33 76.25
Islam 9.96 16.21
Christian 2.43 3.26
Sikhism 1.92 1.81
Others 1.36 2.47
Total 100 100
(Hint: Use percentage bar diagram)
7. Represent the following data by subdivided bars drawn on percentage basis.
Particulars 1974 1975 1976
(Rs.) (Rs.) (Rs.)
Materials 48 68 99
Wages 36 52 63
Other costs 24 30 45
Total cost 108 150 207
Profit / loss 12 Nil -27
Sale price 120 150 180
71
76
8. Monthly expenditure of a family on various items is given in the following table. Draw a
pie diagram to the data.
Item Expenditure (Rs.)
Food 240
Clothing 160
House rent 120
Education 80
Fuel & Lighting 40
Miscellaneous 40
Total 680
5.6 Readings:
1. S.P.Gupta : Statistical Methods.
2. D.C.Sancheti & V.K.Kapoor : Statistics - Theory, Methods & Applications.
3. K.V.Rao: Research Methodology in Commerce and Management.
72
77
GUIDELINE-6 :
MEASURES OF CENTRAL TENDENCY
Structure :
6.0 Objectives
6.1 Introduction
6.2 Requisites of a Good Average
6.3 Types of Averages
6.4 Summary
6.5 Unit end Questions
6.6 Readings
6.0 Objectives
a) To present the significance of studying averages,
b) To enable the student to understand the concept of arithmetic mean,
c) To look into methods of obtaining median.
d) To present the concept and method of obtaining mode.
6.1 Introduction:
One of the most important objectives of statistical analysis is to get one single value that
describes the characteristic of the entire mass of unwieldy data. Such a value is called the central
value or an average. The word average is very commonly used in day to day conversation. For
example, we often talk of average boy in a class, average height or life of an Indian, average
income etc. When we say ‘he is an average student, what it means is that he is neither very good
nor very bad, just a mediocre type of student. However, in Statistics, the term average has a
different meaning.
The word ‘average’ has been defined differently by various authors. “An average value
is a single value within the range of the data that is used to represent all of the values in the series.
73
78
Since an average is somewhere within the range of the data, it is also called a measure of central
value.” - Croxton & Cowden.
Since an average is a single value representing a group of values, it is desired that such a
value satisfies the following properties.
Simple to compute: An average should not only be easy to understand but also simple to
compute so that it can be used widely. However, though ease of computation is desirable,
it should not be sought at the expense of other advantages.
ii) Based on all items: The average should depend upon each and every item of the series so
that if any of the items is dropped the average itself is altered.
iii) Not be unduly affected by extreme values: Although each and every item should influence
the value of the average, none of the items should influence it unduly. If one or two very
small or very large items unduly affect the average, i.e., either increase its value or reduce
its value, the average cannot be really typical of the entire series. In other words, extremes
may distort the average and reduce its usefulness.
iv) Rigidly defined: An average should be properly defined so that it has one and only one
interpretation. It should preferably be defined by an algebraic formula so that if different
people compute the average from the same figures, they will get the same answer. The
average should not depend upon the personal prejudice and bias of the investigator;
otherwise the results can be misleading.
v) Capable of further algebraic treatment: We should prefer to have an average that could
be used for further statistical computations so that its utility is enhanced. For example, it
we are given the data about the average income and number of employees of two or
more factories, we should be able to compute the combined average.
74
79
vi) Sampling stability: We should prefer to get a value which was sampling stability. This
means that if we pick 10 different groups of college students, and compute the average
of each group, we should expect to get approximately the : ,ie value. It does not mean,
however, that there can be no difference in the values of different samples. There may be
some difference, but those samples in which this difference, called sampling fluctuation,
is less are considered better than those in which this difference is more.
The very commonly used averages or measures of central tendency are arithmetic mean,
median, mode, geometric mean and harmonic mean. These measures are discussed in detail in
the following sections.
The most popular and widely used measure of representing the entire data by one value
is ‘average’ in the common usage. Statisticians call this as arithmetic mean. Its value is obtained
by adding together all the items and by dividing this total by the number of items.
Arithmetic mean is most widely used in practice because of the following reasons:
3. It is defined by a rigid mathematical formula with the result that every one who
computes the average gets the same answer.
75
80