2. Collection of Data
2. Collection of Data
(Statistics)
Collection of Data
Definition -
Data collection is defined as the procedure of collecting, measuring and analyzing accurate
insights for research using standard validated techniques. A researcher can evaluate their
hypothesis on the basis of collected data. In most cases, data collection is the primary and
most important step for research, irrespective of the field of research. The approach of data
collection is different for different fields of study, depending on the required information.
Data is a tool which helps in reaching a sound conclusion by providing information therefore.
For statistical investigation, collection of data is the first and foremost.
❖ Sources of Data:
Primary Source
Secondary Sources
Published sources
Un-published sources
❖ Primary Data: Data originally collected in the process of investigation are known as
primary data. This is original form of data which are collected for the first time.It is
collected directly from its source of origin.
Secondary data It refers to collection of data by some agency, which already collected the data
and processed. The data thus collected is called secondary data.
(1)
Collection of Data
02
Published sources:
❖ Govt. publication.
(2)
Collection of Data
02
❖ semi-Govt. Publication.
❖ Private publications e.g., Journals and News papers research institute, publication of trade
association.
❖ International publications.
Unpublished Sources -
The statistical data needn’t always be published. There are various sources of unpublished
statistical material such as the records maintained by private firms, business enterprises,
scholars, research workers, etc. They may not like to release their data to any outside
agency.
(3)
Collection of Data
02
(4)
Collection of Data
02
Important Questions
(5)
Collection of Data
02
6. Under random sampling, each item of the universe has __ chance of being selected.
(a) equal
(b) unequal
(c) zero
(d) none of these
7. Which of the following methods is used for the estimation of population in a country?
(a) Census method
(b) Sampling method
(c) Both fa) and (b)
(d) None of these
8. Personal bias is possible under:
(a) random sampling
(b) purposive sampling
(c) stratified sampling
(d) quota sampling
9. If the investigator wants to select a sample on the basis of diverse characteristics of
the population, which method should he use?
(a) Convenience sampling method
(b) Quota sampling method
(c) Stratified sampling method
(d) Both (b) and (c)
10.For drawing lottery _________________ sampling is used.
(a) random
(b) purposive
(c) stratified
(d) quota
11.Which of the following methods is used for the estimation of population in country`
(a) Sampling Method
(b) Census Method
(6)
Collection of Data
02
(7)
Collection of Data
02
(8)
Collection of Data
02
ANSWER KEY
Multiple Choice Answers-
1. B
2. A
3. C
4. D
5. D
6. A
7. A
8. B
9. D
10. A
11. B
12. B
13. A
14. D
15. A
(9)
Collection of Data
02
• Government publication
• Semi-government publication
5. The statistical information is published in the following parameters in the census of
India
• Population projection
• Sex composition of a population
• Density of population
• Size, growth rate, and distribution of people in India
6. The two demerits of indirect oral investigation are:
• Less accurate
• Biased
• Doubtful conclusion
7. The progress report of a railway published by the railway department is secondary
data.
8. The direct personal investigation method is suitable for collecting primary data only
on the following situations:
• When the investigation is confined and less
• When an authentic and accurate information is required
• When the data is to be kept secret
• When the direct contact with information is needed
9. A good Questionnaire should have the following qualities:
• Less number of Questions
• Should be clear
• Proper order of Question
• Non-controversial
• Questions related to the topic
• Request for return
10.A pilot survey is essential because of the following:
• It helps in assessing the quality and suitability of Questions.
(10)
Collection of Data
02
(11)
Collection of Data
02
Primary source of data implies collection of data from its source of origin. It offers you
first-hand quantitative information relating to your statistical study.
which are primary in the hands of one party may be secondary in the hands of
other.’’
Primary and Secondary Data—The Basic Difference
• If we are collecting data from its source of origin, for the first time, it is
primary data.
• If we are using data which have already been collected by somebody else, it is
secondary data.
Note: If you are getting data from somebody else who collected it from its source of
origin but did not use it for his own study, it will be deemed as primary data.
4. The direct personal investigation is the method by which data are personally collected
by the investigator from the informants. In other words, the investigator establishes
direct relation with the persons from whom the information is to be obtained. The
success of this method, however, requires that the investigator should be very
diligent, efficient, impartial and tolerant.
Direct contact with the workers of an industry to obtain information about their
economic conditions is an example of this method.
Suitability
This method of collecting primary data is suitable particularly when:
(i) the field of investigation is limited or not very large.
(ii) a greater degree of originality of the data is required.
(iii) information is to be kept secret.
(iv) accuracy of data is of great significance, and
(v) when direct contact with the informants is required.
Merits
Data, thus, collected have the following merits:
(i) Originality: Data have a high degree of originality.
(ii) Accuracy: Data are fairly accurate when personally collected.
(iii) Reliability: Because the information is collected by the investigator himself,
reliability of the data is not doubted.
(iv) Related Information: When in direct contact with the informants, the investigator
may obtain other related information as well.
(v) Uniformity: There is a fair degree of uniformity in the data collected by the
investigator himself from the informants. It facilitates comparison.
(vi) Elastic: This method is fairly elastic because the investigator can always make
necessary adjustments in his set of questions.
(13)
Collection of Data
02
Demerits
However, the method of direct personal investigation suffers from certain demerits, as
under:
(i) Difficult to Cover Wide Areas: Direct personal investigation becomes very difficult
when the area of the study is very wide.
(ii) Personal Bias: This method is highly prone to personal bias of the investigator. As a
result, the data may lose their credibility.
(iii) Costly: This method is very expensive in terms of the time, money and efforts
involved.
(iv) Limited Coverage: In this method, area of investigation is generally small. The
results are, therefore, less representative. This may lead to wrong conclusions.
5. Indirect oral investigation is the method by which information is obtained not from
the persons regarding whom the information is needed. It is collected orally from
other persons who are expected to possess the necessary information, these other
persons are known as witnesses. For example, by this method, the data on the
economic conditions of the workers may be collected from their employers rather
than the workers themselves.
Suitability
This method is suitable particularly when:
(i) the field of investigation is relatively large.
(ii) it is not possible to have direct contact with the concerned informants.
(iii) the concerned informants are not capable of giving information because of their
ignorance or illiteracy.
(iv) investigation is so complex in nature that only experts can give information.
This method is mosdy used by government or non-government committees or
commissions.
Merits
Some of the notable merits of this method are as under:
(i) Wide Coverage: This method can be applied even when the field of investigation is
very wide.
(ii) Less Expensive: This is relatively a less expensive method as compared to Direct
Personal Investigation.
(iii) Expert Opinion: Using this method an investigator can seek opinion of the experts
and thereby can make his information more reliable.
(14)
Collection of Data
02
(iv) Free from Bias: This method is relatively free from the personal bias of the
investigator.
(v) Simple: This is relatively a simple approach of data collection.
Demerits:
However, there are some demerits, as under:
(i) Less Accurate: The data collected by this method are relatively less accurate. This is
because the information is obtained from persons other than the concerned
informants.
(ii) Biased: There is possibility of personal bias of the witnesses giving information.
(iii) Doubtful Conclusions: This method may lead to doubtful conclusions due to
carelessness of the witnesses.
6. The difference between direct personal investigation and indirect oral investigation is
as under:
i. In the case of direct personal investigation, the investigator establishes direct
contact with the informants. On the other hand, in the case of indirect oral
investigation, information is obtained by contacting other than those about
whom information is sought.
ii. Direct Personal Investigation is generally possible when the field of investigation
is small. On the other hand, indirect oral investigation is generally preferred when
the field of investigation is relatively large.
iii. In the Direct Personal Investigation, the investigator must be well versed in the
language and cultural habits of the informants. There is no such requirement in
the case of Indirect Oral Investigation.
iv. Direct investigation is relatively costlier than the indirect investigation.
(16)
Collection of Data
02
(iv) Publications of Trade Associations: Some of the big trade associations, through
their statistical and research divisions, collect and publish data on various aspects of
trading activity. For example, Sugar Mills Association publishes information regarding
sugar mills in India.
(v) Publications of Research Institutions: Various universities and research institutions
publish information as findings of their research activities. In India, for example, Indian
Statistical Institute, National Council of Applied Economic Research publish a variety of
statistical data as a regular feature.
(vi) Journals and Papers: Many newspapers such as ‘The Economic Times’ as well as
magazines such as Commerce, Facts for You also supply a large variety of statistical
information.
(vii) Publications of Research Scholars: Individual research scholars also sometimes
publish their research work containing some useful statistical information.
(viii) International Publications: International organisations such as UNO, IMF, World
Bank, ILO, and foreign governments etc., also publish a lot of statistical information.
These are used as secondary data.
(2) Unpublished Sources
There are some unpublished secondary data as well. These data are collected by the
government organisations and others, generally for their self use or office record.
These data are not published. This unpublished numerical information may, however,
be used as secondary data.
A Note of Caution for the Users of Secondary Data Users of secondary data must
check:
(i) reliability of data,
(ii) suitability of data, and
(iii) adequacy of data.
9. Statistical errors are broadly classified as (i) sampling errors, and (ii) non-sampling
errors. Following are the details:
(i) Sampling Errors: These are related to the size or nature of the sample selected for
the study. Due to a very small size of the sample selected for study or due to
nonrepresentative nature of the sample, the estimated value may differ from the
actual value of a parameter. The error thus emerging, is called sampling error. For
example, if the estimated value of a parameter is found to be 10 while the
actual/true value is 20 then, the sampling error = estimated value of the parameter
– true value of the parameter = 10-20 = -10.
(ii) Non-sampling Errors: These are errors related to the collection of data. These are
of the following types:
(17)
Collection of Data
02
Error of Measurement: Error of measurement may occur due to.- (a) difference in
the scale of measurement, and (b) difference in the rounding off procedure
adopted by different investigators.
Error of Non-response: This arises when the respondents do not offer the required
information. Error of Misinterpretation: This arises when the respondent fails to
interpret the questions in the questionnaire.
Error of Calculation or Arithmetical Error: It occurs in the course of addition,
subtraction or multiplication of data.
Error of Sampling Bias: It occurs when, for some reason or the other, a part of
target population, cannot be included in the choice of a sample.
Larger the field of investigation or larger the population size, greater is the
possibility of errors related to the collection of data, or data acquisition. It must be
noted here that a non-sampling error is more serious than a sampling error.
Because a sampling error can be minimised by opting for a larger sample size. No
such possibility exists in case of nonsampling errors.
(18)