RM Lecture 23-24 Data Collection and It Handling
RM Lecture 23-24 Data Collection and It Handling
Table of Contents
DATA COLLECTION....................................................................................................................1
1-Information
Information refers to a body of facts that are in a format suitable for decision making, whereas
data are simply recorded measures of certain phenomenon. The raw data collected in the field
must be transformed into information that will answer the sponsor’s (e.g. manager’s) questions.
The conversion of raw data into information requires that the data be edited and coded so that the
data may be transferred to a computer or other data storage medium.
If the database is large, there are many advantages to utilizing a computer. Assuming a large
database, entering the data into computer follows the coding procedure.
2-Editing
Occasionally, a fieldworker makes a mistake and records an improbable answer (e.g., birth year:
1843) or interviews an ineligible respondent (e.g., someone too young to qualify). Seemingly
contradictory answers, such as “no” to automobile ownership but “yes” to an expenditure on
automobile insurance, may appear on a questionnaire. There are many problems like these that
must be dealt with before the data can be coded. Editing procedures are conducted to make the
data ready for coding and transfer to data storage.
Editing is the process of checking and adjusting the data for omissions, legibility, and consistency.
Editing may be differentiated from coding, which is the assignment of numerical scales or
classifying symbols to previously edited data.
The purpose of editing is to ensure the completeness, consistency, and readability of the data to be
transferred to data storage. The editor’s task is to check for errors and omissions on the
questionnaires or other data collection forms.
The editor may have to reconstruct some data. For instance, a respondent may indicate weekly
income rather than monthly income, as requested on the questionnaire. The editor must convert
the information to monthly data without adding any extraneous information. The editor “should
bring to light all hidden values and extract all possible information from a questionnaire, while
adding nothing extraneous.”
a-Field Editing
In large projects, field supervisors are often responsible for conducting preliminary field
edits. The purpose of field editing the same day as the interview is to catch technical
omissions (such as a blank page), check legibility of the handwriting, and clarify
responses that are logically or conceptually inconsistent. If a daily field editing is
conducted, a supervisor who edits completed questionnaires will frequently be able to
question the interviewers, who may be able to recall the interview well enough to correct
any problems. The number of “no answers,” or incomplete answers can be reduced with a
rapid follow-up simulated by a field edit. The daily edit also allows fieldworkers to re-
contact the respondent to fill in omissions before the situation has changed. The field edit
may also indicate the need for further training of interviewers.
b-In-House Editing
Although almost simultaneous editing in the field is highly desirable, in many situations
(particularly with mail questionnaires), early reviewing of the data is not possible. In-
house editing rigorously investigates the results of data collection.
c-Editing for Consistency:
The in-house editor’s task is to ensure that inconsistent or contradictory responses are
adjusted and that answers will not be a problem for coders and keyboard punchers.
Consider the situation in which a telephone interviewer has been instructed to interview
only registered voters that requires voters to be 18 years old. If the editor’s reviews of a
questionnaire indicate that the respondent was only 17 years of age, the editor’s task is to
eliminate this obviously incorrect sampling unit. Thus, in this example, the editor’s job is
to make sure that the sampling unit is consistent with the objectives of the study.
Editing requires checking for logically consistent responses. The in-house editor must
determine if the answers given by a respondent to one question are consistent with those
given to other, related questions. Many surveys utilize filter questions or skip questions
that direct the sequence of questions, depending upon respondent’s answer. In some cases
the respondent will have answered a sequence of questions that should not have been
asked. The editor should adjust these answers, usually to “no answer’ or “inapplicable,” so
that the responses will be consistent.
e-Item Non-response:
It is a technical term for an unanswered question on an otherwise complete questionnaire.
Specific decision rules for handling this problem should be meticulously outlined in the
editorial instructions. In many situations the decision rule will be to do nothing with the
unanswered question: the editor merely indicates in item non response by writing a
message instructing the coder to record a “missing value” or blank as the response.
However, in case the response is necessary then the editor uses the plug value. The decision
rule may to “plug in” an average or neutral value in each case of missing data. A blank
response in an interval scale item with a mid-point would be to assign the mid-point in the
scale as the response to that particular item. Another way is to assign to the item the mean
value of the responses of all those who have responded to that particular item. Another choice
is to give the item the mean of the responses of this particular respondent to all other questions
measuring the variables. Another decision rule may be to alternate the choice of the response
categories
used as plug values (e.g. “yes” the first time, “no” the second time, “yes” the third time,
and so on).
The editor must also decide whether or not an entire questionnaire is “usable.” When a
questionnaire has too many (say 25%) answers missing, it may not be suitable for the
planned data analysis. In such a situation the editor simply records the fact that a particular
incomplete questionnaire has been dropped from the sample.
a-Code Construction
When the question has a fixed-alternative (closed ended) format, the number of categories
requiring codes is determined during the questionnaire design stage. The codes 8 and 9 are
conventionally given to “don’t know” (DK) and “no answer” (NA) respectively. However,
many computer program fields recognize a blank field or a certain character symbol, such
as a period (.), as indicating a missing value (no answer).
There are two basic rules for code construction. First, the coding categories should be
exhaustive – that is, coding categories should be provided for all subjects or objects or
responses. With a categorical variable such as gender, making categories exhaustive is not
a problem. However, when the response represents a small number of subjects or when the
responses might be categorized in a class not typically found, there may be a problem.
Second, the coding categories should also be mutually exclusive and independent. This
means that there should be no overlap between the categories, to ensure that a subject or
response can be placed in only one category. This frequently requires that an “other” code
category be included, so that the categories are all inclusive and mutually exclusive. For
example, managerial span of control might be coded 1, 2, 3, 4, and “5 or more.” The “5 or
more” category ensures everyone a place in a category.
When a questionnaire is highly structured, pre-coding of the categories typically occurs
before the data are collected. In many cases, such as when researchers are using open-
ended response questions, a framework for classifying responses to questions cannot be
established before data collection. This situation requires some careful thought concerning
the determination of categories after editing process has been completed. This is called
post-coding or simply coding. The purpose of coding open-ended response questions is
to reduce the large number of individual responses to a few general categories of answers
that can be assigned numerical scores. Code construction in these situations necessarily
must reflect the judgment of the researcher. A major objective in code-building process is
to accurately transfer the meaning from written answers to numeric codes.
b-Code Book
A book that identifies each variable in a study and its position in the data matrix is called
as code book. The book is used to identify a variable’s description, code name, and field.
Here is a sample:
Rough Sample
• Q/V No. Field/ col. No. Code values
• -- 1-5 Study number
• - 6 City
• 1 = Lahore
• 2 = Rawalpindi
• 3 = Karachi
• 7 -9 Interview No.
• Gender 10 1 = Male
• 0 = Female
• Age 11-12 Actual
• Education 13 1 = Non literate
2 = Literate
c-Production Coding
Transferring the data from the questionnaire or data collection form after the data have
been collected is called production coding. Depending upon the nature of the data
collection form, codes may be written directly on the instrument or on a special coding
sheet.
4-Data Entries
Self typing is the method which is used normally in this manners. Otherwise use of scanner sheets
for data collection may facilitate the entry of the responses directly into the computer without
manual keying in the data. In studies involving highly structured paper questionnaires, an Optical
scanning system may be used to read material directly to the computer’s memory into the
computer’s memory. Optical scanners process the marked-sensed questionnaires and store the
answers in a file.
5-Cleaning Data
The final stage in the coding process is the error checking and verification, or “data cleaning”
stage, which is a check to make sure that all codes are legitimate. Accuracy is extremely
important when coding data. Errors made when coding or entering data into a computer threaten
the validity of measures and cause misleading results. A researcher who has perfect sample,
perfect measures, and no errors in gathering data, but who makes errors in the coding process or
in entering data into a computer, can ruin a whole research project.