0% found this document useful (0 votes)
140 views

Data Preparation: March 6, 2010

The document outlines the key steps in preparing data for analysis: 1) checking questionnaires for completeness and quality, 2) editing unsatisfactory responses by returning to respondents or assigning missing values, 3) coding responses numerically, 4) transcribing data electronically, 5) cleaning data by checking for errors and inconsistencies, 6) statistically adjusting the data through weighting, variable respecification, and standardization to make it representative and suitable for analysis.

Uploaded by

Atisha Juneja
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views

Data Preparation: March 6, 2010

The document outlines the key steps in preparing data for analysis: 1) checking questionnaires for completeness and quality, 2) editing unsatisfactory responses by returning to respondents or assigning missing values, 3) coding responses numerically, 4) transcribing data electronically, 5) cleaning data by checking for errors and inconsistencies, 6) statistically adjusting the data through weighting, variable respecification, and standardization to make it representative and suitable for analysis.

Uploaded by

Atisha Juneja
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

Data Preparation

March 6, 2010
Data Preparation Process
Prepare Preliminary Plan of Data Analysis

Check Questionnaire

Edit

Code

Transcribe

Clean Data

Statistically Adjust the Data

Select Data Analysis Strategy


Questionnaire Checking
A questionnaire returned from the field may be unacceptable
for several reasons.
– Parts of the questionnaire may be incomplete.
– The pattern of responses may indicate that the
respondent did not understand or follow the instructions.
– The responses show little variance.
– One or more pages are missing.
– The questionnaire is received after the preestablished
cutoff date.
– The questionnaire is answered by someone who does not
qualify for participation.
Editing
Treatment of Unsatisfactory Results
– Returning to the Field – The questionnaires with
unsatisfactory responses may be returned to the field,
where the interviewers recontact the respondents.
– Assigning Missing Values – If returning the questionnaires
to the field is not feasible, the editor may assign missing
values to unsatisfactory responses.
– Discarding Unsatisfactory Respondents – In this
approach, the respondents with unsatisfactory responses
are simply discarded.
Coding
Coding means assigning a code, usually a number, to each possible
response to each question. The code includes an indication of the
column position (field) and data record it will occupy.

Coding Questions

• Fixed field codes, which mean that the number of records for each
respondent is the same and the same data appear in the same
column(s) for all respondents, are highly desirable.
• If possible, standard codes should be used for missing data. Coding of
structured questions is relatively simple, since the response options
are predetermined.
• In questions that permit a large number of responses, each possible
response option should be assigned a separate column.
Coding
Guidelines for coding unstructured questions:
• Category codes should be mutually exclusive and collectively
exhaustive.
• Only a few (10% or less) of the responses should fall into the
“other” category.
• Category codes should be assigned for critical issues even if
no one has mentioned them.
• Data should be coded to retain as much detail as possible.
Codebook
A codebook contains coding instructions and the necessary
information about variables in the data set. A codebook
generally contains the following information:
• column number
• record number
• variable number
• variable name
• question number
• instructions for coding
Coding Questionnaires
• The respondent code and the record number appear on each
record in the data.
• The first record contains the additional codes: project code,
interviewer code, date and time codes, and validation code.
• It is a good practice to insert blanks between parts.
An Illustrative Computer File
Fields
Column Numbers
Records 1-3 4 5-6 7-8 ... 26 ... 35 77

Record 1 001 1 31 01 6544234553 5


Record 11 002 1 31 01 5564435433 4
Record 21 003 1 31 01 4655243324 4
Record 31 004 1 31 01 5463244645 6
Record 2701 271 1 31 55 6652354435 5
Data Transcription
Raw Data

CATI/ Keypunching via Mark Sense Optical Computerized


CAPI CRT Terminal Forms Scanning Sensory
Analysis
Verification:Correct
Keypunching Errors

Computer Magnetic
Disks
Memory Tapes

Transcribed Data
Data Cleaning
Consistency Checks

Consistency checks identify data that are out of range,


logically inconsistent, or have extreme values.
– Computer packages like SPSS, SAS, EXCEL and MINITAB can

be programmed to identify out-of-range values for each


variable and print out the respondent code, variable code,
variable name, record number, column number, and out-
of-range value.
– Extreme values should be closely examined.
Data Cleaning
Treatment of Missing Responses
• Substitute a Neutral Value – A neutral value, typically the mean
response to the variable, is substituted for the missing responses.
• Substitute an Imputed Response – The respondents' pattern of
responses to other questions are used to impute or calculate a
suitable response to the missing questions.
• In casewise deletion, cases, or respondents, with any missing
responses are discarded from the analysis.
• In pairwise deletion, instead of discarding all cases with any missing
values, the researcher uses only the cases or respondents with
complete responses for each calculation.
Statistically Adjusting the Data
Weighting

• In weighting, each case or respondent in the


database is assigned a weight to reflect its
importance relative to other cases or respondents.
• Weighting is most widely used to make the sample
data more representative of a target population on
specific characteristics.
• Yet another use of weighting is to adjust the sample
so that greater importance is attached to
respondents with certain characteristics.
Statistically Adjusting the Data
Use of Weighting for Representativeness
 
Years of Sample Population
Education Percentage Percentage Weight
 
Elementary School
0 to 7 years 2.49 4.23 1.70
8 years 1.26 2.19 1.74

High School
1 to 3 years 6.39 8.65 1.35
4 years 25.39 29.24 1.15
 
College
1 to 3 years 22.33 29.42 1.32
4 years 15.02 12.01 0.80
5 to 6 years 14.94 7.36 0.49
7 years or more 12.18 6.90 0.57
 
Totals 100.00 100.00
Statistically Adjusting the Data
Variable Respecification

• Variable respecification involves the transformation of data


to create new variables or modify existing variables.
• E.G., the researcher may create new variables that are
composites of several other variables.
• Dummy variables are used for respecifying categorical
variables. The general rule is that to respecify a categorical
variable with K categories, K-1 dummy variables are needed.
Statistically Adjusting the Data
Variable Respecification

Product Usage Original Dummy Variable Code


Category Variable
Code X1 X2 X3

Nonusers 1 1 0 0
Light users 2 0 1 0
Medium users 3 0 0 1
Heavy users 4 0 0 0
 
Note that X1 = 1 for nonusers and 0 for all others. Likewise, X2 = 1 for
light users and 0 for all others, and X3 = 1 for medium users and 0 for all
others. In analyzing the data, X1, X2, and X3 are used to represent all
user/nonuser groups.
Statistically Adjusting the Data
Scale Transformation and Standardization

Scale transformation involves a manipulation of scale values


to ensure comparability with other scales or otherwise make
the data suitable for analysis.

A more common transformation procedure is


standardization. Standardized scores, Zi, may be obtained as:

Zi = (Xi - )/sx X

You might also like