Session 2 "Getting Started": Core Skills For Data Processing
Session 2 "Getting Started": Core Skills For Data Processing
“Getting Started”
Core Skills for Data Processing
ORSC 2004 - Internal Training
1
1 Core Skill Training Session Six: “Data Analysis”
Objective
2
Various data formats
3
Single Card data
Serial Number/ Respondent ID
1000290022 00061860200310041324 040800100000000000 1.3979167 R1
Record length
Respondent ID is the unique ID for the record
Number of lines in the file = Sample Size
Maximum Length of record = 32,767 (Size of integer)
4
Multicard data
00048011 01 04070917213204070917374232570237550
000480202837525750 111020744t242-345235849862468-2486
R1 0004803 1 111-4 208050505050810 245248609824096
0004804001010 55334333333433453145555413155 646890
0004805 2115245444433353443442343435514334333 425924
00070011 01 040709173010040709175624 245982496
000700201395277173 231019074646464060
R2 0007003 1 112-7 105080803050308 426246
0007004030707 33543553245533535255452355555553
0007005 21113123322&2133222122431232323212313
Each respondent will have more than 1 line of information called “CARD”
In general the length of card is 99 characters
Can also have more than 99 card length
Unique identification in this data format is Respondent ID + Card ID
Maximum Length of record = 32,767 (Size of integer). Maximum record
Length in this case is sum of record lengths of all cards
5
Quantum data format
Quantum can handle both single card/ multicard data formats
6
Introducing Quantum – What does it do?
Check and validate the data
Edit and correct the data
Produce different types of lists and reports of data
Produce new data files
Recode data and produce new variables
Generate tables
Perform Statistical Calculations
7
Underlying concepts
Quantum consists of 2 phases or sessions
Tabulation
Section
8
Underlying concepts
Edit section
•Data examination
•Data modification
•Data correction
Tables section
•Cross tabulation of data
•Control statements to determine layout
9
Layout of a table
Table title
Project Heading
X-break
Base size
Base
Title
Side
headings
Frequency
Percentage
Mean score
10
Coding conventions
A Quantum program is a file created using an editor – Text
editor
The tables section consists of statement types
11
Coding conventions
A Sample of Quantum Program
/*
/* Here is a comment
/*
tab q5 brk1;c=c115’1’;nz
+dsp
12
Fundamentals and Terminology
13
Fundamentals
Individual constants
Examples:
14
Fundamentals
Individual constants
16
Fundamentals
Numbers
- Whole Numbers
- Real Numbers
17
Variables/ column referencing
Columns are referred by their actual position in the data. This means, if you open the
data file in any editor and see the cursor position on which the data is highlighted, the
column position refers to the cursor position
In the case of single card data file, the actual column position itself is directly used for
referring to a column. For example, c12 refers to column 12 in a single card data file
In the case of milticard data file, the column should be referred in combination with
the card number. The format of column referencing is “cXNN” if the number of cards
are less than 9 and “cXXNN” if the number of cards are more than 9. Where X refers
to the card number and NN refers to the column position. One digit column positions
should be referred by preceding the column number with “0”.
18
Variables/ column referencing
A series of columns may be considered as either string or numeric and is
referenced as c(m,n) where m is the start column position and n is the
end column position
Examples:
19
Describing Data Structure
20
Data Structure
By default Quantum reads one record or a line from your data file at a
time. Each record may be up to 100 columns long
Some surveys consist instead of long records with more than 100
columns of data
Format: struct;options
21
Data Structure – contd..
Specifying Long records
struct;reclen=n
struct;read=2;ser=c(m,n);crd=c(p,q)
Example: struct;read=2;ser=c(1,4);crd=c80
22
Data Structure – contd..
When a multi-card set is read, the cards are defined as follows:
Card 1 Columns 101-200
Card 2 Columns 201-300
Card 3 Columns 301-400
Card 4 Columns 401-500
…..
Card 10 Columns 1001-1100
The option max=n is used to define the maximum number of cards in the set
Example:
struct;read=2;ser=c(1,5);crd=c(6,7); max=19
23
Data Structure – contd..
Checking the structure of multi-card data sets
Quantum automatically checks for - Duplicate card types within serial number
and adjacent duplicate serial numbers
It is not mandatory that all cards should be present for every respondent in a
multicard data file
Example:
struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1,2
In this example each record must have a card 1 and 2 present. If either or both
are missing the record will be rejected
If you require a series of cards to be present specify the first and last separated
by a slash
struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1/5
24