Data Management _SKIMS_June_2022
Data Management _SKIMS_June_2022
Inaamul Haq
Assistant Professor
Department of Community Medicine
Government Medical College, Srinagar
Steps in Data Analysis
Descriptiv
Data
e
Data Cleaning
Statistics
Entry
Data
Exploratio Inferential
n Statistics
Learning Objectives
Data Entry
Data Cleaning
Learning Objectives
Data Entry
Data Cleaning
The Spreadsheet – Microsoft Excel
Spreadsheet from hell
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
Spreadsheet from heaven
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
1
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
2
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
3
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
4
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
5
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
Mother and daughter databases
Household level data Individual
level data
HousI Location Communit y HousInco HousI PersonID Diseased Exposed
D m D
1 A 3 1 1 101 1 1
2 B 1 2 1 102 2 1
3 C 35 2 2 201 2 2
4 D 67 1 2 202 1 2
5 E 2 1
• Each database has its own
6 F 2 1
unique identifier
5 G 2 1
… … … … • Link these relational databases
using a common index identifier
• Merge files when needed
Mother and daughter databases
The Ten Data Entry Commandments
6
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
8
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
Constructing a data dictionary
• Contains, for each variable:
Question Variable Type Format Values Logical
• Variable name name checks
(e.g., 1, 2, 3)
ETC…
• Meaning of each value (e.g.,
1= Yes, 2=No) Some softwares create variable catalogue automatically; Ideally investigator constructs the same
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
10
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
Learning Objectives
Data Entry
Data Cleaning
Data Cleaning and Data Exploration
• Frequency tables
• Histograms
• Cross-tabulations
• Scatterplots
Thank You