0% found this document useful (0 votes)
2 views

Data Management _SKIMS_June_2022

The document outlines key steps in data management, focusing on data entry and cleaning processes. It emphasizes the importance of following specific commandments for effective data entry, such as using unique column names and entering data consistently. Additionally, it highlights the significance of constructing a data dictionary and involving a biostatistician in the review process.

Uploaded by

haqinaam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Management _SKIMS_June_2022

The document outlines key steps in data management, focusing on data entry and cleaning processes. It emphasizes the importance of following specific commandments for effective data entry, such as using unique column names and entering data consistently. Additionally, it highlights the significance of constructing a data dictionary and involving a biostatistician in the review process.

Uploaded by

haqinaam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Data Management

Inaamul Haq
Assistant Professor
Department of Community Medicine
Government Medical College, Srinagar
Steps in Data Analysis
Descriptiv
Data
e
Data Cleaning
Statistics
Entry
Data
Exploratio Inferential
n Statistics
Learning Objectives

Data Entry
Data Cleaning
Learning Objectives

Data Entry
Data Cleaning
The Spreadsheet – Microsoft Excel
Spreadsheet from hell

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
Spreadsheet from heaven

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
1

Enter all or most of the data as numbers.


Avoid entering letters, words, string
variables (e.g., NA, 22%, <3.6).
In Excel, all columns, with the exception of
names and text comments, should be
formatted as numbers or dates (not as
general or text)

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
2

Give each column a unique, simple, 1-


word name, 8 characters or less with no
spaces, beginning with a letter, and place
this name in the first row.

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
3

Put only one variable in a column. Do not


combine variables in the same column.

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
4

Enter each patient (or unit of analysis) on


a separate line, beginning on the second
line.

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
5

Give each research participant or patient a


unique case number (1,2,3, etc.)- in the
first column. Delete patient name, and any
identifying information before sending it to
a statistician. Always, save the
spreadsheet with a password.

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
Mother and daughter databases
Household level data Individual
level data
HousI Location Communit y HousInco HousI PersonID Diseased Exposed
D m D
1 A 3 1 1 101 1 1
2 B 1 2 1 102 2 1
3 C 35 2 2 201 2 2
4 D 67 1 2 202 1 2
5 E 2 1
• Each database has its own
6 F 2 1
unique identifier
5 G 2 1
… … … … • Link these relational databases
using a common index identifier
• Merge files when needed
Mother and daughter databases
The Ten Data Entry Commandments
6

Enter cases and controls in the same


spreadsheet. Use one variable to define
the control group (TREATED 0=no, 1=yes
or GROUP 1=Drug A, 2=Drug B).

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments

Quantify. Enter continuous measurements


when possible.

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
8

Create a simple guide (or key) using a


word processor/text editor to explain
variables abbreviations, value coding, and
how missing values were entered. Be
consistent.

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
Constructing a data dictionary
• Contains, for each variable:
Question Variable Type Format Values Logical
• Variable name name checks

• Description of questionnaire 1 EXERDAILY Integer Yes


No
=1
=2
Skip
pattern
item 2 EXERTYPE Integer Walking =1
• Various values of variable Cycling =2

(e.g., 1, 2, 3)
ETC…
• Meaning of each value (e.g.,
1= Yes, 2=No) Some softwares create variable catalogue automatically; Ideally investigator constructs the same

• The catalogue is particularly useful:


• When a database is shared with others
• If the researcher has to get back to the database later
The Ten Data Entry Commandments
9

Think through the analysis before


collecting any data.

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
The Ten Data Entry Commandments
10

Have a biostatistician review the coding


before data entry and again after the first
10 patients have been entered.

http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne
Learning Objectives

Data Entry
Data Cleaning
Data Cleaning and Data Exploration

• Frequency tables
• Histograms
• Cross-tabulations
• Scatterplots
Thank You

You might also like