Data Management and Analysis For Successful Clinical Research
Data Management and Analysis For Successful Clinical Research
4
Timeline
• For abstract, please send us datasets at
least 4 weeks in advance
• Please contact us even if you don’t
have the dataset ready, so we can
schedule other projects and leave
room for yours
5
1. Writing the Proposal
• Background
• Why this research is important
• Be concise
• Specific Aims, Testable Hypothesis
• Be focused, clearly conceptualized, and
feasible
• The most important section of the proposal
• Consult mentors, colleagues and visit us
6
1. Writing the Proposal
• Methods/Experimental Design
• Participants
• Inclusion/Exclusion Criteria
• Recruiting Process
• How the measurements will be made
7
1. Writing the Proposal
• Challenges/Potential Problems
• Loss to follow up
• Bias - Confounding variables and other
sources
• Human Subjects Protection Plan
• Informed consent
• Adverse events
• Privacy, confidentiality issues
8
Bias
Definition - any systematic error in the
design, conduct or analysis of a study
that results in a mistaken estimate of
an exposure’s effect on the risk of
disease
9
Confounding - definition
In a study of whether factor A is a
cause of disease B, we say a third
factor, factor X is a confounder if
• Factor X is a known risk factor for
disease B
• Factor X is associated with factor A, but
is not a result of factor A
10
Confounding – an example
coffee drinking and pancreatic cancer
11
Confounding – an example
coffee drinking and pancreatic cancer
If an association is observed between
coffee drinking and pancreas cancer,
then
• The coffee => cancer
or
• Smoking is a risk factor for cancer
and smoking is associated with
coffee drinking
12
1. Writing the Proposal
Confounding – ways to deal with it
• in design phase
• match cases to controls on confounding
variables
• in analysis phase
• stratification
• adjustment
13
1. Writing the Proposal
• Statistical Analysis (provided by the
statisticians)
• Sample size/Power calculations
• Analysis Plan
14
1. Writing the Proposal
• A good example
• Dr Malow’s template
15
2. Create a Data Dictionary
Name Description Units Type Values
(Permissible
ranges)
group treatment group discrete 1= placebo, 2=trt
16
3. Create a Patient Directory
ID FirstName LastName Address Phone ...
1 John Smith
2 Mary Ann
3 Joe Kim
17
4. Prepare datasets for Statistical
Analysis – A good example
ID group age sex ht wt bp_sys bp_dias stage race date0 complic
1 1 25 1 61 350 120 80 3 3.0 1/15/1999 0
2 1 65 2 68 161 140 90 2 1.0 2/5/1999 1
3 1 25 1 47 150 160 110 4 2.0 1/15/1998 1
4 1 31 1 66 161 140 105 2 2.0 4/1/1999 0
5 1 42 2 72 177 130 70 2 1.0 2/15/1999 0
6 1 45 2 67 160 120 80 1 2.0 3/6/1999 0
7 1 44 1 72 145 120 80 1 1.0 2/28/1999 0
8 1 55 1 72 161 120 95 4 2.0 6/15/2000 1
9 1 0.5 2 66 174 160 110 3 4.0 12/14/2000 1
10 1 21 2 60 155 190 120 2 2.0 11/14/2000 0
18
4. Prepare datasets for
Statistical Analysis
• First - strip off any confidential
information (name, address, phone #)
• Rows - each subject (sample,
observations)
• Columns - each measurement
(variable)
19
4. Preparing datasets
• Variable Names (column labels)
• No special characters (“<“ etc) except
“_”
• Start with letters, not numbers
• Less than 8 characters
• Should be unique
• No spaces
20
≠
4. Preparing datasets
• Data Values
• Be consistent: “M” ≠“m”, date format,
upper/lower case
• No spaces
• No embedded formula – use “paste
special”, then “paste values”
• Missing data: leave it as blank
• Unless there are different reasons for missing, code
them as different values
21
4. Preparing datasets
• Only 1 variable in each column, use
separate columns for non-mutually
exclusive values
• Derived variables – statisticians can
do those
• Keep all information as continuous
variables, information can’t be
recovered
22
4.Preparing datasets
• It’s OK to have separate data sheets
for demographic info and clinical
measurements
• As long as there is a unique identifier
(ID) that links all data sheets
23
4. Preparing Datasets
• If you are in a hurry
• Record data in a file and call it “Raw_xxx.xls”
• Later transform it into the desired format
• It’s OK to format only those needed for
analysis and send only these variables to the
statisticians
• Good idea: visit us after you’ve entered the
first 5 patients and completed the data
dictionary
24
What’s wrong with this data sheet?
Comparison of Drug A and Drug B
Drug A Age of Patient Patient Height Weight 24hrhct blood pressure tumor Race Date complications
Gender (inches) (pound) stage enrolled
Drug B
1 55 m 61 145 normal 120/80 120/90 IV ative Americ 6/20/ 3
2 45 f 4"11 166 ? 135/95 2b none 7/14/99 n
3 32 male 5'13" 171 38 140/80 not staged NA 8/30/99 n
4 44 na 65 ? 40 120/80 2 ? 09/01/00 n
5 66 fem 71 0 41 140/90 4 w Sep 14th y, sepsis
6 71 unknown 172 199 38 >160/110 3 b unknown y, died
7 45 m ? 204 32 140 sys 105 dias 1 b 12/25/00 n
8 34 m NA 145 36 130 3 w July 97 n
9 13 m 66 161 39 166/115 2a w 06/06/99 n
10 66 m 68 176 41 1120/80 3 w 01/21/58 n
Average 45 65 155 38
25
Acknowledgement
• Guideline for data collection and data
entry
http://biostat.mc.vanderbilt.edu/wiki/Main/TheresaScott
26