0% found this document useful (0 votes)
32 views

Data Management and Analysis For Successful Clinical Research

Here are some issues with this data sheet: - Variables are not clearly defined (e.g. what is 24hrhct?) - Missing data is coded inconsistently (e.g. ?, >, <) - Variable types are mixed (e.g. age has both numeric and text values) - Variable values are inconsistent (e.g. height in both inches and cm) - Variable names are not clear or unique (e.g. both drugs have "blood pressure") - Rows and columns are not properly aligned - Data quality issues like typos or impossible values This sheet would need to be restructured to have a consistent format with clear, unique variable names and

Uploaded by

Ken Khumancha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Data Management and Analysis For Successful Clinical Research

Here are some issues with this data sheet: - Variables are not clearly defined (e.g. what is 24hrhct?) - Missing data is coded inconsistently (e.g. ?, >, <) - Variable types are mixed (e.g. age has both numeric and text values) - Variable values are inconsistent (e.g. height in both inches and cm) - Variable names are not clear or unique (e.g. both drugs have "blood pressure") - Rows and columns are not properly aligned - Data quality issues like typos or impossible values This sheet would need to be restructured to have a consistent format with clear, unique variable names and

Uploaded by

Ken Khumancha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Data Management and Analysis for

Successful Clinical Research

Lily Wang, PhD


Department of Biostatistics
Vanderbilt University
Goals of This Presentation
• Provide an overview on data
management and analysis aspects of
clinical research
• Minimize errors in datasets
• Ensure statistical software packages
will recognize data correctly
• Facilitate efficient data analysis for
projects
2
An Overview of the Process
1. Write the protocol
- consult mentors, colleagues and
visit us to finalize specific aims,
testable hypothesis and study design
2. Create a Data Dictionary
3. Create a Patient Directory
4. Prepare datasets for statistical
analysis
3
An Overview
5. The statisticians will assist with
statistical tests
6. Review results, start thinking about
writing the paper
7. Additional tables and figures
8. Write the paper/abstract

4
Timeline
• For abstract, please send us datasets at
least 4 weeks in advance
• Please contact us even if you don’t
have the dataset ready, so we can
schedule other projects and leave
room for yours

5
1. Writing the Proposal
• Background
• Why this research is important
• Be concise
• Specific Aims, Testable Hypothesis
• Be focused, clearly conceptualized, and
feasible
• The most important section of the proposal
• Consult mentors, colleagues and visit us

6
1. Writing the Proposal
• Methods/Experimental Design
• Participants
• Inclusion/Exclusion Criteria
• Recruiting Process
• How the measurements will be made

7
1. Writing the Proposal
• Challenges/Potential Problems
• Loss to follow up
• Bias - Confounding variables and other
sources
• Human Subjects Protection Plan
• Informed consent
• Adverse events
• Privacy, confidentiality issues

8
Bias
Definition - any systematic error in the
design, conduct or analysis of a study
that results in a mistaken estimate of
an exposure’s effect on the risk of
disease

9
Confounding - definition
In a study of whether factor A is a
cause of disease B, we say a third
factor, factor X is a confounder if
• Factor X is a known risk factor for
disease B
• Factor X is associated with factor A, but
is not a result of factor A

10
Confounding – an example
coffee drinking and pancreatic cancer

11
Confounding – an example
coffee drinking and pancreatic cancer
If an association is observed between
coffee drinking and pancreas cancer,
then
• The coffee => cancer
or
• Smoking is a risk factor for cancer
and smoking is associated with
coffee drinking
12
1. Writing the Proposal
Confounding – ways to deal with it
• in design phase
• match cases to controls on confounding
variables
• in analysis phase
• stratification
• adjustment

13
1. Writing the Proposal
• Statistical Analysis (provided by the
statisticians)
• Sample size/Power calculations
• Analysis Plan

14
1. Writing the Proposal
• A good example
• Dr Malow’s template

15
2. Create a Data Dictionary
Name Description Units Type Values
(Permissible
ranges)
group treatment group discrete 1= placebo, 2=trt

age age in years year continuous 10 – 79


bp_sys systolic blood mmHg continuous 100 – 160
pressure
bp_dias diastolic blood mmHg continuous 80 – 150
pressure
date0 date for baseline date mm/dd/yyyy
assessment

16
3. Create a Patient Directory
ID FirstName LastName Address Phone ...
1 John Smith
2 Mary Ann
3 Joe Kim

• Include any other information you


like to record for reference
• Keep this file to yourself, and don’t
send it to us

17
4. Prepare datasets for Statistical
Analysis – A good example
ID group age sex ht wt bp_sys bp_dias stage race date0 complic
1 1 25 1 61 350 120 80 3 3.0 1/15/1999 0
2 1 65 2 68 161 140 90 2 1.0 2/5/1999 1
3 1 25 1 47 150 160 110 4 2.0 1/15/1998 1
4 1 31 1 66 161 140 105 2 2.0 4/1/1999 0
5 1 42 2 72 177 130 70 2 1.0 2/15/1999 0
6 1 45 2 67 160 120 80 1 2.0 3/6/1999 0
7 1 44 1 72 145 120 80 1 1.0 2/28/1999 0
8 1 55 1 72 161 120 95 4 2.0 6/15/2000 1
9 1 0.5 2 66 174 160 110 3 4.0 12/14/2000 1
10 1 21 2 60 155 190 120 2 2.0 11/14/2000 0

18
4. Prepare datasets for
Statistical Analysis
• First - strip off any confidential
information (name, address, phone #)
• Rows - each subject (sample,
observations)
• Columns - each measurement
(variable)

19
4. Preparing datasets
• Variable Names (column labels)
• No special characters (“<“ etc) except
“_”
• Start with letters, not numbers
• Less than 8 characters
• Should be unique
• No spaces

20

4. Preparing datasets
• Data Values
• Be consistent: “M” ≠“m”, date format,
upper/lower case
• No spaces
• No embedded formula – use “paste
special”, then “paste values”
• Missing data: leave it as blank
• Unless there are different reasons for missing, code
them as different values

21
4. Preparing datasets
• Only 1 variable in each column, use
separate columns for non-mutually
exclusive values
• Derived variables – statisticians can
do those
• Keep all information as continuous
variables, information can’t be
recovered
22
4.Preparing datasets
• It’s OK to have separate data sheets
for demographic info and clinical
measurements
• As long as there is a unique identifier
(ID) that links all data sheets

23
4. Preparing Datasets
• If you are in a hurry
• Record data in a file and call it “Raw_xxx.xls”
• Later transform it into the desired format
• It’s OK to format only those needed for
analysis and send only these variables to the
statisticians
• Good idea: visit us after you’ve entered the
first 5 patients and completed the data
dictionary
24
What’s wrong with this data sheet?
Comparison of Drug A and Drug B
Drug A Age of Patient Patient Height Weight 24hrhct blood pressure tumor Race Date complications
Gender (inches) (pound) stage enrolled

1 25 Male 61" >350 38% 120/80 2-3 Hipanic 1/15/99 no


2 65+ female 5'8" 161 32 140/90 II White 2/05/1999 yes
3 ? Male 120cm 12 >160/110 IV Black Jan 98 yes, pneumonia
4 31 m 5'6" obse 40 140 sys 105 dias ? ican-Americ ?
5 42 f >6 ft normal 39 missing =>2 W Feb 99
6 45 f 5.7 160 29 80/120 NA B last fall n
7 unknown ? 6 145 35 normal 1 W 2/30/99 n
8 55 m 72 161.45 12/39 120/95 4 ican-Americ 6-15-00 y
9 6 months f 66 174 38 160/110 3 Asian 14/12/00 y
10 21 f 5'

Drug B
1 55 m 61 145 normal 120/80 120/90 IV ative Americ 6/20/ 3
2 45 f 4"11 166 ? 135/95 2b none 7/14/99 n
3 32 male 5'13" 171 38 140/80 not staged NA 8/30/99 n
4 44 na 65 ? 40 120/80 2 ? 09/01/00 n
5 66 fem 71 0 41 140/90 4 w Sep 14th y, sepsis
6 71 unknown 172 199 38 >160/110 3 b unknown y, died
7 45 m ? 204 32 140 sys 105 dias 1 b 12/25/00 n
8 34 m NA 145 36 130 3 w July 97 n
9 13 m 66 161 39 166/115 2a w 06/06/99 n
10 66 m 68 176 41 1120/80 3 w 01/21/58 n

Average 45 65 155 38

25
Acknowledgement
• Guideline for data collection and data
entry
http://biostat.mc.vanderbilt.edu/wiki/Main/TheresaScott

• “10 Data Entry Commandments”,


“Spreadsheet from Heaven/Hell”
http://biostat.mc.vanderbilt.edu/wiki/Main/DanielByrne

26

You might also like