0% found this document useful (0 votes)

80 views

Dirty Data. Clean It Using SAS

This document provides an overview of principles for cleaning dirty data using SAS. It discusses including and excluding cases based on criteria, visualizing data distributions to identify outliers, dealing with invalid or inconsistent character variables, and approaches for handling missing data. The presentation also reviews SAS procedures like PROC FREQ, MEANS, and UNIVARIATE that can be used to understand data distributions and identify issues requiring data cleaning.

Uploaded by

melu-34533

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views

Dirty Data. Clean It Using SAS

Uploaded by

melu-34533

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

DIRTY DATA?

CLEAN IT USING SAS

AN INTRODUCTION TO DATA CLEANING PRINCIPLES

CYP-C Research Champion Webinar

August 11, 2017
Giancarlo Di Giuseppe, MPH
Pediatric Oncology Group of Ontario

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Outline
• SAS overview and procedures – revisited
• Fundamental principles to build a clean dataset
• Inclusion / exclusion criteria
• Visualizing data distributions
• Outliers
• Invalid or inconsistent character variables
• Dealing with missing data
• Creating data checkpoints
Healthcare innovation | Survivor care | Family assistance
Population data | Policy development | Education | Research
SAS Overview - Revisited
• For our purposes only two major things you can do in SAS
– DATA step - Manipulate the data in some way
• Reading in Data
• Creating and Redefining Variables
• Sub-Setting Data
• Working with Dates
• Working with Formats

– PROCedure step
• Analyze the data
• Produce frequency tables
• Estimate a regression model

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
SAS Procedures – Revisited
• SAS Procedures
– PROC FREQ
– PROC PRINT
– PROC MEANS
– PROC UNIVARIATE
– PROC SORT
– PROC CONTENTS

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
PRINCIPLES FOR CLEANLINESS

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Understanding Your Dirty Data Source
• No database is initially ever
“clean”

• Databases are not constructed

with our own specific research
questions in mind

• Researchers must be familiar with

the purpose, how variables are
captured and defined, and the http://3rdsectorlabs.com/wp-content/uploads/2014/06/TSL-data-

structure of the database shower.png

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Having an Analysis Plan
• Having clean data requires a sound analysis plan
– Envision what the analysis dataset will look like with all
variables and formats before performing data cleaning

• Determine what your study population denominator

is before you begin cleaning
– Is it patient population? Is it number of total diagnoses
(therefore, multiple dx’s per patient is possible)? Or is it
person-time? Etc.
– Based on the research question!

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Data Manipulation and Data Cleaning:
A Simultaneous Process
• Data manipulation and data cleaning are not
mutually exclusive, rather they go hand-in-hand!

• Both can (and should) be

performed within a single
DATA step
http://i.telegraph.co.uk/multimedia/archive/03219/handshake1_3219777k.jpg

• Ensures efficient and easy to follow SAS

programming

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
SUB-SETTING YOUR DATA

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Receiving Your Data Cut
• Typically data is requested with slightly more
information than needed
– Allows for wiggle room if hypothesis change slightly

• No data cut is ever perfect

– Data still needs to be cleaned

• Initial data cuts are never ready to be analyzed,

they must first be cleaned

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Cleaning Using Inclusion & Exclusion Criteria
PROC SORT DATA = T7 OUT=T7_SORT; BY CYPCID DX_DATE; RUN;

DATA T8; SET T7_SORT; BY CYPCID; First cancers

/* INTERESTED IN PRIMARY DIAGNOSIS ONLY */
IF FIRST.CYPCID;
IF ORDINAL_PRIMARY IN (1);

/* AGE INCLUSION CRITERIA – 0 TO 14 */

IF 0 <= DX_AGE < 15;
Children aged 0 to 14
IF 0 <= DX_AGE < 1 THEN DX_AGE_GR=1;
ELSE IF DX_AGE < 7 THEN DX_AGE_GR=2;
ELSE IF DX_AGE < 11 THEN DX_AGE_GR=3; Note: Data cleaning
ELSE DX_AGE_GR=4; and data manipulation
LABEL DX_AGE_GR = "AGE AT FIRST DIAGNOSIS - GROUPED";
FORMAT DX_AGE_GR DX_AGE_GR.; done simultaneously!
/* SELECTS THOSE WITH A DIAGNOSIS BETWEEN 2002 & 2012 */
IF 2002 <= YEAR(DX_DATE) <= 2012; Diagnosed between 2002-
DX1_YEAR = YEAR(DX_DATE);
2012
/* LEUKEMIA CASES */
IF ICCC_MAIN = 1010 OR ICDO_M_CODE IN (9826, 9835, 9836,
9837);
Only concerned with
leukemia cases
RUN; *N=2,492; Keep logs of sample size in your DATA steps!!

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
DATA DISTRIBUTION

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Recall From Last Session
• PROC FREQ produces frequency outputs
– Can be used for numeric or character variables
– Useful for counts and proportions

• PROC MEANS and UNIVARIATE produce outputs

describing the data distribution for numeric variables
– Checkpoint for data distributions and normality

• PROC FREQ and PROC MEANS/UNIVARIATE are

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Distribution of Continuous Data
ODS GRAPHICS ON;
PROC UNIVARIATE DATA = T8 NORMAL;
ID CYPCID;
VAR WBC_COUNT;
HISTOGRAM WBC_COUNT / NORMAL;
RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Distribution of Continuous Data II

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Distribution of Continuous Data III

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Normality of Continuous Data

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
OUTLIERS

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Dealing With Outliers
• If there are many outliers, these will introduce
bias in your study
• Many options to handle these skewed data:
– Median + IQR instead of mean
– Use a logical range of values and assign any
outlier the upper bound of the range
– Categorize your data based on the distribution or
clinically meaningful ranges
• Whichever approach used should be justified!

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Dealing With Outliers II
DATA T8; SET T8;
/* UPPER LIMIT TO OUTLIERS */
IF WBC_COUNT >= 500 THEN WBC_COUNT_CLEAN = 500;
ELSE WBC_COUNT_CLEAN = WBC_COUNT;

/* CREATING CLINICAL CATEGORIES */

IF WBC_COUNT ^= . THEN DO;
IF WBC_COUNT < 50 THEN WBC_GROUP = 1;
ELSE IF WBC_COUNT < 100 THEN WBC_GROUP = 2;
ELSE IF WBC_COUNT < 200 THEN WBC_GROUP = 3;
DO loop
ELSE IF WBC_COUNT < 300 THEN WBC_GROUP = 4;
ELSE IF WBC_COUNT < 400 THEN WBC_GROUP = 5;
ELSE WBC_GROUP = 6; END;
RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Dealing With Outliers III
/* MEAN VS MEDIAN + IQR */
PROC MEANS DATA=T8 MEAN MIN MAX Q1 MEDIAN Q3;
VAR WBC_COUNT_CLEAN;
RUN;

/* DATA CATEGORIZATION */
PROC FREQ DATA=T8;
TABLES WBC_GROUP /MISSING;
RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Dealing With Outliers III

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Dealing With Outliers III
/* MEAN VS MEDIAN + IQR */
PROC MEANS DATA=T8 MEAN MIN MAX Q1 MEDIAN Q3;
VAR WBC_COUNT_CLEAN;
RUN;

/* DATA CATEGORIZATION */
PROC FREQ DATA=T8;
TABLES WBC_GROUP /MISSING;
RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
CLEANING CHARACTER VARIABLES

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
CApITOLIzATioN Matters!
PROC FREQ DATA=T8;
TABLES PROTOCOL_NAME;
RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
CAPITOLIZATION Matters! Use UPCASE
DATA T8; SET T8;
PROTOCOL_NAME = UPCASE(PROTOCOL_NAME);
RUN;
PROC FREQ DATA=T8; TABLES PROTOCOL_NAME; RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
FINDing, Cleaning, and Manipulating
DATA T9; SET T8;
PROTOCOL_NAME = UPCASE(PROTOCOL_NAME);

IF FIND(PROTOCOL_NAME,"ALL PROTOCOL C") THEN DO;

PROTOCOL_NAME = "ALL PROTOCOL C";
DO loop
ALL_RISK = "HIGH RISK";
END;
RUN;
PROC FREQ DATA=T9; TABLES PROTOCOL_NAME; RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Use Caution When Searching Text
• When performing character search functions in SAS,
be wary of the phrase being used
• Can lead to errors in data cleaning
• Searched term should be unique enough to prevent
unwanted matches
• If “ALL PROTOCOL B” was searched using FIND(),
then the BFM-90 protocol would have been
misclassified as Protocol B

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
MISSING DATA

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Recall: Viewing Missing Data
PROC FREQ DATA = T8;
TABLES STAGE_CODE /MISSING;
RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Understanding Your Missing Data
PROC FREQ DATA = T8; • Staging not done for the
WHERE DX1_GRP = 2; leukemia’s which represent a high
TABLES STAGE_CODE /MISSING;
RUN; % of childhood cancers
• Staging important for lymphomas
• Know your data!

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
DATA CHECKPOINTS

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Date Checkpoints I
DATA FLAGS; SET T8 (KEEP=PATIENT_ID DOB DX_DATE1 DOD);
IF DOD < DX_DATE1 AND DOD ^=. THEN DEATH_FLAG = 1;
ELSE DEATH_FLAG=0;
IF DX_DATE1 < DOB THEN DX_FLAG = 1;
ELSE DX_FLAG = 0;
RUN;
PROC FREQ DATA=FLAGS; TABLES DEATH_FLAG DX_FLAG; RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Date Checkpoints II
PROC PRINT DATA=T8 NOOBS;
WHERE DOD < DX_DATE1 AND DOD ^=. ;
VAR PATIENT_ID DX_DATE1 DOD;
RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Treatment Checkpoints
DATA TX_FLAGS;
MERGE T8 (IN=MASTER) CHEMO (IN=A) SURG (IN=B)
BMT (IN=C) RAD (IN=D);

BY CYPCID;

IF A THEN CHEMO = 1; ELSE CHEMO = 0;

IF B THEN SURGERY = 1; ELSE SURGERY = 0;
IF C THEN BMT = 1; ELSE BMT = 0; Treatment flags
IF D THEN RAD = 1; ELSE RAD = 0;

NUM_TX_MODALITIES = SUM(CHEMO,SURGERY,BMT,RAD);

IF FIRST.CYPCID;
IF MASTER THEN OUTPUT;
RUN;
REMEMBER: All datasets involved in a merge must be sorted by the common identifier (ie.CYPCID)
Healthcare innovation | Survivor care | Family assistance
Population data | Policy development | Education | Research
Treatment Checkpoints II
PROC FREQ DATA=TX_FLAGS;
TABLES DX1_GRP * (CHEMO SURGERY BMT RAD);
TABLES DX1_GRP * NUM_TX_MODALITIES;
RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Treatment Checkpoints II
PROC FREQ DATA=TX_FLAGS;
TABLES DX1_GRP * (CHEMO SURGERY BMT RAD);
TABLES DX1_GRP * NUM_TX_MODALITIES;
RUN;

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
Topics Covered
• Key principles to build a clean dataset
• Using Inclusion / exclusion criteria
• Visualizing data distributions
• Handling data outliers
• Cleaning character variables
• Dealing with missing data
• Creating data checkpoints

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research
THANK YOU!

Healthcare innovation | Survivor care | Family assistance

Population data | Policy development | Education | Research

Goldman Cecil Medicine PDF
0% (2)
Goldman Cecil Medicine PDF
2 pages
Sas Ron Cody
No ratings yet
Sas Ron Cody
35 pages
Evidence-Based Dentistry
From Everand
Evidence-Based Dentistry
Derek Richards
4.5/5 (3)
Diet, Feeding and Nutritional Care of Captive Tigers, Lions and Leopards
No ratings yet
Diet, Feeding and Nutritional Care of Captive Tigers, Lions and Leopards
5 pages
2024 Wk5 Explorative Data Analysis-1.Ko.en
No ratings yet
2024 Wk5 Explorative Data Analysis-1.Ko.en
51 pages
BC 2014 Session2
No ratings yet
BC 2014 Session2
45 pages
Pima Tutorial
No ratings yet
Pima Tutorial
8 pages
Lecture1 Introduction To Biostatistics
No ratings yet
Lecture1 Introduction To Biostatistics
18 pages
Download Complete Cody s Data Cleaning Techniques Using SAS 2nd ed Edition Ron Cody PDF for All Chapters
No ratings yet
Download Complete Cody s Data Cleaning Techniques Using SAS 2nd ed Edition Ron Cody PDF for All Chapters
51 pages
Lecture 03 DS Methodology
No ratings yet
Lecture 03 DS Methodology
77 pages
ET 610 - Data Preprocessing
No ratings yet
ET 610 - Data Preprocessing
41 pages
Preparing Data For Analysis Using Microsoft Excel: Tools and Issues
No ratings yet
Preparing Data For Analysis Using Microsoft Excel: Tools and Issues
9 pages
SAS 201 - Copy - Copy (4)
No ratings yet
SAS 201 - Copy - Copy (4)
17 pages
Notes For SAS Programming Fall2009
No ratings yet
Notes For SAS Programming Fall2009
88 pages
Computers & Stastic Method
No ratings yet
Computers & Stastic Method
48 pages
Lecture 2 Data Information Knowledge-1
No ratings yet
Lecture 2 Data Information Knowledge-1
110 pages
Seminar
No ratings yet
Seminar
32 pages
Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities
No ratings yet
Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities
5 pages
HCI - Notes-Ch3
100% (1)
HCI - Notes-Ch3
44 pages
Business Intelligence Data Analyst_Career Path
No ratings yet
Business Intelligence Data Analyst_Career Path
27 pages
Notes For SAS Programming Fall2009
No ratings yet
Notes For SAS Programming Fall2009
88 pages
Data Analysis
No ratings yet
Data Analysis
65 pages
Data Management and Analysis For Successful Clinical Research
No ratings yet
Data Management and Analysis For Successful Clinical Research
26 pages
1 - Introduction To Health Care Data Analytics (Bagian 2)
No ratings yet
1 - Introduction To Health Care Data Analytics (Bagian 2)
31 pages
SAS 101 - Copy (5) - Copy
No ratings yet
SAS 101 - Copy (5) - Copy
17 pages
What Is Data Analysis?: Making Figures Speak (The Truth!)
No ratings yet
What Is Data Analysis?: Making Figures Speak (The Truth!)
44 pages
Dofile - Quan Ly Va Lam Sach Du Lieu 2
No ratings yet
Dofile - Quan Ly Va Lam Sach Du Lieu 2
6 pages
KMBN IT01 LM Consolidated
No ratings yet
KMBN IT01 LM Consolidated
123 pages
Preparing Data For Analysis Using Microsoft Excel
No ratings yet
Preparing Data For Analysis Using Microsoft Excel
8 pages
Data Analysis
No ratings yet
Data Analysis
2 pages
Dealing With Health Care Data Using The SAS® System
No ratings yet
Dealing With Health Care Data Using The SAS® System
9 pages
CSCI322 - Lecture 2
No ratings yet
CSCI322 - Lecture 2
38 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
(Ebook) Cody's Data Cleaning Techniques Using SAS by Ron Cody ISBN 9781599946597, 1599946599 pdf download
100% (2)
(Ebook) Cody's Data Cleaning Techniques Using SAS by Ron Cody ISBN 9781599946597, 1599946599 pdf download
50 pages
Rsudio Problems
No ratings yet
Rsudio Problems
27 pages
(Ebook) Cody's Data Cleaning Techniques Using SAS by Ron Cody ISBN 9781599946597, 1599946599 - The ebook in PDF and DOCX formats is ready for download now
100% (1)
(Ebook) Cody's Data Cleaning Techniques Using SAS by Ron Cody ISBN 9781599946597, 1599946599 - The ebook in PDF and DOCX formats is ready for download now
49 pages
SAS 101 - Copy (6) - Copy
No ratings yet
SAS 101 - Copy (6) - Copy
17 pages
SAS 201 - Copy - Copy
No ratings yet
SAS 201 - Copy - Copy
17 pages
SCA - Module 3
No ratings yet
SCA - Module 3
48 pages
Completetypes N Preloadfmt PDF
No ratings yet
Completetypes N Preloadfmt PDF
5 pages
Unit 2 - Data Visualization Techniques
No ratings yet
Unit 2 - Data Visualization Techniques
101 pages
The Common and Old Method: How To Do?
100% (1)
The Common and Old Method: How To Do?
43 pages
Data Science Methodology: Pertemuan Iv
No ratings yet
Data Science Methodology: Pertemuan Iv
80 pages
Pre Processing
No ratings yet
Pre Processing
68 pages
Data analysis 2025
No ratings yet
Data analysis 2025
17 pages
Do - File - Quan Ly Va Lam Sach Du Lieu
No ratings yet
Do - File - Quan Ly Va Lam Sach Du Lieu
6 pages
SAS 201 - Copy - Copy (2)
No ratings yet
SAS 201 - Copy - Copy (2)
18 pages
Session 2 Tidy - Data
No ratings yet
Session 2 Tidy - Data
37 pages
Working With Statistics Using Excel: K.V.S. Sarma Professor of Statistics Sri Venkateswara University Tirupati - 517 502
No ratings yet
Working With Statistics Using Excel: K.V.S. Sarma Professor of Statistics Sri Venkateswara University Tirupati - 517 502
50 pages
Lecture 6 Data Preprocessing
No ratings yet
Lecture 6 Data Preprocessing
59 pages
1st Part of Material
No ratings yet
1st Part of Material
15 pages
Health Data Classification III (3) - Copy
No ratings yet
Health Data Classification III (3) - Copy
102 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Week3- Data Preprocessing, Extraction and Preparation
No ratings yet
Week3- Data Preprocessing, Extraction and Preparation
34 pages
BA UNIT-3 - Part 1
No ratings yet
BA UNIT-3 - Part 1
4 pages
Week # 4
No ratings yet
Week # 4
28 pages
Data Collections - DR Hagni
No ratings yet
Data Collections - DR Hagni
47 pages
Sta 222 - New (1) - 1-1
No ratings yet
Sta 222 - New (1) - 1-1
25 pages
HSM 2 Health Information Systems
No ratings yet
HSM 2 Health Information Systems
31 pages
ENGG1003_06_DataScience
No ratings yet
ENGG1003_06_DataScience
44 pages
Stastic Data Flows
No ratings yet
Stastic Data Flows
46 pages
Data Preparation and Exploration: Applied to Healthcare Data
From Everand
Data Preparation and Exploration: Applied to Healthcare Data
Robert Hoyt
No ratings yet
STEMI Thrombolysis Protocol STElevation MI
No ratings yet
STEMI Thrombolysis Protocol STElevation MI
6 pages
Teknik Relaksasi Nafas Dalam Untuk Menurunkan Intensitas Nyeri Post Operasi Sectio Caesarea Di RSIA Melati Magetan Sri Suhartiningsih
No ratings yet
Teknik Relaksasi Nafas Dalam Untuk Menurunkan Intensitas Nyeri Post Operasi Sectio Caesarea Di RSIA Melati Magetan Sri Suhartiningsih
5 pages
MSI Template
No ratings yet
MSI Template
3 pages
Jurusan Kebidanan Poltekkes Kemenkes Semarang Jl. Tirto Agung, Pedalangan, Banyumanik, Semarang E-Mail
No ratings yet
Jurusan Kebidanan Poltekkes Kemenkes Semarang Jl. Tirto Agung, Pedalangan, Banyumanik, Semarang E-Mail
6 pages
Disease, Death and Control: Missionary Medicine in Colonial ODISHA (1900-1940)
No ratings yet
Disease, Death and Control: Missionary Medicine in Colonial ODISHA (1900-1940)
12 pages
Mini Project Rubric
100% (1)
Mini Project Rubric
2 pages
Ballesta Mizelle P. Activity 5 STS Bsais 2C
No ratings yet
Ballesta Mizelle P. Activity 5 STS Bsais 2C
2 pages
Knowledge, Attitudes, and Behaviors of Parents Towards Varicella and Its Vaccination
No ratings yet
Knowledge, Attitudes, and Behaviors of Parents Towards Varicella and Its Vaccination
8 pages
Technology and Human Rights
No ratings yet
Technology and Human Rights
3 pages
Project Proposal Slide
No ratings yet
Project Proposal Slide
20 pages
Effects of A Head Support On Children With Hypotonia in The Cervical 2020
No ratings yet
Effects of A Head Support On Children With Hypotonia in The Cervical 2020
4 pages
Sample Test Papers With Answers - Clat 2012: Rank. 9 Out of Top 10 Ranks Were Bagged by SRIRAM Students
No ratings yet
Sample Test Papers With Answers - Clat 2012: Rank. 9 Out of Top 10 Ranks Were Bagged by SRIRAM Students
45 pages
Copy of TTTC Study Guide
No ratings yet
Copy of TTTC Study Guide
6 pages
Diploma Module 4
No ratings yet
Diploma Module 4
26 pages
Ayurvedic Sooranam PDF
71% (7)
Ayurvedic Sooranam PDF
255 pages
POPQUIZ2018
No ratings yet
POPQUIZ2018
59 pages
Full Download Principles and Practice of Mechanical Ventilation 2nd Edition Martin Tobin PDF DOCX
100% (6)
Full Download Principles and Practice of Mechanical Ventilation 2nd Edition Martin Tobin PDF DOCX
50 pages
Medical Negligence - Medical Negligence Judgments
No ratings yet
Medical Negligence - Medical Negligence Judgments
7 pages
Universidade de Evora: Open Call Rules
No ratings yet
Universidade de Evora: Open Call Rules
4 pages
Calcio Florentino
No ratings yet
Calcio Florentino
2 pages
Normal Aging and Its Impact On Social Life: Moderated By: Presented By: Jahir Abbas
No ratings yet
Normal Aging and Its Impact On Social Life: Moderated By: Presented By: Jahir Abbas
30 pages
Reproduction 9 QP-merged PDF
No ratings yet
Reproduction 9 QP-merged PDF
144 pages
PICU Booklet
No ratings yet
PICU Booklet
98 pages
A_Comprehensive_Survey_on_the_Detection,_Classification,_and_Challenges_of_Neurological_Disorders
No ratings yet
A_Comprehensive_Survey_on_the_Detection,_Classification,_and_Challenges_of_Neurological_Disorders
45 pages
Summary of Changes in ISO 15189
No ratings yet
Summary of Changes in ISO 15189
4 pages
Top Performing Schools in July 2010 Nurse Licensure Exam (NLE)
No ratings yet
Top Performing Schools in July 2010 Nurse Licensure Exam (NLE)
7 pages
18106B1030 Sheetal Dahibavkar Social Relevance-1
No ratings yet
18106B1030 Sheetal Dahibavkar Social Relevance-1
52 pages
Finlay and Rowans Complete Denture Construction Manual PDF
100% (1)
Finlay and Rowans Complete Denture Construction Manual PDF
28 pages

Dirty Data. Clean It Using SAS

Uploaded by

Dirty Data. Clean It Using SAS

Uploaded by

DIRTY DATA?

CLEAN IT USING SAS

CYP-C Research Champion Webinar

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

• Databases are not constructed

• Researchers must be familiar with

structure of the database shower.png

Healthcare innovation | Survivor care | Family assistance

• Determine what your study population denominator

Healthcare innovation | Survivor care | Family assistance

• Both can (and should) be

• Ensures efficient and easy to follow SAS

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

• No data cut is ever perfect

• Initial data cuts are never ready to be analyzed,

Healthcare innovation | Survivor care | Family assistance

DATA T8; SET T7_SORT; BY CYPCID; First cancers

/* AGE INCLUSION CRITERIA – 0 TO 14 */

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

• PROC MEANS and UNIVARIATE produce outputs

• PROC FREQ and PROC MEANS/UNIVARIATE are

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

/* CREATING CLINICAL CATEGORIES */

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

IF FIND(PROTOCOL_NAME,"ALL PROTOCOL C") THEN DO;

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

IF A THEN CHEMO = 1; ELSE CHEMO = 0;

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

Healthcare innovation | Survivor care | Family assistance

You might also like