how to populate variables in demographics domain in sdtm
how to populate variables in demographics domain in sdtm
In SDTM, the Demographics (DM) domain contains subject-level information such as sex,
race, age, and other key demographic details. This data is typically derived from case report
forms (CRFs) and other clinical trial data sources.
The SDTM DM domain includes standard variables. Below are the key ones and how to derive them:
Variable
Label Derivation / Mapping
Name
STUDYID Study Identifier Populate with the study protocol number.
Domain
DOMAIN Always set to "DM".
Abbreviation
USUBJID
Unique Subject A concatenation of STUDYID, SITEID, and SUBJID (e.g., XYZ-
Identifier 1001-0005).
Subject Identifier
SUBJID Taken directly from the CRF.
for Study
RFSTDTC Reference Start Date Derived from informed consent date or first exposure date.
RFENDTC Reference End Date Derived from last visit date or last known status.
SITEID Study Site Identifier Taken from the CRF.
Extracted from the CRF, or calculated as RFSTDTC -
AGE Age
BRTHDTC.
AGEU Age Units Usually "YEARS".
Collected directly from the CRF (e.g., "M" for Male, "F" for
SEX Sex
Female).
Coded based on study protocol requirements (e.g., "WHITE",
RACE Race
"BLACK", "ASIAN").
Collected as per regulatory guidelines (e.g., "HISPANIC OR
ETHNIC Ethnicity
LATINO", "NOT HISPANIC OR LATINO").
Description of Study
ARM Derived from randomization dataset.
Arm
ARMCD Study Arm Code Shorter version of ARM.
COUNTRY Country of Subject Taken from the CRF.
Variable
Label Derivation / Mapping
Name
BRTHDTC Birth Date Derived from CRF (format: YYYY-MM-DD).
sas
CopyEdit
data DM;
set source.demographics; /* Source dataset */
length STUDYID $20 DOMAIN $2 USUBJID $30 SUBJID $10 SITEID $10 AGEU $5 SEX
$1 RACE $20;
STUDYID = "XYZ123";
DOMAIN = "DM";
USUBJID = catx("-", STUDYID, SITEID, SUBJID);
RFSTDTC = input(RFSTDAT, yymmdd10.); /* Convert start date */
RFENDTC = input(RFENDAT, yymmdd10.); /* Convert end date */
AGEU = "YEARS";
if BRTHDTC ne . then AGE = int((RFSTDTC - BRTHDTC)/365.25); /* Calculate
Age */
keep STUDYID DOMAIN USUBJID SUBJID SITEID RFSTDTC RFENDTC AGE AGEU SEX
RACE ETHNIC ARM ARMCD COUNTRY BRTHDTC;
run;
4. Validate and QC
Scenario: A subject is enrolled in a clinical trial, and their demographic data is collected as follows:
STUDYI DOMAI USUBJI SUBJI SITEI RFSTD AG AGE SE RAC ETHNI COUNT
D N D D D TC E U X E C RY
NOT
XYZ123
2023-05- YEAR ASIA HISPAN
XYZ123 DM -1001- 0005 1001 43 M USA
10 S N IC OR
0005
LATINO
The Demographics (DM) domain in SDTM is a special-purpose domain that contains subject-
level information, such as age, sex, race, and study arm. Below are the key assumptions when
creating the DM domain:
1. General Assumptions
o RACE: Must follow standard values like WHITE, BLACK, ASIAN, OTHER.
o ETHNIC:
Should be HISPANIC OR LATINO or NOT HISPANIC OR
LATINO.
8. Subjects Who Did Not Start the Study Must Still Have a DM Record
o Even if a subject was screened but not randomized, they should still have a DM
record with appropriate ARMCD = "SCRNFAIL" (Screen Failure).
RFSTDTC should match the first recorded date in the Exposure (EX) domain, if
applicable.
If a subject has an Exposure (EX) record, they should have a non-missing RFSTDTC.
12. Subjects Who Discontinue Must Have a Record in Disposition (DS) Domain
In SDTM (Study Data Tabulation Model), the Disposition (DS) domain is used to capture
protocol milestones, subject disposition, and other relevant events. It includes information such
as screening status, study completion status, and reasons for discontinuation.
Raw datasets (e.g., Enrollment logs, Subject Status datasets, Disposition logs)
Protocol-defined milestones (e.g., Randomization, Treatment, Completion)
EDC (Electronic Data Capture) system exports
sas
CopyEdit
proc sort data=ds_raw; by USUBJID DSSTDTC; run;
data ds;
set ds_raw;
by USUBJID;
if first.USUBJID then DSSEQ = 1;
else DSSEQ + 1;
run;
DSDECOD: Map DSTERM values to controlled terminology.
DSDY: Calculate relative study day.
sas
CopyEdit
DSY = DSSTDTC - RFSTDTC + 1;
Ensure all required fields (STUDYID, USUBJID, DSTERM, DSSTDTC) are populated.
Confirm consistency with controlled terminology for DSDECOD.
Real-Time Scenario
Scenario:
A subject is randomized on 2025-02-01, withdraws on 2025-02-20, and the study database
records WITHDRAWN as the reason.
Derived Data:
DSDY: 20 - 1 + 1 = 20
Skip to content
ChatGPT
Share
You said:
assumpions of disposition domain
ChatGPT said:
In SDTM (Study Data Tabulation Model), the Disposition (DS) domain captures information about
subjects’ protocol milestones, such as screening status, treatment discontinuation, study completion,
and withdrawal reasons. When working with the DS domain, the following assumptions should be
considered:
Would you like an example of how to structure DS domain data in SAS or SQL?
You said:
yes
ChatGPT said:
Here’s an example of how to structure the Disposition (DS) domain data using both SAS and SQL.
STUDY01 101 1 SCREEN FAILURE Did not meet eligibility DISPOSITION EVENT SCREENING
You said:
assumptions of adverse events domain
ChatGPT said:
In the Adverse Events (AE) domain of SDTM, several assumptions ensure consistency,
accuracy, and regulatory compliance when handling adverse event data. Here are the key
assumptions:
1. General Assumptions
The AE domain captures all adverse events (AEs) occurring during the study, including
pre-treatment, on-treatment, and post-treatment AEs.
AEs are independent of the investigational product unless explicitly specified.
Only unique AEs should be recorded, meaning the same event should not be duplicated
unless it reoccurs after resolution.
AEs should be coded using a standard dictionary such as MedDRA (Medical
Dictionary for Regulatory Activities).
2. Structural Assumptions
Each row in the AE dataset represents one adverse event per subject (i.e., a subject may
have multiple records if they experience multiple events).
Start and end dates must be provided if available, and missing dates should be handled
appropriately (e.g., imputed using SDTM conventions).
The AE term (AETERM) should be captured verbatim as reported by the investigator
before dictionary coding.
AE relationship (AEREL), severity (AESEV), seriousness (AESER), and outcome
(AEOUT) should be populated based on study protocol definitions.
Skip to content
ChatGPT
Share
You said:
how to populate variables in disposition domain in sdtm
ChatGPT said:
The Disposition (DS) domain in SDTM is used to capture information about the subject’s
disposition (e.g., study completion, discontinuation, or protocol deviations). It includes details
about study milestones such as screening, randomization, treatment discontinuation, and study
withdrawal.
DSTERM Disposition Term The event term (e.g., "COMPLETED," "WITHDRAWN," "LOST TO FOLL
DSDECOD Standardized Disposition Term SDTM-controlled terminology (e.g., "COMPLETED," "ADVERSE EVENT,"
DSSTDTC Date/Time of Disposition Date of disposition event (ISO 8601 format, e.g., "2023-08-15")
DSSPID Sponsor-defined Identifier Tracking identifier from raw data (if applicable)
Real-Time Scenario
Scenario: A subject was randomized but later withdrew due to an adverse event on 2025-01-15.
Skip to content
ChatGPT
Share
You said:
assumptions of adverse events domain in sdtm
ChatGPT said:
The Adverse Events (AE) domain in SDTM is used to capture and standardize adverse event
data in clinical trials. The assumptions for the AE domain are as follows:
1. General Assumptions:
The AE domain includes all adverse events that occur during the study, whether related
or unrelated to the treatment.
Each row represents one adverse event occurrence for a subject.
The timing variables (e.g., AESTDTC, AEENDTC) indicate when the AE started and
ended.
The severity, seriousness, causality, and outcome of the AE are captured.
3. Timing Considerations:
AESTDTC (Start Date of AE) and AEENDTC (End Date of AE):
Should be in ISO 8601 format (YYYY-MM-DD or YYYY-MM-DDThh:mm:ss).
If AE is ongoing at the time of reporting, AEENDTC may be missing.
AEENRF (End Relative to Reference):
Used when AE end date is missing, indicating if the event was ongoing at the end
of the study.
AESTDY / AEENDY (Study Day of AE Start and End):
Calculated based on RFSTDTC (Reference Start Date of the Subject).
5. Additional Considerations:
Duplicate AEs should be avoided – separate occurrences of the same AE should have
distinct records.
AE that starts before study treatment begins is still recorded if relevant to the study.
Ongoing AEs at the end of the study should be flagged properly.
Uncoded AETERM should not be used in AEDECOD – all events should be mapped
to MedDRA term
Skip to content
ChatGPT
Share
You said:
how to populate variables in adverse events domain in sdtm
ChatGPT said:
In SDTM, the Adverse Events (AE) domain is used to capture information about adverse events
experienced by study subjects. Below is a step-by-step guide on how to populate key variables in the AE
domain.
Typically, the adverse events data comes from the CRF (Case Report Form) and sponsor-defined
adverse event coding dictionaries (e.g., MedDRA).
AETERM Reported Term for the AE Directly from the CRF (verbatim term).
AEMODIFY Modified Reported Term Can be derived if any modifications are needed.
AEDECOD Dictionary-Derived Term Assign the Preferred Term (PT) from MedDRA.
AESOC System Organ Class Assign System Organ Class (SOC) from MedDRA.
AEACN Action Taken with Study Treatment Captured from CRF ( Drug Withdrawn, Dose Reduce
SDTM Variable Label Mapping/Derivatio
AECONTRT Concomitant Treatment Given "Y" if any treatment was given for AE, else "N".
sas
CopyEdit
data sdtm.AE; set raw.ae; /* Assign study-level variables */ STUDYID =
"XYZ123"; DOMAIN = "AE"; USUBJID = catx("-", STUDYID, SITEID, SUBJID); /*
Unique sequence number */ AESEQ = _N_; /* Adverse event terms */ AETERM =
AE_TERM; AEMODIFY = AE_TERM_MODIFIED; AEDECOD = AE_PREFERRED_TERM; AESOC =
AE_SYSTEM_ORGAN_CLASS; /* Severity and seriousness */ AESEV = AE_SEVERITY;
AESER = ifc(AE_SERIOUS="Yes", "Y", "N"); /* Relationship, action taken, and
outcome */ AEREL = AE_RELATIONSHIP; AEACN = AE_ACTION_TAKEN; AEOUT =
AE_OUTCOME; /* Dates conversion to ISO 8601 */ if AE_START_DATE ne . then
AESTDTC = put(AE_START_DATE, yymmdd10.); if AE_END_DATE ne . then AEENDTC =
put(AE_END_DATE, yymmdd10.); /* Ongoing AE flag */ if AEONGOING = "Yes" then
do; AEONGO = "Y"; AEENRF = "ONGOING"; end; else do; AEONGO = "N"; AEENRF =
""; end; /* Death-related flags */ AESDTH = ifc(AEOUT = "Fatal", "Y",
"N"); /* Concomitant treatment given */ AECONTRT = ifc(AE_TREATMENT_GIVEN =
"Yes", "Y", "N"); /* Assign toxicity grade if available */ AETOXGR =
AE_TOXICITY_GRADE; /* Keep necessary SDTM variables */ keep STUDYID DOMAIN
USUBJID AESEQ AETERM AEMODIFY AEDECOD AESOC AESEV AESER AEREL AEACN AEOUT
AESTDTC AEENDTC AEONGO AEENRF AESDTH AECONTRT AETOXGR; run;
5. Summary
Identify and extract adverse event data from raw datasets.
Ensure proper ISO 8601 date format conversion.
Derive severity, seriousness, outcome, and treatment action.
Use MedDRA coding for AEDECOD and AESOC.
Handle missing or ongoing AEs properly.
Validate data against SDTM rules (e.g., AEDECOD must not be missing).
Skip to content
ChatGPT
Share
You said:
how to populate variables in concomittent medicaion omain sdtm
ChatGPT said:
In SDTM, the Concomitant Medications (CM) domain is used to capture information about
medications taken by a subject during a clinical study, except for investigational products. Here’s
how you can populate key variables in the CM domain:
Map CMTRT (as reported by the site) to CMDECOD (coded term from WHO-DD
dictionary).
If medication is still ongoing, set CMONGO = "Y" and leave CMENDTC blank.
4. Real-Time Scenarios
Scenario 1: Missing End Date
If the medication is ongoing, keep CMENDTC blank and set CMONGO = "Y".
If a medication does not have a coded term in WHO-DD, CMDECOD remains blank.
You said:
es
ChatGPT said:
Here’s a SAS example code to derive the CM (Concomitant Medications) domain from raw
data:
Expected Output
STUDYID DOMAIN USUBJID CMSEQ CMTRT CMDECOD CMDOSE
ABC123 CM 101-001 1 Ibuprofen Ibuprofen 200
data raw_cm;
datalines;
101-001,Ibuprofen,200,mg,BID,Oral,2023-01-15,2023-01-20
101-001,Lipitor,10,mg,QD,Oral,2022-12-01,
101-002,Metformin,500,mg,BID,Oral,2023-02,
101-003,Amoxicillin,250,mg,TID,Oral,,
;
run;
data cm_decoded;
set raw_cm;
data cm_final;
set cm_decoded;
STUDYID = "ABC123";
DOMAIN = "CM";
run;
var STUDYID DOMAIN USUBJID CMSEQ CMTRT CMDECOD CMDOSE CMDOSU CMDOSFRQ CMROUTE
CMSTDTC CMENDTC CMONGO;
run;
Skip to content
ChatGPT
Share
You said:
assumpions of concommittent medication domain in sdtm
ChatGPT said:
In SDTM (Study Data Tabulation Model), the Concomitant Medication (CM) domain is
used to capture information about medications that a subject is taking in addition to the
investigational product during the study. When working with the CM domain, certain
assumptions and guidelines should be followed to ensure data standardization and compliance.
Here are the key assumptions:
Skip to content
ChatGPT
Share
You said:
assumptions of exposure domain in sdtm
ChatGPT said:
In SDTM (Study Data Tabulation Model), the Exposure (EX) domain is used to capture
information about the study treatment that a subject actually received. The assumptions for the
EX domain are:
Skip to content
ChatGPT
Share
You said:
how to populate variables in exposure domain in stm
ChatGPT said:
In the Exposure (EX) domain of SDTM, we capture details about the study drug administration,
including dosage, frequency, route, and treatment duration. Populating variables in the EX
domain requires mapping raw clinical trial data (from EDC or CRF) into standardized SDTM
format.
The EX domain is typically derived from drug administration records collected in raw
datasets such as:
Dosing CRF (Electronic Case Report Form)
Medication logs
IVRS (Interactive Voice Response System)
Drug accountability records
Below are some essential EX domain variables and how they are populated:
USUBJID Unique Subject ID Concatenation of study ID, site ID, and subject number (e.g., "XYZ-123-
EXDOSE Dose Administered Captured from the dosing CRF (e.g., "10 mg")
EXDOSFRQ Dosing Frequency "QD" (once daily), "BID" (twice daily), "TID" (three times daily)
EXREASND Reason for Dose Not Given "Missed Visit", "Adverse Event", "Withdrawal", etc.
3. Example of EX Dataset
Let's assume subject 1001 took a drug orally for 3 days:
If only year and month are available (YYYY-MM), set unknown components as "--".
Example: EXSTDTC = "2025-03--"
If only year is available (YYYY), format as "YYYY----".
Example: EXSTDTC = "2025----"
If the subject missed a dose due to an Adverse Event (AE) or any other reason:
Set EXDOSE = 0
Capture reason in EXREASND
Example:
ini
CopyEdit
USUBJID = "XYZ-123-1002"
EXSEQ = 4
EXDOSE = 0
EXREASND = "Adverse Event"
If a subject had a dose change, add a new record with the updated dose and flag
adjustment in EXADJ.
raw;
E8601DT.); keep STUDYID DOMAIN USUBJID EXSEQ EXTRT EXDOSE EXDOSU EXDOSFRM
EXDOSFRQ EXROUTE EXSTDTC EXENDTC; run;
Skip to content
ChatGPT
Share
You said:
assumptions of vital signs domain in sdtm
ChatGPT said:
In SDTM, the Vital Signs (VS) domain is used to store subject-level vital sign measurements
such as blood pressure, heart rate, temperature, and respiratory rate. When working with the VS
domain, certain assumptions must be followed to ensure compliance with CDISC standards.
Here are the key assumptions:
Skip to content
ChatGPT
Share
You said:
how to populate variables in vital signs domain in sdtm domain
ChatGPT said:
In SDTM, the Vital Signs (VS) domain is used to store vital sign measurements such as blood pressure,
heart rate, temperature, and respiratory rate. Here’s how you can populate variables in the VS domain
step by step:
The vital signs data is usually collected from CRF (Case Report Forms) or electronic health records (EHR).
It contains measurements for different visits and time points.
VSTESTCD Vital Signs Test Short Name Predefined list (e.g., SYSBP, DIABP, PULSE, TEMP, RESP)
VSTEST Vital Signs Test Name Predefined list (e.g., Systolic Blood Pressure, Diastolic Blood Pr
VSORRESU Original Units e.g., "mmHg" for blood pressure, "bpm" for pulse, "°C" for tem
sas
CopyEdit
data VS; set source_data; length STUDYID $10 DOMAIN $2 USUBJID $20 VSTESTCD
$10 VSTEST $50 VSORRES $10 VSORRESU $10; STUDYID = "ABC123"; /* Study ID */
DOMAIN = "VS"; VSSEQ = _N_; /* Auto-generated sequence */ VSDTC = VSDATE; /*
Ensure it's in ISO 8601 format */ /* Mapping Vital Signs */ array tests{5} $
SYSBP DIABP PULSE TEMP RESP; array testcds{5} $ ('SYSBP', 'DIABP', 'PULSE',
'TEMP', 'RESP'); array testnames{5} $ ('Systolic Blood Pressure', 'Diastolic
Blood Pressure', 'Pulse Rate', 'Body Temperature', 'Respiratory Rate'); array
units{5} $ ('mmHg', 'mmHg', 'bpm', '°C', 'breaths/min'); do i = 1 to 5; if
tests{i} ne . then do; VSTESTCD = testcds{i}; VSTEST = testnames{i}; VSORRES
= put(tests{i}, best.); VSORRESU = units{i}; VSSTRESN = tests{i}; VSSTRESU =
VSORRESU; output; end; end; drop i SYSBP DIABP PULSE TEMP RESP; run;
1. Check for missing values – Ensure no key variables like USUBJID, VSTESTCD, VSDTC are missing.
2. Verify unit consistency – Ensure VSSTRESU matches the expected standard units.
3. Ensure correct ISO 8601 format for data
Skip to content
ChatGPT
Share
You said:
assumptions of inclusion and exclusions criteria in sdtm
ChatGPT said:
In SDTM (Study Data Tabulation Model), inclusion and exclusion criteria are crucial for
defining the study population. These criteria are typically documented in the IE
(Inclusion/Exclusion) domain and are based on protocol requirements. Here are some key
assumptions associated with inclusion and exclusion criteria in SDTM:
Skip to content
ChatGPT
Share
You said:
how to popualate variables in inclusion and exclusion criteria omain sdtm
ChatGPT said:
In SDTM, the Inclusion/Exclusion Criteria (IE) Domain is used to capture whether a subject
meets the inclusion and exclusion criteria for a clinical trial. The IE domain follows a standard
structure with key variables that need to be populated accurately. Below is a step-by-step guide
on how to populate variables in the IE domain.
DOMAIN Domain Abbreviation Always "IE" for the Inclusion/Exclusion Criteria domain.
IEORRES Original Result The subject's actual result (e.g., "Not Met", "Met").
IESTRESC Standardized Result Standardized result (e.g., "Y" for met, "N" for not met).
IESTAT Inclusion/Exclusion Status Populated if the test was not performed (e.g., "NOT DON
IEDTC Date/Time of Criterion Evaluation Date when the inclusion/exclusion criterion was assessed.
VISITNUM, VISIT,
VISITDY Visit Information If applicable, visit when the criterion was assessed.
Real-Time Scenario
Scenario:
You can populate the IE domain using PROC SQL or DATA STEP in SAS.
sas
CopyEdit
data ie; set source_data; /* Assuming source_data contains eligibility results
*/ length STUDYID $10 DOMAIN $2 USUBJID $20 IECAT $30 IETESTCD $10 IETEST $50
IEORRES $10 IESTRESC $1 IEDTC $10; DOMAIN = "IE"; STUDYID = "STUDY01"; IEDTC =
put(screening_date, yymmdd10.); /* Inclusion Criteria: Age Requirement */
IECAT = "INCLUSION CRITERIA"; IETESTCD = "AGE"; IETEST = "Age Requirement"; if
age >= 18 and age <= 65 then do; IEORRES = "Met"; IESTRESC = "Y"; end; else
do; IEORRES = "Not Met"; IESTRESC = "N"; end; output; /* Exclusion Criteria:
Serious Medical Condition */ IECAT = "EXCLUSION CRITERIA"; IETESTCD =
"MEDCOND"; IETEST = "No Serious Medical Condition"; if serious_med_cond = "No"
then do; IEORRES = "Met"; IESTRESC = "Y"; end; else do; IEORRES = "Not Met";
IESTRESC = "N"; end; output; run;
Key Takeaways
✔ IE domain is crucial in determining subject eligibility.
✔ Accurate mapping of inclusion/exclusion criteria from protocol is required.
✔ Ensure correct values for IEORRES and IESTRESC.
✔ Missing data handling using IESTAT if the criterion was not assessed.
✔ SAS coding can be used to automate the population of IE data.