Pooling Study
Pooling Study
ABSTRACT
Integration and analysis of data across all studies in a submission is a vital part of applications for
regulatory approval in the pharma industry. The existing ADaM classes (ADSL, BDS, and OCCDS)
already support some simple cases of integration analysis. However, there has been a need for an
integration standard that supports the more complex cases.
To address this need, the ADaM Integration sub-team is developing the upcoming ADaM Integration
standards document. This paper introduces the new IADSL, IBDS, and IOCCDS classes found in this
document. IADSL allows for multiple records per subject. IBDS and IOCCDS work effectively with the new
IADSL class. This paper also discusses the analysis needs that necessitated the creation of the new
classes, and provides examples in the form of usage scenarios, data, and metadata. With them, no future
integration will prove too complex.
INTRODUCTION
Integrated clinical data is data collected from multiple clinical studies, combined together in a consistent
manner, to support safety and efficacy analyses as required by regulatory agencies. Other uses include
annual safety reports, investigator brochures (IB), ongoing safety monitoring and risk management,
marketing materials, integrated PK analyses, extension studies, and answers to regulatory authorities'
questions.
Particularly, the FDA requires an Integrated Summary of Safety (ISS) and an Integrated Summary of
Efficacy (ISE) and expects that "the ISE and ISS are not summaries but rather detailed integrated
analyses of all relevant data from the clinical study reports that belong in Module 5." [1] Because ISS and
ISE analysis data are critical components of pharmaceutical and biotech applications to the FDA, the
ADaM integrated data structures were developed with this purpose in mind.
This paper describes two approaches for preparing integrated ADaM data: 1) In cases where the
producer determines that using existing ADaM classes (ADSL, BDS, OCCDS) will support analysis,
provide clear metadata, and offer traceability back to source data, then the existing classes may be used.
2) For all other cases, the ADaM Integration sub-team has prepared new data structure classes (IADSL,
IBDS, IOCCDS) to support the unique challenges of integration analysis. This paper will introduce and
give examples of the use of these new classes. Please note that the ADaM integration sub-team does not
require or recommend a given data flow when creating integrated ADaM dataset.
This paper assumes the reader has working knowledge of CDSIC ADaM standards and experience with
integration analysis, specifically integration pool definitions and analyses based on pools.
1
ADaM Structures for Integration: A Preview, continued
In the simple case, where each subject in the integration participates in only one study, the study-level
ADSL datasets may be set together to create an integrated one-record-per-subject ADSL dataset with a
class of ADSL. BDS and OCCDS study-level data may similarly be combined.
In complex cases, where subjects participate in more than one trial, existing classes may still be used.
However, existing rules must still be followed, e.g. required ADSL variables such as STUDYID, SUBJID
must still be populated, which means choosing one value for subjects participating in more than one
study.
If the integrated SAP defines pool analysis with treatment, baseline, covariate values that vary for a
subject depending on the pool, variables to support these differences will be needed. E.g. if the first
exposure date TRTSDT differs for a subject depending on the pool, then pool-specific versions of this
variable will be needed. Due to the breadth of variables that can be impacted by requirements unique to
integration, variable naming is left to the sponsor; standardized variable naming for integration analysis
will be a part of the approach using new integration structure classes.
2
ADaM Structures for Integration: A Preview, continued
The primary efficacy analysis in this ISE is a change from baseline to endpoint for total PANSS score for
all randomized treatment groups in the three studies. Each study medication (Drug A) dose group will be
compared with Placebo using an analysis of covariance model (ANCOVA) based on the pooled ITT
analysis population excluding the active control group (Drug B 10 mg). Sample metadata and data are
presented below.
ADSL
The ADSL metadata in Table 1 below describes the derivation and data origin for variables in the
integrated ADSL dataset. Please note that the double-blind period for study S001 has been aligned with
the other two studies. The actual integrated ADSL dataset will contain many more variables; this example
has been edited for length for this paper.
Table 1: Sample ADSL Metadata
Variable Where Condition Source/Derivation/Comments
3
ADaM Structures for Integration: A Preview, continued
Table 2 below shows sample ADSL records for the variables described in Table 1. Other than changes in
the value for variable STUDYID, there is little to distinguish this dataset from a study-level ADSL dataset.
Table 2: Sample ADSL Data
ROW STUDYID USUBJID SITEID ARMCD TRT01P SAFFL ITTFL AGE REGION1
ADEFF
The ADEFF metadata in Table 3 below describes the derivation and data origin for variables in the
integrated ADEFF datasets. Please note that the TRTPN variable is rederived to ensure variables TRTP
and TRTPN shared a one-to-one relationship as required by the BDS structure. The variables STUDYID
and RSSEQ allow records to be traced back to the original SDTM source record.
Table 3: Sample ADEFF Metadata
Variable Label Source/Derivation/Comment
USUBJID Unique Subject Identifier ADEFF.USUBJID from individual study ADaM dataset
TRTPN Planned Treatment (N) Mapped into unique number according to combined data for TRTP values.
AVISITN Analysis Visit (N) ADEFF.AVISITN from individual study ADaM dataset
4
ADaM Structures for Integration: A Preview, continued
CHG Change from Baseline ADEFF.CHG from individual study ADaM dataset
PCHG Percent Change from ADEFF.PCHG from individual study ADaM dataset
Baseline
ABLFL Baseline Record Flag ADEFF.ABLFL from individual study ADaM dataset
ANL01FL Analysis Record Flag 01 ADEFF.ANL01FL from individual study ADaM dataset (If multiple visits fall
into the same visit window, then the one closest to the target day is chosen
for analysis. These are flagged with ANL01FL="Y").
Table 4 below shows sample ADEFF records for the variables described in Table 3. SDTM variables
identify the records copied from SDTM, and derived records are copied directly from study ADaM
resulting in a simple to produce ADaM datasets.
Table 4: Sample ADEFF Data
ROW STUDYID USUBJID RSSEQ TRTP TRTPN PARAMCD ADY VISITNUM
5
ADaM Structures for Integration: A Preview, continued
ROW VISIT AVISITN AVISIT AVAL BASE CHG PCHG ABLFL ANL01FL DTYPE
The simplicity of this example is possible due to both the straightforward design of the three pivotal
studies and the effort to standardize the analysis datasets in the individual study analysis. Without either,
the creation of the ISE analysis datasets would involve far more complexity.
6
ADaM Structures for Integration: A Preview, continued
In taking this approach, the same variable in ADSL can take on a different value for each pool. E.g. if the
first exposure date TRTSDT for a subject differs depending on the pool, the variable TRTSDT will hold
the appropriate date in each pool-specific record.
The new IBDS and IOCCDS structure classes are designed for merging with the new IADSL structure.
Where IADSL is a set of subject-level records for each pool, IBDS and IOCCDS is a set of records for
each pool, allowing merging with ADSL by pool. Sets of records should only be created as needed for
analysis, e.g. if a pool is not used for AE analysis, a set of records do not need to be created for that pool
in ADAE. Records may also be omitted from a pool if it is not needed, e.g. if a pool only analyzes studies
2 and 3, records from study 1 do not need to be included for that pool.
The example below provides a demonstration of the new structure classes. Please note more integration
ADaM standard variables and use cases may be found in the official ADaM Integration document.
Figure 5 shows sample pool definitions from an integration SAP for the ISS analysis. Three pools are
defined, listing the studies and treatment periods included for each pool.
7
ADaM Structures for Integration: A Preview, continued
In addition to the definitions above, the SAP defines baseline values as the last value collected before the
first exposure in each pool and treatment-emergent adverse event (TEAE) periods are based on the
treatments received in each pool.
For subjects such as 02-003 and 02-004, their first exposure date for Pools 1 and 3 will be based on their
participation in study MD-201, while their first exposure date for Pool 2 will be based their participation
studies MD-301/302. Their baseline values are similarly impacted.
To support the pool-specific definitions for analysis, it is decided to use the new Integration structure
classes.
IADSL
The ADSL metadata below in Table 5 includes the new integration variable POOL and two other new
variables POOLC (the character description of POOL) and STUDIES (multiple STUDYID values in one
string). The treatment and analysis period variables are aligned to the SAP definition for each pool.
Table 5: Sample ADSL Metadata
Variable Label Source/Derivation/Comments
POOL Pool For each SAP defined pool a subject belongs to, create a record. 1 for overall pool
if patient was enrolled in any study (101, 201, 301, 302, 320), 2 for Pivotal pool if
patient was enrolled in study 301 or 302. 3 for comparison pool if patient was
enrolled in study 201, 302 or 320.
STUDIES Study or Studies A list, delimited by comma ',' , of ISS.DM.STUDYID values in the order of
Identifier participation for each subject as applicable for the pool
TRT01P Planned Treatment First treatment period in ISS.DM.ARM where ISS.DM.STUDYID is in the list of
for Period 01 STUDIES
TR01SDT Date of First The date of the first dose in the period described by TRT01P. Take EX records for
Exposure in Period the described period by checking ISS.EX.STUDYID and ISS.EX.EXTRT, take the
01 minimum ISS.EX.EXSTDTC value converted to SAS® date9 format.
TR01EDT Date of Last The date of the last dose in the period described by TRT01P. Take EX records for
Exposure in Period the described period by checking ISS.EX.STUDYID and ISS.EX.EXTRT, take the
01 maximum ISS.EX.EXENDTC value converted to SAS date9 format.
AP01EDT Period 01 End Date The minimum of TR01EDT+7 and the next period start date-1
AP02EDT ... The minimum of TR02EDT+7 and the next period start date-1
8
ADaM Structures for Integration: A Preview, continued
AP03EDT ... The minimum of TR03EDT+7 and the next period start date-1
Table 6 below shows sample ADSL records for the variables described in Table 5. There is a record for
each subject and pool the subject belongs in. The variable STUDIES lists the studies included in each
record, and all values on the record reflect pool-specific derivations. Additional variables such as
demographics, population flags, covariates, and baseline variables are expected in a real study.
Table 6: Sample ADSL Data
ROW USUBJID POOL POOLC STUDIES TRT01P TR01SDT TR01EDT AP01SDT
2(C) 2000-09-09 SOC 20mg 2000-09-10 2000-10-03 2000-09-10 2000-10-10 MD 10mg 2001-08-21
3(C) 2002-04-18
4(C) 2000-09-09 SOC 20mg 2000-09-10 2000-10-03 2000-09-10 2000-11-02 MD 10mg 2001-08-21
5(C) 2000-10-01 MD 10mg 2000-10-02 2000-10-27 2000-10-02 2000-11-03 SOC 20mg 2001-09-06
6(C) 2002-05-01
7(C) 2000-10-01 MD 10mg 2000-10-02 2000-10-27 2000-10-02 2000-11-03 SOC 20mg 2001-09-06
9
ADaM Structures for Integration: A Preview, continued
1(C)
3(C)
6(C)
IOCCDS
The ADAE metadata below in Table 7 includes the new integration variable POOL merged in from ADSL.
Other variables from ADSL are implicitly included and not relisted. A set of ADAE records is created for
pools 2 and 3 only, intended to show that the producer may create records as needed. The variables
STUDYID and AESEQ provide data point traceability back to the SDTM source records. The derivation of
TRTP and TRTEMFL do not need to refer to pool definitions because the variables APxxSDT/APxxEDT
merged in from ADSL are already pool-specific, simplifying the derivation of this variable.
TRTP Planned Treatment If APxxSDT<=ASTDT<=APxxEDT for xx=01 thru 04, then set TRTP=the
corresponding TRTxxP variable. For example, if
AP02SDT<=ASTDT<=AP02EDT, then set TRTP=TRT02P
TRTEMFL Treatment Emergent Treatment emergent adverse events are defined as those that start within
Analysis Flag a treatment exposure period + 7 days or the day before the start of the
next exposure period, whichever is less.
If APxxSDT<=ASTDT<=APxxEDT for xx=01 thru 04, then set
TRTEMFL='Y'.
10
ADaM Structures for Integration: A Preview, continued
Table 8 below shows sample ADAE records for the variables described in Table 7, omitting AEBODSYS.
Records supporting pool 2 only include records from studies MD-301/302 while pool 3 records also
include records from MD-301. Data point traceability is clear from SDTM variables and derivation
traceability for TRTP and TRTEMFL would be clear from the inclusion of ADSL variables.
Table 8: Sample ADAE Data
USUBJID POOL STUDYID AESEQ AEDECOD ASTDT TRTP TRTEMFL
Please note that the same AE record for subject 02-003 from MD-301 with AESEQ=2 is considered a
TEAE for pool 3 but not for pool 2. SAP pool definitions can result in different values for TRTEMFL, ADY,
TRTP, TRTA, APERIOD, and other variables for the same AE record. Having a set of records for each
pool means standard OCCDS variables may be used, whereas pool-specific custom variables would
need to be created if the same record was to support all pools.
IBDS
The ADLB metadata below in Table 9 includes the new integration variable POOL merged in from ADSL.
A set of records is created for pools 2 and 3 only as defined in the SAP. The variables STUDYID and
LBSEQ provide data point traceability back to SDTM. The value of AVISIT, ADY, ABLFL, and TRTP may
differ depending on the pool, however as pool-specific ADSL variable values are merged in, the
derivations remain simple.
11
ADaM Structures for Integration: A Preview, continued
ABLFL Baseline Record Flag Set to "Y" for last pre-dose record where ADY<=1 and
ISS.LB.LBTPT=PREDOSE (Note: Time point is not shown in this
example)
TRTP Planned Treatment If APxxSDT<=ASTDT<=APxxEDT for xx=01 thru 04, then set
TRTP=the corresponding TRTxxP variable. For example, if
AP02SDT<=ASTDT<=AP02EDT, then set TRTP=TRT02P
Table 10 below shows sample ADLB records for the variables described in Table 9. Records supporting
pool 2 only include records from studies MD-301/302 while pool 3 records also include records from MD-
301. Data point traceability is clear from SDTM variables and derivation traceability for timing related
variables would be clear from the inclusion of ADSL variables.
12
ADaM Structures for Integration: A Preview, continued
USUBJID POOL STUDYID LBSEQ PARAM AVAL ADT ADY AVISIT ABLFL TRTP
Please note that conceivably, the variable BASETYPE could be used for the same purpose as POOL in
this dataset. For the goal of consistency with ADSL by sharing key variables USUBJID and POOL, the
variable POOL is required. The variable BASETYPE remains available if there are multiple baselines
within a pool.
CONCLUSION
There are more IADSL standard variables defined in the ADaM Integration standards document than the
ones presented in this paper. On the other hand, there are no new standard variables for the IBDS and
IOCCDS structures other than those merged in from the ADSL dataset (IADSL class). When merged in,
the new IADSL class variables have an impact on the usage of variables such as ABLFL, BASE,
OCCzzFL, and others. In order to promote understanding of these changes, and permit compliance rules
around them, new structure classes are necessary.
Existing ADaM classes may be used for integration, and as seen through planning in advance, when
similar study and dataset designs are utilized with the same controlled terminology, integration efforts are
greatly reduced.
For the challenge of more complex integrations, namely when analyses vary by pool, the new
IADSL/IBDS/IOCCDS structures build this variability vertically into the analysis datasets. The advantages
of this approach are the reusability of the existing ADSL class variables pertaining to subjects within pool
rather than just subjects, and the ability to subset all the ADaM datasets by pool and quickly understand
the experience of a subject within a pool.
13
ADaM Structures for Integration: A Preview, continued
REFERENCES
[1] FDA. “Guidance for Industry Integrated Summaries of Effectiveness and Safety: Location Within the
Common Technical Document.” Published April 2009. Available at
https://www.fda.gov/downloads/drugs/guidances/ucm136174.pdf.
RECOMMENDED READING
• Analysis Data Model Implementation Guide version 1.1
• Analysis Data Model Structure for Occurrence Data (OCCDS) version 1.0
• CDISC Define-XML Specification Version 2.0
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Wayne Zhong
Accretion Softworks
[email protected]
Kimberly Minkalis
The Griesser Group
[email protected]
Deborah Bauer
Sanofi
[email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
14