0% found this document useful (0 votes)
21 views7 pages

Causal Analysis For Multivariate Integrated Clinic

This study investigates causal relationships in a large-scale electronic health record dataset of asthma patients, focusing on demographic, clinical, and environmental factors. Using the Integrated Clinical and Environmental Service (ICEES), the authors identify significant predictors of asthma attacks and perform simulated interventions to assess causal effects. The findings highlight the importance of causal inference in medical decision-making and the challenges of accessing clinical data for research.

Uploaded by

bhatbasharat78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views7 pages

Causal Analysis For Multivariate Integrated Clinic

This study investigates causal relationships in a large-scale electronic health record dataset of asthma patients, focusing on demographic, clinical, and environmental factors. Using the Integrated Clinical and Environmental Service (ICEES), the authors identify significant predictors of asthma attacks and perform simulated interventions to assess causal effects. The findings highlight the importance of causal inference in medical decision-making and the challenges of accessing clinical data for research.

Uploaded by

bhatbasharat78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Sinha et al.

BMC Medical Informatics and


BMC Medical Informatics and Decision Making (2025) 25:27
https://doi.org/10.1186/s12911-025-02849-4 Decision Making

RESEARCH Open Access

Causal analysis for multivariate integrated


clinical and environmental exposures data
Meghamala Sinha1*, Perry Haaland2, Ashok Krishnamurthy3,4, Bo Lan5, Stephen A. Ramsey1, Patrick L. Schmitt3,
Priya Sharma3, Hao Xu3 and Karamarie Fecho3

Abstract
Electronic health records (EHRs) provide a rich source of observational patient data that can be explored to infer
underlying causal relationships. These causal relationships can be applied to augment medical decision-making
or suggest hypotheses for healthcare research. In this study, we explored a large-scale EHR dataset on patients
with asthma or related conditions (N = 14,937). The dataset included integrated data on features representing
demographic factors, clinical measures, and environmental exposures. The data were accessed via a service named
the Integrated Clinical and Environmental Service (ICEES). We estimated underlying causal relationships from the data
to identify significant predictors of asthma attacks. We also performed simulated interventions on the inferred causal
network to detect the causal effects, in terms of shifts in probability distribution for asthma attacks.
Keywords Causal inference, Structure learning, Open clinical data, Asthma

Introduction history, diagnoses, medications, and laboratory results.


Causal inference [1] is re-emerging as an important tool Inferring causal relationships from these data is useful for
in the domain of health sciences for informatics work important tasks like prediction and explanation. For pre-
such as finding effects of a drug or risk factors for a dis- diction, we want to measure the likelihood of occurrence
ease. Causality has traditionally been a core concept of an event as a result of another event, for example, the
across all branches of medical science and considered occurrence of lung cancer based on exposure to smoke
when diagnosing patients based on their symptoms, in the environment. However, such predictions are sus-
effects of treatment, and years of historical evidence [2]. ceptible to fallacies if made only based on associations
Electronic health records (EHRs) present a potential data for instance, an increase in the sales of matches (e.g., in
source to analyze digital patient information like medical a blackout-prone area) can also be associated with lung
cancer. Most black-box prediction models, unlike causal
inference, are not able to identify confounding variables
*Correspondence:
Meghamala Sinha
and hence cannot differentiate causal versus spurious
[email protected] associations.
1
School of Electrical Engineering and Computer Science, Oregon State Another aspect of causal inference is the ability to pro-
University, Corvallis, OR 97331, USA
2
Department of Statistics and Operations Research, University of North
vide an explanation for the relationship between two
Carolina at Chapel Hill, Chapel Hill, North Carolina, USA events. For instance, causal inference helps us to unearth
3
Renaissance Computing Institute, University of North Carolina at Chapel why a patient is sick and diagnose them based on the
Hill, Chapel Hill, North Carolina, USA
4
Department of Computer Science, University of North Carolina, Chapel
underlying cause of their symptoms and other aspects of
Hill, North Carolina, USA their disease. Access to EHR data is thus critical for the
5
UNC Highway Safety Research Center, University of North Carolina advancement of clinical research and practice. However,
at Chapel Hill, Chapel Hill, North Carolina, USA
due to the many regulations that surround clinical data,

© The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if
you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or
parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To
view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Sinha et al. BMC Medical Informatics and Decision Making (2025) 25:27 Page 2 of 7

while necessary to ensure patient privacy and protection answer important questions about the effects of clinical
of sensitive data, access to the data for research is often interventions. We use subject matter expert knowledge
challenging. and publication support as our ground truth to measure
In this research, we analyzed a patient-level data- the correctness of our causal inference modeling. Finally,
set extracted from a regulatory-compliant open service we discuss our findings, including the benefits and limi-
called the Integrated Clinical and Environmental Expo- tations of our causal inference model and approach.
sures Service (ICEES). ICEES supports several use cases
including asthma. The ICEES data are constructed by Analysis of the multivariate ICEES table
integrating clinical data elements derived from patient We queried the ICEES OpenAPI to generate an eight-
EHRs and environmental exposures data derived from a feature multivariate table. The multivariate table ana-
variety of public sources of environmental exposures data lysed in this work comprised data on 14,937 patients
before binning or recoding the data and stripping all pro- (rows represent individual patients in the asthma cohort)
tected health information per the Safe Harbor method and eight ICEES feature variables, per patient, namely,
of the Health Insurance Portability and Accountability TotalEDInpatientVisits, Sex, Race, Prednisone, Obesity,
Act [3]. PM2.5Exposure, RoadwayExposure, and EstResidential-
The ICEES data are then exposed via an open applica- Density, where TotalEDInpatientVisits is our primary
tion programming interface (OpenAPI). For our prin- outcome variable (Table 1). In Fig. 1, we plot bar charts
cipal application use case, we asked if there is a causal to show comparisons of the number of TotalEDInpatient-
relationship between asthma attacks and the following Visits among the discrete categories of each feature. We
features: sex, race, prescriptions for prednisone, diag- can see that the count for zero TotalEDInpatientVisits
noses of obesity, residential proximity to a major road- is the largest among all categories. Upon further analy-
way or highway, residential density, and exposure to sis, we found that the multivariate data table extracted
high levels of airborne pollutants. These features were from the openAPI largely consisted of patients who were
selected because published studies, including our prior inactive in the year 2010. Hence, to avoid bias and reduce
work [4, 5], have recognized them to be associated to noise in our analysis, we removed patients who were not
asthma attacks. We focused on an existing ICEES cohort active in the year of interest, meaning their EHR did not
of patients with asthma or related conditions (see [4] for indicate any healthcare usage, by applying the “Active_
details), and we considered the number of annual emer- In_Year” feature as a filter to extract a multivariate table,
gency department (ED) or inpatient visits for respiratory with Active_In_Year = 1 to select only patients who were
issues as the primary outcome measure and indicator of active in 2010. We show the bar charts for the number
asthma attacks. We used the ICEES OpenAPI to extract of ED/inpatient visits for each feature in Fig. 1. We can
features that might be causally related to each other and observe that most of patients who were active in year
used the resultant multivariate table for causal inference 2010 only visited the ED or an inpatient clinic once. We
modeling. Because EHR data are purely observational, also can see there is an imbalance among the levels in
we also demonstrate a way to perform simulated exter- some features like Prednisone, Obesity, Race, Roadway-
nal intervention, given a known causal network, to help Exposure, and Pm2.5exposure.

Table 1 Feature variables used to generate multivariate table


Feature Variable Variable Definition and Enumeration

Sex Male (0), Female (1)


Race Caucasian, African American, Asian, Native Hawaiian/Pacific Islander, American/Alaskan Native, Other
Prednisone Common medication for asthma-like conditions (1=Yes, 0=No)
Obesity Diagnostic code for obesity anytime over ‘study’ period (1=Yes, 0=No)
Airborne Particulate Exposure Abbreviated herein as “PM2.5Exposure”. US Environmental Protection Agency estimated maximum daily exposure
to particulate matter ≤2.5-microns in diameter over ‘study’ period, binned using pandas.cut
Roadway Exposure Abbreviated herein as “RoadwayExposure”. US Department of Transportation distance in meters from household
to nearest roadway (1 = 0–49, 2 = 50–99, 3 = 100–149, 4 = 150–199, 5 = 200–249, 6 = ≥ 250 meters)
Residential Density Abbreviated herein as “EstResidentialDensity”. US Census Bureau American Community Survey 2007–2011 estimated
total population [block group], binned according to US Census Bureau definitions
Emergency Department Visits Abbreviated herein as “TotalEDInpatientVisits”. Total number ED or inpatient visits for respiratory issue(s) over the ‘study’
period (0, 1, 2, 3, ...)
Sinha et al. BMC Medical Informatics and Decision Making (2025) 25:27 Page 3 of 7

Fig. 1 Stacked bar chart representing the number of TotalEDInpatientVisits across each level of the feature variables. See Table 1 for feature variable
definitions

Feature importance found Prednisone, Race, ObesityDx as the highest con-


We evaluated the importance of each feature and its tributing factors, as shown in Fig. 2.
contribution towards the model performance using a
tree-based machine learning model: random forest. We Modeling causal networks
leveraged the caret R package [6] to evaluate the feature Most of the naturally occurring trends that we come
importance. We controlled the parameters for training by across are simply passive observations of events occur-
using the repeatedcv method to divide our dataset into ring in the world that are either coincidental or unex-
ten-folds cross-validation and repeated three times. We plained associations. For example, statements like

Fig. 2 Relative feature importance for all feature with respect to TotalEDInpatientVisits. See Table 1 for feature variable definitions
Sinha et al. BMC Medical Informatics and Decision Making (2025) 25:27 Page 4 of 7

“Drinking beer everyday increase chance of prostate (2) Constraint-based: This method is based on estimat-
cancer” are common in the news and scientific reporting ing some of the conditional (in)dependencies in
and in our day-to-day personal beliefs. These associations the distribution P from the data D by performing
can be easily mistaken as causation, making us suscep- hypothesis tests of conditional independence. Con-
tible to logical fallacies without knowing the real under- straint-based methods usually start with a fully con-
lying cause. Causal inference is the science of learning nected, undirected graph and progressively remove
cause from effect [1]. It is an important field of research edges whenever a new conditional independence
because it helps us eradicate spurious correlation [7, 8]. relation is discovered, while satisfying the corre-
The primary aim of inferring causal relations from data is sponding d-separation statements. In this work, we
to discover interactions between different entities in the will use a constraint-based approach called the PC
form of Vi → Vj , where Vi and Vj are observable features algorithm, given that the dataset is observational.
in domain and the arrow indicates that the state of Vi To infer the causal graph from data, we learn the
influences the state of Vj . Causal inference can be either equivalence class of a directed acyclic graph (DAG)
discovered through observational measurements (seeing) from data with the traditional constraint-based PC
or from measurements after performing some external algorithm proposed by [9]. Given a dataset D hav-
manipulation/intervention (doing). A causal network [1, ing n features Vi , ....., Vn, we conduct the following
9] can be represented with a directed acyclic graph steps. We start with a complete undirected graph
(DAG) G = (V , E), where V = Vi , ....., Vn denotes the set given n features. We then eliminate edges between
of features and E ∈ (V × V ) denotes the set of edges that variables that are unconditionally independent. For
are causal in nature. For a causal edge (Vi , Vj ), we say that each pair of variables (Vi , Vj ) with an edge between
Vi is a cause (parent) of Vj , and Vj is the resulting effect them, and for each variable Vk with an edge con-
(child) of Vi . Let pa(Vi ) denote the set of parents of Vi . nected to either of them, we eliminate the edge
The conditional probability distribution Pi defines the between Vi and Vj if Vi ⊥ ⊥ Vj | Vk . For each pair of
probability of Vi given the state of its parents pa(Vi ). A variables Vi , Vj having an edge between them, and
causal network represents a joint distribution P over vari- for each pair of variables Vk , Vl with edges both con-
ables V as long as it satisfies two main assumptions: nected to Vi or both connected to Vj , we eliminate
the edge between Vi and Vj if Vi ⊥ ⊥ Vj | Vk , Vl . We
(a) Causal Markov assumption: Any given variable Vi continue to check independencies conditional on
is independent of its non-descendants, conditioned subsets of variables of increasing size n until there
on all of its direct causes (parents). This implies are no more adjacent pairs (Vi , Vj ) such that there is
that the joint distribution P(V) can be factored as: a subset of variables of size n in which all of the var-
p(V ) = ni=1 pi (Vi | Pa(Vi )). iables in the subset are adjacent to Vi or adjacent to
(b) Faithfulness assumption: The joint distribution Vj . For each triple of variables (Vi , Vj , Vk ) such that
p(V1 , . . . , Vn ) is faithful to G if every conditional Vi and Vj are adjacent, Vj and Vk are adjacent, and Vi
independence relation in the probability distribu- and Vk are not adjacent, we orient the edges Vi—-Vj
tion P is entailed by the Markov assumption applied —-Vk as Vi → Vj ← Vk , if Vj is not in the set con-
to G [10]. ditioning on which Vi and Vk became independent
and the edge between them was accordingly elimi-
To reconstruct a causal graph from data, we generally nated. We call such a triple of variables a v-struc-
start by finding an approximation of the graph, given V, ture. For each triple of variables such that Vi → Vj
and then optimize based on conditions on data. The two —-Vk , and Vj and Vk are not adjacent, we orient the
main approaches used for causal network inference are: edge Vj—-Vk as Vj → Vk . This is called orientation
propagation.
(1) Score-based: This is based on a Bayesian scoring
function S(G | D), which estimates the goodness-of- Results
fit of graph G to the data D [11], as objective func- Inferring causal graphs
tions to maximize, while favoring simpler struc- We first applied the PC algorithm to the ICEES multivar-
tures. The score function is usually combined with iate feature table. In Fig. 3a, we show the inferred casual
a search heuristic that explores the space of all pos- graphs, first using the entire table with all eight features
sible graphs. Score-based methods are robust and and second in Fig. 3b using only the top four important
can be extended to include interventional studies (if features with respect to TotalEDInpatientVisits, as deter-
available), but they are not scalable as network or mined in Feature importance section. Expected relation-
data size increases. ships between features based on subject matter expertise
Sinha et al. BMC Medical Informatics and Decision Making (2025) 25:27 Page 5 of 7

Fig. 3 Inferred causal graph. Solid black lines represent true positives, dashed lines represent false negatives and red lines represents false positives

and published literature are represented in black (solid Effects of intervention


and dashed) lines. There are eight such expected edges, Having learned a causal network from the data, we now
which we use to measure the structure-learning accu- use it to answer relevant questions by making inferences.
racy of the causal algorithm. Solid black lines represent To evaluate this, we computed the effects of interven-
expected edges (true positives) that are reported via the tions on features by modifying the network to simulate
PC algorithm, while dashed lines are edges which were interventions. Firstly, because some of the edges detected
expected but missed (false negatives). Newly found in Fig. 3a were undirected, we removed them. We then
relationships inferred by the PC algorithm, that are not learned the parameters of our learned causal DAG given
expected, are represented in red (false positive). We the network structure and the data. Next, we constructed
note that there were a few undirected edges detected, a mutilated network to simulate a perfect intervention
for which the algorithm was not able to determine by setting a target node to a particular value. Finally, we
directionality. tested the effects of these interventions, while verifying
Three of eight expected edges were inferred. For the the correctness of the learned causal network, to sub-
determined false positive edges, we conducted a further stantiate some commonly known causal links like the fol-
literature survey to find multiple (5+) citations where a lowing expected claims:
relationship between the features were reported. We
marked them as reported edges, along with the true – Claim (a). Obesity should have a direct effect on
positives. Two out of the three additional edges detected TotalEDInpatientVisits. Hence, conducting an inter-
were found in the literature; hence, we marked them as vention on the node “Obesity” should reflect a change
reported. The expected directed edge from Race → Total- (increase or decrease, accordingly) in the probability
EDInpatientVisits was also missed. distribution of TotalEDInpatientVisits.
As discussed in Analysis of the multivariate ICEES – Claim (b). Prednisone should have a direct effect on
table section, we queried the openAPI a second time to TotalEDInpatientVisits. Hence, conducting an inter-
generate a multivariate table containing only the top four vention on the node “Prednisone” should reflect a
important features (Prednisone, Race, Obesity and Road- change (increase or decrease, accordingly) in the
wayExposure) with respect to TotalEDInpatientVisits, as probability distribution of TotalEDInpatientVisits.
identified by random forest (Fig. 3b). We found signifi- – Claim (c). Sex2 should not have a direct effect on
cant improvement in accuracy. Three out of five expected TotalEDInpatientVisits. Hence, conducting an inter-
edges were detected. An undirected edge between Race vention on the node “Sex2” should not reflect a
and ObesityDx was also detected, which are reported in change (increase or decrease, accordingly) in the
literature as highly associated features. probability distribution of TotalEDInpatientVisits.
Sinha et al. BMC Medical Informatics and Decision Making (2025) 25:27 Page 6 of 7

We conducted these three interventions on our learned from a major roadway/highway as an additional variable
causal network. To test Claim (a), we created a muti- that is casually related to annual ED or inpatient visits for
lated network by fixing the state of ObesityDx to 1, respiratory issues.
which means we are forcing ObesityDx to be present. We validated our findings based on expert knowledge
For Claim (b), we fixed the state of Prednisone to be 1, and prior published literature. Most of our results are
again meaning that we are forcing prednisone to be pre- consistent with previously published literature [12]. For
sent. For Claim (c), we fixed state of Sex2 to be Male. instance, prednisone, which is commonly prescribed
Next, we compared the changes in the probability dis- for patients who are non-responsive to first-line treat-
tribution of TotalEDInpatientVisits before and after ments such as inhaled albuterol [13], has been identi-
these three ad hoc interventions to confirm the expected fied as a factor associated with asthma exacerbations
causal influences. We plotted the changes in the prob- and ED or inpatient visits for respiratory issues [14].
ability distribution of TotalEDInpatientVisits in Fig. 4. As Female sex, obesity, and African American race have
expected, there were changes in the probability distribu- previously been identified as factors that contribute to
tion of TotalEDInpatientVisits for interventions a and b, asthma attacks [15]. In another work by our group [5]
reflected in Fig. 4a and b, respectively. For intervention c, and others [16], obesity and sex have been found to be
the changes before and after intervention were negligible, highly related to asthma attacks. Several other works [3,
meaning that Sex2 had no causal effect on the frequency 17] have additionally found a significant association
of TotalEDInpatientVisits. between African American race and increased risk of
asthma attacks. Exposure to major roadways or highways
Discussion has also been found to be a risk factor for asthma. Sev-
We demonstrated the ability to use the ICEES OpenAPI eral studies [18, 19] have demonstrated an increase in
to answer important questions about causal relationships asthma attacks among patients residing in close proxim-
between factors affecting asthma attacks. We focused on ity to a major roadway or highway. Our findings on the
a large cohort of patients with asthma or related condi- relationship between roadway exposures and asthma
tions and a dataset that included data derived from EHRs exacerbations have been inconsistent, with evidence to
and a variety of public sources of environmental expo- support [14] and negate [12] a relationship.
sures data. We applied PC analysis, a constraint-based One factor that we expected to find in our model as
causal learning algorithm, on the dataset and identified causally related to asthma attacks, but did not, is expo-
prednisone, race, and obesity as significant predictors of sure to airborne particulate matter. Exposure to airborne
annual ED or inpatient visits for respiratory issues, fol- particulate matter is a well-established trigger for asthma
lowed by residential distance from a major roadway/ attacks [4, 12, 14, 15, 20]. The failure to detect a causal
highway, airborne particulate exposure, and sex. Of relationship between exposure to airborne particulate
those, prednisone and obesity were found to be caus- matter and asthma attacks likely reflects the imbalance
ally related to annual ED or inpatient visits in our causal in the distribution of patients across bins. Indeed, we
inference model, and sex and race were found to be indi- are actively refining both our exposure models and our
rectly related to annual ED or inpatient visits via a causal binning strategy. For instance, instead of using a Python
relationship to obesity. On a smaller dataset, comprising algorithm to bin the airborne pollutant exposures, we are
only the four most important features, as determined by considering a binning strategy based on subject matter
random forest analysis, we identified residential distance expertise.

Fig. 4 Effect of intervention on a Obesity, b Prednisone and c Sex: change in the probability distribution of TotalEDInpatientVisits before (red)
and after (blue) intervention
Sinha et al. BMC Medical Informatics and Decision Making (2025) 25:27 Page 7 of 7

Acknowledgements data: an application to cell signaling network inference. PLoS ONE.


The authors wish to acknowledge Stanley C. Ahalt, Director of the Renaissance 2021;16(2):e0245776.
Computing Institute, for support and advice on the work described herein; 9. Spirtes P, Glymour C, Scheines R. Causation, prediction, and search. Adap-
David B. Peden for his expertise on the asthma use case; Emily R. Pfaff and tive computation and machine learning. Cambridge: MIT Press; 2000.
James Champion for their help with the patient data; and Sarav Arunachalam, 10. Druzdzel MJ. The role of assumptions in causal discovery. 2009.
Stephen A. Appold, Alejandro Valencia Arias, and Lisa Stillwell for their help 11. Pearl J. "Graphical models for probabilistic and causal reasoning." Quanti-
with the environmental exposures data. The authors also thank Ms. Marie Rape fied representation of uncertainty and imprecision. 1998:367–89.
of the Regulatory Service at the UNC Chapel Hill NC Translational and Clinical 12. Fecho K, Haaland P, Krishnamurthy A, Lan B, Ramsey SA, Schmitt PL, et al.
Sciences Institute for regulatory guidance (CTSA - UM1TR004406) An approach for open multivariate analysis of integrated clinical and
environmental exposures data. Inform Med Unlocked. 2021;26:100733.
Authors’ contributions 13. Alangari AA. Corticosteroids in the treatment of acute asthma. Ann
MS did the major writing and the experimentation behind this research. BL, Thorac Med. 2014;9(4):187.
PLS, PS and HX helped in the data generation and data analysis behind this 14. Fecho K, Ahalt SC, Appold S, Arunachalam S, Pfaff E, Stillwell L, et al.
research. PH and KF were our advisors and guided us throughout the research Development and Application of an Open Tool for Sharing and Analyzing
with their valuable feedabck and mentoring. AK and SR helped us the final Integrated Clinical and Environmental Exposures Data: Asthma Use Case.
reviewing. KF did the major revision and point by point response of the JMIR Formative Res. 2022;6(4):e32357.
reviewers. 15. Lan B, Haaland P, Krishnamurthy A, Peden DB, Schmitt PL, Sharma P, et al.
Open Application of Statistical and Machine Learning Models to Explore
Funding the Impact of Environmental Exposures on Health and Disease: An
This project was funded with awards from the National Center for Advanc- Asthma Use Case. Int J Environ Res Public Health. 2021;18(21):11398.
ing Translational Sciences, National Institutes of Health [OT3TR002020, 16. Greenblatt RE, Zhao EJ, Henrickson SE, Apter AJ, Hubbard RA, Himes BE.
OT2TR003430, UL1TR002489, UL1TR002489-03S4, OT2TR003428]. Factors associated with exacerbations among adults with asthma accord-
ing to electronic health record data. Asthma Res Pract. 2019;5(1):1–11.
Data availability 17. Keet CA, McCormack MC, Pollack CE, Peng RD, McGowan E, Matsui EC.
No datasets were generated or analysed during the current study. Neighborhood poverty, urban residence, race/ethnicity, and asthma:
rethinking the inner-city asthma epidemic. J Allergy Clin Immunol.
2015;135(3):655–62.
Declarations 18. Perez L, Lurmann F, Wilson J, Pastor M, Brandt SJ, Künzli N, et al. Near-
roadway pollution and childhood asthma: implications for developing
Ethics approval and consent to participate “win-win’’ compact urban development and clean vehicle strategies.
A waiver of informed consent for research [45 CFR 46.116(d)] and a waiver of Environ Health Perspect. 2012;120(11):1619–26.
HIPAA authorization [45 CFR 164.512(i)(2)(ii)] were granted by the Institutional 19. Schurman SH, Bravo MA, Innes CL, Jackson WB, McGrath JA, Miranda ML,
Review Board at the University of North Carolina at Chapel Hill (protocol et al. Toll-like receptor 4 pathway polymorphisms interact with pollution
16-2978). to influence asthma diagnosis and severity. Sci Rep. 2018;8(1):1–11.
20. Mirabelli MC, Vaidyanathan A, Flanders WD, Qin X, Garbe P. Outdoor
Consent for publication PM2.5, ambient air temperature, and asthma symptoms in the past
Not applicable. 14 days among adults with active asthma. Environ Health Perspect.
2016;124(12):1882–90.
Competing interests
The authors declare no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
Received: 27 November 2023 Accepted: 1 January 2025 lished maps and institutional affiliations.

References
1. Pearl J. Causality: models, reasoning, and inference. Econ Theory.
2003;19(675–685):46.
2. Rizzi DA. Causal reasoning and the diagnostic process. Theor Med.
1994;15(3):315–33.
3. Xu H, Cox S, Stillwell L, Pfaff E, Champion J, Ahalt SC, et al. FHIR PIT: an
open software application for spatiotemporal integration of clinical data
and environmental exposures data. BMC Med Inform Decis Making.
2020;20(1):1–8.
4. Fecho K, Pfaff E, Xu H, Champion J, Cox S, Stillwell L, et al. A novel
approach for exposing and sharing clinical data: the Translator Integrated
Clinical and Environmental Exposures Service. J Am Med Inform Assoc.
2019;26(10):1064–73.
5. Fecho K, Ahalt SC, Arunachalam S, Champion J, Chute CG, Davis S,
et al. Sex, obesity, diabetes, and exposure to particulate matter among
patients with severe asthma: Scientific insights from a comparative analy-
sis of open clinical data sources during a five-day hackathon. J Biomed
Inform. 2019;100:103325.
6. Kuhn M. Building predictive models in R using the caret package. J Stat
Softw. 2008;28(1):1–26.
7. Sinha M, Tadepalli P, Ramsey SA. Pooling vs Voting: An Empirical Study of
Learning Causal Structures. 2019.
8. Sinha M, Tadepalli P, Ramsey SA. Voting-based integration algorithm
improves causal network learning from interventional and observational

You might also like