SlideShare a Scribd company logo
Data Harmonization for a Molecularly
Driven Health System
Warren A. Kibbe, Ph.D.
Professor, Biostats & Bioinformatics
Chief Data Officer, Duke Cancer Institute
warren.kibbe@duke.edu
@wakibbe
#DataSharing
#LearningHealthSystem
#DataHarmonization
Sections
• Learning Health Systems
• Data Commons
• Data Harmonization
The World is Changing
• Pace of Commercialization
• Reach of Markets
• Role of Data
• Change in Healthcare
• Change in Computing
• Societal Changes
Is the US able to keep up?
R&D By Country
US R&D Funding as share of GDP
R&D spending / STEM
How do we continue to innovate?
Data Science
Twitter impacts science
Eric Topol
Changes in Computing
• Converged devices
• Converged IT
• Ubiquity of devices, data, mHealth
2017200220072012
10/23/2001
(~5yrsold)
1/9/2007
(~10yrsold)
iPod(10GBmax)
iPhone(EDGE,16GBmax)
9/16/1999
(~3yrsold)
802.11bWiFi
4/3/2010
(~13yrsold)
iPad(EDGE,64GBmax)
4/23/2005
(~8yrsold)
9/26/2006
(~9yrsold)
7/15/2006
2/7/2007
Google
Drive
4/24/2012
(~15yrsold)7/11/2008
(~11yrsold)
iPhone3G
(16GBmax)
9/12/2012
(~15yrsold)
iPhone5(LTE,128GBmax)
Google
Baseline
3/9/2015
(~18yrsold)
Apple
ResearchKit
HTCVRHeadset
4/5/2016
(~19yrsold)
7/14/2014
(~17yrsold)
NextGen
Courtesy of Jerry Lee, NCI
Changes in Technology
Pace of Technology Adoption
Changes in Commercialization
Changes in Oncology
• Cancer is a grand challenge
• Anatomic vs molecular classification
• Health vs Disease
Understanding Cancer
• Precision medicine will lead to fundamental
understanding of the complex interplay between
genetics, epigenetics, nutrition, environment and
clinical presentation and direct effective,
evidence-based prevention and treatment.
Ramifications across many aspects of health care
IOM
(Now NAM)
Report
2006-11
NAM
Workshops
“Science, informatics, incentives, and
culture are aligned for continuous
improvement and innovation, with best
practices seamlessly embedded in the
delivery process and new knowledge
captured as an integral by-product of the
delivery experience.”
—Institute of Medicine
LEARNING HEALTH SYSTEMS
Another imperative is that such systems
do their work:
• Transparently (how does one learn
without well documented processes?)
• Reproducibly (good practices must
always be repeatable at scale and
scientifically reproducible)
• Only with the above can the science in
“data science” be done with sufficient
rigor
LEARNING HEALTH SYSTEMS
ASSEMBLE
ANALYZE
INTERPRET
FEEDBACK
CHANGE
LEARNING HEALTH SYSTEMS
Learning Health Systems in NEJM
Goals
• Contain rising cost of healthcare
• Maximize the value of care
• Increase public discourse and
marketplace for healthcare
Drivers
• Decision Making is too complex
• Clinical decisions are based on
practice, not evidence
• Inefficiency and waste in healthcare
Human cognitive capacity is constant
Lack of Evidence
EHRs and the Learning Health System
LHS definition
Problems for LHS to solve
Inefficient Healthcare
Poor Health in spite of high expenditures
Curve hasn’t improved
2015
View from 2006
EHRs are now ubiquitous
But evidence-driven decision support
remains a future vision
Hope
Cloud computing, data commons,
service-based computing provide some
powerful tools for solving data access,
data analysis, data analytics, and data
visualization problems at scale,
securely.
Sebastian Thrun
So what is a Data Commons
Commons Topology
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
Commons Compliance
• Treat products of research – data,
methods, papers etc. as digital
objects
• These digital objects exist in a
shared virtual space
• Digital object compliance through
FAIR principles:
– Findable
– Accessible (and usable)
– Interoperable
– Reusable
Data Sharing and the FAIR Principles
FAIR –
Making data
Findable,
Accessible,
Attributable,
Interoperable,
Reusable,
and provide Recognition
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
“The Commons is an effort at
creating a sharing economy and
for building community. We hope
for a more cost effective and
productive research
environment while bringing
people together in a unique
way.“
Phil Bourne
44
Blue Ribbon Panel Report
Cancer Moonshot℠ Blue Ribbon Panel
“The Cancer Moonshot Task Force was
directed to consult with external experts
from relevant scientific sectors, including
the presidentially appointed National
Cancer Advisory Board(NCAB).
A Blue Ribbon Panel of scientific experts
was created to advise the NCAB.”
Vision:
Enable the creation of a Learning Healthcare System for
Cancer, where as a nation we learn from the contributed
knowledge and experience of every cancer patient. As
part of the Cancer Moonshot, we want to unleash the
power of data to enhance, improve, and inform the journey
of every cancer patient from the point of diagnosis
through survivorship.
A National Cancer Data Ecosystem
Cancer Research
Data Commons
SBG CGC
Broad FireCloud ISB CGC
Courtesy NCI-CBIIT
Data Commons Framework – What Is It?
47
Modular Components
Secure user authentication and authorization
Metadata validation and tools
Domain-specific, extensible data models and dictionaries
API and container environment for tools and pipelines
Access to computational workspaces for storing data, tools, and
results
Reusable, expandable
framework for a Data
Commons
Core principles and
structures for a Data
Commons
Set of modular
components that can be
leveraged across Data
Commons
Narrow Middle Architecture (End-to-End Design)
1. AuthN / AuthZ
2. Metadata validation
3. Extensible data model
4. APIs for containers, workflows & tools
5. Workspaces
science outdata in
Courtesy Bob
Grossman, U. Chicago
49
NCI Cancer Research Data Commons (CRDC) - Concept
NCI Scope: “Create a data
science infrastructure necessary
to connect repositories, analytical
tools, and knowledge bases”
Data commons co-locate data,
storage and computing
infrastructure with commonly
used services, tools & apps for
analyzing and sharing data to
create an interoperable resource
for the research community.*
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A
Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science
and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons
infrastructure at the University of Chicago Kenwood Data Center.
50
Data Commons Framework
Clinical Proteomics ImagingGenomics Immuno-
oncology
Animal Models Cancer Biomarkers
NCI Cancer Research
Data Commons
SBG CGC
Broad FireCloud ISB CGC
Elastic
Compute
Query
Visualization
Clinical Proteomics Tumor
Analysis Consortium*
Tool
Deployment
The Cancer Imaging Archive*
TCIA
Web
Interface
APIs Data
Submission
Authentication
& Authorization
Authentication
& Authorization
Data Models &
Dictionaries
Computational
Workspaces
Data Contributors and Consumers
Tool
Repositories
Metadata
Validation
& Tools
Analysis
Courtesy NCI-CBIIT
Gen3 Data Commons
Gen3 Data Commons
Gen3 Data Commons
NCI Genomic Data Commons
NCI Genomic Data Commons
NCI Genomic Data Commons
Data Harmonization
• The process of semantic and
syntactic mapping of data to a set of
definitions, predefined data
elements, data model.
• Validation and Harmonization of
primary and secondary data is crucial
to enable analysis and reuse
Spanning the Semantic Chasm of Despair
Building a Translational Bridge
CD2H
Thanks to Melissa Haendel
Project Highlight: Harmonizing clinical data models
Sentinel
I2b2/ACT
OMOP
PCORNET
▪ Different countries use different “outlets”.
▪ There is a need for travel adapters.
The Solution:
▪ Use a converter between various adapters.
▪ Allow researchers to ask a question once and
receive results from many different sources
Project Highlight: LOINC2HPO
◆ Develop a software tool to map
LOINC codes to HPO terms
◆ Develop software to convert
EHR observations into HPO
terms for use in clinical
research
Steps
Develop a tool for converting LOINC laboratory codes and values into more
phenotypically meaningful language (Human Phenotype Ontology) to allow for
translational interoperability and new analytics
2657-5 “Nitrite [Mass/volume] in Urine” Numeric
20407-3 “Nitrite [Mass/volume] in Urine by Test
strip”
Numeric
32710-6 “Nitrite [Presence] in Urine” Positive/Negati
ve
5802-4 “Nitrite [Presence] in Urine by Test strip” Positive/Negati
ve
50558-6 “Nitrite [Presence] in Urine by
Automated test strip
Positive/Negati
ve
LOINC Outcome
HPO: Nitrituria
INSERT CDE Browser Screenshot?
CIBMTR Center for Cancer
Research
Over 35 NCI Programs, Plus
Cancer Centers and Consortia
GDC
Data Sharing Index
• We need metrics for data, software,
algorithm use, usability, conformance
• Data sharing stimulates science,
innovation, commercialization
• Providing recognition and attribution
to data providers and software &
algorithm builders is critical for a
robust data sharing ecosystem
• Support and measure FAIRness!
Questions?
Warren Kibbe, Ph.D.
warren.kibbe@duke.edu
@wakibbe

More Related Content

What's hot (20)

PDF
Some Frameworks for Improving Analytic Operations at Your Company
Robert Grossman
 
PPTX
Big Data as a Catalyst for Collaboration & Innovation
Philip Bourne
 
PDF
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 
PPT
The Vision for Data @ the NIH
Philip Bourne
 
PDF
Darwin ai covid-net mitre
ianmitch
 
PDF
What is Data Commons and How Can Your Organization Build One?
Robert Grossman
 
PPTX
SWOT Analysis - What Does it Tell Us?
Philip Bourne
 
PPTX
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET
 
PDF
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Robert Grossman
 
PDF
A Data Biosphere for Biomedical Research
Robert Grossman
 
PPTX
Introduction to Big Data and its Potential for Dementia Research
David De Roure
 
PPTX
The Future of FAIR Data: An international social, legal and technological inf...
Michel Dumontier
 
PPTX
Bioinformatics in the Era of Open Science and Big Data
Philip Bourne
 
PPTX
Understanding the Big Data Enterprise
Philip Bourne
 
PDF
Trust threads: Provenance for Data Reuse in Long Tail Science
Beth Plale
 
PDF
Data Virtualization Modernizes Biobanking
Denodo
 
PDF
A Gen3 Perspective of Disparate Data
Robert Grossman
 
PDF
Hadoop and Data Virtualization - A Case Study by VHA
Denodo
 
PDF
Trust threads : Active Curation and Publishing in SEAD
Beth Plale
 
PPT
Meeting the Computational Challenges Associated with Human Health
Philip Bourne
 
Some Frameworks for Improving Analytic Operations at Your Company
Robert Grossman
 
Big Data as a Catalyst for Collaboration & Innovation
Philip Bourne
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 
The Vision for Data @ the NIH
Philip Bourne
 
Darwin ai covid-net mitre
ianmitch
 
What is Data Commons and How Can Your Organization Build One?
Robert Grossman
 
SWOT Analysis - What Does it Tell Us?
Philip Bourne
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Robert Grossman
 
A Data Biosphere for Biomedical Research
Robert Grossman
 
Introduction to Big Data and its Potential for Dementia Research
David De Roure
 
The Future of FAIR Data: An international social, legal and technological inf...
Michel Dumontier
 
Bioinformatics in the Era of Open Science and Big Data
Philip Bourne
 
Understanding the Big Data Enterprise
Philip Bourne
 
Trust threads: Provenance for Data Reuse in Long Tail Science
Beth Plale
 
Data Virtualization Modernizes Biobanking
Denodo
 
A Gen3 Perspective of Disparate Data
Robert Grossman
 
Hadoop and Data Virtualization - A Case Study by VHA
Denodo
 
Trust threads : Active Curation and Publishing in SEAD
Beth Plale
 
Meeting the Computational Challenges Associated with Human Health
Philip Bourne
 

Similar to Data Harmonization for a Molecularly Driven Health System (20)

PPTX
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
Warren Kibbe
 
PPTX
Role of data in precision oncology
Warren Kibbe
 
PPTX
Big data sharing
Warren Kibbe
 
PPTX
A Vision for a Cancer Research Knowledge System
Warren Kibbe
 
PPTX
Imaging dearry ncrdc 11062017
imgcommcall
 
PPTX
ICBO 2014, October 8, 2014
Warren Kibbe
 
PPTX
Federal Research & Development for the Florida system Sept 2014
Warren Kibbe
 
PDF
Imaging Data Commons (IDC) - Introduction and intital approach
imgcommcall
 
PPTX
Cancer Research Data Ecosystem - Dr. Warren Kibbe
imgcommcall
 
PPTX
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
Warren Kibbe
 
PPTX
Data Commons & Data Science Workshop
Warren Kibbe
 
PPTX
National Cancer Data Ecosystem and Data Sharing
Warren Kibbe
 
PDF
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
Kees van Bochove
 
PPT
Opportunities and Challenges for International Cooperation Around Big Data
Philip Bourne
 
PPTX
Opportunities for computing in cancer research
Warren Kibbe
 
PPTX
Fore FAIR ISMB 2019
Ian Fore
 
PPTX
NCI Cancer Genomics, Open Science and PMI: FAIR
Warren Kibbe
 
PPT
Data at the NIH
Philip Bourne
 
PPTX
Converged IT Summit - NCI Data Sharing
Warren Kibbe
 
PPT
Data Science at NIH and its Relationship to Social Computing, Behavioral-Cult...
Philip Bourne
 
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
Warren Kibbe
 
Role of data in precision oncology
Warren Kibbe
 
Big data sharing
Warren Kibbe
 
A Vision for a Cancer Research Knowledge System
Warren Kibbe
 
Imaging dearry ncrdc 11062017
imgcommcall
 
ICBO 2014, October 8, 2014
Warren Kibbe
 
Federal Research & Development for the Florida system Sept 2014
Warren Kibbe
 
Imaging Data Commons (IDC) - Introduction and intital approach
imgcommcall
 
Cancer Research Data Ecosystem - Dr. Warren Kibbe
imgcommcall
 
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
Warren Kibbe
 
Data Commons & Data Science Workshop
Warren Kibbe
 
National Cancer Data Ecosystem and Data Sharing
Warren Kibbe
 
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
Kees van Bochove
 
Opportunities and Challenges for International Cooperation Around Big Data
Philip Bourne
 
Opportunities for computing in cancer research
Warren Kibbe
 
Fore FAIR ISMB 2019
Ian Fore
 
NCI Cancer Genomics, Open Science and PMI: FAIR
Warren Kibbe
 
Data at the NIH
Philip Bourne
 
Converged IT Summit - NCI Data Sharing
Warren Kibbe
 
Data Science at NIH and its Relationship to Social Computing, Behavioral-Cult...
Philip Bourne
 
Ad

More from Warren Kibbe (20)

PPTX
CCDI Kibbe Wake Forest University Dec 2023.pptx
Warren Kibbe
 
PPTX
Big Data Training for Cancer Research, Purdue, May 2023
Warren Kibbe
 
PPTX
CCDI Overview November 2022
Warren Kibbe
 
PPTX
RADx-UP CDCC Overview November 2022
Warren Kibbe
 
PPTX
CCDI Kibbe Big Data Training May 2022
Warren Kibbe
 
PPTX
Real world data, the National COVID-19 Cohort Consortium, and Oncology 2021
Warren Kibbe
 
PPTX
Childhood Cancer Data Initiative presentation to the Children’s Brain Tumor N...
Warren Kibbe
 
PPTX
RADx-UP CDCC presentation for the NIH Disaster Interest Group
Warren Kibbe
 
PPTX
DCHI webinar on N3C January 2021
Warren Kibbe
 
PPTX
NCATS CTSA N3C
Warren Kibbe
 
PPTX
NAACCR June 2020
Warren Kibbe
 
PPTX
NCI HTAN, cancer trajectories, precision oncology
Warren Kibbe
 
PPTX
ENAR 2020
Warren Kibbe
 
PPTX
ENAR 2020
Warren Kibbe
 
PPTX
Technology and connected health for population science kibbe duke jan 2020
Warren Kibbe
 
PPTX
Super computing 19 Cancer Computing Workshop Keynote
Warren Kibbe
 
PPTX
Data sharing Webinar March 2019
Warren Kibbe
 
PPTX
Data in precision oncology SAMSI Precision Medicine Meeting mar 2019
Warren Kibbe
 
PPTX
Opportunities in technology and connected health for population science
Warren Kibbe
 
PPTX
HPC, Machine Learning, and Big Data
Warren Kibbe
 
CCDI Kibbe Wake Forest University Dec 2023.pptx
Warren Kibbe
 
Big Data Training for Cancer Research, Purdue, May 2023
Warren Kibbe
 
CCDI Overview November 2022
Warren Kibbe
 
RADx-UP CDCC Overview November 2022
Warren Kibbe
 
CCDI Kibbe Big Data Training May 2022
Warren Kibbe
 
Real world data, the National COVID-19 Cohort Consortium, and Oncology 2021
Warren Kibbe
 
Childhood Cancer Data Initiative presentation to the Children’s Brain Tumor N...
Warren Kibbe
 
RADx-UP CDCC presentation for the NIH Disaster Interest Group
Warren Kibbe
 
DCHI webinar on N3C January 2021
Warren Kibbe
 
NCATS CTSA N3C
Warren Kibbe
 
NAACCR June 2020
Warren Kibbe
 
NCI HTAN, cancer trajectories, precision oncology
Warren Kibbe
 
ENAR 2020
Warren Kibbe
 
ENAR 2020
Warren Kibbe
 
Technology and connected health for population science kibbe duke jan 2020
Warren Kibbe
 
Super computing 19 Cancer Computing Workshop Keynote
Warren Kibbe
 
Data sharing Webinar March 2019
Warren Kibbe
 
Data in precision oncology SAMSI Precision Medicine Meeting mar 2019
Warren Kibbe
 
Opportunities in technology and connected health for population science
Warren Kibbe
 
HPC, Machine Learning, and Big Data
Warren Kibbe
 
Ad

Recently uploaded (20)

PPTX
Tuberculosis agents and category uses.pptx
nishantkumar95570325
 
PDF
clinical corelations of Muscle_Dystrophies & Myopathies
MedicoseAcademics
 
PPTX
single nucleotide assosiation mdd.pptx
Rdxrock
 
PDF
1140718-椎間盤源性疼痛—病理機轉、診斷與治療-社團法人高雄市醫師公會.pdf
Ks doctor
 
PPTX
11. biomechanics of human upper extrimity.pptx
Bolan University of Medical and Health Sciences ,Quetta
 
PPTX
Cosmetics and cosmeceuticals : sunscreen and sunprotection.pptx
SahilKasture2
 
PDF
Skeletal Muscle strcuture_Physiological properties
MedicoseAcademics
 
PPTX
BENIGN ANORECTAL CONDITIONS by Kesheni L
KesheniLemi
 
PPT
AD-SAFE: An Initiative to Build Understanding of ARIA and Skills Needed to Im...
PVI, PeerView Institute for Medical Education
 
PPTX
NASOPHARYNGEAL CARCINOMA by Bandari Bharadwaj
Samanvitha Reddy
 
PDF
eye evaluation Vitrectomy - Anterior segment.pdf
nusratshaki390
 
PPTX
7 .Nucleic Acid (DNA and RNA) and Hybridization .pptx
Bolan University of Medical and Health Sciences ,Quetta
 
PPTX
Unexplained Infertility: Newer Understanding
Sujoy Dasgupta
 
PPTX
ETHICS AND BIO ETHICS.pptx FOR NURSING STUDENTS
SHILPA HOTAKAR
 
PPTX
Microscopy and different techniques of handling of microorganism.pptx
Raju Yadav
 
PPTX
Cryoneurolysis for Chronic Migraine: An important Interventional Pain Procedure
Daradia: The Pain Clinic
 
PPTX
Surgical management of colorectal cancer.pptx
Oladele Situ
 
PDF
Alexander Neumeister_ A Journey of Science, Leadership, and Resilience.pdf
Sentosh It LTD
 
PPTX
First 1000 days of nutrition for children. importance of breast feeding . by ...
RekhaR88
 
PDF
ICF around the World - Keynote presentation
Olaf Kraus de Camargo
 
Tuberculosis agents and category uses.pptx
nishantkumar95570325
 
clinical corelations of Muscle_Dystrophies & Myopathies
MedicoseAcademics
 
single nucleotide assosiation mdd.pptx
Rdxrock
 
1140718-椎間盤源性疼痛—病理機轉、診斷與治療-社團法人高雄市醫師公會.pdf
Ks doctor
 
11. biomechanics of human upper extrimity.pptx
Bolan University of Medical and Health Sciences ,Quetta
 
Cosmetics and cosmeceuticals : sunscreen and sunprotection.pptx
SahilKasture2
 
Skeletal Muscle strcuture_Physiological properties
MedicoseAcademics
 
BENIGN ANORECTAL CONDITIONS by Kesheni L
KesheniLemi
 
AD-SAFE: An Initiative to Build Understanding of ARIA and Skills Needed to Im...
PVI, PeerView Institute for Medical Education
 
NASOPHARYNGEAL CARCINOMA by Bandari Bharadwaj
Samanvitha Reddy
 
eye evaluation Vitrectomy - Anterior segment.pdf
nusratshaki390
 
7 .Nucleic Acid (DNA and RNA) and Hybridization .pptx
Bolan University of Medical and Health Sciences ,Quetta
 
Unexplained Infertility: Newer Understanding
Sujoy Dasgupta
 
ETHICS AND BIO ETHICS.pptx FOR NURSING STUDENTS
SHILPA HOTAKAR
 
Microscopy and different techniques of handling of microorganism.pptx
Raju Yadav
 
Cryoneurolysis for Chronic Migraine: An important Interventional Pain Procedure
Daradia: The Pain Clinic
 
Surgical management of colorectal cancer.pptx
Oladele Situ
 
Alexander Neumeister_ A Journey of Science, Leadership, and Resilience.pdf
Sentosh It LTD
 
First 1000 days of nutrition for children. importance of breast feeding . by ...
RekhaR88
 
ICF around the World - Keynote presentation
Olaf Kraus de Camargo
 

Data Harmonization for a Molecularly Driven Health System

  • 1. Data Harmonization for a Molecularly Driven Health System Warren A. Kibbe, Ph.D. Professor, Biostats & Bioinformatics Chief Data Officer, Duke Cancer Institute [email protected] @wakibbe #DataSharing #LearningHealthSystem #DataHarmonization
  • 2. Sections • Learning Health Systems • Data Commons • Data Harmonization
  • 3. The World is Changing • Pace of Commercialization • Reach of Markets • Role of Data • Change in Healthcare • Change in Computing • Societal Changes
  • 4. Is the US able to keep up?
  • 6. US R&D Funding as share of GDP
  • 8. How do we continue to innovate?
  • 12. Changes in Computing • Converged devices • Converged IT • Ubiquity of devices, data, mHealth
  • 14. Pace of Technology Adoption
  • 16. Changes in Oncology • Cancer is a grand challenge • Anatomic vs molecular classification • Health vs Disease
  • 17. Understanding Cancer • Precision medicine will lead to fundamental understanding of the complex interplay between genetics, epigenetics, nutrition, environment and clinical presentation and direct effective, evidence-based prevention and treatment. Ramifications across many aspects of health care
  • 20. “Science, informatics, incentives, and culture are aligned for continuous improvement and innovation, with best practices seamlessly embedded in the delivery process and new knowledge captured as an integral by-product of the delivery experience.” —Institute of Medicine LEARNING HEALTH SYSTEMS
  • 21. Another imperative is that such systems do their work: • Transparently (how does one learn without well documented processes?) • Reproducibly (good practices must always be repeatable at scale and scientifically reproducible) • Only with the above can the science in “data science” be done with sufficient rigor LEARNING HEALTH SYSTEMS
  • 24. Goals • Contain rising cost of healthcare • Maximize the value of care • Increase public discourse and marketplace for healthcare
  • 25. Drivers • Decision Making is too complex • Clinical decisions are based on practice, not evidence • Inefficiency and waste in healthcare
  • 28. EHRs and the Learning Health System
  • 30. Problems for LHS to solve
  • 32. Poor Health in spite of high expenditures
  • 34. 2015
  • 36. EHRs are now ubiquitous But evidence-driven decision support remains a future vision
  • 37. Hope Cloud computing, data commons, service-based computing provide some powerful tools for solving data access, data analysis, data analytics, and data visualization problems at scale, securely.
  • 39. So what is a Data Commons
  • 40. Commons Topology Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface PaaS SaaS IaaS https://datascience.nih.gov/commons
  • 41. Commons Compliance • Treat products of research – data, methods, papers etc. as digital objects • These digital objects exist in a shared virtual space • Digital object compliance through FAIR principles: – Findable – Accessible (and usable) – Interoperable – Reusable
  • 42. Data Sharing and the FAIR Principles FAIR – Making data Findable, Accessible, Attributable, Interoperable, Reusable, and provide Recognition Force11 white paper https://www.force11.org/group/fairgroup/fairprinciples
  • 43. “The Commons is an effort at creating a sharing economy and for building community. We hope for a more cost effective and productive research environment while bringing people together in a unique way.“ Phil Bourne
  • 44. 44 Blue Ribbon Panel Report Cancer Moonshot℠ Blue Ribbon Panel “The Cancer Moonshot Task Force was directed to consult with external experts from relevant scientific sectors, including the presidentially appointed National Cancer Advisory Board(NCAB). A Blue Ribbon Panel of scientific experts was created to advise the NCAB.”
  • 45. Vision: Enable the creation of a Learning Healthcare System for Cancer, where as a nation we learn from the contributed knowledge and experience of every cancer patient. As part of the Cancer Moonshot, we want to unleash the power of data to enhance, improve, and inform the journey of every cancer patient from the point of diagnosis through survivorship.
  • 46. A National Cancer Data Ecosystem Cancer Research Data Commons SBG CGC Broad FireCloud ISB CGC Courtesy NCI-CBIIT
  • 47. Data Commons Framework – What Is It? 47 Modular Components Secure user authentication and authorization Metadata validation and tools Domain-specific, extensible data models and dictionaries API and container environment for tools and pipelines Access to computational workspaces for storing data, tools, and results Reusable, expandable framework for a Data Commons Core principles and structures for a Data Commons Set of modular components that can be leveraged across Data Commons
  • 48. Narrow Middle Architecture (End-to-End Design) 1. AuthN / AuthZ 2. Metadata validation 3. Extensible data model 4. APIs for containers, workflows & tools 5. Workspaces science outdata in Courtesy Bob Grossman, U. Chicago
  • 49. 49 NCI Cancer Research Data Commons (CRDC) - Concept NCI Scope: “Create a data science infrastructure necessary to connect repositories, analytical tools, and knowledge bases” Data commons co-locate data, storage and computing infrastructure with commonly used services, tools & apps for analyzing and sharing data to create an interoperable resource for the research community.* *Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at the University of Chicago Kenwood Data Center.
  • 50. 50 Data Commons Framework Clinical Proteomics ImagingGenomics Immuno- oncology Animal Models Cancer Biomarkers NCI Cancer Research Data Commons SBG CGC Broad FireCloud ISB CGC Elastic Compute Query Visualization Clinical Proteomics Tumor Analysis Consortium* Tool Deployment The Cancer Imaging Archive* TCIA Web Interface APIs Data Submission Authentication & Authorization Authentication & Authorization Data Models & Dictionaries Computational Workspaces Data Contributors and Consumers Tool Repositories Metadata Validation & Tools Analysis Courtesy NCI-CBIIT
  • 54. NCI Genomic Data Commons
  • 55. NCI Genomic Data Commons
  • 56. NCI Genomic Data Commons
  • 57. Data Harmonization • The process of semantic and syntactic mapping of data to a set of definitions, predefined data elements, data model. • Validation and Harmonization of primary and secondary data is crucial to enable analysis and reuse
  • 58. Spanning the Semantic Chasm of Despair Building a Translational Bridge CD2H Thanks to Melissa Haendel
  • 59. Project Highlight: Harmonizing clinical data models Sentinel I2b2/ACT OMOP PCORNET ▪ Different countries use different “outlets”. ▪ There is a need for travel adapters. The Solution: ▪ Use a converter between various adapters. ▪ Allow researchers to ask a question once and receive results from many different sources
  • 60. Project Highlight: LOINC2HPO ◆ Develop a software tool to map LOINC codes to HPO terms ◆ Develop software to convert EHR observations into HPO terms for use in clinical research Steps Develop a tool for converting LOINC laboratory codes and values into more phenotypically meaningful language (Human Phenotype Ontology) to allow for translational interoperability and new analytics 2657-5 “Nitrite [Mass/volume] in Urine” Numeric 20407-3 “Nitrite [Mass/volume] in Urine by Test strip” Numeric 32710-6 “Nitrite [Presence] in Urine” Positive/Negati ve 5802-4 “Nitrite [Presence] in Urine by Test strip” Positive/Negati ve 50558-6 “Nitrite [Presence] in Urine by Automated test strip Positive/Negati ve LOINC Outcome HPO: Nitrituria
  • 61. INSERT CDE Browser Screenshot? CIBMTR Center for Cancer Research Over 35 NCI Programs, Plus Cancer Centers and Consortia GDC
  • 62. Data Sharing Index • We need metrics for data, software, algorithm use, usability, conformance • Data sharing stimulates science, innovation, commercialization • Providing recognition and attribution to data providers and software & algorithm builders is critical for a robust data sharing ecosystem • Support and measure FAIRness!