SlideShare a Scribd company logo
Integration and analysis of heterogeneous big data for precision
medicine and suggested treatments for different types of patients.
SC1-PM-18-2016: Big Data supporting Public Health policies
Big Data Analytics
Maria-Esther Vidal
Leibniz Information Centre For
Science and Technology
University Library (TIB), Germany
06.11.2017
BigDataEurope Workshop on Big Data in Climate Action,
Environment, Resource Efficiency and Raw Materials
1
http://project-iasis.eu
@Project_IASIS
Big Data Analytics and Drug Side Effects
2
Big Data Analytics study has found an association
between the use of proton-pump inhibitors and the
likelihood of incurring a heart attack
[Shah, SH. Clopidogrel Dosing and CYP2C19. Medscape. July 1, 2011. Medscape web site.]
Heart Attack
Electronic Health Records (EHRs) of nearly 3 million
people and trillions of pieces of medical data
Big Data
Big Data Analytics and Disease Predisposition
Researchers analyzed more than 7,700 brain
images from 1,171 people in various stages of
Alzheimer's progression using a variety of
techniques including magnetic resonance imaging
(MRI) and positron emission tomography (PET).
Alzheimer’s Disease Evolution
Big Data Analytics study has found that changes in
blood flow are the earliest known warning sign of
Alzheimer's.
Big Data
https://www.sciencedaily.com/releases/2016/07/160712130229.htm
3
Big Data: Medical Image
Large volumes of data produced by Imaging Techniques
Volume of medical images is growing exponentially
Annotated data or structured methods to annotate
medical images is challenging
Big Data Analytical Methods allow for the interpretability
of depicted contents
Scalable Methods for collecting, compressing, sharing,
and anonymizing medical data
Medical Image Processing from Big Data Point of View
4
Electronic Health Records (EHR) from Big Data Point of View
Big Data: Clinical Data
Large volumes of Clinical Data that needs to be
stored, retrieved, and aggregated
Scalable Methods for collecting, compressing,
sharing, and anonymizing medical data
Scalable Methods for signal processing and for
developing Big Data based clinical decision support
systems (CDSSs)
5
Genomics from Big Data Point of View
Big Data: Genomics Data
Human Genome consists of 30,000 to 35,000 genes
Scalable Methods for pathway analysis and for the discover
associations between observed gene expression changes and
predicted functional effects
Scalable Methods for Reconstruction of Metabolic and
Regulatory Networks
6
7
Challenges of Big Data Management and Analytics
8
Challenges of Big Data Management and Analytics
9
Big Data Pipeline
Agenda
1. iASiS: Big Data to Support Precision Medicine and Public
Health Policy
2. The iASiS Architecture
3. Big Data Management and Analytics in iASiS
10
Drug Ineffectiveness
11
[Source: Brian B. Spear, Margo Heath-Chiozzi, Jeffrey Huff, “Clinical Trends in Molecular Medicine,” Volume 7, Issue 5, 1 May 2001, pages 201-204.]
Drug Effectiveness
12
https://healthy.kaiserpermanente.org/static/drugency/images/ACI02770.JPG
Methotrexate is a chemotherapy agent
and immune system suppressant
Drugs Approved for Small Cell Lung Cancer
https://www.cancer.gov/about-cancer/treatment/drugs/lung
The right drug and dosage are
selected based on a patient genome
Drug Side Effects
13
https://www.topcanadianpharmacy.org/wp-content/uploads/2016/09/Plavix-package.jpg
Drug designed to prevent blood clots
The second-best-selling drug in the world
Different impact on protecting stent patients
from thrombosis depending on patient genetic
variance within CYP2C19
CYP2C19 encodes an enzyme that converts the
drug from an inactive to an active state.
[Shah, SH. Clopidogrel Dosing and CYP2C19. Medscape. July 1, 2011. Medscape web site.]
The right drug and dosage are
selected based on a patient
genome
Precision Medicine
14
Medical model that proposes the
customization of healthcare, with medical
decisions, practices, or products being
tailored to the individual patient
Precision Medicine
15
Stratified Medicine
Targeted Therapy
Personalized Medicine
P4 Medicine:
Personalized, Predictive,
Preventive, Participatory
Medical model that proposes the
customization of healthcare, with medical
decisions, practices, or products being
tailored to the individual patient
iASiS: Vision and Objectives
16
iASiS Vision:
Turn clinical, pharmacogenomics, and other Big Data into actionable
knowledge for personalized medicine and health policy-making
iASiS Objectives:
•Integrate automated unstructured and structured data analysis,
image analysis, and sequence analysis into a Big Data framework
•Use the iASiS framework to support personalized diagnosis and
treatment
The iASiS Pipeline
17
The iASiS Pipeline
18
The iASiS Pipeline
19
The iASiS Schema
20
http://vocol.iais.fraunhofer.de/iasis/ Powered by VolCol
The iASiS Pipeline
21
Pilot 1: Lung Cancer
22
Motivation:
• Lung cancer among the most
• common and deadly diseases
• costly cancers
• Lung cancer is a heterogeneous
disease. Characteristics differ among
• patients
• tumor regions
iASiS will enable:
• Discovery of correlations among
tumor spread, prognosis, response to
treatment
• Unraveling molecular mechanisms
that predict response to different
tumor types (signatures)
Big Data in the Lung Cancer Pilot
• Pharmacological knowledge extracted
from publicly available datasets
• Biomedical ontologies and taxonomies
• terminology standardization
• semantically describing the EHRs
• EHRs in Spanish
• PET/CT Images
• Genomic Data/Liquid Biopsy
Samples
Pilot 2: Alzheimer's Disease
Motivation:
• Approximately, 10% of people over
65 suffer from Alzheimer’s
• Heterogeneity of the symptoms
impedes accurate diagnosis and
treatments
iASiS will enable:
• Discovery of patterns associated with
prognosis, outcomes, and response
to treatments
• Association of medical and lifestyle
advice to Alzheimer’s risk and stages
of severity
• EHRs in English
• MRI Brain Images
• Genomic Data
• Pharmacological knowledge extracted
from publicly available datasets
• Biomedical ontologies and taxonomies
• terminology standardization
• semantically describing the EHRs
Big Data in the Alzheimer's Disease Pilot
Clinical Big Data Analytics
Clinical
Notes NLP
Data
Preprocessing
Medical
Images
Deep Learning Predictive
Models
Medical
Vocabularies
Knowledge
Data Mining
Knowledge
Graph
Genomic Big Data Analytics
Identification of
RNAs regulated
by RBP
Comparison
with available
information
Integration with
transcriptomic
data
Hospital-derived
data
Knowledge
Graph
Identification of
key genes and
interactions
Open Big Data Analytics
•Heterogeneous open data
•Semantic indexing of the data via ontologies and thesauri
•Knowledge extraction from the data
• NLP and network analysis technologies
Agenda
1. iASiS: Big Data to Support Precision Medicine and Public
Health Policy
2. The iASiS Architecture
3. Big Data Management and Analytics in iASiS
29
The iASiS Architecture
30
Knowledge Graph
The iASiS Architecture
31
Knowledge Graph
Central
Node
Node@Partner1
The iASiS Architecture
32
Knowledge Graph
Central
Node
Node@Partner1
Node@Partner2
Agenda
1. iASiS: Big Data to Support Precision Medicine and Public
Health Policy
2. The iASiS Architecture
3. Big Data Management and Analytics in iASiS
33
Big Data Management & Analytics
34
Big Data Management & Analytics
35
Big Data Management & Analytics
36
Big Data Management & Analytics
37
Ontario: Big Data Management
38
Knowledge Graph
Ontario: Query Processing
39
Ontario: Evaluation Study
40
Benchmark by Ali Hasnain et. al BioFed: federated query processing over
life sciences linked open data. Journal of Biomedical Semantics 2017.
RDF Dataset Name Number of RDF Triples
Chebi 4,772,706
DrugBank 517,023
Kegg 1,090,830
Affymetrix 44,207,146
Dailymed 162,972
Diseasome 72,445
Sider 101,542
Medicare 44,500
LinkedCT 9,804,652
Linked TCGA-A 35,329,868
Ontario: Evaluation Study
41
Benchmark by Ali Hasnain et. al BioFed: federated query processing over
life sciences linked open data. Journal of Biomedical Semantics 2017.
Category #Triple Patterns #Star-Shaped
Subqueries
#Union #Optional
Min Max Min Max Min Max Min Max
Simple
Queries
3 8 2 4 0 1 0 1
Complex
Queries
6 12 2 5 0 1 0 1
Ontario: Evaluation Study
42
• Ontario is compared with
state-of-the-art federated
query processing tools
(ANAPSID and FedX) and direct
SPARQL endpoint
• Ontario exhibits better
performance than the results
of the engines in terms of
throughput
Ontario: Evaluation Study
43
• Ontario is compared with
state-of-the-art federated
query processing tools
(ANAPSID and FedX) and direct
SPARQL endpoint
• Ontario exhibits better
performance than the results
of the engines in terms of
throughput
Big Data Analytics and Pattern Discovery
44
Pragmatics
ContextSemantics
Drug
Similarity
Target
Similarity
Detecting
Patterns
Predicting
Interactions
Computing
Similarity
Pattern Detection Prediction
Principle
Network of
Drugs, Targets,
and interactions
Patterns of
similar Drugs,
similar Targets,
and their
interactions
Discovered
Drug-Target
Interactions
ChEBI Ontology
Patterns between Drug-Protein Interactions
45
Predicted Interactions between Drugs and Proteins
Patterns between Drug and Side Effects
46
Predicted Interactions between Drugs and Side Effects
Drug-Target Interaction Discovery
47
Nuclear
Receptor GPCR
Ion
Channel
Enzym
e
Drugs 54 223 210 445
Targets 26 95 204 664
Interactions 90 635 1,476 2,926
Avg. Interaction per
Target 3.46 6.68 7.23 4.4
Avg. Interaction per
Drug 1.66 2.84 7.02 6.57
Supervised Machine Learning Approaches
48
Machine Learning for Drug-Target Interaction Discovery:
• BLM: Bipartite Local Method [Cheng et al]
• LapRLS: Laplacian Regularized Least Squares [Xia et al]
• GIP: Gaussian Interaction Profile [Van Laarhoven et al]
• KBMF2K: Kernelized Bayesian Matrix Factorization with
twin Kernels [Gonen]
• NBI: Network-Based Inference [Cheng et al]
Unsupervised Approaches
49
• semEP: Graph Partition into communities from where new
interactions are predicted [Palma, Vidal, and Raschid]
• Metis: Multilevel recursive-bisection [George Karypis and
Vipin Kumar]
• Ncut: Normalized Cuts [Shi and Malik]
Drug-Target Interaction Discovery
50
Top5 Novel interactions are interactions that do not appear in
the dataset and can be validated in STITCH
(http://stitch.embl.de/) and KEGG (http://www.kegg.jp )
Method
Nuclear
Receptor GPCR
Ion
Channel Enzyme
semEP 5 5 4 1
Metis 3 5 2 1
Ncut 3 4 1 1
BLM 2 1 0 0
NBI 1 1 1 2
GIP 4 2 1 3
LapRLS 4 4 2 2
KBMF2K 4 4 4 4
Conclusions and Future Work
51
Biomedicine Big Data The iASiS Pipeline
The iASiS Architecture
Big Data Management and Analytics
Conclusions and Future Work
52
Biomedicine Big Data The iASiS Pipeline
The iASiS Architecture
Big Data Management and Analytics
Next Steps:
● Collect Big Data
● Extract Knowledge from Big Data sources
● Create the iASiS Knowledge Graph
● Enforce Data Access and Privacy Policies
● Big Data Processing and Analytics
The iASiS Partners
53
The iASiS Team
54
Scientific Data Management Group at TIB
55
Guillermo
Palma
Guillermo Betancourt Yerson Roa
Kemele
Endris
Farah
Karim
Bachelor StudentsPhD StudentsPostDoc
Maria-Esther Vidal
Many thanks for your attention!
Questions?
56

More Related Content

What's hot (20)

PDF
Big data analytics in healthcare industry
Bhagath Gopinath
 
PPTX
The Hive Think Tank: Unpacking AI for Healthcare
The Hive
 
PDF
Big Data Analytics in Health Care: A Review Paper
AIRCC Publishing Corporation
 
PPTX
Big data in health care
yogita gaikwad
 
PDF
AI in Healthcare
Paul Agapow
 
PDF
Lumiata
YTH
 
PDF
The data explosion along the care cycle (Dell Healthcare)
Eric Van 't Hoff
 
PDF
Big Data Analytics in Hospitals By Dr.Mahboob ali khan Phd
Healthcare consultant
 
PDF
Benefits of Big Data in Health Care A Revolution
ijtsrd
 
PPTX
Using Big Data for Improved Healthcare Operations and Analytics
Perficient, Inc.
 
PDF
Unpacking AI for Healthcare
Lumiata
 
PDF
Big Data in Healthcare and Medical Devices
PremNarayanan6
 
PDF
Data Governance Talking Points: Simple Lessons From the Trenches
Health Catalyst
 
PDF
Ai applied in healthcare
Javier Samir Rey
 
PDF
The impact of cloud and big data on healthcare sector (1)
Mindfire LLC
 
PDF
Our Journey to Release a Patient-Centric AI App to Reduce Public Health Costs
Databricks
 
PPTX
Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...
Innovation Enterprise
 
PDF
X-RAIS: The Third Eye
Databricks
 
PDF
Patient centricity and digital solutions
Ahmed Graouch
 
PPTX
apidays LIVE India - The digitisation of healthcare by Dr S.S. Lal, Global Fo...
apidays
 
Big data analytics in healthcare industry
Bhagath Gopinath
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive
 
Big Data Analytics in Health Care: A Review Paper
AIRCC Publishing Corporation
 
Big data in health care
yogita gaikwad
 
AI in Healthcare
Paul Agapow
 
Lumiata
YTH
 
The data explosion along the care cycle (Dell Healthcare)
Eric Van 't Hoff
 
Big Data Analytics in Hospitals By Dr.Mahboob ali khan Phd
Healthcare consultant
 
Benefits of Big Data in Health Care A Revolution
ijtsrd
 
Using Big Data for Improved Healthcare Operations and Analytics
Perficient, Inc.
 
Unpacking AI for Healthcare
Lumiata
 
Big Data in Healthcare and Medical Devices
PremNarayanan6
 
Data Governance Talking Points: Simple Lessons From the Trenches
Health Catalyst
 
Ai applied in healthcare
Javier Samir Rey
 
The impact of cloud and big data on healthcare sector (1)
Mindfire LLC
 
Our Journey to Release a Patient-Centric AI App to Reduce Public Health Costs
Databricks
 
Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...
Innovation Enterprise
 
X-RAIS: The Third Eye
Databricks
 
Patient centricity and digital solutions
Ahmed Graouch
 
apidays LIVE India - The digitisation of healthcare by Dr S.S. Lal, Global Fo...
apidays
 

Similar to Big Data Analytics in the Health Domain (20)

PPTX
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Warren Kibbe
 
PDF
Bioinformatics in the Clinical Pipeline: Contribution in Genomic Medicine
iosrjce
 
PDF
Big Data and Analytic Strategy for Clinical Research
BBCR Consulting
 
PPTX
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Paolo Missier
 
PPT
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Philip Bourne
 
PPT
Embi cri review-2013-final
Peter Embi
 
PPTX
Reg Sci Lecture Dec 2016
Rick Silva
 
PDF
InSyBio at Open Coffee Athens CI
Open Coffee Greece
 
PPT
Data Mining and Big Data Analytics in Pharma
Ankur Khanna
 
PDF
Building a National Data Infrastructure to Advance Patient-Centered Comparati...
Patient-Centered Outcomes Research Institute
 
PPTX
MURI Summer
Zachary East
 
PPTX
Cancer Moonshot, Data sharing and the Genomic Data Commons
Warren Kibbe
 
PPTX
Quantifying the content of biomedical semantic resources as a core for drug d...
Syed Muhammad Ali Hasnain
 
PPTX
Electronic Medical Records: From Clinical Decision Support to Precision Medicine
Kent State University
 
PPT
Wake up Pharma and look into your Big data
Yigal Aviv
 
PDF
New sources of big data for precision medicine: are we ready?
Health and Biomedical Informatics Centre @ The University of Melbourne
 
PDF
The state of the art in behavioral machine learning for healthcare
Africa Perianez
 
PPTX
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
D3 Consutling
 
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Warren Kibbe
 
Bioinformatics in the Clinical Pipeline: Contribution in Genomic Medicine
iosrjce
 
Big Data and Analytic Strategy for Clinical Research
BBCR Consulting
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Paolo Missier
 
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Philip Bourne
 
Embi cri review-2013-final
Peter Embi
 
Reg Sci Lecture Dec 2016
Rick Silva
 
InSyBio at Open Coffee Athens CI
Open Coffee Greece
 
Data Mining and Big Data Analytics in Pharma
Ankur Khanna
 
Building a National Data Infrastructure to Advance Patient-Centered Comparati...
Patient-Centered Outcomes Research Institute
 
MURI Summer
Zachary East
 
Cancer Moonshot, Data sharing and the Genomic Data Commons
Warren Kibbe
 
Quantifying the content of biomedical semantic resources as a core for drug d...
Syed Muhammad Ali Hasnain
 
Electronic Medical Records: From Clinical Decision Support to Precision Medicine
Kent State University
 
Wake up Pharma and look into your Big data
Yigal Aviv
 
New sources of big data for precision medicine: are we ready?
Health and Biomedical Informatics Centre @ The University of Melbourne
 
The state of the art in behavioral machine learning for healthcare
Africa Perianez
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
D3 Consutling
 
Ad

More from BigData_Europe (20)

PDF
Luigi Selmi - The Big Data Integrator Platform
BigData_Europe
 
PDF
Josep Maria Salanova - Introduction to BDE+SC4
BigData_Europe
 
PDF
Rajendra Akerkar - LeMO Project
BigData_Europe
 
PDF
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
BigData_Europe
 
PDF
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
BigData_Europe
 
PDF
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
BigData_Europe
 
PDF
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
BigData_Europe
 
PDF
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
BigData_Europe
 
PDF
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
BigData_Europe
 
PDF
BDE SC3.3 Workshop - Agenda
BigData_Europe
 
PDF
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BigData_Europe
 
PDF
BDE SC3.3 Workshop - Data management in WT testing and monitoring
BigData_Europe
 
PDF
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
BigData_Europe
 
PDF
BDE SC3.3 Workshop - BDE Platform: Technical overview
BigData_Europe
 
PDF
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BigData_Europe
 
PDF
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BigData_Europe
 
PDF
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
BigData_Europe
 
PDF
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BigData_Europe
 
PPTX
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BigData_Europe
 
PPTX
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BigData_Europe
 
Luigi Selmi - The Big Data Integrator Platform
BigData_Europe
 
Josep Maria Salanova - Introduction to BDE+SC4
BigData_Europe
 
Rajendra Akerkar - LeMO Project
BigData_Europe
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
BigData_Europe
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
BigData_Europe
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
BigData_Europe
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
BigData_Europe
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
BigData_Europe
 
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
BigData_Europe
 
BDE SC3.3 Workshop - Agenda
BigData_Europe
 
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BigData_Europe
 
BDE SC3.3 Workshop - Data management in WT testing and monitoring
BigData_Europe
 
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
BigData_Europe
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
BigData_Europe
 
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BigData_Europe
 
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BigData_Europe
 
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
BigData_Europe
 
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BigData_Europe
 
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BigData_Europe
 
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BigData_Europe
 
Ad

Recently uploaded (20)

PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 

Big Data Analytics in the Health Domain

  • 1. Integration and analysis of heterogeneous big data for precision medicine and suggested treatments for different types of patients. SC1-PM-18-2016: Big Data supporting Public Health policies Big Data Analytics Maria-Esther Vidal Leibniz Information Centre For Science and Technology University Library (TIB), Germany 06.11.2017 BigDataEurope Workshop on Big Data in Climate Action, Environment, Resource Efficiency and Raw Materials 1 http://project-iasis.eu @Project_IASIS
  • 2. Big Data Analytics and Drug Side Effects 2 Big Data Analytics study has found an association between the use of proton-pump inhibitors and the likelihood of incurring a heart attack [Shah, SH. Clopidogrel Dosing and CYP2C19. Medscape. July 1, 2011. Medscape web site.] Heart Attack Electronic Health Records (EHRs) of nearly 3 million people and trillions of pieces of medical data Big Data
  • 3. Big Data Analytics and Disease Predisposition Researchers analyzed more than 7,700 brain images from 1,171 people in various stages of Alzheimer's progression using a variety of techniques including magnetic resonance imaging (MRI) and positron emission tomography (PET). Alzheimer’s Disease Evolution Big Data Analytics study has found that changes in blood flow are the earliest known warning sign of Alzheimer's. Big Data https://www.sciencedaily.com/releases/2016/07/160712130229.htm 3
  • 4. Big Data: Medical Image Large volumes of data produced by Imaging Techniques Volume of medical images is growing exponentially Annotated data or structured methods to annotate medical images is challenging Big Data Analytical Methods allow for the interpretability of depicted contents Scalable Methods for collecting, compressing, sharing, and anonymizing medical data Medical Image Processing from Big Data Point of View 4
  • 5. Electronic Health Records (EHR) from Big Data Point of View Big Data: Clinical Data Large volumes of Clinical Data that needs to be stored, retrieved, and aggregated Scalable Methods for collecting, compressing, sharing, and anonymizing medical data Scalable Methods for signal processing and for developing Big Data based clinical decision support systems (CDSSs) 5
  • 6. Genomics from Big Data Point of View Big Data: Genomics Data Human Genome consists of 30,000 to 35,000 genes Scalable Methods for pathway analysis and for the discover associations between observed gene expression changes and predicted functional effects Scalable Methods for Reconstruction of Metabolic and Regulatory Networks 6
  • 7. 7 Challenges of Big Data Management and Analytics
  • 8. 8 Challenges of Big Data Management and Analytics
  • 10. Agenda 1. iASiS: Big Data to Support Precision Medicine and Public Health Policy 2. The iASiS Architecture 3. Big Data Management and Analytics in iASiS 10
  • 11. Drug Ineffectiveness 11 [Source: Brian B. Spear, Margo Heath-Chiozzi, Jeffrey Huff, “Clinical Trends in Molecular Medicine,” Volume 7, Issue 5, 1 May 2001, pages 201-204.]
  • 12. Drug Effectiveness 12 https://healthy.kaiserpermanente.org/static/drugency/images/ACI02770.JPG Methotrexate is a chemotherapy agent and immune system suppressant Drugs Approved for Small Cell Lung Cancer https://www.cancer.gov/about-cancer/treatment/drugs/lung The right drug and dosage are selected based on a patient genome
  • 13. Drug Side Effects 13 https://www.topcanadianpharmacy.org/wp-content/uploads/2016/09/Plavix-package.jpg Drug designed to prevent blood clots The second-best-selling drug in the world Different impact on protecting stent patients from thrombosis depending on patient genetic variance within CYP2C19 CYP2C19 encodes an enzyme that converts the drug from an inactive to an active state. [Shah, SH. Clopidogrel Dosing and CYP2C19. Medscape. July 1, 2011. Medscape web site.] The right drug and dosage are selected based on a patient genome
  • 14. Precision Medicine 14 Medical model that proposes the customization of healthcare, with medical decisions, practices, or products being tailored to the individual patient
  • 15. Precision Medicine 15 Stratified Medicine Targeted Therapy Personalized Medicine P4 Medicine: Personalized, Predictive, Preventive, Participatory Medical model that proposes the customization of healthcare, with medical decisions, practices, or products being tailored to the individual patient
  • 16. iASiS: Vision and Objectives 16 iASiS Vision: Turn clinical, pharmacogenomics, and other Big Data into actionable knowledge for personalized medicine and health policy-making iASiS Objectives: •Integrate automated unstructured and structured data analysis, image analysis, and sequence analysis into a Big Data framework •Use the iASiS framework to support personalized diagnosis and treatment
  • 22. Pilot 1: Lung Cancer 22 Motivation: • Lung cancer among the most • common and deadly diseases • costly cancers • Lung cancer is a heterogeneous disease. Characteristics differ among • patients • tumor regions iASiS will enable: • Discovery of correlations among tumor spread, prognosis, response to treatment • Unraveling molecular mechanisms that predict response to different tumor types (signatures)
  • 23. Big Data in the Lung Cancer Pilot • Pharmacological knowledge extracted from publicly available datasets • Biomedical ontologies and taxonomies • terminology standardization • semantically describing the EHRs • EHRs in Spanish • PET/CT Images • Genomic Data/Liquid Biopsy Samples
  • 24. Pilot 2: Alzheimer's Disease Motivation: • Approximately, 10% of people over 65 suffer from Alzheimer’s • Heterogeneity of the symptoms impedes accurate diagnosis and treatments iASiS will enable: • Discovery of patterns associated with prognosis, outcomes, and response to treatments • Association of medical and lifestyle advice to Alzheimer’s risk and stages of severity
  • 25. • EHRs in English • MRI Brain Images • Genomic Data • Pharmacological knowledge extracted from publicly available datasets • Biomedical ontologies and taxonomies • terminology standardization • semantically describing the EHRs Big Data in the Alzheimer's Disease Pilot
  • 26. Clinical Big Data Analytics Clinical Notes NLP Data Preprocessing Medical Images Deep Learning Predictive Models Medical Vocabularies Knowledge Data Mining Knowledge Graph
  • 27. Genomic Big Data Analytics Identification of RNAs regulated by RBP Comparison with available information Integration with transcriptomic data Hospital-derived data Knowledge Graph Identification of key genes and interactions
  • 28. Open Big Data Analytics •Heterogeneous open data •Semantic indexing of the data via ontologies and thesauri •Knowledge extraction from the data • NLP and network analysis technologies
  • 29. Agenda 1. iASiS: Big Data to Support Precision Medicine and Public Health Policy 2. The iASiS Architecture 3. Big Data Management and Analytics in iASiS 29
  • 31. The iASiS Architecture 31 Knowledge Graph Central Node Node@Partner1
  • 32. The iASiS Architecture 32 Knowledge Graph Central Node Node@Partner1 Node@Partner2
  • 33. Agenda 1. iASiS: Big Data to Support Precision Medicine and Public Health Policy 2. The iASiS Architecture 3. Big Data Management and Analytics in iASiS 33
  • 34. Big Data Management & Analytics 34
  • 35. Big Data Management & Analytics 35
  • 36. Big Data Management & Analytics 36
  • 37. Big Data Management & Analytics 37
  • 38. Ontario: Big Data Management 38 Knowledge Graph
  • 40. Ontario: Evaluation Study 40 Benchmark by Ali Hasnain et. al BioFed: federated query processing over life sciences linked open data. Journal of Biomedical Semantics 2017. RDF Dataset Name Number of RDF Triples Chebi 4,772,706 DrugBank 517,023 Kegg 1,090,830 Affymetrix 44,207,146 Dailymed 162,972 Diseasome 72,445 Sider 101,542 Medicare 44,500 LinkedCT 9,804,652 Linked TCGA-A 35,329,868
  • 41. Ontario: Evaluation Study 41 Benchmark by Ali Hasnain et. al BioFed: federated query processing over life sciences linked open data. Journal of Biomedical Semantics 2017. Category #Triple Patterns #Star-Shaped Subqueries #Union #Optional Min Max Min Max Min Max Min Max Simple Queries 3 8 2 4 0 1 0 1 Complex Queries 6 12 2 5 0 1 0 1
  • 42. Ontario: Evaluation Study 42 • Ontario is compared with state-of-the-art federated query processing tools (ANAPSID and FedX) and direct SPARQL endpoint • Ontario exhibits better performance than the results of the engines in terms of throughput
  • 43. Ontario: Evaluation Study 43 • Ontario is compared with state-of-the-art federated query processing tools (ANAPSID and FedX) and direct SPARQL endpoint • Ontario exhibits better performance than the results of the engines in terms of throughput
  • 44. Big Data Analytics and Pattern Discovery 44 Pragmatics ContextSemantics Drug Similarity Target Similarity Detecting Patterns Predicting Interactions Computing Similarity Pattern Detection Prediction Principle Network of Drugs, Targets, and interactions Patterns of similar Drugs, similar Targets, and their interactions Discovered Drug-Target Interactions ChEBI Ontology
  • 45. Patterns between Drug-Protein Interactions 45 Predicted Interactions between Drugs and Proteins
  • 46. Patterns between Drug and Side Effects 46 Predicted Interactions between Drugs and Side Effects
  • 47. Drug-Target Interaction Discovery 47 Nuclear Receptor GPCR Ion Channel Enzym e Drugs 54 223 210 445 Targets 26 95 204 664 Interactions 90 635 1,476 2,926 Avg. Interaction per Target 3.46 6.68 7.23 4.4 Avg. Interaction per Drug 1.66 2.84 7.02 6.57
  • 48. Supervised Machine Learning Approaches 48 Machine Learning for Drug-Target Interaction Discovery: • BLM: Bipartite Local Method [Cheng et al] • LapRLS: Laplacian Regularized Least Squares [Xia et al] • GIP: Gaussian Interaction Profile [Van Laarhoven et al] • KBMF2K: Kernelized Bayesian Matrix Factorization with twin Kernels [Gonen] • NBI: Network-Based Inference [Cheng et al]
  • 49. Unsupervised Approaches 49 • semEP: Graph Partition into communities from where new interactions are predicted [Palma, Vidal, and Raschid] • Metis: Multilevel recursive-bisection [George Karypis and Vipin Kumar] • Ncut: Normalized Cuts [Shi and Malik]
  • 50. Drug-Target Interaction Discovery 50 Top5 Novel interactions are interactions that do not appear in the dataset and can be validated in STITCH (http://stitch.embl.de/) and KEGG (http://www.kegg.jp ) Method Nuclear Receptor GPCR Ion Channel Enzyme semEP 5 5 4 1 Metis 3 5 2 1 Ncut 3 4 1 1 BLM 2 1 0 0 NBI 1 1 1 2 GIP 4 2 1 3 LapRLS 4 4 2 2 KBMF2K 4 4 4 4
  • 51. Conclusions and Future Work 51 Biomedicine Big Data The iASiS Pipeline The iASiS Architecture Big Data Management and Analytics
  • 52. Conclusions and Future Work 52 Biomedicine Big Data The iASiS Pipeline The iASiS Architecture Big Data Management and Analytics Next Steps: ● Collect Big Data ● Extract Knowledge from Big Data sources ● Create the iASiS Knowledge Graph ● Enforce Data Access and Privacy Policies ● Big Data Processing and Analytics
  • 55. Scientific Data Management Group at TIB 55 Guillermo Palma Guillermo Betancourt Yerson Roa Kemele Endris Farah Karim Bachelor StudentsPhD StudentsPostDoc Maria-Esther Vidal
  • 56. Many thanks for your attention! Questions? 56