SlideShare a Scribd company logo
Genomic Big Data
Management, Integration, and Mining
E. Weitschek1,2
1 Department of Engineering, Uninettuno International University, Italy
2 Institute of Systems Analysis and Computer Science, National Research Council, Italy
Joint work with P. Bertolazzi, G. Felici , F. Cumbo, G. Fiscon, E. Cappelli
2
Outline
• Growth of biological data
• Next generation sequencing
• Biological data sources
• Biological data management
• Biological data integration
• Big data bioinformatics
• Knowledge extraction
• Supervised Learning
• Biomedical applications
• Conclusions and future directions
3
Growth of biological data
• Advances in molecular biology lead to an exponential growth of biological data thanks
to the support of computer science
‒ originated by the DNA sequencing method invented by Sanger in early eighties
‒ late nineties significant advances in sequence generation, e.g. Human Genome
Project
‒ actually the genomic sequences are doubling every 18 months
‒ GenBank: collection of all publicly available nucleotide sequences (160 M seq)
4
Growth of biological data
• Advances in molecular biology lead to an exponential growth of biological data thanks
to the support of computer science
‒ Today next generation high throughput data from modern parallel sequencing
machines, are collected and huge amounts of biological data are currently
available on public and private sources
‒ 10000 Human Genomes project (3000 Mbp)
‒ Nowadays: 1000$ genome
• Very large data sets, that are generated by several different biological experiments,
need to be automatically processed and analyzed with computer science methods
5
DNA Sequencing
• DNA (deoxyribonucleic acid) is the hereditary material in almost all organisms
• DNA sequencing is the process of determining the order of nucleotides within a DNA
molecule
• It includes any method or technology that is used to determine the order of
the four bases—adenine (A), cytosine (C), guanine (G), and thymine (T)
• Originated by the DNA sequencing method invented by Sanger in early eighties
• In late nineties significant advances in sequence generation techniques, largely
inspired by massive projects such as the Human Genome Project
• High costs and time, e.g., for the Human Genome Project 5 billions $ and 13 years
6
Next Generation Sequencing (NGS)
• Today: next generation high throughput data from modern parallel sequencing
machines
‒ Roche 454, Illumina, Applied Biosystems SOLiD,
Helicos Heliscope, Complete Genomics,
Pacific Biosciences SMRT, ION Torrent
‒ Next generation sequencing (NGS) machines output a
large amount of short DNA sequences, called reads
(in fastq format)
‒ Cannot read entire genome one nucleotides at a time from
beginning to end
‒ shred the genome and generate shorts reads
‒ Low cost per base (1000$ for a whole human genome)
‒ High speed (24h to sequence a whole human genome )
‒ Large number of reads
‒ Problems: data storage and analysis, high costs for IT infrastructure
7
Next Generation Sequencing (NGS)
• Data dimension, time and cost of Next Generation Sequencing
Seq type Data Price $ Time
Human Genome 90 GB 1000 1 day
Human Gene
Expression
9 GB 500 12 h
Plant Genome 150 GB 2000 5 days
Bacterial Genome 1 GB 300 6 h
8
Biological data sources
• Several heterogeneous sources of biomedical data are available
• Sequence Read Archive
• The Gene Expression Omnibus
• NCBI
• ELIXIR
• The Cancer Genome Atlas (TCGA)
9
Biological data management
10
Biological data integration
• Challenge for the research community
• Allow everyone to store, organize, access, and analyze the information
available on the web and/or on private repositories
• Integration of data: providing a unified access to heterogeneous and
independent data sources as a single source
• Many solutions from the I.T. and from the bioinformatics community, e.g.
− Heterogeneous Database Systems
− Distributed Database Systems
− SRS
− NCBI Entrez
− Federated databases (BioKleisli)
− Multi-databases (TAMBIS),
− Mediator-based (Bio-DataServer)
− Data warehousing (BioWarehouse)
• Integration of clinical and genomic data
11
Bioinformatics
• New methods are demanded able to extract relevant information from biological data
sets
• Effective and efficient computer science methods are needed to support the analysis
of complex biological data sets
• Modern biology is frequently combined with computer science, leading to
Bioinformatics
• Bioinformatics is a discipline where biology and computer science merge together in
order to design and develop efficient methods for analyzing biological data, for
supporting in vivo, in vitro and in silicio experiments and for automatically solving
complex life science problems
• Bioinformatician: a computer scientist and biology domain expert, who is able to deal
with the computer aided resolution of life science problems
12
• The attention to Big Data in bioinformatics is steadily increasing,
proportionally to the growth of the amount of biological data obtained
through sequencing
• Dealing with such an amount of data, recorded at different stages during
the life of a person and stored for dynamic analysis studies, requires
scalable systems suitable for the collection, management, and analysis
• Biological Big Data Bases
Big Data Bioinformatics
13
• Comprehensive genomic characterization and analysis of more than 30 cancer
type
• National Cancer Institute (NCI), National Human Genome Research Institute
(NHGRI), and National Institute of Health (NIH)
• Aim: improve the ability to diagnose, treat and prevent cancer
• A free-available platform to search, download, and analyze data sets
• 33 tumors with more than 10000 patients
• Public data distributed with the open access paradigm
• Genomic experiments
– Copy Number Variation (CNV)
– DNA-methylation
– DNA-sequencing (whole genome, whole exome, mutations)
– Gene expression data (RNA-Seq V1, V2)
– MicroRNA sequencing
– Meta data (Clinical and Biospecimen)
• Contains more than 15 TB of genomic and clinical data, whose analysis and
interpretation are posing great challenges to the bioinformatics community
The Cancer Genome Atlas (TCGA)
14
TCGA2BED
Data integration from external dbs
15
data set:
DNA-Methylation
data set:
RNA-sequencing
Genomic data integration
Typical problem in Bioinformatics:
• More than 1000 samples (patients), 450 000 features (genes, sites, clinical
variables, proteins, )
• Aim: distinguish healthy vs diseased samples
• Not addressable by a classic machine learning algorithm
• Big Data solutions
16
• Aims: distinguish the diseased from the healthy samples and prediction
• Input: a training set (reference library) containing samples with a priori
known class membership
• Model building: based on this training set the software computes the
classification model
• The classification model can be applied to a test set (query set) which
contains samples that require classification:
− query samples with unknown species membership or
− samples that also have a priori known species membership, allowing verification of the
classifications
Classification and supervised machine learning
17
Rule-based classification
A rule-based classifier is a technique for classifying samples by using a collection of
“if… then rules”, named logic formulas:
– Antecedent  Consequent
– (Condition1) or (Condition2) or … or (Conditionn)  Class
– Conditioni: (A1 op v1) and (A2 op v2) and … and (Am op vm)
– A = attribute; v = value; op = operator {=, ≠, <, >, ≤, ≥}
• Example of logic classification formula is
• The evaluation of the logic formulas and the classification of the samples to the right
class is performed according :
– Percentage split or cross validation sampling
– Accuracy
– F-measure
“IF Aph1b<0.507 then the experimental sample is CONTROL”
18
CAMUR
• Classifier with Alternative and Multiple Rule-based models (CAMUR)
• New method for classifying RNA-seq case-control samples, which is able to compute
multiple human readable classification models
• Aims of CAMUR:
1) To classify RNA-seq experiments
2) To extract several alternative and equivalent rule-based models,
which represent relevant sets of genes related to the case and control samples
• CAMUR extracts multiple classification models by adopting a feature elimination
technique and by iterating the classification procedure
• Prerequisite: Gene expression normalization
(RPKM or RSEM )
• Available at: http://dmb.iasi.cnr.it/camur.php
19
CAMUR: method
• CAMUR is based on:
1) a rule-based classifier (i.e., in this work RIPPER)
2) an iterative feature elimination technique
3) a repeated classification procedure
4) an ad-hoc storage structure for the classification rules (CAMUR database)
• In brief, CAMUR:
• iteratively computes a rule-based classification model through the supervised
RIPPER algorithm,
• calculates the power set (or a partial combination) of the features present in the
rules,
• iteratively eliminates those combinations from the data set, and
• performs again the classification procedure until a stopping criterion is verified:
 F-measure < threshold
 Maximum number of iterations reached
20
Experimentation and results
21
Experimentation and results
22
(MAMDC2_dMet >= 6.63) and
(ACACB_rnaSeq >= 887.80)
=> class=normal (19.0/3.0)
[ ] => class=tumoral (1102.0/1.0)
Correctly Classified Instances 98.11 %
Incorrectly Classified Instances 1.88 %
Gene occurrences
FIGF_rnaSeq 44
SPRY2_dMet 37
SCN3A_rnaSeq 25
PAMR1_dMet 20
MMP11_rnaSeq 20
Class rule accuracy
Normal (FIGF_rnaSeq >= 184.15) and
(CLEC5A_dMet <= 5.44) ||
(TSHZ2_rnaSeq >= 471.04) and
(DLGAP2_dMet >= 10.06)
9.800
Normal (SPRY2_dMet >= 0.55) and
(CD300LG_rnaSeq >= 454.24) ||
(PAMR1_rnaSeq >= 712.17) and
(PARP8_dMet >= 2.17)
9.700
Camur: occurrences
Classification models for breast cancer
CAMUR: rules
Supervised model extraction
23
Aim: To extract relevant features from the ever-increasing amount of
biological data and to apply supervised learning to classify them
Biology Issue Features Software Data source
Clinical patient
classification
Clinical variables (blood,
imaging, psicosometric
tests…)
DMB, Weka
Heterogeneous
health care facilities
Gene Expression
Analysis
Discretize gene expression
profiles
Gela, CAMUR TCGA, EBRI
DNA barcoding
Nucleotide sequences of
DNA-barcode
Blog, Fasta2Weka
Barcode of Life
Consortium
Polyoma/Rhyno
Viruses
Nucleotide sequences of
Polyoma/Rhyno viruses
DMB, MISSAL
Istituto Superiore di
Sanità
EEG signals
processing
Fourier Coefficients
extracted from EEG
recordings
Matlab, Weka, DMB
IRCCS Centro di
Neurolesi “Bonino-
Pulejo” of Messina
Biomedical
image processing
Oriented Fast and Rotated
BRIEF
Matlab, Weka, DMB
Alzheimer's Disease
Neuroimaging
Initiative
Other applications on biomedical data
24
Conclusions and future directions
• Exponential growth of biomedical data
• Release of many public data bases, data
collection and data management projects
• Data integration
• Supervised classification analysis
• Advanced systems for data integration
• New big data approaches
25
Acknowledgments
Emanuel Weitschek
Department of Engineering
Uninettuno International University
www.iasi.cnr.it/~eweitschek
emanuel@iasi.cnr.it
Ad

More Related Content

What's hot (20)

NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
Bioinformatics and Computational Biosciences Branch
 
Illumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodIllumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) method
FekaduKorsa
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
Data Driven Innovation
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
Mrinal Vashisth
 
Ncbi
NcbiNcbi
Ncbi
richierich1011
 
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUEPacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
Muunda Mudenda
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
Dominic Suciu
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
Golden Helix Inc
 
Single cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsSingle cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applications
faraharooj
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
QIAGEN
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
Bioinformatics and Computational Biosciences Branch
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Shaheen Alam
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
Jennifer Shelton
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
Junsu Ko
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single Cell
QIAGEN
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
Jyoti Singh
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
Dr.K.RameshKumar, Assistant Professor,Vivekananda College,Tiruvedakam West, Madurai
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)
Mrinal Vashisth
 
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
VHIR Vall d’Hebron Institut de Recerca
 
Illumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodIllumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) method
FekaduKorsa
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
Mrinal Vashisth
 
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUEPacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
Muunda Mudenda
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
Dominic Suciu
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
Golden Helix Inc
 
Single cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsSingle cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applications
faraharooj
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
QIAGEN
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Shaheen Alam
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
Jennifer Shelton
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
Junsu Ko
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single Cell
QIAGEN
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
Jyoti Singh
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)
Mrinal Vashisth
 
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
VHIR Vall d’Hebron Institut de Recerca
 

Viewers also liked (20)

How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
Data Driven Innovation
 
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro RosatiIl valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Data Driven Innovation
 
The mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia MarzanoThe mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia Marzano
Data Driven Innovation
 
Knowledge graph: il percorso di Cerved per connettere i Big Data - Diego Sanvito
Knowledge graph: il percorso di Cerved per connettere i Big Data - Diego SanvitoKnowledge graph: il percorso di Cerved per connettere i Big Data - Diego Sanvito
Knowledge graph: il percorso di Cerved per connettere i Big Data - Diego Sanvito
Data Driven Innovation
 
Il deep learning ed una nuova generazione di AI - Simone Scardapane
Il deep learning ed una nuova generazione di AI - Simone ScardapaneIl deep learning ed una nuova generazione di AI - Simone Scardapane
Il deep learning ed una nuova generazione di AI - Simone Scardapane
Data Driven Innovation
 
Towards intelligent data insights in central banks: challenges and opportunit...
Towards intelligent data insights in central banks: challenges and opportunit...Towards intelligent data insights in central banks: challenges and opportunit...
Towards intelligent data insights in central banks: challenges and opportunit...
Data Driven Innovation
 
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Data Driven Innovation
 
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'AnnaBig Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Data Driven Innovation
 
Data driven innovation in chirurgia: il caso EVARplanning - Paolo Spada
Data driven innovation in chirurgia: il caso EVARplanning - Paolo SpadaData driven innovation in chirurgia: il caso EVARplanning - Paolo Spada
Data driven innovation in chirurgia: il caso EVARplanning - Paolo Spada
Data Driven Innovation
 
Il paradigma dei Big Data e Predictive Analysis, un valido supporto al contra...
Il paradigma dei Big Data e Predictive Analysis, un valido supporto al contra...Il paradigma dei Big Data e Predictive Analysis, un valido supporto al contra...
Il paradigma dei Big Data e Predictive Analysis, un valido supporto al contra...
Data Driven Innovation
 
A visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe FrancavillaA visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe Francavilla
Data Driven Innovation
 
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Data Driven Innovation
 
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
Data Driven Innovation
 
Healthware for medicine - Roberto Ascione
Healthware for medicine - Roberto AscioneHealthware for medicine - Roberto Ascione
Healthware for medicine - Roberto Ascione
Data Driven Innovation
 
Cognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico NeriCognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico Neri
Data Driven Innovation
 
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven Innovation
 
How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)
Data Driven Innovation
 
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide MulaPortabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Data Driven Innovation
 
LCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca RuiniLCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca Ruini
Data Driven Innovation
 
No Data, No Party - Roberto Magnifico
No Data, No Party - Roberto MagnificoNo Data, No Party - Roberto Magnifico
No Data, No Party - Roberto Magnifico
Data Driven Innovation
 
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
Data Driven Innovation
 
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro RosatiIl valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Data Driven Innovation
 
The mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia MarzanoThe mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia Marzano
Data Driven Innovation
 
Knowledge graph: il percorso di Cerved per connettere i Big Data - Diego Sanvito
Knowledge graph: il percorso di Cerved per connettere i Big Data - Diego SanvitoKnowledge graph: il percorso di Cerved per connettere i Big Data - Diego Sanvito
Knowledge graph: il percorso di Cerved per connettere i Big Data - Diego Sanvito
Data Driven Innovation
 
Il deep learning ed una nuova generazione di AI - Simone Scardapane
Il deep learning ed una nuova generazione di AI - Simone ScardapaneIl deep learning ed una nuova generazione di AI - Simone Scardapane
Il deep learning ed una nuova generazione di AI - Simone Scardapane
Data Driven Innovation
 
Towards intelligent data insights in central banks: challenges and opportunit...
Towards intelligent data insights in central banks: challenges and opportunit...Towards intelligent data insights in central banks: challenges and opportunit...
Towards intelligent data insights in central banks: challenges and opportunit...
Data Driven Innovation
 
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Data Driven Innovation
 
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'AnnaBig Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Data Driven Innovation
 
Data driven innovation in chirurgia: il caso EVARplanning - Paolo Spada
Data driven innovation in chirurgia: il caso EVARplanning - Paolo SpadaData driven innovation in chirurgia: il caso EVARplanning - Paolo Spada
Data driven innovation in chirurgia: il caso EVARplanning - Paolo Spada
Data Driven Innovation
 
Il paradigma dei Big Data e Predictive Analysis, un valido supporto al contra...
Il paradigma dei Big Data e Predictive Analysis, un valido supporto al contra...Il paradigma dei Big Data e Predictive Analysis, un valido supporto al contra...
Il paradigma dei Big Data e Predictive Analysis, un valido supporto al contra...
Data Driven Innovation
 
A visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe FrancavillaA visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe Francavilla
Data Driven Innovation
 
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Data Driven Innovation
 
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
Data Driven Innovation
 
Healthware for medicine - Roberto Ascione
Healthware for medicine - Roberto AscioneHealthware for medicine - Roberto Ascione
Healthware for medicine - Roberto Ascione
Data Driven Innovation
 
Cognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico NeriCognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico Neri
Data Driven Innovation
 
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven Innovation
 
How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)
Data Driven Innovation
 
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide MulaPortabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Data Driven Innovation
 
LCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca RuiniLCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca Ruini
Data Driven Innovation
 
Ad

Similar to Genomic Big Data Management, Integration and Mining - Emanuel Weitschek (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
chirag thakkar
 
Bioinformatics Introduction
Bioinformatics IntroductionBioinformatics Introduction
Bioinformatics Introduction
David Montaner
 
Introduction to Bioinformatics 2025.....pdf
Introduction to Bioinformatics 2025.....pdfIntroduction to Bioinformatics 2025.....pdf
Introduction to Bioinformatics 2025.....pdf
omniaabdo276
 
Introduction to bioinformatics and databases .pptx
Introduction to bioinformatics and databases .pptxIntroduction to bioinformatics and databases .pptx
Introduction to bioinformatics and databases .pptx
ManjuM90
 
Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...
Ed Dodds
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
 
Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013
Prof. Wim Van Criekinge
 
Basics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptxBasics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptx
Mohdkaifkhan18
 
Lecture_1_Introduction_Bioinformatics.pptx
Lecture_1_Introduction_Bioinformatics.pptxLecture_1_Introduction_Bioinformatics.pptx
Lecture_1_Introduction_Bioinformatics.pptx
90loiq2y9
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
xRowlet
 
Microarry andd NGS.pdf
Microarry andd NGS.pdfMicroarry andd NGS.pdf
Microarry andd NGS.pdf
nedalalazzwy
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Nathan Olson
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
European Bioinformatics Institute
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
ARPUTHA SELVARAJ A
 
Genomics and Bioinformatics
Genomics and BioinformaticsGenomics and Bioinformatics
Genomics and Bioinformatics
Amit Garg
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
nist-spin
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
nguyenpg
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
BioinformaticsCentre
 
Share_Introduction to Bioinformatics-WPS_Office.pptx
Share_Introduction to Bioinformatics-WPS_Office.pptxShare_Introduction to Bioinformatics-WPS_Office.pptx
Share_Introduction to Bioinformatics-WPS_Office.pptx
ShashiKala434918
 
Bioinformatics Introduction
Bioinformatics IntroductionBioinformatics Introduction
Bioinformatics Introduction
David Montaner
 
Introduction to Bioinformatics 2025.....pdf
Introduction to Bioinformatics 2025.....pdfIntroduction to Bioinformatics 2025.....pdf
Introduction to Bioinformatics 2025.....pdf
omniaabdo276
 
Introduction to bioinformatics and databases .pptx
Introduction to bioinformatics and databases .pptxIntroduction to bioinformatics and databases .pptx
Introduction to bioinformatics and databases .pptx
ManjuM90
 
Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...
Ed Dodds
 
Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013
Prof. Wim Van Criekinge
 
Basics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptxBasics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptx
Mohdkaifkhan18
 
Lecture_1_Introduction_Bioinformatics.pptx
Lecture_1_Introduction_Bioinformatics.pptxLecture_1_Introduction_Bioinformatics.pptx
Lecture_1_Introduction_Bioinformatics.pptx
90loiq2y9
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
xRowlet
 
Microarry andd NGS.pdf
Microarry andd NGS.pdfMicroarry andd NGS.pdf
Microarry andd NGS.pdf
nedalalazzwy
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Nathan Olson
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
European Bioinformatics Institute
 
Genomics and Bioinformatics
Genomics and BioinformaticsGenomics and Bioinformatics
Genomics and Bioinformatics
Amit Garg
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
nist-spin
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
nguyenpg
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
BioinformaticsCentre
 
Share_Introduction to Bioinformatics-WPS_Office.pptx
Share_Introduction to Bioinformatics-WPS_Office.pptxShare_Introduction to Bioinformatics-WPS_Office.pptx
Share_Introduction to Bioinformatics-WPS_Office.pptx
ShashiKala434918
 
Ad

More from Data Driven Innovation (20)

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Data Driven Innovation
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
Data Driven Innovation
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
Data Driven Innovation
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Data Driven Innovation
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
Data Driven Innovation
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Data Driven Innovation
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Data Driven Innovation
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Data Driven Innovation
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
Data Driven Innovation
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Data Driven Innovation
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Data Driven Innovation
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
Data Driven Innovation
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
Data Driven Innovation
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Data Driven Innovation
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Data Driven Innovation
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Data Driven Innovation
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Data Driven Innovation
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Data Driven Innovation
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Driven Innovation
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data Driven Innovation
 
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Data Driven Innovation
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
Data Driven Innovation
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
Data Driven Innovation
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Data Driven Innovation
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
Data Driven Innovation
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Data Driven Innovation
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Data Driven Innovation
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Data Driven Innovation
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
Data Driven Innovation
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Data Driven Innovation
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Data Driven Innovation
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
Data Driven Innovation
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
Data Driven Innovation
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Data Driven Innovation
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Data Driven Innovation
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Data Driven Innovation
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Data Driven Innovation
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Data Driven Innovation
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Driven Innovation
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data Driven Innovation
 

Recently uploaded (20)

A comparative study of onlay versus sublay mesh repair in the surgical manage...
A comparative study of onlay versus sublay mesh repair in the surgical manage...A comparative study of onlay versus sublay mesh repair in the surgical manage...
A comparative study of onlay versus sublay mesh repair in the surgical manage...
Sona Thesis Consultancy
 
Gastric Cancer: Artificial Intelligence, Synergetics, Complex System Analysis...
Gastric Cancer: Artificial Intelligence, Synergetics, Complex System Analysis...Gastric Cancer: Artificial Intelligence, Synergetics, Complex System Analysis...
Gastric Cancer: Artificial Intelligence, Synergetics, Complex System Analysis...
Oleg Kshivets
 
Adverse event following immunizations and Vaccine safety surveillance.pptx
Adverse event following immunizations and Vaccine safety surveillance.pptxAdverse event following immunizations and Vaccine safety surveillance.pptx
Adverse event following immunizations and Vaccine safety surveillance.pptx
Dr. Koppala R.V.S. Chaitanya
 
2025 lobotomy vs nasal surgery comparison
2025 lobotomy vs nasal surgery comparison2025 lobotomy vs nasal surgery comparison
2025 lobotomy vs nasal surgery comparison
yilef94631
 
Role of Gene Therapy Neurological disorders
Role of Gene Therapy Neurological disordersRole of Gene Therapy Neurological disorders
Role of Gene Therapy Neurological disorders
riggdiana2
 
Normal distribution and Z score Test for post graduate and undergraduate stu...
Normal distribution and Z score Test  for post graduate and undergraduate stu...Normal distribution and Z score Test  for post graduate and undergraduate stu...
Normal distribution and Z score Test for post graduate and undergraduate stu...
Tauseef Jawaid
 
Meeting dissolution requirements M.Pharmacy sem 2nd biopharmaceutics &pharmac...
Meeting dissolution requirements M.Pharmacy sem 2nd biopharmaceutics &pharmac...Meeting dissolution requirements M.Pharmacy sem 2nd biopharmaceutics &pharmac...
Meeting dissolution requirements M.Pharmacy sem 2nd biopharmaceutics &pharmac...
Swami ramanand teerth marathwada university
 
Ophthalmological notes for dental students
Ophthalmological notes for dental studentsOphthalmological notes for dental students
Ophthalmological notes for dental students
KafrELShiekh University
 
Subconjunctival Hemorrhage Secondary to Pertussis.pdf
Subconjunctival Hemorrhage Secondary to Pertussis.pdfSubconjunctival Hemorrhage Secondary to Pertussis.pdf
Subconjunctival Hemorrhage Secondary to Pertussis.pdf
wahbikhalidali
 
PLEURA & IT'S RECESSES -Prof.Dr.N.Mugunthan.pdf
PLEURA & IT'S RECESSES -Prof.Dr.N.Mugunthan.pdfPLEURA & IT'S RECESSES -Prof.Dr.N.Mugunthan.pdf
PLEURA & IT'S RECESSES -Prof.Dr.N.Mugunthan.pdf
Kanyakumari Medical Mission Research Center, Muttom
 
Methods of Cancer diagnosis in Context of Radiotherapy
Methods of Cancer diagnosis in Context  of RadiotherapyMethods of Cancer diagnosis in Context  of Radiotherapy
Methods of Cancer diagnosis in Context of Radiotherapy
Saikat Roy
 
Pharmacology All Notes 505 Slides (2).pptx
Pharmacology All Notes 505 Slides (2).pptxPharmacology All Notes 505 Slides (2).pptx
Pharmacology All Notes 505 Slides (2).pptx
ssuseraed25f1
 
Adverse event following immunization (AEFI).pptx
Adverse event following immunization (AEFI).pptxAdverse event following immunization (AEFI).pptx
Adverse event following immunization (AEFI).pptx
Dr. Koppala R.V.S. Chaitanya
 
Ignite new thinking in Chronic Obstructive Pulmonary disease
Ignite new thinking in Chronic Obstructive Pulmonary diseaseIgnite new thinking in Chronic Obstructive Pulmonary disease
Ignite new thinking in Chronic Obstructive Pulmonary disease
Ashraf ElAdawy
 
Subconjunctival Hemorrhage Secondary to Pertussis.pdf
Subconjunctival Hemorrhage Secondary to Pertussis.pdfSubconjunctival Hemorrhage Secondary to Pertussis.pdf
Subconjunctival Hemorrhage Secondary to Pertussis.pdf
wahbikhalidali
 
Diagnostics of Dental implant .scientific day 2025.pdf
Diagnostics of Dental implant .scientific day 2025.pdfDiagnostics of Dental implant .scientific day 2025.pdf
Diagnostics of Dental implant .scientific day 2025.pdf
Shimaa Hussein Kotb
 
The Physiology of Central Nervous System - Sensory Pathways
The Physiology of Central Nervous System - Sensory PathwaysThe Physiology of Central Nervous System - Sensory Pathways
The Physiology of Central Nervous System - Sensory Pathways
MedicoseAcademics
 
Lecture chi squire. For Postgraduate and Undergraduate
Lecture chi squire. For Postgraduate and UndergraduateLecture chi squire. For Postgraduate and Undergraduate
Lecture chi squire. For Postgraduate and Undergraduate
Tauseef Jawaid
 
Pharmacovigilance aspects : Predictability & Preventability Assessment.pptx
Pharmacovigilance aspects : Predictability  & Preventability Assessment.pptxPharmacovigilance aspects : Predictability  & Preventability Assessment.pptx
Pharmacovigilance aspects : Predictability & Preventability Assessment.pptx
Dr. Koppala R.V.S. Chaitanya
 
Meeting dissolution requirements M.Pharmacy sem 2nd biopharmaceutics &pharmac...
Meeting dissolution requirements M.Pharmacy sem 2nd biopharmaceutics &pharmac...Meeting dissolution requirements M.Pharmacy sem 2nd biopharmaceutics &pharmac...
Meeting dissolution requirements M.Pharmacy sem 2nd biopharmaceutics &pharmac...
Swami ramanand teerth marathwada university
 
A comparative study of onlay versus sublay mesh repair in the surgical manage...
A comparative study of onlay versus sublay mesh repair in the surgical manage...A comparative study of onlay versus sublay mesh repair in the surgical manage...
A comparative study of onlay versus sublay mesh repair in the surgical manage...
Sona Thesis Consultancy
 
Gastric Cancer: Artificial Intelligence, Synergetics, Complex System Analysis...
Gastric Cancer: Artificial Intelligence, Synergetics, Complex System Analysis...Gastric Cancer: Artificial Intelligence, Synergetics, Complex System Analysis...
Gastric Cancer: Artificial Intelligence, Synergetics, Complex System Analysis...
Oleg Kshivets
 
Adverse event following immunizations and Vaccine safety surveillance.pptx
Adverse event following immunizations and Vaccine safety surveillance.pptxAdverse event following immunizations and Vaccine safety surveillance.pptx
Adverse event following immunizations and Vaccine safety surveillance.pptx
Dr. Koppala R.V.S. Chaitanya
 
2025 lobotomy vs nasal surgery comparison
2025 lobotomy vs nasal surgery comparison2025 lobotomy vs nasal surgery comparison
2025 lobotomy vs nasal surgery comparison
yilef94631
 
Role of Gene Therapy Neurological disorders
Role of Gene Therapy Neurological disordersRole of Gene Therapy Neurological disorders
Role of Gene Therapy Neurological disorders
riggdiana2
 
Normal distribution and Z score Test for post graduate and undergraduate stu...
Normal distribution and Z score Test  for post graduate and undergraduate stu...Normal distribution and Z score Test  for post graduate and undergraduate stu...
Normal distribution and Z score Test for post graduate and undergraduate stu...
Tauseef Jawaid
 
Ophthalmological notes for dental students
Ophthalmological notes for dental studentsOphthalmological notes for dental students
Ophthalmological notes for dental students
KafrELShiekh University
 
Subconjunctival Hemorrhage Secondary to Pertussis.pdf
Subconjunctival Hemorrhage Secondary to Pertussis.pdfSubconjunctival Hemorrhage Secondary to Pertussis.pdf
Subconjunctival Hemorrhage Secondary to Pertussis.pdf
wahbikhalidali
 
Methods of Cancer diagnosis in Context of Radiotherapy
Methods of Cancer diagnosis in Context  of RadiotherapyMethods of Cancer diagnosis in Context  of Radiotherapy
Methods of Cancer diagnosis in Context of Radiotherapy
Saikat Roy
 
Pharmacology All Notes 505 Slides (2).pptx
Pharmacology All Notes 505 Slides (2).pptxPharmacology All Notes 505 Slides (2).pptx
Pharmacology All Notes 505 Slides (2).pptx
ssuseraed25f1
 
Ignite new thinking in Chronic Obstructive Pulmonary disease
Ignite new thinking in Chronic Obstructive Pulmonary diseaseIgnite new thinking in Chronic Obstructive Pulmonary disease
Ignite new thinking in Chronic Obstructive Pulmonary disease
Ashraf ElAdawy
 
Subconjunctival Hemorrhage Secondary to Pertussis.pdf
Subconjunctival Hemorrhage Secondary to Pertussis.pdfSubconjunctival Hemorrhage Secondary to Pertussis.pdf
Subconjunctival Hemorrhage Secondary to Pertussis.pdf
wahbikhalidali
 
Diagnostics of Dental implant .scientific day 2025.pdf
Diagnostics of Dental implant .scientific day 2025.pdfDiagnostics of Dental implant .scientific day 2025.pdf
Diagnostics of Dental implant .scientific day 2025.pdf
Shimaa Hussein Kotb
 
The Physiology of Central Nervous System - Sensory Pathways
The Physiology of Central Nervous System - Sensory PathwaysThe Physiology of Central Nervous System - Sensory Pathways
The Physiology of Central Nervous System - Sensory Pathways
MedicoseAcademics
 
Lecture chi squire. For Postgraduate and Undergraduate
Lecture chi squire. For Postgraduate and UndergraduateLecture chi squire. For Postgraduate and Undergraduate
Lecture chi squire. For Postgraduate and Undergraduate
Tauseef Jawaid
 
Pharmacovigilance aspects : Predictability & Preventability Assessment.pptx
Pharmacovigilance aspects : Predictability  & Preventability Assessment.pptxPharmacovigilance aspects : Predictability  & Preventability Assessment.pptx
Pharmacovigilance aspects : Predictability & Preventability Assessment.pptx
Dr. Koppala R.V.S. Chaitanya
 

Genomic Big Data Management, Integration and Mining - Emanuel Weitschek

  • 1. Genomic Big Data Management, Integration, and Mining E. Weitschek1,2 1 Department of Engineering, Uninettuno International University, Italy 2 Institute of Systems Analysis and Computer Science, National Research Council, Italy Joint work with P. Bertolazzi, G. Felici , F. Cumbo, G. Fiscon, E. Cappelli
  • 2. 2 Outline • Growth of biological data • Next generation sequencing • Biological data sources • Biological data management • Biological data integration • Big data bioinformatics • Knowledge extraction • Supervised Learning • Biomedical applications • Conclusions and future directions
  • 3. 3 Growth of biological data • Advances in molecular biology lead to an exponential growth of biological data thanks to the support of computer science ‒ originated by the DNA sequencing method invented by Sanger in early eighties ‒ late nineties significant advances in sequence generation, e.g. Human Genome Project ‒ actually the genomic sequences are doubling every 18 months ‒ GenBank: collection of all publicly available nucleotide sequences (160 M seq)
  • 4. 4 Growth of biological data • Advances in molecular biology lead to an exponential growth of biological data thanks to the support of computer science ‒ Today next generation high throughput data from modern parallel sequencing machines, are collected and huge amounts of biological data are currently available on public and private sources ‒ 10000 Human Genomes project (3000 Mbp) ‒ Nowadays: 1000$ genome • Very large data sets, that are generated by several different biological experiments, need to be automatically processed and analyzed with computer science methods
  • 5. 5 DNA Sequencing • DNA (deoxyribonucleic acid) is the hereditary material in almost all organisms • DNA sequencing is the process of determining the order of nucleotides within a DNA molecule • It includes any method or technology that is used to determine the order of the four bases—adenine (A), cytosine (C), guanine (G), and thymine (T) • Originated by the DNA sequencing method invented by Sanger in early eighties • In late nineties significant advances in sequence generation techniques, largely inspired by massive projects such as the Human Genome Project • High costs and time, e.g., for the Human Genome Project 5 billions $ and 13 years
  • 6. 6 Next Generation Sequencing (NGS) • Today: next generation high throughput data from modern parallel sequencing machines ‒ Roche 454, Illumina, Applied Biosystems SOLiD, Helicos Heliscope, Complete Genomics, Pacific Biosciences SMRT, ION Torrent ‒ Next generation sequencing (NGS) machines output a large amount of short DNA sequences, called reads (in fastq format) ‒ Cannot read entire genome one nucleotides at a time from beginning to end ‒ shred the genome and generate shorts reads ‒ Low cost per base (1000$ for a whole human genome) ‒ High speed (24h to sequence a whole human genome ) ‒ Large number of reads ‒ Problems: data storage and analysis, high costs for IT infrastructure
  • 7. 7 Next Generation Sequencing (NGS) • Data dimension, time and cost of Next Generation Sequencing Seq type Data Price $ Time Human Genome 90 GB 1000 1 day Human Gene Expression 9 GB 500 12 h Plant Genome 150 GB 2000 5 days Bacterial Genome 1 GB 300 6 h
  • 8. 8 Biological data sources • Several heterogeneous sources of biomedical data are available • Sequence Read Archive • The Gene Expression Omnibus • NCBI • ELIXIR • The Cancer Genome Atlas (TCGA)
  • 10. 10 Biological data integration • Challenge for the research community • Allow everyone to store, organize, access, and analyze the information available on the web and/or on private repositories • Integration of data: providing a unified access to heterogeneous and independent data sources as a single source • Many solutions from the I.T. and from the bioinformatics community, e.g. − Heterogeneous Database Systems − Distributed Database Systems − SRS − NCBI Entrez − Federated databases (BioKleisli) − Multi-databases (TAMBIS), − Mediator-based (Bio-DataServer) − Data warehousing (BioWarehouse) • Integration of clinical and genomic data
  • 11. 11 Bioinformatics • New methods are demanded able to extract relevant information from biological data sets • Effective and efficient computer science methods are needed to support the analysis of complex biological data sets • Modern biology is frequently combined with computer science, leading to Bioinformatics • Bioinformatics is a discipline where biology and computer science merge together in order to design and develop efficient methods for analyzing biological data, for supporting in vivo, in vitro and in silicio experiments and for automatically solving complex life science problems • Bioinformatician: a computer scientist and biology domain expert, who is able to deal with the computer aided resolution of life science problems
  • 12. 12 • The attention to Big Data in bioinformatics is steadily increasing, proportionally to the growth of the amount of biological data obtained through sequencing • Dealing with such an amount of data, recorded at different stages during the life of a person and stored for dynamic analysis studies, requires scalable systems suitable for the collection, management, and analysis • Biological Big Data Bases Big Data Bioinformatics
  • 13. 13 • Comprehensive genomic characterization and analysis of more than 30 cancer type • National Cancer Institute (NCI), National Human Genome Research Institute (NHGRI), and National Institute of Health (NIH) • Aim: improve the ability to diagnose, treat and prevent cancer • A free-available platform to search, download, and analyze data sets • 33 tumors with more than 10000 patients • Public data distributed with the open access paradigm • Genomic experiments – Copy Number Variation (CNV) – DNA-methylation – DNA-sequencing (whole genome, whole exome, mutations) – Gene expression data (RNA-Seq V1, V2) – MicroRNA sequencing – Meta data (Clinical and Biospecimen) • Contains more than 15 TB of genomic and clinical data, whose analysis and interpretation are posing great challenges to the bioinformatics community The Cancer Genome Atlas (TCGA)
  • 15. 15 data set: DNA-Methylation data set: RNA-sequencing Genomic data integration Typical problem in Bioinformatics: • More than 1000 samples (patients), 450 000 features (genes, sites, clinical variables, proteins, ) • Aim: distinguish healthy vs diseased samples • Not addressable by a classic machine learning algorithm • Big Data solutions
  • 16. 16 • Aims: distinguish the diseased from the healthy samples and prediction • Input: a training set (reference library) containing samples with a priori known class membership • Model building: based on this training set the software computes the classification model • The classification model can be applied to a test set (query set) which contains samples that require classification: − query samples with unknown species membership or − samples that also have a priori known species membership, allowing verification of the classifications Classification and supervised machine learning
  • 17. 17 Rule-based classification A rule-based classifier is a technique for classifying samples by using a collection of “if… then rules”, named logic formulas: – Antecedent  Consequent – (Condition1) or (Condition2) or … or (Conditionn)  Class – Conditioni: (A1 op v1) and (A2 op v2) and … and (Am op vm) – A = attribute; v = value; op = operator {=, ≠, <, >, ≤, ≥} • Example of logic classification formula is • The evaluation of the logic formulas and the classification of the samples to the right class is performed according : – Percentage split or cross validation sampling – Accuracy – F-measure “IF Aph1b<0.507 then the experimental sample is CONTROL”
  • 18. 18 CAMUR • Classifier with Alternative and Multiple Rule-based models (CAMUR) • New method for classifying RNA-seq case-control samples, which is able to compute multiple human readable classification models • Aims of CAMUR: 1) To classify RNA-seq experiments 2) To extract several alternative and equivalent rule-based models, which represent relevant sets of genes related to the case and control samples • CAMUR extracts multiple classification models by adopting a feature elimination technique and by iterating the classification procedure • Prerequisite: Gene expression normalization (RPKM or RSEM ) • Available at: http://dmb.iasi.cnr.it/camur.php
  • 19. 19 CAMUR: method • CAMUR is based on: 1) a rule-based classifier (i.e., in this work RIPPER) 2) an iterative feature elimination technique 3) a repeated classification procedure 4) an ad-hoc storage structure for the classification rules (CAMUR database) • In brief, CAMUR: • iteratively computes a rule-based classification model through the supervised RIPPER algorithm, • calculates the power set (or a partial combination) of the features present in the rules, • iteratively eliminates those combinations from the data set, and • performs again the classification procedure until a stopping criterion is verified:  F-measure < threshold  Maximum number of iterations reached
  • 22. 22 (MAMDC2_dMet >= 6.63) and (ACACB_rnaSeq >= 887.80) => class=normal (19.0/3.0) [ ] => class=tumoral (1102.0/1.0) Correctly Classified Instances 98.11 % Incorrectly Classified Instances 1.88 % Gene occurrences FIGF_rnaSeq 44 SPRY2_dMet 37 SCN3A_rnaSeq 25 PAMR1_dMet 20 MMP11_rnaSeq 20 Class rule accuracy Normal (FIGF_rnaSeq >= 184.15) and (CLEC5A_dMet <= 5.44) || (TSHZ2_rnaSeq >= 471.04) and (DLGAP2_dMet >= 10.06) 9.800 Normal (SPRY2_dMet >= 0.55) and (CD300LG_rnaSeq >= 454.24) || (PAMR1_rnaSeq >= 712.17) and (PARP8_dMet >= 2.17) 9.700 Camur: occurrences Classification models for breast cancer CAMUR: rules Supervised model extraction
  • 23. 23 Aim: To extract relevant features from the ever-increasing amount of biological data and to apply supervised learning to classify them Biology Issue Features Software Data source Clinical patient classification Clinical variables (blood, imaging, psicosometric tests…) DMB, Weka Heterogeneous health care facilities Gene Expression Analysis Discretize gene expression profiles Gela, CAMUR TCGA, EBRI DNA barcoding Nucleotide sequences of DNA-barcode Blog, Fasta2Weka Barcode of Life Consortium Polyoma/Rhyno Viruses Nucleotide sequences of Polyoma/Rhyno viruses DMB, MISSAL Istituto Superiore di Sanità EEG signals processing Fourier Coefficients extracted from EEG recordings Matlab, Weka, DMB IRCCS Centro di Neurolesi “Bonino- Pulejo” of Messina Biomedical image processing Oriented Fast and Rotated BRIEF Matlab, Weka, DMB Alzheimer's Disease Neuroimaging Initiative Other applications on biomedical data
  • 24. 24 Conclusions and future directions • Exponential growth of biomedical data • Release of many public data bases, data collection and data management projects • Data integration • Supervised classification analysis • Advanced systems for data integration • New big data approaches
  • 25. 25 Acknowledgments Emanuel Weitschek Department of Engineering Uninettuno International University www.iasi.cnr.it/~eweitschek [email protected]