SlideShare a Scribd company logo
Predicting Outcomes When Your
Outcomes are Graphs (or functions)
Bill Shannon, PhD, MBA
Co-Founder and CEO, BioRankings
Professor Emeritus of Biostatistics in Medicine, WUSM
bill@biorankings.com, 314-704-8725
With big data come new complex data
formats – data as graphs
Functional MRI Data
• Brains are inserted into MRI
scanner
• 30 gigabytes raw data
• Parcellation
• Networks
– Nodes are regions of the
brain
– Edges are the correlations
between pairs of nodes
Connectome Graph
With big data come new complex data
formats – data as graphs
Data Microbiome
• Sample from human,
animal, field (soil),
environment
• Next Generation
Sequencing (write once,
read never data)
• Genomic analysis
processing
– Annotation to taxonomic
label (i.e., genus, species)
Microbiome Tree
Statistics is interested in inferring
things about everything from a sample
Sample to Population Inference
• Collect a bunch of graphs – 1
per subject
• Plot graphs
• Estimate mean and variance
(or g* and tau)
• Does this plot teach us about
the graphs in terms of how
they are distributed and what
the central tendency is?
Does this plot teach us anything?
Graphs are too complex – let’s simplify
Network metrics
Average connectivity
Small world network
Species diversity
Taxa counts
Enterotype
Many-to-one mapping is not necessarily a good
way to simplify data for analysis
Simplifying in fMRI and Microbiome
fMRI
• Average Node Connectivity
• Consider two brain scans
– Patient 1
• Right half ANC = 10
• Left half ANC = 0
– Patient 2
• Right half ANC = 5
• Left half ANC = 5
• Both whole brain ANC = 5
Microbiome
• Species Diversity
• Consider two samples
– Patient 1
• Proportion Taxa A, B, C = 1/3
• Proportion Taxa D, E, F = 0
– Patient 2
• Proportion Taxa A, B, C = 0
• Proportion Taxa D, E, F = 1/3
• Both have Simpson diversity
= 0.33
We analyze graphical data the same
way as we analyze columns of data
Gibbs distribution
• Let G be a finite set of graphs and denote the
elements of G by g. Let 𝑑 be an arbitrary
distance metric on G. The Gibbs distribution
on the graphs G is denoted by
ℙ 𝒈; 𝒈∗
, 𝝉 = 𝒄 𝒈∗
, 𝝉 𝒆𝒙𝒑 −𝝉𝒅 𝒈∗
, 𝒈 , ∀𝒈 𝝐 𝐆,
with parameters g∗
the central or average
graph, and 𝜏 a non-negative number that is a
measure of the dispersion of the observed
connectome data around g∗
. 𝑐 g∗
, 𝜏 is the
normalizing constant.
ℙ 𝑔𝑖; g∗
, 𝜏 is the probability of observing a
specific graph 𝑔𝑖 given the parameters
g∗
, 𝜏 .
Statistics on Graphs
We analyze graphical data the same
way as we analyze columns of data
Recursive partitioning
• Regress the graphs on
covariates
• In this example of Parkinson's
disease
– Y = connectome
– X = group, sex, age
• RP splits the connectomes into
homogeneous groups based
on likelihood of Gibbs
Statistics on Graphs
What else can be analyzed with
graphical OODA?
IoT
Blockchain
Cybersecurity
What about data which are functional
objects?
Untargeted Metabolomics
• Liquid chromatography and
mass spec – LC/MS
• RT x m/z plots
• Which peaks correspond to
metabolites (known or
unknown), and which peaks
are different in patients
who live and die?
RT x m/z plots are too complex – let’s
simply
Looking for things that look
different and then testing them
statistically is wrong – P values
don’t mean anything in these
cases.
Why not analyze functions using
functional OODA?
Why not analyze functions using
functional OODA?
Field Enabling
Technology
Bioinformatics Exploratory Analysis Translational
Statistics
Microbiome Next generation
Sequencing
Assembly,
annotation, chimera
checking
Cluster analysis,
multidimensional
scaling, heatmaps
Dirichlet-
multinomial for taxa
counts
Gibbs distribution
for taxonomic
trees
Brain Imaging Functional MRI
(fMRI)
Image registration,
parcellation
Generalized linear
models with
multiple testing
adjustment, graph
metrics
Gibbs distribution
for connectome
Metabolomics LC/MS Peak detection,
centering
Mass univariate
testing with multiple
testing adjustment
Functional data
analysis, Gibbs
distribution, Co-
Inertia, and the
Exploratory-
Validation Model for
experimental design
Projects in object oriented data analysis
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Ad

More Related Content

What's hot (17)

PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear Models
Colleen Farrelly
 
PMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological dataPMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological data
Yiteng Dang
 
Basic Statistics (MEAN)
Basic Statistics (MEAN)Basic Statistics (MEAN)
Basic Statistics (MEAN)
Shahirah Aziz
 
Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validation
Colleen Farrelly
 
Machine Learning by Analogy II
Machine Learning by Analogy IIMachine Learning by Analogy II
Machine Learning by Analogy II
Colleen Farrelly
 
Block iterative methods
Block iterative methodsBlock iterative methods
Block iterative methods
Kaleeswaran Balasubramaniam
 
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Asoka Korale
 
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsData Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Colleen Farrelly
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
Catur Wibisono
 
Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerations
Colleen Farrelly
 
Portfolio Theory of Information Retrieval
Portfolio Theory of Information RetrievalPortfolio Theory of Information Retrieval
Portfolio Theory of Information Retrieval
Jun Wang
 
Deep learning
Deep learningDeep learning
Deep learning
Chris Orwa
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNASRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
​Iván Rodríguez
 
How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...
Yusuke Kaneko
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis Superlearner
Colleen Farrelly
 
On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...
Jun Wang
 
Dmblog
DmblogDmblog
Dmblog
veeralakshmi pandi
 
PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear Models
Colleen Farrelly
 
PMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological dataPMC Poster - phylogenetic algorithm for morphological data
PMC Poster - phylogenetic algorithm for morphological data
Yiteng Dang
 
Basic Statistics (MEAN)
Basic Statistics (MEAN)Basic Statistics (MEAN)
Basic Statistics (MEAN)
Shahirah Aziz
 
Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validation
Colleen Farrelly
 
Machine Learning by Analogy II
Machine Learning by Analogy IIMachine Learning by Analogy II
Machine Learning by Analogy II
Colleen Farrelly
 
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...
Asoka Korale
 
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression ModelsData Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Colleen Farrelly
 
Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerations
Colleen Farrelly
 
Portfolio Theory of Information Retrieval
Portfolio Theory of Information RetrievalPortfolio Theory of Information Retrieval
Portfolio Theory of Information Retrieval
Jun Wang
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNASRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
​Iván Rodríguez
 
How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...How to correctly estimate the effect of online advertisement(About Double Mac...
How to correctly estimate the effect of online advertisement(About Double Mac...
Yusuke Kaneko
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis Superlearner
Colleen Farrelly
 
On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...
Jun Wang
 

Similar to Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017 (20)

Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
Dmitry Grapov
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
AdamCribbs1
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
Avjinder (Avi) Kaler
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
pannicle
 
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGYBIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
GauravBoruah
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdf
BirhanTesema
 
1. Introduction To Statistics in computing.pptx
1. Introduction To Statistics in computing.pptx1. Introduction To Statistics in computing.pptx
1. Introduction To Statistics in computing.pptx
IsuriUmayangana
 
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhzLect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
ayeleasefa2
 
Data & data reprentation
Data & data reprentationData & data reprentation
Data & data reprentation
SomeshwarMoholkar
 
Data Visualization (1).pptx
Data Visualization (1).pptxData Visualization (1).pptx
Data Visualization (1).pptx
cfiskillzz159
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
yuvraj404
 
Cluster Analysis in Business Research Methods
Cluster Analysis in Business Research MethodsCluster Analysis in Business Research Methods
Cluster Analysis in Business Research Methods
ufkconsumerproducts
 
Lec 3.pptx
Lec 3.pptxLec 3.pptx
Lec 3.pptx
AliAkbar99386
 
Building maps with analysis
Building maps with analysisBuilding maps with analysis
Building maps with analysis
LindaBeale
 
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONINFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
IJDKP
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systems
Ganesh Bagler
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
KamleshKumar394
 
Data in science
Data in science Data in science
Data in science
Sreejith Aravindakshan
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
Dmitry Grapov
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
AdamCribbs1
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
pannicle
 
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGYBIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
GauravBoruah
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdf
BirhanTesema
 
1. Introduction To Statistics in computing.pptx
1. Introduction To Statistics in computing.pptx1. Introduction To Statistics in computing.pptx
1. Introduction To Statistics in computing.pptx
IsuriUmayangana
 
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhzLect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
ayeleasefa2
 
Data & data reprentation
Data & data reprentationData & data reprentation
Data & data reprentation
SomeshwarMoholkar
 
Data Visualization (1).pptx
Data Visualization (1).pptxData Visualization (1).pptx
Data Visualization (1).pptx
cfiskillzz159
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
yuvraj404
 
Cluster Analysis in Business Research Methods
Cluster Analysis in Business Research MethodsCluster Analysis in Business Research Methods
Cluster Analysis in Business Research Methods
ufkconsumerproducts
 
Building maps with analysis
Building maps with analysisBuilding maps with analysis
Building maps with analysis
LindaBeale
 
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONINFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
IJDKP
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systems
Ganesh Bagler
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
KamleshKumar394
 
Ad

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
Ad

Recently uploaded (20)

Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 

Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017

  • 1. Predicting Outcomes When Your Outcomes are Graphs (or functions) Bill Shannon, PhD, MBA Co-Founder and CEO, BioRankings Professor Emeritus of Biostatistics in Medicine, WUSM [email protected], 314-704-8725
  • 2. With big data come new complex data formats – data as graphs Functional MRI Data • Brains are inserted into MRI scanner • 30 gigabytes raw data • Parcellation • Networks – Nodes are regions of the brain – Edges are the correlations between pairs of nodes
  • 4. With big data come new complex data formats – data as graphs Data Microbiome • Sample from human, animal, field (soil), environment • Next Generation Sequencing (write once, read never data) • Genomic analysis processing – Annotation to taxonomic label (i.e., genus, species)
  • 6. Statistics is interested in inferring things about everything from a sample Sample to Population Inference • Collect a bunch of graphs – 1 per subject • Plot graphs • Estimate mean and variance (or g* and tau) • Does this plot teach us about the graphs in terms of how they are distributed and what the central tendency is?
  • 7. Does this plot teach us anything?
  • 8. Graphs are too complex – let’s simplify Network metrics Average connectivity Small world network Species diversity Taxa counts Enterotype
  • 9. Many-to-one mapping is not necessarily a good way to simplify data for analysis
  • 10. Simplifying in fMRI and Microbiome fMRI • Average Node Connectivity • Consider two brain scans – Patient 1 • Right half ANC = 10 • Left half ANC = 0 – Patient 2 • Right half ANC = 5 • Left half ANC = 5 • Both whole brain ANC = 5 Microbiome • Species Diversity • Consider two samples – Patient 1 • Proportion Taxa A, B, C = 1/3 • Proportion Taxa D, E, F = 0 – Patient 2 • Proportion Taxa A, B, C = 0 • Proportion Taxa D, E, F = 1/3 • Both have Simpson diversity = 0.33
  • 11. We analyze graphical data the same way as we analyze columns of data Gibbs distribution • Let G be a finite set of graphs and denote the elements of G by g. Let 𝑑 be an arbitrary distance metric on G. The Gibbs distribution on the graphs G is denoted by ℙ 𝒈; 𝒈∗ , 𝝉 = 𝒄 𝒈∗ , 𝝉 𝒆𝒙𝒑 −𝝉𝒅 𝒈∗ , 𝒈 , ∀𝒈 𝝐 𝐆, with parameters g∗ the central or average graph, and 𝜏 a non-negative number that is a measure of the dispersion of the observed connectome data around g∗ . 𝑐 g∗ , 𝜏 is the normalizing constant. ℙ 𝑔𝑖; g∗ , 𝜏 is the probability of observing a specific graph 𝑔𝑖 given the parameters g∗ , 𝜏 . Statistics on Graphs
  • 12. We analyze graphical data the same way as we analyze columns of data Recursive partitioning • Regress the graphs on covariates • In this example of Parkinson's disease – Y = connectome – X = group, sex, age • RP splits the connectomes into homogeneous groups based on likelihood of Gibbs Statistics on Graphs
  • 13. What else can be analyzed with graphical OODA? IoT Blockchain Cybersecurity
  • 14. What about data which are functional objects? Untargeted Metabolomics • Liquid chromatography and mass spec – LC/MS • RT x m/z plots • Which peaks correspond to metabolites (known or unknown), and which peaks are different in patients who live and die?
  • 15. RT x m/z plots are too complex – let’s simply Looking for things that look different and then testing them statistically is wrong – P values don’t mean anything in these cases.
  • 16. Why not analyze functions using functional OODA?
  • 17. Why not analyze functions using functional OODA?
  • 18. Field Enabling Technology Bioinformatics Exploratory Analysis Translational Statistics Microbiome Next generation Sequencing Assembly, annotation, chimera checking Cluster analysis, multidimensional scaling, heatmaps Dirichlet- multinomial for taxa counts Gibbs distribution for taxonomic trees Brain Imaging Functional MRI (fMRI) Image registration, parcellation Generalized linear models with multiple testing adjustment, graph metrics Gibbs distribution for connectome Metabolomics LC/MS Peak detection, centering Mass univariate testing with multiple testing adjustment Functional data analysis, Gibbs distribution, Co- Inertia, and the Exploratory- Validation Model for experimental design Projects in object oriented data analysis