SlideShare a Scribd company logo
Cancer Genomics
Data Pipelines
Lynn & Samantha Langit
CSIRO Bioinformatics / Australia
June 2017 - Oslo
3 Billion data points per patient DNA sample
Up to 25% of the population could be sequenced by 2025
Two Perspectives
Bioinformatics
Research
• Insight
• Reproducibility
Cloud
Architecture
• Speed
• Low Cost
• Simplicity
Cloud Data Pipeline Pattern
Problem
• Define business
problem
Data
• Quality
• Quantity
Candidate
Technologies
• Ingest
• ETL
• Biz Analytics
• ML
• Visualization
Build MVPs
• Iterate
• Learn
Assemble
Pipeline
• Validate each
section
• Test at scale
Bioinformatics Data Pipelines built by CSIRO on AWS
Genomic Sequencing Results
CRISPR-Cas9 for molecular engineering technology
enables the accurate editing of genomes for researchers.
It…
 Pattern-matching unique sequences of DNA
 Huge demand for large-scale computation
 Time-critical dimension to compute
 NIH-approved for human health
 Could revolutionize cancer treatments
Serverless Lambda
Architecture Pattern
Lambda
function
1
Lambda
function
2
Lambda
function
3
buckets with
objects DynamoDB
API Gateway Users
CSIRO: Commonwealth Scientific & Industrial Research Organization
GT-Scan2
Demo
GT-Scan2
Bioinformatics Data Pipelines built by CSIRO on AWS
Scale Genomic Analysis
GWAS = genome-wide sequencing data association
studies
 Analysis on large cohort data or imputed SNP array data
 Clustering on genomic profiles to stratify large-cohort
genomic data
 Viewing datasets with millions of features
Cloud Data Pipeline Pattern
Problem
• Define business
problem
Data
• Quality
• Quantity
Candidate
Technologies
• Ingest
• ETL
• Biz Analytics
• ML
• Visualization
Build MVPs
• Iterate
• Learn
Assemble
Pipeline
• Validate each
section
• Test at scale
Genomics (ML) Pipeline Pattern
What is CSIRO’s solution?
For Scale at
reasonable cost Use Apache Hadoop
For Scale at
speed Use Apache Spark for Hadoop
For Usability
in
bioinformatics
Create a domain-specific API (OSS library)
For global use
Leverage Cloud Pipeline Patterns
GWAS Analysis with Variant-Spark
On premise Hadoop Cluster
with Apache Spark
Genomics Analysts
corporate data center
What is Apache Spark?
What is variant-spark?
Demo
80% faster than ADAM
90% faster than R
90% faster than Python
VariantSpark
Uses Apache Spark to massively parallelize the generation of
random forests to identify disease genes efficiently
 Analyzes 3,000 samples with 80 million features in < 30 minutes
 Enables real-time diagnosis by finding similar patients
 Contributes to motor neuron disease (ALS) research in Australia
Data
Prep
Statistics
Probabilistic
Algorithms
Data Viz
Machine
Learning…
Spark ML Classification Algorithms
Wide Random Forest Ensemble
of Decision Trees
Logistic Regression
variant-spark other libraries
OSS Library variant-spark for all
 usable? performant?
 extendable? (clean code)
 using the best language
(Scala)?
 using the ‘best version’ of
Spark?
 using a version of wide
random forests that is
understandable?
Is it…
How best to Deploy Cloud Hadoop?
• IaaS
 EC2 instances with Apache Hadoop, Apache Spark, more…
• PaaS
 Elastic Map Reduce (EMR) Hadoop cluster
• SaaS
 Vendor-managed, i.e. DataBricks w/Jupyter Notebooks
What is Databricks?
Bioinformatics Data Pipelines built by CSIRO on AWS
DEMO: Jupyter Notebooks
Variant-Spark and
Databricks
Demo
Solving
Important
Questions…
Cancer Genomics?
DEMO: Who is a Hipster?
 AWS EC2 Spot Instances
GWAS Analysis with Variant-Spark
EC2 Hadoop Cluster with Apache Spark
Genomics Analysts
Availability Zone
1000 Genomes
GWAS input
Spot EC2 Hadoop
worker instances
EC2 Hadoop
instances
Cloud Data Pipeline Pattern
Problem Data
Candidate
Technologies
Build MVPs
Assemble
Pipeline
Analyze GWAS -> S3/Hadoop Ingest
ETL
Analyze
Viz
S3 -> Databricks DBFS
Apache Spark
Variant-Spark ML
Notebook SQL, R or Python
SaaS
Bioinformatics Data Pipelines built by CSIRO on AWS
Cloud Data Pipeline Pattern
Problem Data
Candidate
Technologies
Build MVPs
Assemble
Pipeline
1. Scan vcf -> S3/DynamoDB Ingest
ETL
Analyze
Viz
S3
Lambda
Lambda
Lambda/API Gateway
Serverless
2. Analyze GWAS -> S3/Hadoop Ingest
ETL
Analyze
Viz
S3 -> Databricks DBFS
Apache Spark
Variant-Spark ML
Notebook SQL, R or Python
SaaS
Modern Big Data Pipelines
• Problem #1 - Scan
• Solution: Serverless Cloud Pipeline
• Problem # 2 - Analyze
• Solution: SaaS Cloud ML Pipeline
Cancer Genomics
Data Pipelines
Lynn & Samantha Langit
CSIRO Bioinformatics & variant-spark
June 2017 - Oslo
Ad

Recommended

Genome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
Lynn Langit
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
Lynn Langit
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
Lynn Langit
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
Ian Foster
 
Reusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize Agriculture
David LeBauer
 
Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the Cloud
Jamie Kinney
 
Big data at experimental facilities
Big data at experimental facilities
Ian Foster
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Globus
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
Data Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
Larry Smarr
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 
Coding the Continuum
Coding the Continuum
Ian Foster
 
2014 moore-ddd
2014 moore-ddd
c.titus.brown
 
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Microsoft Azure for Research
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
Larry Smarr
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
PacificResearchPlatform
 
Big data ecosystem
Big data ecosystem
SlideCentral
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
noho
 
Cloud Accelerated Genomics
Cloud Accelerated Genomics
Idan Tohami
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
Anubhav Jain
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
Ian Foster
 
DuraMat Data Management and Analytics
DuraMat Data Management and Analytics
Anubhav Jain
 
Research workflow - 4 June 2018
Research workflow - 4 June 2018
Zachary Labe
 
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
ATMOSPHERE .
 
Sgg crest-presentation-final
Sgg crest-presentation-final
marpierc
 
DCSF 19 Towards Reproducable Climate Research
DCSF 19 Towards Reproducable Climate Research
Docker, Inc.
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
Andy Petrella
 
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Shadab Ali Khan
 

More Related Content

What's hot (20)

Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
Data Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
Larry Smarr
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 
Coding the Continuum
Coding the Continuum
Ian Foster
 
2014 moore-ddd
2014 moore-ddd
c.titus.brown
 
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Microsoft Azure for Research
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
Larry Smarr
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
PacificResearchPlatform
 
Big data ecosystem
Big data ecosystem
SlideCentral
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
noho
 
Cloud Accelerated Genomics
Cloud Accelerated Genomics
Idan Tohami
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
Anubhav Jain
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
Ian Foster
 
DuraMat Data Management and Analytics
DuraMat Data Management and Analytics
Anubhav Jain
 
Research workflow - 4 June 2018
Research workflow - 4 June 2018
Zachary Labe
 
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
ATMOSPHERE .
 
Sgg crest-presentation-final
Sgg crest-presentation-final
marpierc
 
DCSF 19 Towards Reproducable Climate Research
DCSF 19 Towards Reproducable Climate Research
Docker, Inc.
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
Data Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
Larry Smarr
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 
Coding the Continuum
Coding the Continuum
Ian Foster
 
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Microsoft Azure for Research
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
Larry Smarr
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
PacificResearchPlatform
 
Big data ecosystem
Big data ecosystem
SlideCentral
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
noho
 
Cloud Accelerated Genomics
Cloud Accelerated Genomics
Idan Tohami
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
Anubhav Jain
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
Ian Foster
 
DuraMat Data Management and Analytics
DuraMat Data Management and Analytics
Anubhav Jain
 
Research workflow - 4 June 2018
Research workflow - 4 June 2018
Zachary Labe
 
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
ATMOSPHERE .
 
Sgg crest-presentation-final
Sgg crest-presentation-final
marpierc
 
DCSF 19 Towards Reproducable Climate Research
DCSF 19 Towards Reproducable Climate Research
Docker, Inc.
 

Similar to Bioinformatics Data Pipelines built by CSIRO on AWS (20)

Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
Andy Petrella
 
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Shadab Ali Khan
 
Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017
delagoya
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
Uri Laserson
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
Uri Laserson
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
Ola Spjuth
 
11-Big Data Application in Biomedical Research and Health Care.pptx
11-Big Data Application in Biomedical Research and Health Care.pptx
shikhamittal42
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Geoffrey Fox
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
Andy Petrella
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
Vijay Srinivas Agneeswaran, Ph.D
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
Databricks
 
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Paolo Missier
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
Turi, Inc.
 
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
Databricks
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
Open Networking Summit
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
Chris Dwan
 
Big data analysing genomics and the bdg project
Big data analysing genomics and the bdg project
sree navya
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
Guy Coates
 
Big data week London Big data pipelining 0.2
Big data week London Big data pipelining 0.2
Simon Ambridge
 
Cloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences Research
InterpretOmics
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
Andy Petrella
 
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Shadab Ali Khan
 
Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017
delagoya
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
Uri Laserson
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
Uri Laserson
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
Ola Spjuth
 
11-Big Data Application in Biomedical Research and Health Care.pptx
11-Big Data Application in Biomedical Research and Health Care.pptx
shikhamittal42
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Geoffrey Fox
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
Andy Petrella
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
Databricks
 
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Paolo Missier
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
Turi, Inc.
 
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
Databricks
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
Open Networking Summit
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
Chris Dwan
 
Big data analysing genomics and the bdg project
Big data analysing genomics and the bdg project
sree navya
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
Guy Coates
 
Big data week London Big data pipelining 0.2
Big data week London Big data pipelining 0.2
Simon Ambridge
 
Cloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences Research
InterpretOmics
 
Ad

More from Lynn Langit (20)

VariantSpark on AWS
VariantSpark on AWS
Lynn Langit
 
Serverless Architectures
Serverless Architectures
Lynn Langit
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming
Lynn Langit
 
Blastn plus jupyter on Docker
Blastn plus jupyter on Docker
Lynn Langit
 
Testing in Ballerina Language
Testing in Ballerina Language
Lynn Langit
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa Skills
Lynn Langit
 
Practical cloud
Practical cloud
Lynn Langit
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examples
Lynn Langit
 
Teaching Kids Programming
Teaching Kids Programming
Lynn Langit
 
Practical Cloud
Practical Cloud
Lynn Langit
 
Serverless Reality
Serverless Reality
Lynn Langit
 
Serverless Reality
Serverless Reality
Lynn Langit
 
Beyond Relational
Beyond Relational
Lynn Langit
 
New AWS Services for Bioinformatics
New AWS Services for Bioinformatics
Lynn Langit
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
Lynn Langit
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
Lynn Langit
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud Platform
Lynn Langit
 
Redis Labs and SQL Server
Redis Labs and SQL Server
Lynn Langit
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Lynn Langit
 
What is 'Teaching Kids Programming'
What is 'Teaching Kids Programming'
Lynn Langit
 
VariantSpark on AWS
VariantSpark on AWS
Lynn Langit
 
Serverless Architectures
Serverless Architectures
Lynn Langit
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming
Lynn Langit
 
Blastn plus jupyter on Docker
Blastn plus jupyter on Docker
Lynn Langit
 
Testing in Ballerina Language
Testing in Ballerina Language
Lynn Langit
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa Skills
Lynn Langit
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examples
Lynn Langit
 
Teaching Kids Programming
Teaching Kids Programming
Lynn Langit
 
Serverless Reality
Serverless Reality
Lynn Langit
 
Serverless Reality
Serverless Reality
Lynn Langit
 
Beyond Relational
Beyond Relational
Lynn Langit
 
New AWS Services for Bioinformatics
New AWS Services for Bioinformatics
Lynn Langit
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
Lynn Langit
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
Lynn Langit
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud Platform
Lynn Langit
 
Redis Labs and SQL Server
Redis Labs and SQL Server
Lynn Langit
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Lynn Langit
 
What is 'Teaching Kids Programming'
What is 'Teaching Kids Programming'
Lynn Langit
 
Ad

Recently uploaded (20)

The scientific heritage No 162 (162) (2025)
The scientific heritage No 162 (162) (2025)
The scientific heritage
 
1-SEAFLOOR-SPREADINGGGGGGGGGGGGGGGGGGGG.pptx
1-SEAFLOOR-SPREADINGGGGGGGGGGGGGGGGGGGG.pptx
JohnCristoffMendoza
 
Gas Exchange in Insects and structures 01
Gas Exchange in Insects and structures 01
PhoebeAkinyi1
 
Science 8 Quarter 4 first quiz digestive system.docx
Science 8 Quarter 4 first quiz digestive system.docx
junefermunez
 
Science Holiday Homework (interesting slide )
Science Holiday Homework (interesting slide )
aryanxkohli88
 
Overview of Stem Cells and Immune Modulation.ppsx
Overview of Stem Cells and Immune Modulation.ppsx
AhmedAtwa29
 
Paired Sketching of Distributed User Interfaces:Workflow, Protocol, Software ...
Paired Sketching of Distributed User Interfaces:Workflow, Protocol, Software ...
Jean Vanderdonckt
 
Study of Appropriate Information Combination in Image-based Obfuscated Malwar...
Study of Appropriate Information Combination in Image-based Obfuscated Malwar...
takahashi34
 
Lesson 1 in Earth and Life Science .pptx
Lesson 1 in Earth and Life Science .pptx
KizzelLanada2
 
MEDICINAL CHEMISTRY PROSPECTIVES IN DESIGN OF EGFR INHIBITORS.pptx
MEDICINAL CHEMISTRY PROSPECTIVES IN DESIGN OF EGFR INHIBITORS.pptx
40RevathiP
 
GBSN__Unit 2 - Control of Microorganisms
GBSN__Unit 2 - Control of Microorganisms
Areesha Ahmad
 
Enzyme Kinetics_Lecture 8.5.2025 Enzymology.pdf
Enzyme Kinetics_Lecture 8.5.2025 Enzymology.pdf
ayeshaalibukhari125
 
plant and animal nutrition..........pptx
plant and animal nutrition..........pptx
mayflorgalleno
 
Properties of Gases siwhdhadpaldndn.pptx
Properties of Gases siwhdhadpaldndn.pptx
CatherineJadeBurce
 
242680824006, 09, 02 suoac guideline nd bacpac
242680824006, 09, 02 suoac guideline nd bacpac
091HarshikaModi
 
Science 10 1.3 Mountain Belts in the Philippines.pptx
Science 10 1.3 Mountain Belts in the Philippines.pptx
ClaireMangundayao1
 
What is Skeleton system.pptx by aahil sir
What is Skeleton system.pptx by aahil sir
bhatbashir421
 
HERBAL INGREDIENTS USED IN ORAL CARE.pptx
HERBAL INGREDIENTS USED IN ORAL CARE.pptx
Vidhi889356
 
How Psychology Can Power Product Decisions: A Human-Centered Blueprint- Shray...
How Psychology Can Power Product Decisions: A Human-Centered Blueprint- Shray...
ShrayasiRoy2
 
GBSN_Unit 3 - Medical and surgical Asepsis
GBSN_Unit 3 - Medical and surgical Asepsis
Areesha Ahmad
 
The scientific heritage No 162 (162) (2025)
The scientific heritage No 162 (162) (2025)
The scientific heritage
 
1-SEAFLOOR-SPREADINGGGGGGGGGGGGGGGGGGGG.pptx
1-SEAFLOOR-SPREADINGGGGGGGGGGGGGGGGGGGG.pptx
JohnCristoffMendoza
 
Gas Exchange in Insects and structures 01
Gas Exchange in Insects and structures 01
PhoebeAkinyi1
 
Science 8 Quarter 4 first quiz digestive system.docx
Science 8 Quarter 4 first quiz digestive system.docx
junefermunez
 
Science Holiday Homework (interesting slide )
Science Holiday Homework (interesting slide )
aryanxkohli88
 
Overview of Stem Cells and Immune Modulation.ppsx
Overview of Stem Cells and Immune Modulation.ppsx
AhmedAtwa29
 
Paired Sketching of Distributed User Interfaces:Workflow, Protocol, Software ...
Paired Sketching of Distributed User Interfaces:Workflow, Protocol, Software ...
Jean Vanderdonckt
 
Study of Appropriate Information Combination in Image-based Obfuscated Malwar...
Study of Appropriate Information Combination in Image-based Obfuscated Malwar...
takahashi34
 
Lesson 1 in Earth and Life Science .pptx
Lesson 1 in Earth and Life Science .pptx
KizzelLanada2
 
MEDICINAL CHEMISTRY PROSPECTIVES IN DESIGN OF EGFR INHIBITORS.pptx
MEDICINAL CHEMISTRY PROSPECTIVES IN DESIGN OF EGFR INHIBITORS.pptx
40RevathiP
 
GBSN__Unit 2 - Control of Microorganisms
GBSN__Unit 2 - Control of Microorganisms
Areesha Ahmad
 
Enzyme Kinetics_Lecture 8.5.2025 Enzymology.pdf
Enzyme Kinetics_Lecture 8.5.2025 Enzymology.pdf
ayeshaalibukhari125
 
plant and animal nutrition..........pptx
plant and animal nutrition..........pptx
mayflorgalleno
 
Properties of Gases siwhdhadpaldndn.pptx
Properties of Gases siwhdhadpaldndn.pptx
CatherineJadeBurce
 
242680824006, 09, 02 suoac guideline nd bacpac
242680824006, 09, 02 suoac guideline nd bacpac
091HarshikaModi
 
Science 10 1.3 Mountain Belts in the Philippines.pptx
Science 10 1.3 Mountain Belts in the Philippines.pptx
ClaireMangundayao1
 
What is Skeleton system.pptx by aahil sir
What is Skeleton system.pptx by aahil sir
bhatbashir421
 
HERBAL INGREDIENTS USED IN ORAL CARE.pptx
HERBAL INGREDIENTS USED IN ORAL CARE.pptx
Vidhi889356
 
How Psychology Can Power Product Decisions: A Human-Centered Blueprint- Shray...
How Psychology Can Power Product Decisions: A Human-Centered Blueprint- Shray...
ShrayasiRoy2
 
GBSN_Unit 3 - Medical and surgical Asepsis
GBSN_Unit 3 - Medical and surgical Asepsis
Areesha Ahmad
 

Bioinformatics Data Pipelines built by CSIRO on AWS

Editor's Notes

  • #7: http://www.nature.com/news/first-crispr-clinical-trial-gets-green-light-from-us-panel-1.20137
  • #9: http://bioinformatics.csiro.au/ and https://www.csiro.au/en/Locations/NSW/North-Ryde
  • #10: https://www.gt-scan.net/ --AND- AMA with Dr, Bauer -- https://www.reddit.com/r/science/comments/5fiicm/science_ama_series_im_denis_bauer_a_team_leader/
  • #11: https://aws.amazon.com/blogs/aws/genome-engineering-applications-early-adopters-of-the-cloud/
  • #18: https://github.com/csirobigdata/variant-spark
  • #22: https://en.wikipedia.org/wiki/Random_forest --and-- https://spark.apache.org/docs/1.6.2/ml-classification-regression.html
  • #26: https://databricks.com/
  • #35: https://aws.amazon.com/blogs/aws/genome-engineering-applications-early-adopters-of-the-cloud/
  • #38: https://github.com/csirobigdata/variant-spark