SlideShare a Scribd company logo
Pipeline for automated structure-based
classification in the ChEBI ontology
Janna Hastings
Coordinator,
Cheminformatics and Metabolism
www.ebi.ac.uk/chebi
ACS Symposium on Chemical Ontologies,
Taxonomies and Schemas. Dallas, 16 March 2014
Chemical Entities of Biological Interest
Freely available
online, available
for download in full
Freely available
online, available
for download in full
Low molecular weight,
i.e. no proteins
Low molecular weight,
i.e. no proteins
Definitions,
relationships,
hierarchy
Definitions,
relationships,
hierarchy
E.g.
metabolites,
drugs,
pesticides
E.g.
metabolites,
drugs,
pesticides
38,215 entries last
release
38,215 entries last
release
What does ChEBI provide?
Chemical structures and
visualisations
caffeine
1,3,7-trimethylxanthine
methyltheobromine
Names and synonyms
Formula: C8H10N4O2
Charge: 0
Mass: 194.19
Chemical data
metabolite
CNS stimulant
trimethylxanthines
Ontology –
classifications
MSDchem: CFF
KEGG DRUG: D00528
PubMed citations
Links to more
information
Chemical Informatics
InChI=1/C8H10N4O2/c1-10-4-9-6-
5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O
Example ChEBI entry page
Example entry page (continued)
Example entry page (continued)
Structure-based classification in ChEBI
Challenges with manual classification
• May be incomplete
• May be inconsistent
• Difficult to maintain (even with extensive use of
computationally expensive automatic validations)
• Blocks automatic loading of otherwise high-quality
externally annotated chemical data into ChEBI
(as no classification available)
SOCO (SMARTS, OWL)
Leonid Chepelev, Michel Dumontier, collaborators
• Given a training set of classified molecules, examine
structures for consensus features across all (using
fragmentation and feature detection)
• Capture features hierarchically
• Use OWL to classify
Chepelev et al. BMC Bioinformatics 2012 13:3 doi:10.1186/1471-2105-13-3
Limitations of SOCO
• No support for negation
• Only “min” (at least) counting supported, not max or
exact. Thus, dicarboxylic acid is_a monocarboxylic acid
(Every two-legged human is also a one-legged human in the sense
that they have at least one leg…)
• SMARTS is powerful – but not very human-readable.
ChEBI is for human biologist and chemist consumption.
E.g. SMARTS for the class of aliphatic amines: [$([NH2][CX4]),$
([NH]([CX4])[CX4]),$[NX3]([CX4])([CX4])[CX4])]
Can we do better at making definitions accessible?
A new pipeline for automated structure-
based ontology classification in ChEBI
Definitions (OWL)
ChEBI structures
OWL Parser =>
logical
cheminformatics
definitions
OWL Parser =>
logical
cheminformatics
definitions
Novel
structure
Candidate
classes
RankingRankingBest classes: save is_a relations
MatchingMatching
Human-readable definitions, mapped to
structures in ChEBI knowledgebase
thiadiazoles:
molecular_entity and has_part
some ( 1,2,3-thiadiazole or 1,2,4-thiadiazole
or 1,2,5-thiadiazole or 1,3,4-thiadiazole )
diterpenoid: organic_molecular_entity and
has_part exactly 2 terpenoid
organic ion: organic_molecular_entity and
( has_charge some int[>0] or has_charge some int[<0] )
monocyclic compound: molecular_entity and
has_cycles value "1"^^int
Logical operatorsLogical operators
Counts (min, max
and exact)
Counts (min, max
and exact)
PropertiesProperties
PartsParts
Planned integration into ChEBI tools
• ChEBI internal data loader and bulk submissions
• ChEBI online submission tool
Pre-population
of matched
classes
Pre-population
of matched
classes
Acknowledgements – Thanks!
ChEBI team:
Christoph Steinbeck
Gareth Owen
Adriano Dekker
Namrata Kale
Steve Turner
Venkatesh Muthukrishnan
Collaborators:
Colin Batchelor, RSC
Lian Duan, ETH
Leonid Chepelev, Ottawa
Michel Dumontier, Stanford
Despoina Magka, Oxford
Ilinca Tudose and John May, EBI
Funding:
BBSRC “Continued
development of ChEBI towards
better usability for the systems
biology and metabolic
modelling communities”
BB/K019783/1
Questions?
Thank you for listening!
chebi-help@ebi.ac.uk
ACS Symposium on Chemical Ontologies,
Taxonomies and Schemas. Dallas, 16 March 2014

More Related Content

Similar to Pipeline for automated structure-based classification in the ChEBI ontology (20)

Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
Duncan Hull
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
Abner Huang
 
II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network" II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network"
Dr. Haxel Consult
 
PhDc exam presentation
PhDc exam presentationPhDc exam presentation
PhDc exam presentation
Carlos Manuel Estévez-Bretón Riveros
 
protein.pptx
protein.pptxprotein.pptx
protein.pptx
MohamedHasan816582
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
Meetika Gupta
 
Protein databases
Protein databasesProtein databases
Protein databases
sarumalay
 
Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)
BIOVIA
 
Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
Neil Swainston
 
cath-171102055313.pptx
cath-171102055313.pptxcath-171102055313.pptx
cath-171102055313.pptx
MuhammadAli732496
 
Automatic vs manual curation of a multisource chemical dictionary
Automatic vs manual curation of a multisource chemical dictionaryAutomatic vs manual curation of a multisource chemical dictionary
Automatic vs manual curation of a multisource chemical dictionary
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Metabolite Set Enrichment Analysis (ChemRICH)
Metabolite Set Enrichment Analysis (ChemRICH)Metabolite Set Enrichment Analysis (ChemRICH)
Metabolite Set Enrichment Analysis (ChemRICH)
Dinesh Barupal
 
SCOP_Advanced Computational Biology.pptx
SCOP_Advanced Computational Biology.pptxSCOP_Advanced Computational Biology.pptx
SCOP_Advanced Computational Biology.pptx
KrishnaKashyap38
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
Chris Southan
 
Pep Talk San Diego 011311
Pep Talk San Diego 011311Pep Talk San Diego 011311
Pep Talk San Diego 011311
Philip Bourne
 
Protein database
Protein databaseProtein database
Protein database
Rajpal Choudhary
 
Implications of structural and chemical data bases
Implications of structural and chemical data basesImplications of structural and chemical data bases
Implications of structural and chemical data bases
Bhavitha Pulaparthi
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
DrGopaSarma
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
Sachin Kumar
 
Inorganic Substances Handbook Pierre Villars Karin Cenzual
Inorganic Substances Handbook Pierre Villars Karin CenzualInorganic Substances Handbook Pierre Villars Karin Cenzual
Inorganic Substances Handbook Pierre Villars Karin Cenzual
ghaidscheib
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
Duncan Hull
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
Abner Huang
 
II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network" II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network"
Dr. Haxel Consult
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
Meetika Gupta
 
Protein databases
Protein databasesProtein databases
Protein databases
sarumalay
 
Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)
BIOVIA
 
Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
Neil Swainston
 
Metabolite Set Enrichment Analysis (ChemRICH)
Metabolite Set Enrichment Analysis (ChemRICH)Metabolite Set Enrichment Analysis (ChemRICH)
Metabolite Set Enrichment Analysis (ChemRICH)
Dinesh Barupal
 
SCOP_Advanced Computational Biology.pptx
SCOP_Advanced Computational Biology.pptxSCOP_Advanced Computational Biology.pptx
SCOP_Advanced Computational Biology.pptx
KrishnaKashyap38
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
Chris Southan
 
Pep Talk San Diego 011311
Pep Talk San Diego 011311Pep Talk San Diego 011311
Pep Talk San Diego 011311
Philip Bourne
 
Implications of structural and chemical data bases
Implications of structural and chemical data basesImplications of structural and chemical data bases
Implications of structural and chemical data bases
Bhavitha Pulaparthi
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
DrGopaSarma
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
Sachin Kumar
 
Inorganic Substances Handbook Pierre Villars Karin Cenzual
Inorganic Substances Handbook Pierre Villars Karin CenzualInorganic Substances Handbook Pierre Villars Karin Cenzual
Inorganic Substances Handbook Pierre Villars Karin Cenzual
ghaidscheib
 

More from Janna Hastings (20)

Using ChEBI to explore the underlying biology in metabolomics studies
Using ChEBI to explore the underlying biology in metabolomics studiesUsing ChEBI to explore the underlying biology in metabolomics studies
Using ChEBI to explore the underlying biology in metabolomics studies
Janna Hastings
 
Chemical classification for the Semantic Web
Chemical classification for the Semantic WebChemical classification for the Semantic Web
Chemical classification for the Semantic Web
Janna Hastings
 
Emotion Ontology and Affective Neuroscience
Emotion Ontology and Affective NeuroscienceEmotion Ontology and Affective Neuroscience
Emotion Ontology and Affective Neuroscience
Janna Hastings
 
Waves and fields in bio-ontologies
Waves and fields in bio-ontologiesWaves and fields in bio-ontologies
Waves and fields in bio-ontologies
Janna Hastings
 
Representing addiction in Mental Functioning and Disease ontologies
Representing addiction in Mental Functioning and Disease ontologiesRepresenting addiction in Mental Functioning and Disease ontologies
Representing addiction in Mental Functioning and Disease ontologies
Janna Hastings
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
Janna Hastings
 
Mental functioning ontology for interdisciplinary research into mental diseas...
Mental functioning ontology for interdisciplinary research into mental diseas...Mental functioning ontology for interdisciplinary research into mental diseas...
Mental functioning ontology for interdisciplinary research into mental diseas...
Janna Hastings
 
From chemicals to minds: Integrated ontologies in the search for scientific u...
From chemicals to minds: Integrated ontologies in the search for scientific u...From chemicals to minds: Integrated ontologies in the search for scientific u...
From chemicals to minds: Integrated ontologies in the search for scientific u...
Janna Hastings
 
Modularity requirements in bio-ontologies: a case study of ChEBI
Modularity requirements in bio-ontologies: a case study of ChEBIModularity requirements in bio-ontologies: a case study of ChEBI
Modularity requirements in bio-ontologies: a case study of ChEBI
Janna Hastings
 
The SHAPES workshop, and Holes in living beings
The SHAPES workshop, and Holes in living beings The SHAPES workshop, and Holes in living beings
The SHAPES workshop, and Holes in living beings
Janna Hastings
 
A chemical view into biological systems
A chemical view into biological systemsA chemical view into biological systems
A chemical view into biological systems
Janna Hastings
 
Chemical diagrams and the IAO
Chemical diagrams and the IAOChemical diagrams and the IAO
Chemical diagrams and the IAO
Janna Hastings
 
The emotion ontology: enabling interdisciplinary research in the affective sc...
The emotion ontology: enabling interdisciplinary research in the affective sc...The emotion ontology: enabling interdisciplinary research in the affective sc...
The emotion ontology: enabling interdisciplinary research in the affective sc...
Janna Hastings
 
Hyperontology for the biomedical ontologist
Hyperontology for the biomedical ontologistHyperontology for the biomedical ontologist
Hyperontology for the biomedical ontologist
Janna Hastings
 
Using multiple ontologies to characterise the bioactivity of small molecules
Using multiple ontologies to characterise the bioactivity of small moleculesUsing multiple ontologies to characterise the bioactivity of small molecules
Using multiple ontologies to characterise the bioactivity of small molecules
Janna Hastings
 
Processes and Properties
Processes and PropertiesProcesses and Properties
Processes and Properties
Janna Hastings
 
Representing sequences of parts in processes using OWL
Representing sequences of parts in processes using OWLRepresenting sequences of parts in processes using OWL
Representing sequences of parts in processes using OWL
Janna Hastings
 
Modelling metabolite concentrations in OWL using Pronto
Modelling metabolite concentrations in OWL using ProntoModelling metabolite concentrations in OWL using Pronto
Modelling metabolite concentrations in OWL using Pronto
Janna Hastings
 
Chemical ontologies: what are they, what are they for, and what are the chall...
Chemical ontologies: what are they, what are they for, and what are the chall...Chemical ontologies: what are they, what are they for, and what are the chall...
Chemical ontologies: what are they, what are they for, and what are the chall...
Janna Hastings
 
Ontological dependence, dispositions and institutional reality in chemistry
Ontological dependence, dispositions and institutional reality in chemistryOntological dependence, dispositions and institutional reality in chemistry
Ontological dependence, dispositions and institutional reality in chemistry
Janna Hastings
 
Using ChEBI to explore the underlying biology in metabolomics studies
Using ChEBI to explore the underlying biology in metabolomics studiesUsing ChEBI to explore the underlying biology in metabolomics studies
Using ChEBI to explore the underlying biology in metabolomics studies
Janna Hastings
 
Chemical classification for the Semantic Web
Chemical classification for the Semantic WebChemical classification for the Semantic Web
Chemical classification for the Semantic Web
Janna Hastings
 
Emotion Ontology and Affective Neuroscience
Emotion Ontology and Affective NeuroscienceEmotion Ontology and Affective Neuroscience
Emotion Ontology and Affective Neuroscience
Janna Hastings
 
Waves and fields in bio-ontologies
Waves and fields in bio-ontologiesWaves and fields in bio-ontologies
Waves and fields in bio-ontologies
Janna Hastings
 
Representing addiction in Mental Functioning and Disease ontologies
Representing addiction in Mental Functioning and Disease ontologiesRepresenting addiction in Mental Functioning and Disease ontologies
Representing addiction in Mental Functioning and Disease ontologies
Janna Hastings
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
Janna Hastings
 
Mental functioning ontology for interdisciplinary research into mental diseas...
Mental functioning ontology for interdisciplinary research into mental diseas...Mental functioning ontology for interdisciplinary research into mental diseas...
Mental functioning ontology for interdisciplinary research into mental diseas...
Janna Hastings
 
From chemicals to minds: Integrated ontologies in the search for scientific u...
From chemicals to minds: Integrated ontologies in the search for scientific u...From chemicals to minds: Integrated ontologies in the search for scientific u...
From chemicals to minds: Integrated ontologies in the search for scientific u...
Janna Hastings
 
Modularity requirements in bio-ontologies: a case study of ChEBI
Modularity requirements in bio-ontologies: a case study of ChEBIModularity requirements in bio-ontologies: a case study of ChEBI
Modularity requirements in bio-ontologies: a case study of ChEBI
Janna Hastings
 
The SHAPES workshop, and Holes in living beings
The SHAPES workshop, and Holes in living beings The SHAPES workshop, and Holes in living beings
The SHAPES workshop, and Holes in living beings
Janna Hastings
 
A chemical view into biological systems
A chemical view into biological systemsA chemical view into biological systems
A chemical view into biological systems
Janna Hastings
 
Chemical diagrams and the IAO
Chemical diagrams and the IAOChemical diagrams and the IAO
Chemical diagrams and the IAO
Janna Hastings
 
The emotion ontology: enabling interdisciplinary research in the affective sc...
The emotion ontology: enabling interdisciplinary research in the affective sc...The emotion ontology: enabling interdisciplinary research in the affective sc...
The emotion ontology: enabling interdisciplinary research in the affective sc...
Janna Hastings
 
Hyperontology for the biomedical ontologist
Hyperontology for the biomedical ontologistHyperontology for the biomedical ontologist
Hyperontology for the biomedical ontologist
Janna Hastings
 
Using multiple ontologies to characterise the bioactivity of small molecules
Using multiple ontologies to characterise the bioactivity of small moleculesUsing multiple ontologies to characterise the bioactivity of small molecules
Using multiple ontologies to characterise the bioactivity of small molecules
Janna Hastings
 
Processes and Properties
Processes and PropertiesProcesses and Properties
Processes and Properties
Janna Hastings
 
Representing sequences of parts in processes using OWL
Representing sequences of parts in processes using OWLRepresenting sequences of parts in processes using OWL
Representing sequences of parts in processes using OWL
Janna Hastings
 
Modelling metabolite concentrations in OWL using Pronto
Modelling metabolite concentrations in OWL using ProntoModelling metabolite concentrations in OWL using Pronto
Modelling metabolite concentrations in OWL using Pronto
Janna Hastings
 
Chemical ontologies: what are they, what are they for, and what are the chall...
Chemical ontologies: what are they, what are they for, and what are the chall...Chemical ontologies: what are they, what are they for, and what are the chall...
Chemical ontologies: what are they, what are they for, and what are the chall...
Janna Hastings
 
Ontological dependence, dispositions and institutional reality in chemistry
Ontological dependence, dispositions and institutional reality in chemistryOntological dependence, dispositions and institutional reality in chemistry
Ontological dependence, dispositions and institutional reality in chemistry
Janna Hastings
 
Ad

Recently uploaded (20)

cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdfcnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
If You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FMEIf You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FME
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven InfrastructureNo-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdfArtificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI ProfessionalOracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdfEdge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
Ben Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding WorldBen Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding World
AWS Chicago
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely
 
Introduction to Typescript - GDG On Campus EUE
Introduction to Typescript - GDG On Campus EUEIntroduction to Typescript - GDG On Campus EUE
Introduction to Typescript - GDG On Campus EUE
Google Developer Group On Campus European Universities in Egypt
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME FlowProviding an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FMEEnabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free DownloadViral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Trends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary MeekerTrends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary Meeker
Clive Dickens
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent IntegrationPyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdfcnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
If You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FMEIf You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FME
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven InfrastructureNo-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdfArtificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI ProfessionalOracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdfEdge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
Ben Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding WorldBen Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding World
AWS Chicago
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME FlowProviding an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FMEEnabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free DownloadViral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Trends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary MeekerTrends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary Meeker
Clive Dickens
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent IntegrationPyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 
Ad

Pipeline for automated structure-based classification in the ChEBI ontology

  • 1. Pipeline for automated structure-based classification in the ChEBI ontology Janna Hastings Coordinator, Cheminformatics and Metabolism www.ebi.ac.uk/chebi ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014
  • 2. Chemical Entities of Biological Interest Freely available online, available for download in full Freely available online, available for download in full Low molecular weight, i.e. no proteins Low molecular weight, i.e. no proteins Definitions, relationships, hierarchy Definitions, relationships, hierarchy E.g. metabolites, drugs, pesticides E.g. metabolites, drugs, pesticides 38,215 entries last release 38,215 entries last release
  • 3. What does ChEBI provide? Chemical structures and visualisations caffeine 1,3,7-trimethylxanthine methyltheobromine Names and synonyms Formula: C8H10N4O2 Charge: 0 Mass: 194.19 Chemical data metabolite CNS stimulant trimethylxanthines Ontology – classifications MSDchem: CFF KEGG DRUG: D00528 PubMed citations Links to more information Chemical Informatics InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O
  • 5. Example entry page (continued)
  • 6. Example entry page (continued)
  • 8. Challenges with manual classification • May be incomplete • May be inconsistent • Difficult to maintain (even with extensive use of computationally expensive automatic validations) • Blocks automatic loading of otherwise high-quality externally annotated chemical data into ChEBI (as no classification available)
  • 9. SOCO (SMARTS, OWL) Leonid Chepelev, Michel Dumontier, collaborators • Given a training set of classified molecules, examine structures for consensus features across all (using fragmentation and feature detection) • Capture features hierarchically • Use OWL to classify Chepelev et al. BMC Bioinformatics 2012 13:3 doi:10.1186/1471-2105-13-3
  • 10. Limitations of SOCO • No support for negation • Only “min” (at least) counting supported, not max or exact. Thus, dicarboxylic acid is_a monocarboxylic acid (Every two-legged human is also a one-legged human in the sense that they have at least one leg…) • SMARTS is powerful – but not very human-readable. ChEBI is for human biologist and chemist consumption. E.g. SMARTS for the class of aliphatic amines: [$([NH2][CX4]),$ ([NH]([CX4])[CX4]),$[NX3]([CX4])([CX4])[CX4])] Can we do better at making definitions accessible?
  • 11. A new pipeline for automated structure- based ontology classification in ChEBI Definitions (OWL) ChEBI structures OWL Parser => logical cheminformatics definitions OWL Parser => logical cheminformatics definitions Novel structure Candidate classes RankingRankingBest classes: save is_a relations MatchingMatching
  • 12. Human-readable definitions, mapped to structures in ChEBI knowledgebase thiadiazoles: molecular_entity and has_part some ( 1,2,3-thiadiazole or 1,2,4-thiadiazole or 1,2,5-thiadiazole or 1,3,4-thiadiazole ) diterpenoid: organic_molecular_entity and has_part exactly 2 terpenoid organic ion: organic_molecular_entity and ( has_charge some int[>0] or has_charge some int[<0] ) monocyclic compound: molecular_entity and has_cycles value "1"^^int Logical operatorsLogical operators Counts (min, max and exact) Counts (min, max and exact) PropertiesProperties PartsParts
  • 13. Planned integration into ChEBI tools • ChEBI internal data loader and bulk submissions • ChEBI online submission tool Pre-population of matched classes Pre-population of matched classes
  • 14. Acknowledgements – Thanks! ChEBI team: Christoph Steinbeck Gareth Owen Adriano Dekker Namrata Kale Steve Turner Venkatesh Muthukrishnan Collaborators: Colin Batchelor, RSC Lian Duan, ETH Leonid Chepelev, Ottawa Michel Dumontier, Stanford Despoina Magka, Oxford Ilinca Tudose and John May, EBI Funding: BBSRC “Continued development of ChEBI towards better usability for the systems biology and metabolic modelling communities” BB/K019783/1
  • 15. Questions? Thank you for listening! [email protected] ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014

Editor's Notes

  • #2: ChEBI is a database and ontology of chemical entities of biological interest. As of October 2013, it contains more than 35,000 entries, organised into a structure-based and role-based classification hierarchy. Each entry is extensively annotated with a name, definition and synonyms, other metadata such as cross-references, and chemical structure information where appropriate. In addition to the classification hierarchy, the ontology also contains diverse chemical and ontological relationships. While ChEBI is primarily manually maintained, recent developments have focused on improvements in curation through partial automation of common tasks. We will describe a pipeline we have developed for structure-based classification of chemicals into the ChEBI structural classification. The pipeline connects class-level structural knowledge encoded in Web Ontology Language (OWL) axioms as an extension to the ontology, and structural information specified in standard MOLfiles. We make use of the Chemistry Development Kit, the OWL API and the OWLTools library. Harnessing the pipeline, we are able to suggest the best structural classes for the classification of novel structures within the ChEBI ontology.