SlideShare a Scribd company logo
Progress in Natural Language Processing 
of Materials Science Text
Elsa Olivetti, MIT   Gerbrand Ceder, UC Berkeley 
Departments of Materials Science & Engineering
Andrew McCallum, UMass Amherst
Department of Computer Science & Engineering
1
Motivation: data is a key ingredient in machine 
learning for materials
Text is unstructured and semi‐structured data 
described by free‐flowing natural language that is not 
readily interpretable by machines
Manual data extraction is expensive, labor‐intensive, 
and error‐prone
3
Scope Motivation: Modern data‐driven and first‐
principles materials design accelerates 
pace of what to make…
4
5
Automated body text extraction of synthesis parameters for 
materials science using primarily machine learning 
approaches
How does this work align with others
6
Scientific Domain Progression for NLP
7
Riedel and McCallum, Empirical methods in NLP 2011
ChemTagger: Hawizy et al 2011
ChemSpot: Rocktäschel et al  2012
Tchoua et al, IEEE, 2017 DOI 10.1109/eScience.2017.23
Biology/medical domain:
Publicly available annotated collections
GENIA corpus
Unified Medical Language System
Chemistry domain:
Domain‐specific terminology, 
numeric phrases
CHEMDNER corpus
Inorganic materials:
Polymer domain:
Continuum of text mining approaches
8
Nature, 533, 2016
Continuum of text mining approaches
• Collection of rules/dictionaries ; rule‐matching engine
9
J Chem Inf Model, 56, 2016
LeadMine: Lowe and Sayle, J 
Chem Inf Model, 7, 2015
Continuum of text mining approaches
10
Court and Cole., Scientific Data 2018
Korvigo et al., J Cheminform 2018CNN: convolutional neural net
HS‐biGRU: half‐stateful bidirectional gated recurrent unit
FCN: fully‐connected network
CEM: chemical entity mention
NLP activities
Entity extraction, Event extraction, Relation extraction, Entity linking
Krallinger et al. Chem. Rev. 2017
Chemical Entity Recognition
12
Krallinger et al. Chem. Rev. 2017
Parts of a text, types of texts
13
Experimental methods
NaNi1/3Co1/3Fe1/3O2 was synthesized by solid‐state reaction. 
Excess amounts of Na2O, NiO, Co3O4 and Fe2O3 were mixed and 
ball milled for 4 h at 500 rpm rate, and the resulting material 
was collected in the glove box. About 0.5 g of powder was fired 
at 800 °C under O2 for 14 h before it was quenched to room 
temperature and moved to a glove box filled with argon.
Challenges
• With NLP for materials: 
• Transferability across materials domains
• Off‐stoichiometry
• Lexical ambiguity and evolution
• With overall approach: 
• Age, quality
• Linked recipes
• Negative examples
14
“BaxMn1‐xO3 for x = 0.9”
Solid state, thin film, 
templated synthesis
“following the method described by….”
Optical character recognition, pdf vs. html
Bias of literature toward success
ACS central science, 2017
• Example from chemical synthesis
• Data augmentation reaction databased supplemented with chemically plausible negative 
examples
Learning Method Ontologies
16
17
Unspecified reaction conditions/amounts
Appropriate amounts
Small amount
Large amount
Several times
Ambient conditions
Constant conditions
Reduced pressure
Vigorous stirring
We can measure improvement in readability
Some examples of what we’ve done…
Variational autoencoder:
• Loss = reconstruction + f(Gaussian)
• Also a generative model
Edward Kim et al., npj Computational Materials 2017
Collaborator, Stefanie Jegelka, CSAIL, MIT
Exploratory: suggesting synthesis conditions 
for stabilizing desired materials
Polymorphs for MnO2
overlaid with most probable 
alkali‐ion use in synthesis 
(intercalation‐based phase 
stability)
Edward Kim et al., npj Computational Materials 2017
Photocatalysts
Lithium‐ion batteries Molecular sieves
Alkaline batteries
10,200 articles
Concluding thoughts on NLP progress
• Natural language processing is young in its application to 
materials science
• It takes effort to build up an annotation approach and corpus
• There are domain‐specific needs regarding accuracy and 
ambiguity 
• Tradeoff between accuracy and degree of generalizability
Bibliography
Audus, Debra J., and Juan J. de Pablo. "Polymer informatics: opportunities and challenges." (2017): 
1078‐1082.
Coley, Connor W., et al. "Prediction of organic reaction outcomes using machine learning." ACS 
central science 3.5 (2017): 434‐443.
Court, Callum J., and Jacqueline M. Cole. "Auto‐generated materials database of Curie and Néel
temperatures via semi‐supervised relationship extraction." Scientific data 5 (2018): 180111.
Hawizy, L.; Jessop, D. M.; Adams, N.; Murray‐Rust, P. ChemicalTagger: A Tool for Semantic Text‐
Mining in Chemistry. J. Cheminform. 2011, 3, 1–13.
Kim, E. et al. “A Data‐driven Framework for Materials Synthesis Discovery.” Chemistry of Materials. 
2017, 29.
Kim, E. et al. "Virtual screening of inorganic materials synthesis parameters with deep learning" npj
Computational Materials, 2017, 53.
Rocktäschel, T.; Weidlich, M.; Leser, U. ChemSpot: A Hybrid System for Chemical Named Entity 
Recognition. Bioinformatics 2012, 28, 1633–1640.
Swain, M. C.; Cole, J. M. ChemDataExtractor: A Toolkit for Automated Extraction of Chemical 
Information from the Scientific Literature. J. Chem. Inf. Model. 2016, 56, 1894–1904.
Thank you 
Edward Kim, Zachary Jensen, Kevin Huang
Teams at Berkeley and UMA
olivetti.mit.edu
synthesisproject.org
elsao@mit.edu
22

More Related Content

What's hot (20)

PDF
Machine learning for materials design: opportunities, challenges, and methods
Anubhav Jain
 
PDF
A Framework and Infrastructure for Uncertainty Quantification and Management ...
aimsnist
 
PDF
[DL輪読会]AVID:Adversarial Visual Irregularity Detection
Deep Learning JP
 
PDF
TMS workshop on machine learning in materials science: Intro to deep learning...
BrianDeCost
 
PDF
Understanding Blackbox Predictions via Influence Functions
harmonylab
 
PPTX
Nanoimprint Lithography
Debendra Timsina
 
PPTX
Deep learning (2)
Muhanad Al-khalisy
 
PPTX
Nanoimprint lithography (NIL)
Preeti Choudhary
 
PPTX
Cd te solar cell
Subhasis Shit
 
PDF
Makram thesis presentation
abdelqad
 
PDF
汎用ニューラルネットワークポテンシャル「PFP」による材料探索_MRS-J2021招待講演_2021/12/15
Matlantis
 
PPTX
Physics-Informed Machine Learning
OmarYounis21
 
PDF
機械学習は化学研究の"経験と勘"を合理化できるか?
Ichigaku Takigawa
 
PPTX
Variational continual learning
Nguyen Giang
 
PDF
CVPR 2015 読み会 "Understanding Deep Image Representations by Inverting Them"
Hiroharu Kato
 
PDF
Graphene -Applications in Electronics
Zaahir Salam
 
PPSX
Band structure and surface properties of 1-4 layers of MoS2
Po-Chun Yeh
 
PDF
CVPR2019読み会@関東CV
Takanori Ogata
 
PPTX
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
Ryohei Suzuki
 
PDF
Wet chemical processing with megasonics assist for micro-bump resist stripping
John Tracy
 
Machine learning for materials design: opportunities, challenges, and methods
Anubhav Jain
 
A Framework and Infrastructure for Uncertainty Quantification and Management ...
aimsnist
 
[DL輪読会]AVID:Adversarial Visual Irregularity Detection
Deep Learning JP
 
TMS workshop on machine learning in materials science: Intro to deep learning...
BrianDeCost
 
Understanding Blackbox Predictions via Influence Functions
harmonylab
 
Nanoimprint Lithography
Debendra Timsina
 
Deep learning (2)
Muhanad Al-khalisy
 
Nanoimprint lithography (NIL)
Preeti Choudhary
 
Cd te solar cell
Subhasis Shit
 
Makram thesis presentation
abdelqad
 
汎用ニューラルネットワークポテンシャル「PFP」による材料探索_MRS-J2021招待講演_2021/12/15
Matlantis
 
Physics-Informed Machine Learning
OmarYounis21
 
機械学習は化学研究の"経験と勘"を合理化できるか?
Ichigaku Takigawa
 
Variational continual learning
Nguyen Giang
 
CVPR 2015 読み会 "Understanding Deep Image Representations by Inverting Them"
Hiroharu Kato
 
Graphene -Applications in Electronics
Zaahir Salam
 
Band structure and surface properties of 1-4 layers of MoS2
Po-Chun Yeh
 
CVPR2019読み会@関東CV
Takanori Ogata
 
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
Ryohei Suzuki
 
Wet chemical processing with megasonics assist for micro-bump resist stripping
John Tracy
 

Similar to Progress in Natural Language Processing of Materials Science Text (20)

PDF
Applications of Natural Language Processing to Materials Design
Anubhav Jain
 
PDF
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Anubhav Jain
 
PDF
ChemNLP: A Natural Language Processing based Library for Materials Chemistry ...
KAMAL CHOUDHARY
 
PDF
Accelerating materials design through natural language processing
Anubhav Jain
 
PDF
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain
 
PDF
Discovering advanced materials for energy applications by mining the scientif...
Anubhav Jain
 
PDF
Materials design using knowledge from millions of journal articles via natura...
Anubhav Jain
 
PDF
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Anubhav Jain
 
PDF
Capturing and leveraging materials science knowledge from millions of journal...
Anubhav Jain
 
PDF
Natural language processing for extracting synthesis recipes and applications...
Anubhav Jain
 
PPTX
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
DOCX
Nlp final
HARISHREDDY282
 
PDF
Module 8: Natural language processing Pt 1
Sara Hooker
 
PDF
Natural Language Processing for development
Aravind Reddy
 
PDF
Natural Language Processing from Object Automation
Object Automation
 
PPTX
Unit 1 Natural Language Procerssing.pptx
sriramrpselvam
 
PPTX
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
PhD Assistance
 
PDF
naturallanguageprocessing-160722053804 (1).pdf
RohanSharma573161
 
PPTX
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
vemuripraveena2622
 
PDF
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Anubhav Jain
 
Applications of Natural Language Processing to Materials Design
Anubhav Jain
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Anubhav Jain
 
ChemNLP: A Natural Language Processing based Library for Materials Chemistry ...
KAMAL CHOUDHARY
 
Accelerating materials design through natural language processing
Anubhav Jain
 
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain
 
Discovering advanced materials for energy applications by mining the scientif...
Anubhav Jain
 
Materials design using knowledge from millions of journal articles via natura...
Anubhav Jain
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Anubhav Jain
 
Capturing and leveraging materials science knowledge from millions of journal...
Anubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Anubhav Jain
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
Nlp final
HARISHREDDY282
 
Module 8: Natural language processing Pt 1
Sara Hooker
 
Natural Language Processing for development
Aravind Reddy
 
Natural Language Processing from Object Automation
Object Automation
 
Unit 1 Natural Language Procerssing.pptx
sriramrpselvam
 
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
PhD Assistance
 
naturallanguageprocessing-160722053804 (1).pdf
RohanSharma573161
 
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
vemuripraveena2622
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Anubhav Jain
 
Ad

More from aimsnist (18)

PDF
Enabling Data Science Methods for Catalyst Design and Discovery
aimsnist
 
PDF
Predicting local atomic structures from X-ray absorption spectroscopy using t...
aimsnist
 
PDF
Smart Metrics for High Performance Material Design
aimsnist
 
PDF
Graphs, Environments, and Machine Learning for Materials Science
aimsnist
 
PDF
When The New Science Is In The Outliers
aimsnist
 
PDF
The MGI and AI
aimsnist
 
PDF
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
aimsnist
 
PDF
How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...
aimsnist
 
PDF
Coupling AI with HiTp experiments to Discover Metallic Glasses Faster
aimsnist
 
PDF
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
aimsnist
 
PDF
Autonomous experimental phase diagram acquisition
aimsnist
 
PDF
Classical force fields as physics-based neural networks
aimsnist
 
PDF
Pathways Towards a Hierarchical Discovery of Materials
aimsnist
 
PDF
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
aimsnist
 
PDF
Polymer Genome: An Informatics Platform for Polymer Dielectrics Discovery and...
aimsnist
 
PDF
Materials Data in Action
aimsnist
 
PDF
2D/3D Materials screening and genetic algorithm with ML model
aimsnist
 
PDF
Combinatorial Experimentation and Machine Learning for Materials Discovery
aimsnist
 
Enabling Data Science Methods for Catalyst Design and Discovery
aimsnist
 
Predicting local atomic structures from X-ray absorption spectroscopy using t...
aimsnist
 
Smart Metrics for High Performance Material Design
aimsnist
 
Graphs, Environments, and Machine Learning for Materials Science
aimsnist
 
When The New Science Is In The Outliers
aimsnist
 
The MGI and AI
aimsnist
 
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
aimsnist
 
How to Leverage Artificial Intelligence to Accelerate Data Collection and Ana...
aimsnist
 
Coupling AI with HiTp experiments to Discover Metallic Glasses Faster
aimsnist
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
aimsnist
 
Autonomous experimental phase diagram acquisition
aimsnist
 
Classical force fields as physics-based neural networks
aimsnist
 
Pathways Towards a Hierarchical Discovery of Materials
aimsnist
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
aimsnist
 
Polymer Genome: An Informatics Platform for Polymer Dielectrics Discovery and...
aimsnist
 
Materials Data in Action
aimsnist
 
2D/3D Materials screening and genetic algorithm with ML model
aimsnist
 
Combinatorial Experimentation and Machine Learning for Materials Discovery
aimsnist
 
Ad

Recently uploaded (20)

PPTX
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
PDF
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
PPTX
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
A R SIVANESH M.E., (Ph.D)
 
PPTX
Knowledge Representation : Semantic Networks
Amity University, Patna
 
PDF
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
PDF
Electrical Machines and Their Protection.pdf
Nabajyoti Banik
 
PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
PPTX
Alan Turing - life and importance for all of us now
Pedro Concejero
 
PDF
Digital water marking system project report
Kamal Acharya
 
PDF
this idjfk sgfdhgdhgdbhgbgrbdrwhrgbbhtgdt
WaleedAziz7
 
PPTX
Seminar Description: YOLO v1 (You Only Look Once).pptx
abhijithpramod20002
 
PDF
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
PDF
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
PDF
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
PDF
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
PPTX
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
PPTX
How Industrial Project Management Differs From Construction.pptx
jamespit799
 
PPTX
darshai cross section and river section analysis
muk7971
 
PDF
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
PPTX
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
A R SIVANESH M.E., (Ph.D)
 
Knowledge Representation : Semantic Networks
Amity University, Patna
 
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
Electrical Machines and Their Protection.pdf
Nabajyoti Banik
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
Alan Turing - life and importance for all of us now
Pedro Concejero
 
Digital water marking system project report
Kamal Acharya
 
this idjfk sgfdhgdhgdbhgbgrbdrwhrgbbhtgdt
WaleedAziz7
 
Seminar Description: YOLO v1 (You Only Look Once).pptx
abhijithpramod20002
 
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
How Industrial Project Management Differs From Construction.pptx
jamespit799
 
darshai cross section and river section analysis
muk7971
 
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 

Progress in Natural Language Processing of Materials Science Text