Artificial intelligence is emerging as a new paradigm in materials science. This talk describes how physical intuition and (insightful) machine learning can solve the complicated task of structure recognition in materials at the nanoscale.
Going Smart and Deep on Materials at ALCFIan Foster
As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
1) The document discusses evaluating machine learning algorithms for materials science using the Matbench protocol.
2) Matbench provides standardized datasets, testing procedures, and an online leaderboard to benchmark and compare machine learning performance.
3) This allows different groups to evaluate algorithms independently and identify best practices for materials science predictions.
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
Classical force fields as physics-based neural networksaimsnist
1. The document discusses classical interatomic potentials and machine-learning potentials for atomistic simulations. It presents a new physically-informed neural network (PINN) potential that combines the accuracy of neural networks with the transferability of physics-based potentials.
2. As an example, the document develops a PINN potential for aluminum that accurately reproduces density functional theory energies and various aluminum properties beyond the training data.
3. The PINN potential is shown to have better transferability and predictiveness compared to a purely mathematical neural network potential. This makes PINN potentials a promising next-generation approach for efficient and accurate atomistic simulations.
The document provides an overview of materials informatics and the Materials Genome Initiative. It discusses how materials informatics uses data-driven approaches and techniques from fields like signal processing, machine learning and statistics to generate structure-property-processing linkages from materials science data and improve understanding of materials behavior. This includes extracting features from materials microstructure, using statistical analysis and data mining to discover relationships and create predictive models, and evaluating how knowledge has improved.
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...WithTheBest
This presentation illustrates distinct statistical and machine learning approaches to automated recognition of major brain tissues in 3D brain MRI.
Nataliya Portman, Postdoctoral Fellow Faculty of Science, UOIT, Oshawa, ON Canada
PhD in Applied Mathematics, University of Waterloo | Postdoctoral Research on Brain MRI Segmentation, Neuro | Current: Applied Machine Learning in Materials Science, University of Ontario Institute of Technology
The document discusses using artificial intelligence (AI) to accelerate materials innovation for clean energy applications. It outlines six elements needed for a Materials Acceleration Platform: 1) automated experimentation, 2) AI for materials discovery, 3) modular robotics for synthesis and characterization, 4) computational methods for inverse design, 5) bridging simulation length and time scales, and 6) data infrastructure. Examples of opportunities include using AI to bridge simulation scales, assist complex measurements, and enable automated materials design. The document argues that a cohesive infrastructure is needed to make effective use of AI, data, computation, and experiments for materials science.
Overview of DuraMat software tool developmentAnubhav Jain
The document discusses software tools being developed by researchers for photovoltaic (PV) applications. It summarizes several software projects funded by DuraMat that address different aspects of PV including: (1) PV system modeling and analysis, (2) operation and degradation modeling, and (3) planning and reducing levelized cost of energy. The software aims to solve a range of PV problems, are open source, and developed collaboratively on GitHub to be reusable and sustainable resources for the community.
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
The document discusses the development of Matbench, a standardized benchmark for evaluating machine learning algorithms for materials property prediction. Matbench includes 13 standardized datasets covering a variety of materials prediction tasks. It employs a nested cross-validation procedure to evaluate algorithms and ranks submissions on an online leaderboard. This allows for reproducible evaluation and comparison of different algorithms. Matbench has provided insights into which algorithm types work best for certain prediction problems and has helped measure overall progress in the field. Future work aims to expand Matbench with more diverse datasets and evaluation procedures to better represent real-world materials design challenges.
Smart Metrics for High Performance Material Designaimsnist
This document discusses smart metrics for high-performance material design using density functional theory (DFT), classical force fields (FF), and machine learning (ML). It provides an overview of the JARVIS database and tools containing over 35,000 materials and classical properties calculated using DFT, FF, and ML methods. Metrics discussed include formation energy, exfoliation energy, elastic constants, surface energy, vacancy energy, grain boundary energy, bandgaps, and other electronic and optical properties important for applications like solar cells. ML models are developed to predict these properties with mean absolute errors within chemical accuracy compared to DFT benchmarks.
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Anubhav Jain
- The document describes a computational materials design pipeline that uses theory, optimization, and natural language processing (NLP) to accelerate materials discovery.
- Key components of the pipeline include optimization algorithms like Rocketsled to find best materials solutions with fewer calculations, and NLP tools to extract and analyze knowledge from literature to predict promising new materials and benchmarks.
- The pipeline has shown speedups of 15-30x over random searches and has successfully predicted new thermoelectric materials discoveries 1-2 years before their reporting in literature.
A Machine Learning Framework for Materials Knowledge Systemsaimsnist
- The document describes a machine learning framework for developing artificial intelligence-based materials knowledge systems (MKS) to support accelerated materials discovery and development.
- The MKS would have main functions of diagnosing materials problems, predicting materials behaviors, and recommending materials selections or process adjustments.
- It would utilize a Bayesian statistical approach to curate process-structure-property linkages for all materials classes and length scales, accounting for uncertainty in the knowledge, and allow continuous updates from new information sources.
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
Atomate is a tool for automating materials simulations and high-throughput computations. It provides predefined workflows for common calculations like band structures, elastic tensors, and Raman spectra. Users can customize workflows and simulation parameters. FireWorks executes workflows on supercomputers and detects/recovers from failures. Data is stored in databases for analysis with tools like pymatgen. The goal is to make simulations easy and scalable by automating tedious steps and leveraging past work.
Applications of Natural Language Processing to Materials DesignAnubhav Jain
This document discusses using natural language processing (NLP) techniques to extract useful information from unstructured text sources in materials science literature. It describes how NLP models can be trained on large datasets of materials science publications to perform tasks like chemistry-aware search, summarizing material properties, and suggesting synthesis methods. The models are developed using techniques like word embeddings, LSTM networks, and named entity recognition. The goal is to organize materials science knowledge from text into a database called Matscholar to enable new applications of the information.
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
FireWorks is a workflow management system that allows researchers to define and execute complex computational materials science workflows on local or remote computing resources in an automated manner. It provides features such as error detection and recovery, job scheduling, provenance tracking, and remote file access. The atomate library builds on FireWorks to provide a high-level interface for common materials simulation procedures like structure optimization, band structure calculation, and property prediction using popular codes like VASP. Together, these tools aim to make high-throughput computational materials discovery and design more accessible to researchers.
Overview of DuraMat software tool development(poster version)Anubhav Jain
This document provides an overview of software tools being developed by the DuraMat project to analyze photovoltaic systems. It summarizes six software tools that serve two main purposes: core functions for PV analysis and modeling operation/degradation, and tools for project planning and reducing levelized cost of energy (LCOE). The core function tools include PVAnalytics for data processing and a PV-Pro preprocessor. Tools for operation/degradation include PV-Pro, PVOps, PVArc, and pv-vision. Tools for project planning and LCOE include a simplified LCOE calculator and VocMax string length calculator. All tools are open source and designed for large PV data sets.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
A Framework and Infrastructure for Uncertainty Quantification and Management ...aimsnist
QuesTek Innovations presented a framework to incorporate materials genome initiatives (MGI) and artificial intelligence (AI) into their integrated computational materials engineering (ICME) practice. They discussed three key aspects: (1) MaGICMaT, a materials genome and ICME toolkit to manage data and property-structure-performance linkages, (2) an uncertainty quantification framework for CALPHAD modeling, and (3) a cloud-based platform to enable rapid development and deployment of ICME models with an HPC backend. The presentation provided details on their approaches for each aspect and highlighted opportunities to further enhance ICME with MGI and AI.
Assessing Factors Underpinning PV Degradation through Data AnalysisAnubhav Jain
The document discusses using PVPRO methods and large-scale data analysis to distinguish system and module degradation in PV systems. It involves 3 main tasks: 1) Developing an algorithm to detect off-maximum power point operation and compare it to existing tools. 2) Applying PVPRO to additional datasets to refine methods and perform degradation analysis on 25 large PV systems. 3) Connecting bill-of-materials data to degradation results from accelerated stress tests through data-driven analysis and publishing findings while anonymizing data.
How might machine learning help advance solar PV research?Anubhav Jain
Machine learning techniques can help optimize solar PV systems in several ways:
1) Clear sky detection algorithms using ML were developed to more accurately classify sky conditions from irradiance data, improving degradation rate calculations.
2) Site-specific modeling of module voltages over time, validated with field data, allows more optimal string sizing compared to traditional worst-case assumptions.
3) ML and data-driven approaches may help optimize other aspects of solar plant design like climate zone definitions and extracting module parameters from production data.
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
This document discusses computing challenges posed by rapidly increasing data scales in scientific applications and high performance computing. It introduces the concept of online data analysis and reduction as an alternative to traditional offline analysis to help address these challenges. The key messages are that dramatic changes in HPC system geography due to different growth rates of technologies are driving new application structures and computational logistics problems, presenting exciting new computer science opportunities in online data analysis and reduction.
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
This document summarizes research on object detection techniques using deep learning. It discusses using the YOLO algorithm to identify objects in images using a single neural network that predicts bounding boxes and class probabilities. The document reviews prior research on algorithms like R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN and RetinaNet. It then describes the YOLO loss function and methodology for finding bounding boxes of objects in an image. The document concludes that YOLO is well-suited for real-time object detection applications due to its advantages over other algorithms.
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
This talk introduces several open-source software tools for accelerating materials design efforts:
- Atomate enables high-throughput DFT simulations through automated workflows. It has been used to generate large datasets for the Materials Project.
- Rocketsled uses machine learning to suggest the most informative calculations to optimize a target property faster than random searches.
- Matminer provides features to represent materials for machine learning and connects to data mining tools and databases.
- Automatminer develops machine learning models automatically from raw input-output data without requiring feature engineering by users.
- Robocrystallographer analyzes crystal structures and describes them in an interpretable text format.
Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain
This document discusses software tools for automating materials simulations. It introduces pymatgen, atomate, and FireWorks which can be used together to define a workflow of calculations, execute the workflow on supercomputers, and recover from errors or failures. The tools allow researchers to focus on designing and analyzing simulations rather than manual setup and execution of jobs. Workflows in atomate can compute many materials properties including elastic tensors, band structures, and transport coefficients. Parameters are customizable but sensible defaults are provided. FireWorks then executes the workflows across multiple supercomputing clusters.
The document introduces two approaches to chemical prediction: quantum simulation based on density functional theory and machine learning based on data. It then discusses using graph-structured neural networks for chemical prediction on datasets like QM9. It presents Neural Fingerprint (NFP) and Gated Graph Neural Network (GGNN) models for predicting molecular properties from graph-structured data. Chainer Chemistry is introduced as a library for chemical and biological machine learning that implements these graph convolutional networks.
1) The document discusses challenges in using machine learning and data analytics for materials science research. Specifically, most materials are irrelevant for a given purpose, so models need to identify statistically exceptional subgroups rather than averaging all data.
2) Two potential methods for identifying promising subgroups are discussed: focusing on materials with small oxygen-carbon-oxygen angles or large carbon-oxygen bond lengths for catalysis applications.
3) The concept of a model's domain of applicability is introduced, wherein models perform best when applied only to similar data they were trained on, rather than all data globally. Identifying these reliable domains is important.
2D/3D Materials screening and genetic algorithm with ML modelaimsnist
JARVIS-ML provides concise summaries of materials properties using machine learning models trained on the extensive data in the JARVIS repositories. It has developed regression and classification models that can predict formation energies, bandgaps, and other material properties in seconds, much faster than traditional DFT calculations. The models use gradient boosting decision trees and feature importance analysis to provide explanations. JARVIS-ML is available as a public web app and API for rapid screening and discovery of new materials.
Deep learning for molecules, introduction to chainer chemistryKenta Oono
1) The document introduces machine learning and deep learning techniques for predicting chemical properties, including rule-based approaches versus learning-based approaches using neural message passing algorithms.
2) It discusses several graph neural network models like NFP, GGNN, WeaveNet and SchNet that can be applied to molecular graphs to predict characteristics. These models update atom representations through message passing and graph convolution operations.
3) Chainer Chemistry is introduced as a deep learning framework that can be used with these graph neural network models for chemical property prediction tasks. Examples of tasks include drug discovery and molecular generation.
This document discusses deep learning techniques for object detection and recognition. It provides an overview of computer vision tasks like image classification and object detection. It then discusses how crowdsourcing large datasets from the internet and advances in machine learning, specifically deep convolutional neural networks (CNNs), have led to major breakthroughs in object detection. Several state-of-the-art CNN models for object detection are described, including R-CNN, Fast R-CNN, Faster R-CNN, SSD, and YOLO. The document also provides examples of applying these techniques to tasks like face detection and detecting manta rays from aerial videos.
Overview of DuraMat software tool developmentAnubhav Jain
The document discusses software tools being developed by researchers for photovoltaic (PV) applications. It summarizes several software projects funded by DuraMat that address different aspects of PV including: (1) PV system modeling and analysis, (2) operation and degradation modeling, and (3) planning and reducing levelized cost of energy. The software aims to solve a range of PV problems, are open source, and developed collaboratively on GitHub to be reusable and sustainable resources for the community.
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
The document discusses the development of Matbench, a standardized benchmark for evaluating machine learning algorithms for materials property prediction. Matbench includes 13 standardized datasets covering a variety of materials prediction tasks. It employs a nested cross-validation procedure to evaluate algorithms and ranks submissions on an online leaderboard. This allows for reproducible evaluation and comparison of different algorithms. Matbench has provided insights into which algorithm types work best for certain prediction problems and has helped measure overall progress in the field. Future work aims to expand Matbench with more diverse datasets and evaluation procedures to better represent real-world materials design challenges.
Smart Metrics for High Performance Material Designaimsnist
This document discusses smart metrics for high-performance material design using density functional theory (DFT), classical force fields (FF), and machine learning (ML). It provides an overview of the JARVIS database and tools containing over 35,000 materials and classical properties calculated using DFT, FF, and ML methods. Metrics discussed include formation energy, exfoliation energy, elastic constants, surface energy, vacancy energy, grain boundary energy, bandgaps, and other electronic and optical properties important for applications like solar cells. ML models are developed to predict these properties with mean absolute errors within chemical accuracy compared to DFT benchmarks.
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Anubhav Jain
- The document describes a computational materials design pipeline that uses theory, optimization, and natural language processing (NLP) to accelerate materials discovery.
- Key components of the pipeline include optimization algorithms like Rocketsled to find best materials solutions with fewer calculations, and NLP tools to extract and analyze knowledge from literature to predict promising new materials and benchmarks.
- The pipeline has shown speedups of 15-30x over random searches and has successfully predicted new thermoelectric materials discoveries 1-2 years before their reporting in literature.
A Machine Learning Framework for Materials Knowledge Systemsaimsnist
- The document describes a machine learning framework for developing artificial intelligence-based materials knowledge systems (MKS) to support accelerated materials discovery and development.
- The MKS would have main functions of diagnosing materials problems, predicting materials behaviors, and recommending materials selections or process adjustments.
- It would utilize a Bayesian statistical approach to curate process-structure-property linkages for all materials classes and length scales, accounting for uncertainty in the knowledge, and allow continuous updates from new information sources.
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
Atomate is a tool for automating materials simulations and high-throughput computations. It provides predefined workflows for common calculations like band structures, elastic tensors, and Raman spectra. Users can customize workflows and simulation parameters. FireWorks executes workflows on supercomputers and detects/recovers from failures. Data is stored in databases for analysis with tools like pymatgen. The goal is to make simulations easy and scalable by automating tedious steps and leveraging past work.
Applications of Natural Language Processing to Materials DesignAnubhav Jain
This document discusses using natural language processing (NLP) techniques to extract useful information from unstructured text sources in materials science literature. It describes how NLP models can be trained on large datasets of materials science publications to perform tasks like chemistry-aware search, summarizing material properties, and suggesting synthesis methods. The models are developed using techniques like word embeddings, LSTM networks, and named entity recognition. The goal is to organize materials science knowledge from text into a database called Matscholar to enable new applications of the information.
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
FireWorks is a workflow management system that allows researchers to define and execute complex computational materials science workflows on local or remote computing resources in an automated manner. It provides features such as error detection and recovery, job scheduling, provenance tracking, and remote file access. The atomate library builds on FireWorks to provide a high-level interface for common materials simulation procedures like structure optimization, band structure calculation, and property prediction using popular codes like VASP. Together, these tools aim to make high-throughput computational materials discovery and design more accessible to researchers.
Overview of DuraMat software tool development(poster version)Anubhav Jain
This document provides an overview of software tools being developed by the DuraMat project to analyze photovoltaic systems. It summarizes six software tools that serve two main purposes: core functions for PV analysis and modeling operation/degradation, and tools for project planning and reducing levelized cost of energy (LCOE). The core function tools include PVAnalytics for data processing and a PV-Pro preprocessor. Tools for operation/degradation include PV-Pro, PVOps, PVArc, and pv-vision. Tools for project planning and LCOE include a simplified LCOE calculator and VocMax string length calculator. All tools are open source and designed for large PV data sets.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
A Framework and Infrastructure for Uncertainty Quantification and Management ...aimsnist
QuesTek Innovations presented a framework to incorporate materials genome initiatives (MGI) and artificial intelligence (AI) into their integrated computational materials engineering (ICME) practice. They discussed three key aspects: (1) MaGICMaT, a materials genome and ICME toolkit to manage data and property-structure-performance linkages, (2) an uncertainty quantification framework for CALPHAD modeling, and (3) a cloud-based platform to enable rapid development and deployment of ICME models with an HPC backend. The presentation provided details on their approaches for each aspect and highlighted opportunities to further enhance ICME with MGI and AI.
Assessing Factors Underpinning PV Degradation through Data AnalysisAnubhav Jain
The document discusses using PVPRO methods and large-scale data analysis to distinguish system and module degradation in PV systems. It involves 3 main tasks: 1) Developing an algorithm to detect off-maximum power point operation and compare it to existing tools. 2) Applying PVPRO to additional datasets to refine methods and perform degradation analysis on 25 large PV systems. 3) Connecting bill-of-materials data to degradation results from accelerated stress tests through data-driven analysis and publishing findings while anonymizing data.
How might machine learning help advance solar PV research?Anubhav Jain
Machine learning techniques can help optimize solar PV systems in several ways:
1) Clear sky detection algorithms using ML were developed to more accurately classify sky conditions from irradiance data, improving degradation rate calculations.
2) Site-specific modeling of module voltages over time, validated with field data, allows more optimal string sizing compared to traditional worst-case assumptions.
3) ML and data-driven approaches may help optimize other aspects of solar plant design like climate zone definitions and extracting module parameters from production data.
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
This document discusses computing challenges posed by rapidly increasing data scales in scientific applications and high performance computing. It introduces the concept of online data analysis and reduction as an alternative to traditional offline analysis to help address these challenges. The key messages are that dramatic changes in HPC system geography due to different growth rates of technologies are driving new application structures and computational logistics problems, presenting exciting new computer science opportunities in online data analysis and reduction.
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
This document summarizes research on object detection techniques using deep learning. It discusses using the YOLO algorithm to identify objects in images using a single neural network that predicts bounding boxes and class probabilities. The document reviews prior research on algorithms like R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN and RetinaNet. It then describes the YOLO loss function and methodology for finding bounding boxes of objects in an image. The document concludes that YOLO is well-suited for real-time object detection applications due to its advantages over other algorithms.
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
This talk introduces several open-source software tools for accelerating materials design efforts:
- Atomate enables high-throughput DFT simulations through automated workflows. It has been used to generate large datasets for the Materials Project.
- Rocketsled uses machine learning to suggest the most informative calculations to optimize a target property faster than random searches.
- Matminer provides features to represent materials for machine learning and connects to data mining tools and databases.
- Automatminer develops machine learning models automatically from raw input-output data without requiring feature engineering by users.
- Robocrystallographer analyzes crystal structures and describes them in an interpretable text format.
Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain
This document discusses software tools for automating materials simulations. It introduces pymatgen, atomate, and FireWorks which can be used together to define a workflow of calculations, execute the workflow on supercomputers, and recover from errors or failures. The tools allow researchers to focus on designing and analyzing simulations rather than manual setup and execution of jobs. Workflows in atomate can compute many materials properties including elastic tensors, band structures, and transport coefficients. Parameters are customizable but sensible defaults are provided. FireWorks then executes the workflows across multiple supercomputing clusters.
The document introduces two approaches to chemical prediction: quantum simulation based on density functional theory and machine learning based on data. It then discusses using graph-structured neural networks for chemical prediction on datasets like QM9. It presents Neural Fingerprint (NFP) and Gated Graph Neural Network (GGNN) models for predicting molecular properties from graph-structured data. Chainer Chemistry is introduced as a library for chemical and biological machine learning that implements these graph convolutional networks.
1) The document discusses challenges in using machine learning and data analytics for materials science research. Specifically, most materials are irrelevant for a given purpose, so models need to identify statistically exceptional subgroups rather than averaging all data.
2) Two potential methods for identifying promising subgroups are discussed: focusing on materials with small oxygen-carbon-oxygen angles or large carbon-oxygen bond lengths for catalysis applications.
3) The concept of a model's domain of applicability is introduced, wherein models perform best when applied only to similar data they were trained on, rather than all data globally. Identifying these reliable domains is important.
2D/3D Materials screening and genetic algorithm with ML modelaimsnist
JARVIS-ML provides concise summaries of materials properties using machine learning models trained on the extensive data in the JARVIS repositories. It has developed regression and classification models that can predict formation energies, bandgaps, and other material properties in seconds, much faster than traditional DFT calculations. The models use gradient boosting decision trees and feature importance analysis to provide explanations. JARVIS-ML is available as a public web app and API for rapid screening and discovery of new materials.
Deep learning for molecules, introduction to chainer chemistryKenta Oono
1) The document introduces machine learning and deep learning techniques for predicting chemical properties, including rule-based approaches versus learning-based approaches using neural message passing algorithms.
2) It discusses several graph neural network models like NFP, GGNN, WeaveNet and SchNet that can be applied to molecular graphs to predict characteristics. These models update atom representations through message passing and graph convolution operations.
3) Chainer Chemistry is introduced as a deep learning framework that can be used with these graph neural network models for chemical property prediction tasks. Examples of tasks include drug discovery and molecular generation.
This document discusses deep learning techniques for object detection and recognition. It provides an overview of computer vision tasks like image classification and object detection. It then discusses how crowdsourcing large datasets from the internet and advances in machine learning, specifically deep convolutional neural networks (CNNs), have led to major breakthroughs in object detection. Several state-of-the-art CNN models for object detection are described, including R-CNN, Fast R-CNN, Faster R-CNN, SSD, and YOLO. The document also provides examples of applying these techniques to tasks like face detection and detecting manta rays from aerial videos.
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...AILABS Academy
1. The document discusses classification and estimation using artificial neural networks. It provides examples of classification problems from industries like mining and banking loan approval.
2. It describes the basic components of an artificial neural network including the feedforward architecture with multiple layers of neurons and the backpropagation algorithm for learning network weights.
3. Examples are given to illustrate how neural networks can perform nonlinear classification and estimation through combinations of linear perceptron units in multiple layers with the backpropagation algorithm for training the network weights.
This presentation is an analysis of the paper,"SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing"
U-Net is a convolutional neural network (CNN) architecture designed for semantic segmentation tasks, especially in the field of medical image analysis. It was introduced by Olaf Ronneberger, Philipp Fischer, and Thomas Brox in 2015. The name "U-Net" comes from its U-shaped architecture.
Key features of the U-Net architecture:
U-Shaped Design: U-Net consists of a contracting path (downsampling) and an expansive path (upsampling). The architecture resembles the letter "U" when visualized.
Contracting Path (Encoder):
The contracting path involves a series of convolutional and pooling layers.
Each convolutional layer is followed by a rectified linear unit (ReLU) activation function and possibly other normalization or activation functions.
Pooling layers (usually max pooling) reduce spatial dimensions, capturing high-level features.
Expansive Path (Decoder):
The expansive path involves a series of upsampling and convolutional layers.
Upsampling is achieved using transposed convolution (also known as deconvolution or convolutional transpose).
Skip connections are established between corresponding layers in the contracting and expansive paths. These connections help retain fine-grained spatial information during the upsampling process.
Skip Connections:
Skip connections concatenate feature maps from the contracting path to the corresponding layers in the expansive path.
These connections facilitate the fusion of low-level and high-level features, aiding in precise localization.
Final Layer:
The final layer typically uses a convolutional layer with a softmax activation function for multi-class segmentation tasks, providing probability scores for each class.
U-Net's architecture and skip connections help address the challenge of segmenting objects with varying sizes and shapes, which is often encountered in medical image analysis. Its success in this domain has led to its application in other areas of computer vision as well.
The U-Net architecture has also been extended and modified in various ways, leading to improvements like the U-Net++ architecture and variations with attention mechanisms, which further enhance the segmentation performance.
U-Net's intuitive design and effectiveness in semantic segmentation tasks have made it a cornerstone in the field of medical image analysis and an influential architecture for researchers working on segmentation challenges.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
We present the current activities of the German Climate Computing Center (DKRZ) related to the application of machine learning and deep learning in fundamental weather and climate research. We follow the Nature article "Deep learning and process understanding for data-driven Earth system science" (https://www.nature.com/articles/s41586-019-0912-1), elaborate on the hybrid model in the article "Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling" (https://arxiv.org/abs/1710.11431), and explain the recent application of Nvidia image inpaiting in the reconstruction of temperature missing data (Kadow et al. (2020), "Artificial Intelligence reconstructs missing Climate Information" (in review)).
IEEE Student Branch Chittagong University arranged a webinar titled "From APECE to ASML A Semiconductor Journey". Shawn Millat shared his working experience in Semiconductor industry and also shared tips about studying in Germany.
This document summarizes image segmentation techniques using deep learning. It begins with an overview of semantic segmentation and instance segmentation. It then discusses several techniques for semantic segmentation, including deconvolution/transposed convolution for learnable upsampling, skip connections to combine predictions from different CNN depths, and dilated convolutions to increase the receptive field without losing resolution. For instance segmentation, it covers proposal-based methods like Mask R-CNN, and single-shot and recurrent approaches as alternatives to proposal-based models.
This document summarizes research on applying convolutional neural networks to natural language processing tasks. It describes how CNNs can be used to classify sentences and longer texts by representing words as vectors or one-hot encodings and applying convolutional and pooling layers. Pre-trained word vectors like GloVe and Word2Vec allow CNNs to capture key phrases for classification tasks. The document also outlines challenges like training CNNs on large datasets using character inputs and advances in libraries and hardware that will further CNN use for NLP.
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
Object recognition from RGB-D sensors has recently emerged as a renowned and challenging research topic. The current systems often require large amounts of time to train the models and to classify new data. We proposed an effective and fast object recognition approach from 3D data acquired from depth sensors such as Structure or Kinect sensors.
Our contribution in this work} is to present a novel fast and effective approach for real-time object recognition from 3D depth data:
- First, we extract simple but effective frame-level features, which we name as differential frames, from the raw depth data.
- Second, we build a recognition system based on Extreme Learning Machine classifier with a Local Receptive Field (ELM-LRF).
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...Ijripublishers Ijri
This paper presents a novel way to reduce noise introduced or exacerbated by image enhancement methods, in particular algorithms based on the random spray sampling technique, but not only. According to the nature of sprays, output images of spray-based methods tend to exhibit noise with unknown statistical distribution. To avoid inappropriate assumptions on the statistical characteristics of noise, a different one is made. In fact, the non-enhanced image is considered to be either free of noise or affected by non-perceivable levels of noise. Taking advantage of the higher sensitivity of the human visual system to changes in brightness, the analysis can be limited to the luma channel of both the non-enhanced and enhanced image. Also, given the importance of directional content in human vision, the analysis is performed through the dual-tree complex wavelet transform , lanczos interpolator and edge preserving smoothing filters. Unlike the discrete wavelet transform, the DTWCT allows for distinction of data directionality in the transform space. For each level of the transform, the standard deviation of the non-enhanced image coefficients is computed across the six orientations of the DTWCT, then it is normalized.
Keywords: dual-tree complex wavelet transform (DTWCT), lanczos interpolator, edge preserving smoothing filters.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This document discusses image restoration techniques for images degraded by space-variant blurs. It describes running sinusoidal transforms as a method for space-variant image restoration. Running transforms involve applying a short-time orthogonal transform within a moving window, allowing approximately stationary processing. This addresses limitations of methods that assume space-invariance or require coordinate transformations. The chapter presents running discrete sinusoidal transforms as a way to perform the space-variant restoration by modifying orthogonal transform coefficients within the window to estimate pixel values.
Fisheye Omnidirectional View in Autonomous DrivingYu Huang
This document discusses several papers related to using omnidirectional/fisheye camera views for autonomous driving applications. The papers propose methods for tasks like image classification, object detection, scene understanding from 360 degree camera data. Specific approaches discussed include graph-based classification of omnidirectional images, learning spherical convolutions for 360 degree imagery, spherical CNNs, and networks for scene understanding and 3D object detection using around view monitoring camera systems.
Materials Modelling: From theory to solar cells (Lecture 1)cdtpv
This document provides an overview of a mini-module on materials modelling for solar energy applications. It introduces the lecturers and outlines the course structure, which includes lectures on modelling, interfaces, and multi-scale approaches. It also describes a literature review activity where students will present a research paper using materials modelling in photovoltaics. Recommended textbooks are provided on topics like bonding in solids, computational chemistry, and density functional theory for solids.
Ijri ece-01-02 image enhancement aided denoising using dual tree complex wave...Ijripublishers Ijri
This paper presents a novel way to reduce noise introduced or exacerbated by image enhancement methods, in particular
algorithms based on the random spray sampling technique, but not only. According to the nature of sprays,
output images of spray-based methods tend to exhibit noise with unknown statistical distribution. To avoid inappropriate
assumptions on the statistical characteristics of noise, a different one is made. In fact, the non-enhanced image is
considered to be either free of noise or affected by non-perceivable levels of noise. Taking advantage of the higher sensitivity
of the human visual system to changes in brightness, the analysis can be limited to the luma channel of both the
non-enhanced and enhanced image. Also, given the importance of directional content in human vision, the analysis is
performed through the dual-tree complex wavelet transform , lanczos interpolator and edge preserving smoothing filters.
Unlike the discrete wavelet transform, the DTWCT allows for distinction of data directionality in the transform space.
For each level of the transform, the standard deviation of the non-enhanced image coefficients is computed across the
six orientations of the DTWCT, then it is normalized.
Keywords: dual-tree complex wavelet transform (DTWCT), lanczos interpolator, edge preserving smoothing filters.
1) The document discusses curvelet transformation and its application to object tracking. Curvelet transformation is a multiscale directional transform that can efficiently represent objects with curved edges using only a small number of coefficients.
2) It describes the stages of curvelet transformation including sub-band decomposition, smooth partitioning, renormalization, and ridgelet analysis. It also discusses the fast discrete curvelet transform implementation using unequally spaced fast Fourier transforms.
3) The proposed algorithm calculates the curvelet coefficients of frames to track objects based on the difference in curvelet energy between frames. Preliminary results on sample video frames are shown to demonstrate the calculation of curvelet coefficients.
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
This document outlines the steps to build your own natural language processing (NLP) system, beginning with creating a streaming consumer, launching a message queue service, creating a data pre-processing service, serving an ML model, and publishing predictions to a messaging app. It discusses separating components for modularity and ease of testing/extensibility. The presenter recommends tools like Anaconda, Docker, Redis, Fast.ai and SpaCy and walks through setting up the environment and each step in a Jupyter notebook. The goal is to experiment with building your own end-to-end NLP system in a modular, reusable way.
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
In the same way that we need to make assertions about how code functions, we need to make assertions about data, and unit testing is a promising framework. In this talk, we'll explore what is unique about unit testing data, and see how Two Sigma's open source library Marbles addresses these unique challenges in several real-world scenarios.
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
TileDB is an open-source storage manager for multi-dimensional sparse and dense array data. It has a novel architecture that addresses some of the pain points in storing array data on “big-data” and “cloud” storage architectures. This talk will highlight TileDB’s design and its ability to integrate with analysis environments relevant to the PyData community such as Python, R, Julia, etc.
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
In this talk I will discuss exponential family embeddings, which are methods that extend the idea behind word embeddings to other data types. I will describe how we used dynamic embeddings to understand how data science skill-sets have transformed over the last 3 years using our large corpus of jobs. The key takeaway is that these models can enrich analysis of specialized datasets.
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
How many newspapers should be distributed to each store for sale every day? The data science group at The New York Times addresses this optimization problem using custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. I'll describe our modeling and data engineering approaches, written in Python and hosted on Google Cloud Platform.
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
This document provides an introduction to graph theory concepts and working with graph data in Python. It begins with basic graph definitions and real-world graph examples. Various graph concepts are then demonstrated visually, such as vertices, edges, paths, cycles, and graph properties. Finally, it discusses working with graph data structures and algorithms in the NetworkX library in Python, including graph generation, analysis, and visualization. The overall goal is to introduce readers to graph theory and spark their interest in further exploration.
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
To productionize data science work (and have it taken seriously by software engineers, CTOs, clients, or the open source community), you need to write tests! Except… how can you test code that performs nondeterministic tasks like natural language parsing and modeling? This talk presents an approach to testing probabilistic functions in code, illustrated with concrete examples written for Pytest.
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
Those of us who use TensorFlow often focus on building the model that's most predictive, not the one that's most deployable. So how to put that hard work to work? In this talk, we'll walk through a strategy for taking your machine learning models from Jupyter Notebook into production and beyond.
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
In September 2017, dockless bikeshare joined the transportation options in the District of Columbia. In March 2018, scooter share followed. During the pilot of these technologies, Python has helped District Department of Transportation answer some critical questions. This talk will discuss how Python was used to answer research questions and how it supported the evaluation of this demonstration.
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
The document discusses how to avoid bad database surprises through early simulation and scalability testing. It provides examples of web and analytics apps that did not scale due to unanticipated database issues. It recommends using Python classes and JSON schema to define data models and generate synthetic test data. This allows simulating the full system early in development to identify potential performance bottlenecks before real data is involved.
Machine learning often requires us to think spatially and make choices about what it means for two instances to be close or far apart. So which is best - Euclidean? Manhattan? Cosine? It all depends! In this talk, we'll explore open source tools and visual diagnostic strategies for picking good distance metrics when doing machine learning on text.
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
The recent advances in machine learning and artificial intelligence are amazing! Yet, in order to have real value within a company, data scientists must be able to get their models off of their laptops and deployed within a company’s data pipelines and infrastructure. In this session, I'll demonstrate how one-off experiments can be transformed into scalable ML pipelines with minimal effort.
We will be using Beautiful Soup to Webscrape the IMDB website and create a function that will allow you to create a dictionary object on specific metadata of the IMDB profile for any IMDB ID you pass through as an argument.
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
This talk describes an experimental approach to time series modeling using 1D convolution filter layers in a neural network architecture. This approach was developed at System1 for forecasting marketplace value of online advertising categories.
Extending Pandas with Custom Types - Will AydPyData
Pandas v.0.23 brought to life a new extension interface through which you can extend NumPy's type system. This talk will explain what that means in more detail and provide practical examples of how the new interface can be leveraged to drastically improve your reporting.
Machine learning models are increasingly used to make decisions that affect people’s lives. With this power comes a responsibility to ensure that model predictions are fair. In this talk I’ll introduce several common model fairness metrics, discuss their tradeoffs, and finally demonstrate their use with a case study analyzing anonymized data from one of Civis Analytics’s client engagements.
What's the Science in Data Science? - Skipper SeaboldPyData
The gold standard for validating any scientific assumption is to run an experiment. Data science isn’t any different. Unfortunately, it’s not always possible to design the perfect experiment. In this talk, we’ll take a realistic look at measurement using tools from the social sciences to conduct quasi-experiments with observational data.
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
Forecasting time-series data has applications in many fields, including finance, health, etc. There are potential pitfalls when applying classic statistical and machine learning methods to time-series problems. This talk will give folks the basic toolbox to analyze time-series data and perform forecasting using statistical and machine learning models, as well as interpret and convey the outputs.
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
A historical text may now be unreadable, because its language is unknown, or its script forgotten (or both), or because it was deliberately enciphered. Deciphering needs two steps: Identify the language, then map the unknown script to a familiar one. I’ll present an algorithm to solve a cartoon version of this problem, where the language is known, and the cipher is alphabet rearrangement.
Deprecating the state machine: building conversational AI with the Rasa stack...PyData
Rasa NLU & Rasa Core are the leading open source libraries for building machine learning-based chatbots and voice assistants. In this live-coding workshop you will learn the fundamentals of conversational AI and how to build your own using the Rasa Stack.
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat
The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Splunk Security Update | Public Sector Summit Germany 2025Splunk
The Face of Nanomaterials: Insightful Classification Using Deep Learning - Angelo Ziletti
1. The Face of Nanomaterials:
Insightful Classification
Using Deep Learning
Dr. Angelo Ziletti
Deputy Group Leader in Data Science for Materials
Fritz Haber Institute of the Max Planck Society
Berlin, Germany
Berlin, July 8th
, 2018
3. 3
● Ruled by the laws of Quantum Physics
What is a nanomaterial?
International Organization for Standardization (ISO)
"Material with any external dimension in the nanoscale or having internal
structure in a size range from approximately 1 nm to 100 nm."
(A human hair is approximately 80,000- 100,000 nanometers wide)
4. 4
Why are nanomaterials important?
LEDs
Nobel Prize
Physics 2014
(blue LED)
Lasers
Nobel Prize
Physics 1964, 1981
Computers
Nobel Prize
Physics 1956
(transistor)
Levitating Trains
Nobel Prize
Physics 1972
(th. superconductivity)
… and many others...
5. 5
● Graphene:
– Single layer of graphite (carbon), 1-atom thick
– strongest material ever discovered (tensile strength= 130GPa)
– lowest known resistivity at room temperature
– better heat conductor than silver and copper
– 97% transparent
An example: two-dimensional materials
Nobel Prize
2010
Model Experiment Fabrication
7. 7
● Given an atomic arrangement in a nanomaterial, determine the (“most
similar”) prototype among the following classes:
The goal
Body-centered-tetragonal
(139)
Body-centered-tetragonal
(141)
Hexagonal
Simple cubic Face-centered-cubic Diamond Body-centered-cubic
Rhombohedral
8. 8
Structures are quite (very?) similar
Simple
cubic
Body-centered
cubic
Face-centered
cubic
9. 9
Structures are quite (very?) similar
Simple
cubic
Body-
centered
cubic
Face-
centered
cubic
Ref: B. A. Averill and P. Eldredge, Chemistry: Principles, Patterns, and Applications, Prentice Hall (2007)
10. 10
And with atom removals/deformations...
Simple
cubic
Body-centered
cubic
Face-centered
cubic
12. 12
● Nanomaterials are complex, non-rigid, three-dimensional objects with
periodically repeated structures (like the brick of a house)
● A good representation of nanomaterials must be:
– invariant with respect to system size
– stable with respect to deformations and atoms removal
Feature Engineering for periodic 3D objects
Perfect structure 25% atoms removed Random deformation
13. 13
● … and ideally:
– the representation is compact
– nanomaterials belonging to a similar class have a similar
representation
● Learning symmetries by data augmentation?
... but for each structure we would need to give:
– Nanomaterials of different sizes
– All (!) distorted configurations
→ a huge amount of data (and no learning guarantee)
Feature Engineering for periodic 3D objects
14. 14
The diffraction fingerprint: intuition
Crystal
structure to
classify
Diffraction
fingerprint
Simulated
radiation
● Rotate the crystal structure of 45°
and (-45°) about the x,y, and z axis
● Calculate the diffraction pattern
(~Fourier Transform) for each
rotation:
– around x-axis
– around y-axis
– around z-axis
● Sum the results in a RGB image
Ziletti et al., Nature Communications, in press; arXiv: 1709.02298 (2018).
15. 15
The diffraction fingerprint: results
Body-centered-tetragonal
(139)
Body-centered-tetragonal
(141)
Rhombohedral/Hexagonal
Simple cubic Face-centered-cubic Diamond Body-centered-cubic
Ziletti et al., Nature Communications, in press; ArXiv: 1709.02298 (2018).
18. 18
● A standard n-layer neural network applies to the input data a series of linear
and non-linear transformations in successions:
– non-linear operators: ReLU, sigmoid, max-pooling, softmax.
– : weight matrices and bias vectors
● Neural networks have been extremely successful in a large variety of task
(computer vision, speech recognition, machine translation, etc)
● For image recognition: Convolutional Neural Network (ConvNets)[1]
Prediction model: neural network
[1] LeCun et al., Neural Comput. 1, 541 (1989)
19. 19
How do we (humans) subconsciously classify an image?
Looking for identifiable (pre-learned) features (e.g. for dogs: paws, 4 legs)
ConvNets: human analogy
How does a computer classify an image?
Looking at low level features (edges and curves), and then build more
abstract concepts though a series of (convolutional) layers.
20. 20
Computing a convolution
Ref: V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning, https://arxiv.org/abs/1603.07285 (2016)
● Slide kernel throughout the image
● For each position in the image:
– Element-wise multiplication between
image and kernel
– Sum of all elements (within the region)
Output
Input
Kernel
21. 21
Computing a convolution: example
Ref: V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning, https://arxiv.org/abs/1603.07285 (2016)
24. 24
Convolutional layer recap
● Convolution is spatial filtering
● Different filters (weights) extract different
characteristics of the input → multiple filters
● Complexity of the filters increases layer by layer
● Filters learned minimizing the training error
● Multiple conv. Layers:
– 1st
layer: input=image → low-level filters (e.g. curve or straight edges)
– 2nd
layer: input=activation map → higher level filters (e.g. semicircles:
curve+straight edges, squares)
– nth
layer: high level filters (e.g. face)
25. 25
Pooling layer
● Replaces the output at a certain location with a summary statistic of
the nearby outputs
● Makes the representations smaller (downsampling)
● Different poolings: e.g. max pooling, average pooling
● It is not crucial and can be avoided
Images from Stanford CS231n: Convolutional Neural Networks for Visual Recognition (http://cs231n.github.io/convolutional-networks/)
28. 28
● Dataset 1:
– Includes ~90 chemical elements
– Different nanomaterials’ sizes
● Dataset numbers:
– 10,517 images; 7 classes
– 90% training, 10% validation (randomly)
– ConvNet runtime: train: ~80min, pred. ~70 ms @img
The pristine dataset
Training accuracy [%] Validation accuracy [%]
100.0 100.0
29. 29
● Dataset 2: dataset 1 with added defects
– Random displacements: up to st. dev. 0.06 Å
– Random vacancies: up to 25%
– Substitutions (randomly change the type of
atom: e.g. C -> H)
● Dataset numbers:
– 105,170 images
– 7 classes
The defective dataset (test set)
Training accuracy [%] Test accuracy [%]
No Training 100.0
32. 32
Comparison with materials science state-of-the-art
● Our deep learning-based method outperforms the
state-of-the-art approach
● “Fairness” note: smaller number of materials
classes (so far), need correctly labeled (!) training
data
Spglib: Grosse-Kunstleve, Acta Crystallographica A, 55, pp. 383 (1999); A. Togo, https://atztogo.github.io/spglib/ (2009)
Deep learning-based: Ziletti et al., Nature Communications, in press, arXiv: 1709.02298 (2018)
34. 34
Back-projection to image space
Method: Zeiler and Fergus, European Conf. on Computer Vision, Springer, 2014.
● Project feature activities back to the input pixel space
35. 35
“Going backwards” in a convolutional layer
Method: Zeiler and Fergus, European Conf. on Computer Vision, Springer, 2014.
TransposedConvolution: Im et al., Generating images with recurrent adversarial networks, arXiv: 1602.05110 (2016)
Input to layer
Convolution
Pooling
Next layer
Nonlinearity
Reconstruction
Fractionally strided
convolution
Unpooling
Layer above reconstruction
Nonlinearity
Forward pass Going backwards: reconstruction
Also called:
- Transposed convolution
- Backward strided convolution
- Deconvolution
In Tensorflow:
tf.nn.conv2d_transpose
36. 36
Attentive response maps: forward pass
● Forward pass of the image
– for each pooling layer: store pool switches
– for conv. layer of interest (e.g. last):
● calculate filters’ activations
● order filters by activation value
– select the top most-activated filters
Method: Zeiler and Fergus, European Conf. on Computer Vision, Springer, 2014.
Application to anatomy classification: Kumar et al., IEEE Int. Symp. on Biomed. Imaging, arXiv: 1611.06284 (2018)
Application to materials science: Ziletti et al., Nature Communications, in press, arXiv: 1709.02298 (2018)
Input image
ClassificationConv
Layer 1
Conv
Layer 2
Last Conv
Layer
FC
Layers
...
37. 37
Attentive response maps: back-projection
Input image
Conv
Layer 1
Conv
Layer 2
Last Conv
Layer
...
● Back-propagate to image space the top most-activated filters
– for max-pooling layers→ unpooling
– for convolutional layers→ fractionally strided convolution
Method: Zeiler and Fergus, European Conf. on Computer Vision, Springer, 2014.
Application to anatomy classification: Kumar et al., IEEE Int. Symp. on Biomed. Imaging, arXiv: 1611.06284 (2018)
Application to materials science: Ziletti et al., Nature Communications, in press, arXiv: 1709.02298 (2018)
38. 38
Attentive response maps: per-pixel max
Input image
Conv
Layer 1
Conv
Layer 2
Last Conv
Layer
...
● Compute the per-pixel max of
these back-projected maps
Max
Individual response maps
Attentive response map
Method: Zeiler and Fergus, European Conf. on Computer Vision, Springer, 2014.
Application to anatomy classification: Kumar et al., IEEE Int. Symp. on Biomed. Imaging, arXiv: 1611.06284 (2018)
Application to materials science: Ziletti et al., Nature Communications, in press, arXiv: 1709.02298 (2018)
39. 39
Attentive response maps (Summary)
● Forward pass of the image
– for each pooling layer: store pool switches
– for conv. layer of interest (e.g. last):
● calculate filters’ activations
● order filters by activation value
– select the top most-activated filters
● Back-propagate to image space the top most-activated filters
– for max-pooling layers→ unpooling
– for convolutional layers→ fractionally strided convolution
● Compute the per-pixel max of these back-projected maps
Method: Zeiler and Fergus, European Conf. on Computer Vision, Springer, 2014.
Application to anatomy classification: Kumar et al., IEEE Int. Symp. on Biomed. Imaging, arXiv: 1611.06284 (2018)
Application to materials science: Ziletti et al., Nature Communications, in press, arXiv: 1709.02298 (2018)
40. 40
Understanding ConvNets
Devinder Kumar
(University of
Waterloo, Canada)
Attentive response maps
Input
image
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6
green
red
Ziletti et al., Nature Communications, in press; ArXiv: 1709.02298 (2018).
42. 42
What did the ConvNet learn?
● Sum of the last convolutional layer attentive response maps:
● has learned nanomaterials templates automatically from the data
● uses the same landmarks a materials scientist would use
although never explicitly instructed to do so
Our ConvNet:
43. 43
● The challenge
● How to represent a nanomaterial
● Convolutional Networks
● Opening the black-box
Summary
46. Dr. Angelo Ziletti
Fritz Haber Institute of the Max Planck Society, Berlin, Germany
Insightful Classification of Crystal Structures
Using Deep Learning
Ziletti et al., Nature Communications, in press (2018).
Online: https://arxiv.org/abs/1709.02298
[email protected]