SlideShare a Scribd company logo
Hit and Lead Discovery with Explorative RL and
Fragment-based Molecule Generation
SoojungYang, Doyeong Hwang, Seul Lee,
Seongok Ryu, Sung Ju Hwang
AITRICS and KAIST
Screening library
Hits
Chemical space
Automated small molecule drug discovery
Hits
• Searching for novel hits is a critical task in drug discovery.
2
Hits
• Molecules with high therapeutic potential
• High binding affinity to a given protein target
Small molecule Protein target
+ ➞
Protein-ligand complex
ΔG = binding free energy
Automated small molecule drug discovery
• Searching for novel hits is a critical task in drug discovery.
• RL methods can be used for automated hit search.
Screening library
Hits
RL agent
Chemical space
3
Automated small molecule drug discovery
Screening library
Hits
• Searching for novel hits is a critical task in drug discovery.
• RL methods can be used for automated hit search.
• Molecular docking simulation estimates protein-ligand binding affinity.
• We can guide RL agents with docking scores as reward.
RL agent
Docking
simulation score
reward
Chemical space
4
Drug molecules should satisfy strong structural constraints
➝ Few examples of structural alerts in
PAINS filter.
• Drug molecules should not have toxic or highly
reactive substructures.
• Widely-used medicinal chemistry filters include
several hundreds of diverse structural alerts.
Hits
Chemical space
High-quality hits
5
Drug molecules should satisfy strong structural constraints
• Previous molecular generative models guided by implicit methods cannot
entirely avoid inappropriate structures.
• e.g., multi-objective optimization where
total reward = docking score reward + structure penalty reward
➠ Explicit method to constrain the generation space within acceptable
molecules are necessary.
“Fragment-based molecular generation”
Hits
Chemical space
High-quality hits
6
Our fragment-based molecular generation method
7
Pharmacochemcially acceptable fragment library
Action 1
Next state molecule St+1
Action 2
Action 3 Augmented fragment
Possible attachment sites
Current state molecule St
Markovian embedding and policy network makes the model plausible for
hit generation and scaffold-based generation
8
• The embeddings of the molecule and fragments are autoregressively passed onto the policy network.
• State embedding network and policy network are Markov models.
• Any arbitrary molecule can be the current state.
➠ The model can be used for scaffold-based generation as well as hit generation.
Connectivity-preserving generation prevents unrealistic bonds
9
• We preserve the connectivity information of fragmented molecules as explicit attachment sites.
• In this way, our model can avoid the generation of molecules with unrealistic bonds.
Pyrrolium
Pyrrole Pyrrolium 2-methylpyrrole
New bond formation on N New bond formation on C
Explorative RL algorithm improves model performance in the
strongly constrained generation space
10
Chemical space
RL agent
Acceptable
chemical spaces
Good exploration
• Strong constraint in generation space makes search space less
smooth ➠ Solutions can be trapped in local optima.
• A practical drug discovery model should find as many diverse
optimal areas in the chemical space.
Our strategies
• Employ soft-actor-critic (SAC), an off-policy actor-critic algorithm
based on maximum entropy RL, which encourages exploration.
• Devise explorative algorithms based on prioritized experience replay
(PER) method.
11
priority pt
• Goal: Sample-efficient exploration
• Priority is a measure of how much additional information
we can learn from the transition.
• Pt = TD error of agent’s value estimate (Q or V function)
[Schaul et al.] Original PER
priority pt
Our PER algorithm encourages the agent to visit novel states
[Schaul et al.] Prioritized Experience Replay, ICLR 2016
12
priority pt
• Goal: Sample-efficient exploration
• Priority is a measure of how much additional information
we can learn from the transition.
• Pt = TD error of agent’s value estimate (Q or V function)
[Schaul et al.] Original PER
• Goal: Encourage sufficient diversity
• Priority is a measure of the novelty of the state
• PER(PE) Pt = predictive error of reward estimator
• PER(BU) Pt = Bayesian uncertainty of reward estimate
Ours: PER(PE), PER(BU)
Our PER algorithm encourages the agent to visit novel states
[Schaul et al.] Prioritized Experience Replay, ICLR 2016
reward
rt
reward predictor
rt
^
priority pt = rt
^
- rt
predictive error
FREED Model Overview and Contributions
13
• Our fragment-based generation method allows our model to leverage medicinal chemistry prior knowledge.
• Our proposed explorative RL method based on PER significantly improves the model performance.
FREED : Fragment-based generative RL with Explorative Experience replay for Drug design
Experimental results
14
fa7
protease
parp1
polymerase
5ht1b
G protein-coupled receptor
Three protein targets Evaluation metrics
Quality score (filter score)
: ratio of accepted, valid molecules to total
generated molecules
Hit ratio
: ratio of unique hit molecules to total
generated molecules, where hit is defined as
molecules who have higher docking scores
than the median of known active molecules
Top 5% score
: average docking score of top 5%-scored
generated molecules
* Higher = greater absolute value
Baseline models
MORLD
: atom-wise generative model + MolDQN
REINVENT
: SMILES-based generative model + REINFORCE
Results 1 – Explicit constraints are necessary to avoid problematic structures
15
• Our fragment-based generation model successfully avoids structural alerts.
• Baseline models trained to avoid alerts in multi-objective optimization scheme show suboptimal results.
Results 2 – Our model outperforms the unconstrained models
16
• Our model outperforms or at least show
comparable performance with the
unconstrained baseline models.
Hit ratio
Top 5% score
better
better
Result 3 – Our PER method shows the best performance
17
Hit ratio
Top 5% score
better
better
Result 3 – Our PER method shows the best performance
18
• All explorative algorithms outperform the vanilla SAC and PPO.
• Our PER(PE) and PER(BU) outperform previous algorithms such as PER(TD) and curiosity-driven algorithms.
Hit ratio
Top 5% score
better
better
Result 4 – Hit and scaffold-based generation case study
19
Hit generation Scaffold-based (lead) generation
• Initial structure: a benzene ring • Initial structure: a scaffold extracted from
known active molecules
Result 4 – Hit and scaffold-based generation case study
20
Summary
21
• Our model FREED is a novel RL framework for real-world drug design that couples a
fragment-based molecular generation strategy with a highly explorative RL algorithm.
• FREED can generate pharmacochemically acceptable molecules with high docking scores.
• When we want to avoid many structural alerts, explicitly constraining the molecular
generation space is more effective than implicit methods.
• By defining priority as the novelty of the state, PER method can encourage the model
exploration to find many optima in highly-constrained molecular generation space.

More Related Content

What's hot (10)

PDF
Docking techniques
Devika Rana
 
PPTX
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
santosh Kumbhar
 
PPTX
docking
prateek kumar
 
PDF
Ijcatr04051005
Editor IJCATR
 
PPT
Protein docking
Senthil Natesan
 
PDF
Open-source tools for querying and organizing large reaction databases
Greg Landrum
 
PPTX
Computer Aided Molecular Modeling
pkchoudhury
 
PDF
Structure based computer aided drug design
Thanh Truong
 
ODP
26.docking
Abhijeet Kadam
 
PPTX
Molecular docking
Shrihith.A Ananthram
 
Docking techniques
Devika Rana
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
santosh Kumbhar
 
docking
prateek kumar
 
Ijcatr04051005
Editor IJCATR
 
Protein docking
Senthil Natesan
 
Open-source tools for querying and organizing large reaction databases
Greg Landrum
 
Computer Aided Molecular Modeling
pkchoudhury
 
Structure based computer aided drug design
Thanh Truong
 
26.docking
Abhijeet Kadam
 
Molecular docking
Shrihith.A Ananthram
 

Similar to Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation (20)

PPT
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Kim Solez ,
 
PPTX
Data Science Meets Drug Discovery
Philip Bourne
 
PPTX
Bioinformatics t9-t10-biocheminformatics v2014
Prof. Wim Van Criekinge
 
PPT
SOT short course on computational toxicology
Sean Ekins
 
PDF
Pharmacophore
irecen
 
PPTX
Deep reinforcement learning for de novo drug design
Nimmi Weeraddana
 
PDF
Ai in drug design webinar 26 feb 2019
Pistoia Alliance
 
PDF
AI & ML in Drug Design: Pistoia Alliance CoE
Pistoia Alliance
 
PPTX
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Prof. Wim Van Criekinge
 
PDF
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Sean Ekins
 
PPTX
Artificial Intelligence scope and bad effects
atifkhan7861122360
 
PDF
AI for drug discovery
Deakin University
 
PPT
Unc slides on computational toxicology
Sean Ekins
 
PPT
Rational_Drug_Design.ppt
Seema Bansal
 
PPTX
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
Prof. Wim Van Criekinge
 
PDF
Machine Learning and Reasoning for Drug Discovery
Deakin University
 
PDF
Where do recent small molecule clinical development candidates come from?
Jonas Boström
 
PPTX
pharmacophoremapping05-180503150916-converted.pptx
ashharnomani
 
PPTX
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
Prof. Wim Van Criekinge
 
PDF
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Sean Ekins
 
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Kim Solez ,
 
Data Science Meets Drug Discovery
Philip Bourne
 
Bioinformatics t9-t10-biocheminformatics v2014
Prof. Wim Van Criekinge
 
SOT short course on computational toxicology
Sean Ekins
 
Pharmacophore
irecen
 
Deep reinforcement learning for de novo drug design
Nimmi Weeraddana
 
Ai in drug design webinar 26 feb 2019
Pistoia Alliance
 
AI & ML in Drug Design: Pistoia Alliance CoE
Pistoia Alliance
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Prof. Wim Van Criekinge
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Sean Ekins
 
Artificial Intelligence scope and bad effects
atifkhan7861122360
 
AI for drug discovery
Deakin University
 
Unc slides on computational toxicology
Sean Ekins
 
Rational_Drug_Design.ppt
Seema Bansal
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
Prof. Wim Van Criekinge
 
Machine Learning and Reasoning for Drug Discovery
Deakin University
 
Where do recent small molecule clinical development candidates come from?
Jonas Boström
 
pharmacophoremapping05-180503150916-converted.pptx
ashharnomani
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
Prof. Wim Van Criekinge
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Sean Ekins
 
Ad

More from MLAI2 (20)

PDF
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
MLAI2
 
PDF
Online Hyperparameter Meta-Learning with Hypergradient Distillation
MLAI2
 
PDF
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
PDF
Representational Continuity for Unsupervised Continual Learning
MLAI2
 
PDF
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
PDF
Skill-Based Meta-Reinforcement Learning
MLAI2
 
PDF
Edge Representation Learning with Hypergraphs
MLAI2
 
PDF
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
MLAI2
 
PDF
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
PDF
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
MLAI2
 
PDF
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
MLAI2
 
PDF
Accurate Learning of Graph Representations with Graph Multiset Pooling
MLAI2
 
PDF
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
MLAI2
 
PDF
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MLAI2
 
PDF
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
PDF
Adversarial Self-Supervised Contrastive Learning
MLAI2
 
PDF
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
MLAI2
 
PDF
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
MLAI2
 
PDF
Cost-effective Interactive Attention Learning with Neural Attention Process
MLAI2
 
PDF
Adversarial Neural Pruning with Latent Vulnerability Suppression
MLAI2
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
MLAI2
 
Online Hyperparameter Meta-Learning with Hypergradient Distillation
MLAI2
 
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
Representational Continuity for Unsupervised Continual Learning
MLAI2
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
Skill-Based Meta-Reinforcement Learning
MLAI2
 
Edge Representation Learning with Hypergraphs
MLAI2
 
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
MLAI2
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
MLAI2
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
MLAI2
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
MLAI2
 
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
MLAI2
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MLAI2
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
Adversarial Self-Supervised Contrastive Learning
MLAI2
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
MLAI2
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
MLAI2
 
Cost-effective Interactive Attention Learning with Neural Attention Process
MLAI2
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
MLAI2
 
Ad

Recently uploaded (20)

PDF
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
PPTX
Lifting and Rigging Safety AQG-2025-2.pptx
farrukhkhan658034
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PPTX
UI5Con 2025 - Get to Know Your UI5 Tooling
Wouter Lemaire
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
PDF
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PPTX
Machine Learning Benefits Across Industries
SynapseIndia
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Productivity Management Software | Workstatus
Lovely Baghel
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
Lifting and Rigging Safety AQG-2025-2.pptx
farrukhkhan658034
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
UI5Con 2025 - Get to Know Your UI5 Tooling
Wouter Lemaire
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Machine Learning Benefits Across Industries
SynapseIndia
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Productivity Management Software | Workstatus
Lovely Baghel
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 

Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation

  • 1. Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation SoojungYang, Doyeong Hwang, Seul Lee, Seongok Ryu, Sung Ju Hwang AITRICS and KAIST
  • 2. Screening library Hits Chemical space Automated small molecule drug discovery Hits • Searching for novel hits is a critical task in drug discovery. 2 Hits • Molecules with high therapeutic potential • High binding affinity to a given protein target Small molecule Protein target + ➞ Protein-ligand complex ΔG = binding free energy
  • 3. Automated small molecule drug discovery • Searching for novel hits is a critical task in drug discovery. • RL methods can be used for automated hit search. Screening library Hits RL agent Chemical space 3
  • 4. Automated small molecule drug discovery Screening library Hits • Searching for novel hits is a critical task in drug discovery. • RL methods can be used for automated hit search. • Molecular docking simulation estimates protein-ligand binding affinity. • We can guide RL agents with docking scores as reward. RL agent Docking simulation score reward Chemical space 4
  • 5. Drug molecules should satisfy strong structural constraints ➝ Few examples of structural alerts in PAINS filter. • Drug molecules should not have toxic or highly reactive substructures. • Widely-used medicinal chemistry filters include several hundreds of diverse structural alerts. Hits Chemical space High-quality hits 5
  • 6. Drug molecules should satisfy strong structural constraints • Previous molecular generative models guided by implicit methods cannot entirely avoid inappropriate structures. • e.g., multi-objective optimization where total reward = docking score reward + structure penalty reward ➠ Explicit method to constrain the generation space within acceptable molecules are necessary. “Fragment-based molecular generation” Hits Chemical space High-quality hits 6
  • 7. Our fragment-based molecular generation method 7 Pharmacochemcially acceptable fragment library Action 1 Next state molecule St+1 Action 2 Action 3 Augmented fragment Possible attachment sites Current state molecule St
  • 8. Markovian embedding and policy network makes the model plausible for hit generation and scaffold-based generation 8 • The embeddings of the molecule and fragments are autoregressively passed onto the policy network. • State embedding network and policy network are Markov models. • Any arbitrary molecule can be the current state. ➠ The model can be used for scaffold-based generation as well as hit generation.
  • 9. Connectivity-preserving generation prevents unrealistic bonds 9 • We preserve the connectivity information of fragmented molecules as explicit attachment sites. • In this way, our model can avoid the generation of molecules with unrealistic bonds. Pyrrolium Pyrrole Pyrrolium 2-methylpyrrole New bond formation on N New bond formation on C
  • 10. Explorative RL algorithm improves model performance in the strongly constrained generation space 10 Chemical space RL agent Acceptable chemical spaces Good exploration • Strong constraint in generation space makes search space less smooth ➠ Solutions can be trapped in local optima. • A practical drug discovery model should find as many diverse optimal areas in the chemical space. Our strategies • Employ soft-actor-critic (SAC), an off-policy actor-critic algorithm based on maximum entropy RL, which encourages exploration. • Devise explorative algorithms based on prioritized experience replay (PER) method.
  • 11. 11 priority pt • Goal: Sample-efficient exploration • Priority is a measure of how much additional information we can learn from the transition. • Pt = TD error of agent’s value estimate (Q or V function) [Schaul et al.] Original PER priority pt Our PER algorithm encourages the agent to visit novel states [Schaul et al.] Prioritized Experience Replay, ICLR 2016
  • 12. 12 priority pt • Goal: Sample-efficient exploration • Priority is a measure of how much additional information we can learn from the transition. • Pt = TD error of agent’s value estimate (Q or V function) [Schaul et al.] Original PER • Goal: Encourage sufficient diversity • Priority is a measure of the novelty of the state • PER(PE) Pt = predictive error of reward estimator • PER(BU) Pt = Bayesian uncertainty of reward estimate Ours: PER(PE), PER(BU) Our PER algorithm encourages the agent to visit novel states [Schaul et al.] Prioritized Experience Replay, ICLR 2016 reward rt reward predictor rt ^ priority pt = rt ^ - rt predictive error
  • 13. FREED Model Overview and Contributions 13 • Our fragment-based generation method allows our model to leverage medicinal chemistry prior knowledge. • Our proposed explorative RL method based on PER significantly improves the model performance. FREED : Fragment-based generative RL with Explorative Experience replay for Drug design
  • 14. Experimental results 14 fa7 protease parp1 polymerase 5ht1b G protein-coupled receptor Three protein targets Evaluation metrics Quality score (filter score) : ratio of accepted, valid molecules to total generated molecules Hit ratio : ratio of unique hit molecules to total generated molecules, where hit is defined as molecules who have higher docking scores than the median of known active molecules Top 5% score : average docking score of top 5%-scored generated molecules * Higher = greater absolute value Baseline models MORLD : atom-wise generative model + MolDQN REINVENT : SMILES-based generative model + REINFORCE
  • 15. Results 1 – Explicit constraints are necessary to avoid problematic structures 15 • Our fragment-based generation model successfully avoids structural alerts. • Baseline models trained to avoid alerts in multi-objective optimization scheme show suboptimal results.
  • 16. Results 2 – Our model outperforms the unconstrained models 16 • Our model outperforms or at least show comparable performance with the unconstrained baseline models. Hit ratio Top 5% score better better
  • 17. Result 3 – Our PER method shows the best performance 17 Hit ratio Top 5% score better better
  • 18. Result 3 – Our PER method shows the best performance 18 • All explorative algorithms outperform the vanilla SAC and PPO. • Our PER(PE) and PER(BU) outperform previous algorithms such as PER(TD) and curiosity-driven algorithms. Hit ratio Top 5% score better better
  • 19. Result 4 – Hit and scaffold-based generation case study 19 Hit generation Scaffold-based (lead) generation • Initial structure: a benzene ring • Initial structure: a scaffold extracted from known active molecules
  • 20. Result 4 – Hit and scaffold-based generation case study 20
  • 21. Summary 21 • Our model FREED is a novel RL framework for real-world drug design that couples a fragment-based molecular generation strategy with a highly explorative RL algorithm. • FREED can generate pharmacochemically acceptable molecules with high docking scores. • When we want to avoid many structural alerts, explicitly constraining the molecular generation space is more effective than implicit methods. • By defining priority as the novelty of the state, PER method can encourage the model exploration to find many optima in highly-constrained molecular generation space.