De Novo Molecular Design: A Computational Overview

Updated 9 February 2026

De novo molecular design is the algorithmic generation of novel molecular structures from first principles to optimize properties for drug discovery and materials science.
It utilizes diverse methods such as SMILES/SELFIES language models, graph-based approaches, latent diffusion, and evolutionary algorithms to ensure high chemical validity and diversity.
Recent advances integrate reinforcement learning, curriculum learning, and multi-objective optimization to accelerate convergence on benchmarks like GuacaMol and improve property control.

De novo molecular design refers to the algorithmic generation of novel molecular structures with defined or optimized properties from first principles, without relying on explicit enumeration of existing chemical libraries. The aim is to efficiently traverse the astronomically large chemical space—ranging from 10²³ to 10¹⁰⁰ synthesizable small molecules—to identify candidates that satisfy multi-parameter objectives relevant to drug discovery or materials science.

1. Formalization and Core Objectives

The central objective of de novo molecular design is to learn a generative model $\pi_\theta$ over molecular representations (typically SMILES strings $S=(c_1,...,c_L)$ built from a vocabulary $\mathcal{V}$ ), such that the expected value of a property- or task-specific oracle $f_\mathrm{score}(S)$ is maximized:

$\theta^* = \arg\max_{\theta} \mathbb{E}_{S\sim\pi_\theta}[f_\mathrm{score}(S)]$

Chemical space is too large for exhaustive enumeration ( $\gtrsim10^{60}$ valid small molecules for standard drug-likeness constraints), necessitating efficient search, optimization, and exploration strategies (Hou, 2 Apr 2025, Wang et al., 2022).

Key requirements include:

High syntactic validity and diversity of generated molecules
Control over specified molecular properties (e.g., bioactivity, ADMET features, docking score, synthetic accessibility)
Scalability of search and model training to very large chemical spaces
Sample efficiency with respect to expensive property calculations (e.g., docking, DFT)

2. Molecular Representations and Generation Paradigms

2.1. SMILES/SELFIES LLMs

String-based models treat molecules as sequences over character or substructure alphabets, enabling autoregressive modeling using RNNs (Yang et al., 2017, Rao, 2023, Olivecrona et al., 2017, Chitsaz et al., 19 Aug 2025) or Transformers (Xu et al., 2023, Wang et al., 2022). Generation involves sampling one token at a time, with probabilities conditioned on previous tokens.

2.2. Graph-based Models

Graph neural networks and transformer layers process molecular graphs directly, capturing atom and bond relationships. Models such as GANs with graph-transformer generators enable edge-conditioned attention for target-centric design (Ünlü et al., 2023).

2.3. Latent Variable and Diffusion Models

Autoencoders (VAE/AAE) learn a continuous latent space of molecular structures (enabling gradient-based optimization (Blaschke et al., 2017, Lim et al., 2018)), while score-based diffusion models operate directly on atomistic 3D coordinates with E(3)-equivariant backbones (Chen et al., 24 Oct 2025), enabling direct generation of 3D structures.

2.4. Population-based and Evolutionary Methods

Grammatical evolution (Yoshikawa et al., 2018) and genetic algorithms build molecules as parse trees or as mutation/crossover offspring, evaluated in parallel with fitness functions based on docking or property calculation.

2.5. LLM Prompting and Knowledge-Augmented Generation

Zero-shot prompt-based approaches employ LLMs conditioned on technical molecule descriptions to directly output candidate SMILES, utilizing knowledge-augmented retrievals to improve validity, exactness, and semantic alignment (Srinivas et al., 2024).

3. Optimization Algorithms and Training Objectives

3.1. Reinforcement Learning (RL)

Policy-gradient and value-based RL fine-tune generative models with respect to scalar or vector-valued rewards that encode property objectives (Olivecrona et al., 2017, Li et al., 2022, Hou, 2 Apr 2025, Chitsaz et al., 19 Aug 2025, Xu et al., 2023). Augmented likelihood approaches regularize agent policies against a pretrained prior to prevent mode collapse and loss of chemical realism.

RL reward design is flexible:

Scalar rewards: direct optimization of affinity, QED, or other properties.
Composite rewards: multi-objective balancing via weighted sum or product (e.g., QED, SA, docking score).
Uncertainty-aware reward shaping: surrogate property models with predictive uncertainty are used to smooth multi-constraint satisfaction (Chen et al., 24 Oct 2025).
Direct Preference Optimization (DPO): contrastive, pairwise-preference loss is used to maximize the log-likelihood difference between higher- and lower-scoring molecules with respect to a reference policy (Hou, 2 Apr 2025).

3.2. Curriculum Learning (CL)

Curriculum learning systematically increases the difficulty of optimization tasks during training. For DPO, stages are defined via the minimum score gap between paired molecules:

Coarse scaffolds first (large $\Delta$ ), then finer refinements and functional-group tweaks as training progresses (Hou, 2 Apr 2025). This methodology reduces early variability and accelerates convergence.

3.3. Multi-Objective and Pareto-Optimized Design

Multi-objective genetic algorithms (e.g., NSGA-II) explicitly sort populations by Pareto dominance, supporting simultaneous optimization under hard physicochemical, interaction, and synthetic constraints (Daeyaert et al., 2017, Chen et al., 24 Oct 2025). Crowding-distance is used to maintain chemical diversity.

4. Evaluation Benchmarks and Metrics

4.1. Standard Benchmarks

GuacaMol provides a unified suite of benchmark tasks for both unconditional (distribution learning) and goal-directed (property optimization) evaluation (Brown et al., 2018, Hou, 2 Apr 2025). Key metrics include:

Validity: fraction of generated molecules that are chemically valid
Uniqueness/Novelty: rate of unique and out-of-training-set molecules
Distributional metrics: KL divergence and Fréchet ChemNet Distance (FCD) between generated and reference property distributions
Top-K task scores: best, top-10, and top-100 property values for each benchmark (Brown et al., 2018)

4.2. Goal-Directed Tasks

Benchmarks include rediscovery of actives, similarity optimization, isomer enumeration, multi-parameter optimization (MPO), and explicit SMARTS/scaffold hopping tasks. Methods are evaluated for their ability to reach high composite scores and generate high-quality, synthesizable compounds.

4.3. Real-World Validation

Downstream validation includes in silico docking (e.g., with Glide, LeDock), ADMET profiling, and MD simulation (Hou, 2 Apr 2025, Chen et al., 24 Oct 2025). Empirically, in vitro binding and cellular potency data (e.g., IC50) can be used to calibrate and confirm the practical relevance of generated compounds.

5. Methodological Advances and Representative Results

5.1. Direct Preference Optimization and Curriculum Learning

DPO enables guided generation without explicit reward engineering, showing ~6× faster convergence than RL baselines and achieving state-of-the-art scores on GuacaMol (e.g., Perindopril MPO: 0.883 vs. 0.810 for MolRL-MGPT). Pairwise preferences aligning the policy with higher-quality chemotypes were found critical to robust property optimization (Hou, 2 Apr 2025).
Integration of a curriculum on preference gap size stabilizes training and accelerates global optimization (Hou, 2 Apr 2025).

5.2. Language and Graph Models

Large foundation models pretrained on $1.5\times10^9$ molecules (e.g., NovoMolGen, Llama-style BPE-Transformer) achieve high validity (0.999), diversity (IntDiv=0.851), and novelty (0.982) (Chitsaz et al., 19 Aug 2025). Fine-tuning via augmented hill-climb or RL achieves new SOTA on the PMO benchmark (AUC-top10 = 16.70 for 300 M param).
Graph-transformer GANs produce target-centric inhibitors that are validated in vitro and through MD (RMSD $<1$ Å), with atom-level attention maps for model interpretability (Ünlü et al., 2023).

5.3. Multi-Objective RL and Surrogate Modeling

ExMoIRL combines docking, drug-likeness, and system-level phenotypic signatures in a joint RL objective, leading to best-in-class validity ( $>$ 98%), novelty ( $>$ 95%), and activity on multiple cancer-relevant targets (Guo et al., 25 Sep 2025).
Uncertainty-aware multi-objective RL with PPO and Chemprop-based surrogate property prediction yields improved VUN metrics and higher rates of property satisfaction, with validated ADMET and MD stability for EGFR inhibitors (Chen et al., 24 Oct 2025).

5.4. Diversity and Parallelism

Population-based grammatical evolution (ChemGE) maintains high internal diversity and parallelism in offspring evaluation, producing hundreds of unique high-affinity actives exceeding the best known docking scores (Yoshikawa et al., 2018).

5.5. Conditional and Zero-Shot Generation

CVAEs enable property-conditional generation and property extrapolation, supporting multi-parameter and out-of-distribution property control (Lim et al., 2018).
Zero-shot LLM prompting with scaffold-based knowledge augmentation approaches SOTA on text-to-molecule mapping metrics (exact match 0.641, FCD 0.796) (Srinivas et al., 2024).

6. Challenges and Future Directions

Key challenges and limitations noted across the literature include:

Reward model bias and reward hacking, particularly in single-objective and surrogate-driven RL (Xu et al., 2023, Hou, 2 Apr 2025)
Trade-off between sampling novelty and preservation of synthesizability/specificity (mode collapse and degeneration in adversarial/naive sampling)
Limited 3D or synthetic-chemical awareness in SMILES-based approaches, with ongoing work integrating E(n)-equivariant diffusions, explicit 3D graph backbones, and end-to-end retrosynthesis (Chen et al., 24 Oct 2025, Wang et al., 2022)
Scalability to realistic drug-like and macrocyclic structures, especially for RL and diffusion models on large chemical graphs and property spaces
Empirical ceiling in proxy metric predictivity for downstream property-task performance, with weak correlation ( $r=0.376$ between FCD and PMO score in NovoMolGen) (Chitsaz et al., 19 Aug 2025)

Anticipated directions include: meta-learned conditioning on unseen targets (Wang et al., 2022), integrated property reward networks with uncertainty quantification (Chen et al., 24 Oct 2025), explicit Pareto optimization in high-dimensional multi-objective design (Daeyaert et al., 2017), and integration of molecular LLM pretraining with synthetic route planning and reaction prediction (Ivanenkov et al., 2021).

7. Comparative Table of Representative Methods

Method/Framework	Model Family	Key Optimization	Evaluation Highlights	Reference
DPO + Curriculum Learning	Transformer (CL+DPO)	Pairwise Pref	GuacaMol SOTA: 0.883 Perindopril MPO	(Hou, 2 Apr 2025)
NovoMolGen	Llama Transf, SMILES	Pretrain+RL	Valid: 0.999, PMO top10: 16.70	(Chitsaz et al., 19 Aug 2025)
ChemGE	Evolution, Grammar	Mutation+Sel	349 docked actives, high diversity	(Yoshikawa et al., 2018)
ExMoIRL (multi-obj RL)	Dual VAE+GRU policy	RL+Ranking+Prior	Validity >98%, SOTA IC50 vs cancer	(Guo et al., 25 Sep 2025)
DrugGEN	Graph Transf GAN	WGAN, Target pool	Glide docking+MD, in silico/biolog	(Ünlü et al., 2023)
Pareto-GA/Synopsis	Reaction-Tree GA	NSGA-II Pareto	Dual inhibitors hits, OSDAs synth	(Daeyaert et al., 2017)
SMILES-Transformer RL	Decoder-only Transf	RL (aug LL)	Robust for long SMILES, quick conv	(Xu et al., 2023)

Empirically, the convergence of contrastive, pairwise-preference learning and multi-stage curricula with large-scale molecular language pretraining marks a paradigm shift in de novo molecular design, transitioning from rule-based and hand-engineered schemes to scalable, objective-driven, data-intensive workflows. Evaluative standards have been set by the GuacaMol and MOSES benchmarks, and ongoing literature demonstrates fast progress in both algorithmic sophistication and real-world validation throughput.