Hit-Like Molecule Generation

Updated 20 January 2026

Hit-like molecule generation is the computational synthesis of small molecules that meet defined physicochemical, structural, and bioactivity criteria for hit-to-lead campaigns.
Advanced generative models—such as energy-based, diffusion, and autoregressive networks—integrate explicit chemical constraints to optimize binding affinity and drug-like properties.
Multi-stage evaluation pipelines use metrics like docking scores, synthetic accessibility, and distribution alignment to benchmark the quality and novelty of generated candidates.

Hit-like molecule generation refers to the computational synthesis of small molecules that satisfy the physicochemical, structural, and (often) bioactivity criteria required for entry into hit-to-lead campaigns in early-stage drug discovery. Unlike broad virtual library enumeration or lead optimization, this stage is characterized by the need to produce diverse, novel, and synthesizable compounds with properties and predicted target affinities matching experimentally-validated “hit” compounds. A wide array of generative modeling paradigms—including graph-based energy models, diffusion models, flow-based networks, RNNs, transformers, and reinforced fragment additions—have been deployed in this domain, each leveraging explicit chemical constraints and, increasingly, target- and phenotype-specific conditioning.

1. Formal Criteria and Filtering for Hit-Like Molecules

Osman et al. (Osman et al., 26 Dec 2025) define “hit-like” molecules as compounds meeting multilayer physicochemical, structural, and synthetic constraints, operationalized as:

Physicochemical filters: 150 Da ≤ MW ≤ 350 Da, 1 ≤ log P ≤ 3, SAS ≤ 5, 1–4 rings, ring size < 8, no small aromatic/fused rings, allowed elements {C, N, O, F, P, S, Cl, Br, I}.
Structural/bioactivity: pChEMBL ≥ 5 for at least one reported target.
Reactivity/exclusion: Novartis severity ≤ 10, penalizing PAINS, reactive, unstable chemotypes.

These criteria define a “hit-like” subset in chemical space (2.7% of REINVENT’s ~1M ChEMBL molecules). Models are therefore benchmarked not merely on validity, uniqueness, and novelty, but also on explicit hit-like filter compliance, docking-based bioactivity, and in vitro validation (Osman et al., 26 Dec 2025). This multi-stage pipeline is now a de facto standard for evaluating generative hit generation outputs.

2. Energy-Based, Flow-Based, and Auto-Regressive Generative Models

Energy-based, flow-based, and auto-regressive generative models form the backbone of many recent approaches.

Energy-Based Models (EBMs), as in TagMol (Li et al., 2022), learn a conditional distribution $p(y|x_p)$ over ligand graphs $y$ given a protein representation $x_p$ by introducing a parameterized energy function $E_\theta(x, y)$ . This energy serves as a negative binding affinity, inducing a Gibbs density $q_\theta(y|x)\propto\exp(-E_\theta(x,y))$ . TagMol’s architecture features relational GAT encoders, a conditional WGAN-GP training framework that blends critic, energy, and reward losses, and iterative graph sampling coupled to RDKit-based filtering and docking. This architecture produces candidates whose predicted binding affinities are statistically indistinguishable from experimental ligands in PDBbind (Li et al., 2022).
Normalizing Flow Models—for example, GraphBP (Zhang et al., 2022)—use invertible, autoregressive flows to model $p(M|P)$ (ligand given protein), where each atom’s type and position are generated via invertible mappings conditioned on an up-to-date GNN context, encoding not only static protein structure but also local flexibility via B-factor–weighted message passing. While reporting “high validity” and improved binding energies relative to LiGAN baselines, this method does not incorporate explicit affinity-driven objectives or end-to-end binding predictors (Zhang et al., 2022).
Auto-regressive RNN/LSTM Generators (Bjerrum et al., 2017) operate over SMILES by sequentially predicting the character sequence, with drug-like or fragment-like property distributions inherited from the training set. Explicit filtering for MW, logP, rotatable bonds, etc., prior to training or sampling allows output distributions to match targeted hit-like property regimes. Validity and novelty routinely exceed 90%, with synthetic accessibility validated through automated retrosynthesis (e.g. ChemPlanner) (Bjerrum et al., 2017).
Graph-based auto-regressive models (MolRNN, GraphINVENT) and graph-diffusion models (DiGress) further expand this paradigm, being benchmarked via multi-stage pipelines that enforce both VUN filtering and hit-like filter compliance, as well as docking-based thresholds for candidate selection (Osman et al., 26 Dec 2025).

3. Diffusion and Dynamic Structural Models

Diffusion probabilistic models, especially those incorporating 3D coordinate generation and protein structure flexibility, represent the current frontier.

SE(3)-equivariant graph diffusion models such as DiffBP (Lin et al., 2022) perform non-autoregressive denoising of both coordinates and atom types for ligand graphs, conditioned on the 3D structure of the binding pocket (constructed as a KNN graph over protein and ligand atoms). Atom-type diffusion is implemented via a masked Markov chain, and a regularizer ensures no ligand atoms penetrate the protein surface. DiffBP achieves high rates of “medium-sized” and drug-like ligand generation, capturing substructure distributions and QED/SA scores close to reference data (Lin et al., 2022).
Apo2Mol (Zheng et al., 18 Nov 2025) extends this by combining ligand and pocket diffusion, modeling conformational transitions between apo and holo states via residue-level interpolation of translations, quaternions, and side chain χ-angles. Generated ligands not only match binding and drug-likeness metrics (QED 0.59 vs 0.51; ~53% high-affinity, med Vina-min: –8.03 vs –7.08) but are calibrated to induced-fit geometry even from an apo structure, correcting a key limitation of rigid-pocket paradigms.
Fragment-guided diffusion (SILVR (Runcie et al., 2023)) enables the conditioning of generic DDPMs on fragment hits by iterative latent variable “pulls” at each denoising step, preserving key 3D geometry of known binding fragments at tunable strength $r_S$ . Empirically, RMSD ≈ 1.5–2 Å, ~50–70% non-fragmented samples, and marked improvements in shape complementarity are achieved at optimal $r_S$ settings (Runcie et al., 2023).
Peptide-guided diffusion (Peptide2Mol (He et al., 7 Nov 2025)) extends hit generation to peptide–small-molecule “mimicry,” diffusing only the non-protein subset of a complex and establishing high geometric/chemical similarity to peptide fragments.
Omics/text–guided models such as ToDi (Yuan et al., 14 Jul 2025) combine VAE-encoded omics (gene expression) with text-encoded functional descriptions to guide conditional DDPM generation over aligned SELFIES, achieving 100% validity, >98% uniqueness, and top Fréchet ChemNet Distance and structural similarity metrics.

4. Latent Optimization, Genetic, and Reinforcement Learning Approaches

Several strategies exploit explicit optimization and evolutionary algorithms in latent or discrete representations.

Latent-gradient methods (LIMO) (Eckmann et al., 2022) use a VAE to encode SELFIES, a property predictor for binding affinity, and perform gradient-based backpropagation in latent space to maximize predicted affinity (and optionally QED/SA) with discrete substructure constraints. Generated compounds achieve $K_D$ down to $6 \times 10^{-14}$  M, with ABFE-based corroboration. All decoded molecules are guaranteed chemically valid by SELFIES (Eckmann et al., 2022).
Discrete genetic algorithms (DGMM) (Fang et al., 2024) operate in quantized latent space (the “mol-gene”): a discrete VAE encodes molecules as sequences of codebook indices, facilitating crossover and mutation. A composite fitness score balances docking, QED, SA, and Tanimoto similarity/novelty, enabling scaffold hopping and efficient exploration of chemical space while preserving strong binding and drug-like properties (QED ≳ 0.6, SA ≤ 3, docking improved by 2–3 kcal/mol per epoch). Human-level scaffold expansion and lead-hopping are demonstrated in CHK1 benchmark cases (Fang et al., 2024).
Fragment-based reinforcement learning (FREED) (Yang et al., 2021) restricts generative actions to validated fragments and attaches them at chemically permitted sites, maximizing the docking score reward through a Soft Actor-Critic (SAC) policy. Predictive-error prioritized experience replay (PER) is used to address the sparsity and noise of docking rewards. FREED surpasses VAE and RL baselines in top-5% docking, hit ratio, and uniqueness, with chemical quality further ensured by fragment space and filter constraints (Yang et al., 2021).

5. Joint Models, Multimodal Guidance, and Property Prediction

Recent work highlights the synergy of joint generative and predictive training, and the integration of biological and semantic modalities.

Hyformer (Izdebski et al., 23 Apr 2025) introduces a transformer with an alternating attention mechanism, implementing causal (left-to-right) attention for generation and bidirectional for property prediction with a shared backbone. Pre-training alternates between generation, masked-language modeling, and regression/classification batches. At inference, conditional sampling for desired $y$ 0 (e.g. bioactivity) is achieved by rejection sampling over predicted properties. Hyformer achieves strong AUPRC on OOD and conditional hit sampling and rivals specialized baselines on GuacaMol and MoleculeNet tasks.
VAE-LSTM hybrid models with phenotype conditioning (Gx2Mol (Li et al., 2024)) compress gene expression into a latent embedding and auto-regressively generate SMILES. This design links molecular generation to desired cellular/drug-induced phenotypes, outperforming non-phenotype-conditioned baselines in Tanimoto similarity to known actives, and achieving >88% validity, 83% uniqueness, and QED ≈ 0.65. Case studies demonstrate generation of disease-reversal compounds for cancer and neurodegeneration tasks (Li et al., 2024).
TextOmics/ToDi (Yuan et al., 14 Jul 2025): The joint conditioning of omics and text representations enables zero-shot generation of disease-relevant chemotypes, outperforming omics-only and text-only baselines (validity/uniqueness/novelty: 100%/98.5%/97.3%) and producing more “on-target” candidates as measured by Morgan/MACCS similarity and Levenshtein distance.

6. Application to Structure-Based and Pocket-Aware Hit Finding

Contemporary structure-based pipelines fuse predicted protein structures (e.g. AlphaFold), highly parametrized property/docking scoring, and deep generative chemistry.

AlphaFold–driven platforms such as Chemistry42 (Ren et al., 2022) combine VAE, transformer, and RL-based molecular generators with 3D pocket-based conditioning, explicitly optimizing MW, cLogP, TPSA, HBA, HBD, QED, and docking scores. Empirically, ISM042-2-048 (Kd = 210 nM) and ISM042-2-001 (Kd = 8.9 μM) for CDK20 were identified in two rounds of synthesis, confirming both speed and hit rate of the ML-based approach. Stringent filters enforce a kinase-like hit profile, and clustering on pharmacophore features ensures output diversity (Ren et al., 2022).
Peptide2Mol (He et al., 7 Nov 2025) and Apo2Mol (Zheng et al., 18 Nov 2025) address the advanced hit generation task of optimizing for peptide mimicry or explicit apo–holo conformational transitions, respectively. These address target classes (e.g. protein–protein and –peptide interfaces) inaccessible to small-molecule–centric generative frameworks. Metrics (QED, SA, “PoseBusters” passing rate) demonstrate competitive performance compared to prior methods, and partial-diffusion mechanisms enable scaffolded lead optimization (He et al., 7 Nov 2025, Zheng et al., 18 Nov 2025).

7. Empirical Benchmarks, Validation, and Open Challenges

Direct comparisons and in vitro validation are increasingly emphasized.

In vitro GSK-3β inhibition for three selected hits from generative models yielded one compound with IC50 = 314 nM, structurally novel relative to known actives (Osman et al., 26 Dec 2025).
Benchmarks now report not only VUN and property statistics, but also “distribution alignment” (FCD, KL divergence on docking distributions), scaffold and fragment similarities, and bioactivity enrichment (Osman et al., 26 Dec 2025, Li et al., 2022).
Leading diffusion and energy-based models achieve negative energy/docking scores comparable to experimental ligands and preserve the distribution of atom/bond features (see TagMol and DiffBP results).
Challenges remain in accurately correlating standard metrics (Fréchet, SNN) with measured/expected bioactivity, especially under complex, multi-objective conditions or when conditioning on omics or text (Osman et al., 26 Dec 2025, Yuan et al., 14 Jul 2025).

Summary Table: Methods and Notable Features

Model/Approach	Conditional Signal(s)	Output Format (Main)	Key Empirical Metrics/Results
TagMol (Li et al., 2022)	Protein pocket (coords/seq)	Attributed graph	Fake ligands: affinity ~ real; best FD at xdim=16; GAT > GCN
DiffBP (Lin et al., 2022)	3D pocket (atomic), SE(3)-equiv.	3D atom graphs	~40% high-affinity, QED ~0.44, SA ~6.0, recovers reference rings
Apo2Mol (Zheng et al., 18 Nov 2025)	Apo–holo; residue-level flexibility	Ligand+flex. pocket	QED 0.59/0.51, >53% high-affinity, RMSD ~1.5–2 Å
LIMO (Eckmann et al., 2022)	Docking affinity (via predictor + grad)	SELFIES (VAE)	K_D ~6×10⁻¹⁴ M (ESR1), QED>0.4, SA<5.5, fastest runtime
DGMM (Fang et al., 2024)	Docking, QED, SA, novelty (multi-fitness)	SELFIES (D-VAE+GA)	QED >0.6, SA <3, docking improv., scaffold hopping, 100% validity
Hyformer (Izdebski et al., 23 Apr 2025)	Any property, via multitask transformer	SMILES (joint model)	AUPRC 0.784 (DRD2), AMP hit-sampling ~0.84, GuacaMol FCD matched
ToDi (Yuan et al., 14 Jul 2025)	Omics + text (VAE+BERT+DDPM)	SELFIES	100% valid, 98% unique, best FCD, Morgan, MACCS, zero-shot disease
FREED (Yang et al., 2021)	Fragment library (chemically validated)	Graph (RL)	Hit ratio >26%, top-5% docking –10.4, 100% chemical validity
Peptide2Mol (He et al., 7 Nov 2025)	Peptide + pocket (fixed/fuzzy)	3D graphs	QED 0.51, SA 0.61, PBrate 83.8% (lead optimization; peptides)

Collectively, contemporary hit-like molecule generation frameworks synthesize medicinal-chemistry constraints, high-fidelity generative architectures, multi-resolution conditional signals, and explicit evaluation protocols. The current frontier is defined by dynamic structure-based models, joint learning of generative and predictive tasks, and direct empirical validation. Key unresolved questions include the generalization to low-data targets, probabilistic guarantees of novelty with maintained activity, and the integration of more expressive, experimentally validated fitness oracles (Li et al., 2022, Osman et al., 26 Dec 2025, Izdebski et al., 23 Apr 2025).