Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 54 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 22 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 99 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 333 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

ActivityDiff: A diffusion model with Positive and Negative Activity Guidance for De Novo Drug Design (2508.06364v1)

Published 8 Aug 2025 in cs.LG, cs.AI, and q-bio.BM

Abstract: Achieving precise control over a molecule's biological activity-encompassing targeted activation/inhibition, cooperative multi-target modulation, and off-target toxicity mitigation-remains a critical challenge in de novo drug design. However, existing generative methods primarily focus on producing molecules with a single desired activity, lacking integrated mechanisms for the simultaneous management of multiple intended and unintended molecular interactions. Here, we propose ActivityDiff, a generative approach based on the classifier-guidance technique of diffusion models. It leverages separately trained drug-target classifiers for both positive and negative guidance, enabling the model to enhance desired activities while minimizing harmful off-target effects. Experimental results show that ActivityDiff effectively handles essential drug design tasks, including single-/dual-target generation, fragment-constrained dual-target design, selective generation to enhance target specificity, and reduction of off-target effects. These results demonstrate the effectiveness of classifier-guided diffusion in balancing efficacy and safety in molecular design. Overall, our work introduces a novel paradigm for achieving integrated control over molecular activity, and provides ActivityDiff as a versatile and extensible framework.

Summary

The paper introduces a classifier-guided discrete diffusion model that precisely controls drug-target activities and off-target liabilities.
The methodology leverages a denoising network with 12 attention layers and MPNN classifiers trained on BindingDB data, ensuring robust molecule generation.
Experimental results show improved activity scores across multiple targets and significant off-target risk reduction, with docking pass rates up to 88.4%.

ActivityDiff: Classifier-Guided Diffusion for Activity-Aware De Novo Drug Design

Introduction

The paper introduces ActivityDiff, a discrete denoising diffusion model for molecular generation, augmented with classifier-based positive and negative guidance to enable precise control over drug-target activities and off-target liabilities. The framework addresses a critical limitation in existing generative models for de novo drug design: the inability to simultaneously optimize for multiple pharmacological objectives, such as multi-target efficacy and off-target toxicity mitigation. ActivityDiff leverages separately trained drug-target classifiers to steer the reverse diffusion process, providing a flexible mechanism for both enhancing desired activities and suppressing undesired interactions.

Methodology

Discrete Diffusion Model Architecture

ActivityDiff represents molecules as complete graphs, with nodes encoding atom types and formal charges, and edges encoding bond types. The forward diffusion process independently corrupts each categorical feature (atom or bond) via a categorical transition matrix, following the D3PM framework. The reverse process reconstructs the original molecular graph by progressively denoising the corrupted features, with the denoising network operating on the entire graph.

The denoising network is built from stacked multi-head attention blocks, incorporating FiLM layers for feature-wise modulation and gated residual connections. Noise levels for nodes and edges are embedded via MLPs and used for adaptive normalization. The network contains 12 attention layers with 4 heads each, and hidden dimensions of 128 (nodes), 64 (edges), and 128 (global features).

Classifier Guidance

Classifier guidance is implemented by training independent drug-target interaction classifiers (MPNN-based) on BindingDB data. During reverse diffusion, the classifier predicts the likelihood that the intermediate molecular graph satisfies the desired activity condition. Positive guidance encourages generation toward high-affinity molecules for a target, while negative guidance suppresses affinity for off-targets. The conditional reverse process is formulated as:

$p(x_{t-1} | x_t, y) \propto p(x_{t-1} | x_t) \cdot p(y | G_{t-1})$

where $p(y | G_{t-1})$ is approximated via first-order Taylor expansion for gradient-based guidance. This decoupling of generator and classifier enables rapid adaptation to new targets or off-target panels without retraining the generative model.

Training Protocol

The denoising network is trained on the GEOM dataset, restricted to 30 atom types and 4 bond types. Classifiers are trained with a 1:10 ratio of active to inactive compounds, using negative sampling to ensure structural dissimilarity and reduce label noise. The classifier loss is a weighted binary cross-entropy, with weights determined by the signal-to-noise ratio at each diffusion step. Early stopping is applied based on AUC on the validation set.

Experimental Results

Unconditional Generation

ActivityDiff demonstrates superior performance in unconditional molecular generation, achieving the highest ratio of available molecules (0.975) and competitive validity, uniqueness, and novelty compared to state-of-the-art baselines (e.g., Syntalinker, REINVENT2.0, PGMG, SMILES LSTM). This indicates robust coverage of drug-like chemical space.

Activity-Guided Generation

Under positive guidance, ActivityDiff generates molecules with a high proportion of predicted activity scores ( $Y \geq 0.5$ ) for eight biological targets (mean 78.8% ± 16.3%), outperforming control groups from GEOM, BindingDB, and unconditional generation. Negative guidance yields molecules with mean activity scores of 0.04 ± 0.09, with 80.4% of samples below 0.1, demonstrating strong bidirectional controllability.

Docking experiments using AutoDock Vina confirm that a substantial fraction of generated molecules achieve docking scores within the range of experimentally active compounds, with pass rates up to 88.4% for certain targets.

Multi-Target and Fragment-Constrained Generation

ActivityDiff supports dual-target generation via joint classifier guidance. For BRAF/MEK dual inhibition, molecules generated under dual guidance retain high activity for both targets (median MEK score 0.903, median BRAF score 0.754), with >90% overlap with single-target guidance distributions. Fragment-constrained generation is enabled by fixing active fragments from one target and guiding toward another, yielding molecules that preserve the fragment and achieve high classifier scores for the second target.

Selectivity and Off-Target Suppression

For HER2/EGFR selectivity, combined positive (HER2) and negative (EGFR) guidance reduces the proportion of molecules with EGFR scores >0.5 from 22.7% to 6.4%, while maintaining high HER2 activity. Representative molecules exhibit high HER2 scores and low EGFR scores, with favorable docking profiles.

ActivityDiff also reduces broad-spectrum off-target risk. When guided by a joint off-target panel classifier (covering six safety-relevant targets), the proportion of generated molecules predicted to have off-target liabilities is consistently lower than for experimentally active compounds, with improvements up to 65.1% for certain targets.

Implementation Considerations

Computational Requirements: The denoising network contains 6M parameters; each classifier has 0.68M parameters. Training requires access to large-scale molecular datasets (GEOM, BindingDB) and moderate GPU resources.
Adaptability: The decoupled classifier-guidance architecture allows rapid adaptation to new targets or off-target panels by retraining only the classifiers.
Scalability: Multi-target and fragment-constrained generation are supported natively, enabling complex pharmacological design objectives.
Limitations: Balancing affinities across multiple targets and improving predictions for drug metabolism and toxicity remain open challenges. Integration with systems-level biological networks is a promising future direction.

Implications and Future Directions

ActivityDiff establishes a versatile framework for activity-aware molecular generation, enabling integrated control over efficacy and safety profiles. The classifier-guidance paradigm is extensible to arbitrary pharmacological objectives, including multi-target modulation, selectivity enhancement, and off-target suppression. The approach is compatible with fragment-based design and can be integrated with structure-based docking pipelines.

Future research should focus on:

Enhancing multi-target balancing and selectivity in highly homologous protein families.
Incorporating metabolism and toxicity prediction into the guidance framework.
Integrating drug design with systems pharmacology models to account for drug-protein, protein-protein, and drug-drug interactions.
Extending classifier-guidance to continuous property optimization and generative reinforcement learning.

Conclusion

ActivityDiff advances the state-of-the-art in de novo drug design by enabling fine-grained, bidirectional control over molecular activity profiles through classifier-guided discrete diffusion. The framework demonstrates strong performance in unconditional and activity-guided generation, multi-target and fragment-constrained design, selectivity enhancement, and off-target risk reduction. Its modular architecture and adaptability position it as a practical tool for rational drug discovery under complex pharmacological constraints.