Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 165 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 25 tok/s Pro

GPT-5 High 26 tok/s Pro

GPT-4o 81 tok/s Pro

Kimi K2 189 tok/s Pro

GPT OSS 120B 445 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

DemoDiff: Demonstration-Conditioned Diffusion Models

Updated 13 October 2025

DemoDiff is a demonstration-conditioned diffusion framework that defines new molecular design tasks using molecule–score pairs as demonstrations.
It leverages a denoising Transformer with graph tokenization (Node Pair Encoding) to efficiently compress molecular graphs and enable scalable pretraining.
The model generalizes across diverse chemical design domains by guiding the reverse diffusion process with context-driven in-context learning to optimize molecular properties.

Demonstration-Conditioned Diffusion Models (DemoDiff) are a class of generative models that extend conditional diffusion frameworks by allowing new tasks to be defined via a small set of demonstration examples rather than explicit, structured task descriptions. This approach underlies a new family of molecular foundation models capable of in-context molecular design, where the task context is expressed via molecule–score pairs. DemoDiff leverages a denoising diffusion process, guided by demonstration context, ingredientized by scalable pretraining on millions of molecular contexts, and supported by efficient graph tokenization. Its architecture and methodology address limitations in data-scarce property modeling, context representation, and task generalization across diverse chemical design domains (Liu et al., 9 Oct 2025).

1. Task Context Representation via Demonstrations

Traditional conditional diffusion models require explicit condition labels or property vectors, which are often infeasible for applications with extremely high-dimensional or unstructured condition spaces. DemoDiff replaces explicit condition encodings with “demonstrations”: each task is defined by a small set of molecule–score pairs, where each molecule is represented as a graph and each score is a normalized scalar in $[0,1]$ reflecting proximity to the task property optimum.

This approach offers several architectural and operational advantages:

Scalability: Avoids combinatoric explosion from one-hot vectors or textual task descriptors in settings (e.g., drug discovery) where properties may span hundreds of thousands of unique assays or measurements.
Latent Task Concept Induction: The relationship among positive, medium, and negative demonstration examples encodes the relative structural and functional cues required for the model to infer what is being optimized, even when demonstrations are few.
Universality: Enables out-of-the-box adaptation to new properties or objectives using only a contextual set of examples, circumventing the need for retraining on each target property.

The context typically contains a mixture of positive, medium, and negative examples, with positive assigned scores in %%%%1%%%%, medium in $(0.5, 0.75]$ , and negative in $[0, 0.5]$ . These scores provide the model with a gradated understanding of proximity to the ideal target during both training and inference.

2. Denoising Transformer Architecture for Conditioning

At the core of DemoDiff is a denoising Transformer, instantiated in the form of a Graph Diffusion Transformer, which orchestrates the conditional reverse diffusion process. The model takes as input:

The noisy graph representation of the candidate molecule at diffusion step $t$ .
The context set of demonstration molecule–score pairs.

Attention layers allow the Transformer to jointly reason over features of both the noisy molecule and the demonstration set, implicitly performing task identification and leveraging the demonstrations to inform the structure and property of the denoised output. This context-driven inference is analogous to in-context learning in LLMs, except that the context is non-textual and lacks explicit task descriptors.

The training loss is a conditional denoising objective:

$\mathcal{L}_\text{pretrain} = \mathbb{E}_{q(\mathbf{x}^0)} \, \mathbb{E}_{q(\mathbf{x}^t \mid \mathbf{x}^0)} \left[-\log p_{\theta} \left(\mathbf{x}^0 \mid \mathbf{x}^t, \mathcal{C}, Q \right)\right],$

where $\mathcal{C}$ denotes the context (demonstrations) and $Q$ is the query score associated with the target property being optimized (Liu et al., 9 Oct 2025).

During generation, the model begins from noise and iteratively denoises, at each step attending to the demonstration context to bias generation towards structures likely to yield target-score-1 molecules.

3. Motif-Level Molecular Tokenization with Node Pair Encoding

Efficient representation of large molecular graphs and their contexts is critical for both scalability and expressivity. DemoDiff introduces a motif-level tokenizer called Node Pair Encoding (NPE), inspired by subword encoding strategies in NLP (e.g., Byte Pair Encoding).

Construction: Begins from a vocabulary of atoms and selected polymerization points. Iteratively merges the most frequent neighboring node pairs (subgraphs) into new motif tokens, subject to syntactic and chemical constraints (e.g., treating rings as indivisible units).
Compression: Achieves an average reduction of ≈5.5 $\times$ fewer nodes compared to atom-level representations.
Practical Consequence: For a fixed context window (token budget), more demonstration molecule–score pairs can be incorporated alongside the target molecule, enabling richer and higher-variance task contexts during pretraining and inference.

This motif-level tokenization is essential for allowing the pretraining and inference pipeline to operate with a wider diversity of context examples under hardware and graph size constraints (Liu et al., 9 Oct 2025).

4. Large-Scale Dataset Construction and Pretraining

DemoDiff relies on massive and diverse pretraining data covering a wide spectrum of chemical and materials design tasks:

Data Sources: ChEMBL (biological assays, millions of activity records), PolyInfo, and additional polymer property databases (e.g., thermal conductivity, permeability data).
Task Sampling: For each molecule–property pair, a task is constructed by:
1. Selecting an anchor molecule as the ideal target ( $\text{score}=1$ ).
2. Normalizing property values to $[0,1]$ for all candidate molecules using context-specific schemes (e.g., for pChEMBL values).
3. Grouping molecules into positive, medium, or negative examples based on normalized scores.

Pretraining involves approximately one million unique molecules and 155,000 property assays, forming $\approx$ 1.6 million task contexts.

A 0.7-billion-parameter model was pretrained using these data, requiring ≈140 H100 GPU days. This regime supports generalization over millions of distinct context tasks and enables the downstream in-context learning performance critical for DemoDiff (Liu et al., 9 Oct 2025).

5. Performance Evaluation and Task Generalization

DemoDiff is evaluated over 33 molecular design tasks across six categories:

Evaluation Tasks: Drug rediscovery, multi-objective drug design, structure-constrained generation, target-based design, material property maximization, and more.
Metrics: Performance is assessed using harmonic mean between oracle property scores and chemical diversity:
- Oracle Score: Measures how well generated molecules match the target property (as determined by external evaluators or predictors).
- Diversity: Computed as
$\text{IntDiv}(G) = 1 - \sqrt{ \frac{1}{|G|^2} \sum_{m_1 \neq m_2 \in G} T(m_1, m_2)^2 }$

where $T(m_1, m_2)$ is a fingerprint-based molecular similarity.

DemoDiff achieves an average rank of 3.63 across all tasks, surpassing domain-specific baselines (ranks 5.25–10.20) and matching or outperforming in-context LLMs with 100–1000 $\times$ greater parameter counts. The model demonstrates the ability to generate meaningful candidates even in contexts with only negative examples, by inferring structural directions for property improvement (Liu et al., 9 Oct 2025).

6. Comparisons, Scalability, and Model Implications

Comparison with Other Conditional Diffusion Methods: DemoDiff distinguishes itself from methods such as classifier-free guidance or concatenation-based conditional modeling by embedding demonstration signals directly in context and leveraging attention mechanisms, without explicit mapping to structured condition vectors.
Scalability: Motif-level tokenization, massive context-task pretraining, and moderate model size jointly drive unprecedented scale and in-context adaptation, advancing beyond what was previously feasible for molecular generative modeling.
Generalization: The approach implies that, given ample demonstration-conditioned pretraining, diffusion Transformers can serve as foundation models for a broad class of in-context molecular design problems without the need for per-task retraining.

7. Applications and Broader Impact

DemoDiff’s framework enables task specification for molecular design via demonstration alone, facilitating:

Drug Discovery: In-context optimization for activity, selectivity, or multi-objective scenarios using demonstration molecules, especially in data-scarce property regimes.
Materials Science: In-context property maximization for novel polymers or materials where explicit property vectors are impractical.
Flexible Querying: Users can define tasks by a handful of examples, allowing rapid prototyping and iteration in molecular design loops.
Absence of Explicit Descriptor Engineering: The model directly infers the “semantic” property concept from relational context, reducing dependency on engineered or curated feature sets.

A plausible implication is that similar demonstration-conditioned diffusion architectures could be adapted to other modalities where the intrinsic condition space is high-dimensional, latent, or difficult to specify explicitly.

DemoDiff, as formalized in (Liu et al., 9 Oct 2025), marks a substantive advance in foundation models for molecular design, enabling broad, scalable, and context-adaptive task specification and generation. Its methodology, pretraining corpus, tokenization advances, and empirical results form a framework that is extensible to other domains where demonstration-based task definition is desirable.

PDF Markdown Chat (Pro)

References (1)

Graph Diffusion Transformers are In-Context Molecular Designers (2025)

Follow Topic

Get notified by email when new papers are published related to Demonstration-Conditioned Diffusion Models (DemoDiff).