Equivariant VFM for Controlled Generation

Updated 14 May 2026

The paper introduces a principled variational flow matching framework that unifies constraint-driven generative modeling with symmetry-aware sampling.
It employs both end-to-end conditional training and post hoc Bayesian inference to integrate external controls and enforce exact group symmetries.
Demonstrated for molecular graph and geometry generation, the method achieves SOTA performance with high validity, uniqueness, and stability under symmetry constraints.

Controlled Generation with Equivariant Variational Flow Matching (cVFM) defines a principled framework that unifies constraint-driven generative modeling and symmetry-aware sampling within the flow matching paradigm. Central to this approach is the recasting of flow matching as a variational inference problem, enabling both direct and post hoc controlled generation as well as the exact enforcement of group symmetries, with practical emphasis on molecular graph and geometry generation. The framework integrates external constraints ("controls") either through end-to-end supervision or by leveraging Bayesian inference at sampling, and introduces precise mathematical conditions and architectural patterns required to guarantee equivariant generation across arbitrary symmetry groups.

1. Variational Flow Matching and the Controlled Generation Objective

Variational Flow Matching (VFM) parameterizes a time-dependent vector field as an expectation under a learned variational posterior. The key formal object is the velocity field

$u_t(x) = \mathbb{E}_{q_t(x_1|x)}[u_t(x|x_1)]$

where $u_t(x|x_1)$ is an analytically specified vector field (typically describing straight-line or Gaussian perturbation between $x$ and $x_1$ ), and $q_t(x_1|x)$ is a learnable variational posterior.

Controlled generation is incorporated by conditioning the terminal marginal $p_1(x_1)$ on an auxiliary variable $y$ , yielding the target distribution $p_1(x_1|y)$ . The corresponding controlled velocity field is

$u_t(x|y) = \mathbb{E}_{p_t(x_1|x, y)} [ u_t(x|x_1) ]$

which guarantees correct transport $p_0 \rightarrow p_1(\cdot|y)$ . In practice, $u_t(x|x_1)$ 0 is approximated by a neural posterior $u_t(x|x_1)$ 1, and learning proceeds by minimizing the negative log-likelihood:

$u_t(x|x_1)$ 2

This is equivalent to minimizing the KL divergence between the true and approximate endpoint pairs.

For linear conditional fields $u_t(x|x_1)$ 3, matching the component-wise mean of $u_t(x|x_1)$ 4 suffices, giving a "mean-field" loss:

$u_t(x|x_1)$ 5

This construction covers both discrete and continuous generative domains.

2. Pathways for Controlled Generation

Two complementary control mechanisms are established for constraint-driven generation within the VFM framework.

A. End-to-End Conditional Training

Here, $u_t(x|x_1)$ 6 is directly parameterized by a network receiving both the partially noised state $u_t(x|x_1)$ 7 at intermediary time $u_t(x|x_1)$ 8 and the control $u_t(x|x_1)$ 9. The mean-field loss above is optimized using supervised pairs $x$ 0. For generation, the ODE

$x$ 1

is integrated from $x$ 2 to $x$ 3, ensuring samples satisfy the required constraints $x$ 4 by construction.

B. Post Hoc Bayesian Inference

For pretrained, unconditional VFM generators, controlled sampling is performed by reweighting the posterior at inference:

$x$ 5

where $x$ 6 is the pretrained VFM's posterior (often Gaussian with mean $x$ 7 and covariance $x$ 8). The mode $x$ 9 solving

$x_1$ 0

can be found via the iterative update:

$x_1$ 1

initialized at $x_1$ 2. The resulting $x_1$ 3 seeds the ODE for generation. This approach enables constraint-driven, classifier-guided, or reward-driven generation for arbitrary $x_1$ 4, reusing a single backbone model without retraining.

3. Equivariance: Theory, Implementation, and Guarantees

To ensure that the generative process respects symmetries inherent in the data domain, sufficient and necessary equivariance conditions are imposed. Let $x_1$ 5 be a symmetry group (e.g., permutations $x_1$ 6, rigid motions SE(3)) acting on configurations $x_1$ 7. The following must hold:

Prior invariance: $x_1$ 8 for all $x_1$ 9.
Bi-equivariance of the conditional velocity: $q_t(x_1|x)$ 0
Posterior-mean equivariance: $q_t(x_1|x)$ 1

If these are satisfied, the vector field

$q_t(x_1|x)$ 2

generates marginals $q_t(x_1|x)$ 3 that are $q_t(x_1|x)$ 4-invariant for all $q_t(x_1|x)$ 5. In practice, this is achieved by enforcing $q_t(x_1|x)$ 6-equivariance in the network architecture for $q_t(x_1|x)$ 7 (e.g., E(n)-equivariant message-passing networks) and selecting $q_t(x_1|x)$ 8 to be invariant (e.g., isotropic Gaussian for continuous, uniform discrete for categorical components).

4. Equivariant VFM for Molecular Generation

The application to molecules requires handling both discrete (atom types $q_t(x_1|x)$ 9, bond types $p_1(x_1)$ 0, charges $p_1(x_1)$ 1) and continuous (3D coordinates $p_1(x_1)$ 2) modalities, with invariance to atom permutation, and SE(3) symmetry in spatial coordinates. For $p_1(x_1)$ 3,

$p_1(x_1)$ 4

For $p_1(x_1)$ 5 SE(3),

$p_1(x_1)$ 6

with $p_1(x_1)$ 7 invariant. The network producing the mean-field parameters must be permutation-equivariant (e.g., to $p_1(x_1)$ 8) and SE(3)-equivariant (e.g., based on geometric message passing) for $p_1(x_1)$ 9. This guarantees all generated marginals and final samples respect molecular symmetries exactly—no further post hoc symmetrization or data augmentation is required, and both training and sampling are symmetry-consistent by construction.

5. Experimental Results: Molecular Generation and Control

Extensive experiments on molecules demonstrate both state-of-the-art (SOTA) unconditional generation and superior performance in property-conditioned molecular design. Evaluation measures include:

Discrete: Validity %, Uniqueness %, Fréchet ChemNet Distance (FCD)
Continuous: Negative Log-Likelihood (NLL), atom/molecule stability %
Joint: NMol/Atom stability, Validity, Uniqueness, Jensen–Shannon of energy, number of function evaluations (NFE)
Conditional: Property alignment (e.g., polarizability $y$ 0, HOMO/LUMO energies, dipole $y$ 1, heat capacity $y$ 2) via Mean Absolute Error (MAE).

Key findings (Eijkelboom et al., 23 Jun 2025):

Setting	Validity	Uniqueness	FCD (QM9)	Atom-Stability	Mol-Stability	NLL	MAE (α)	NFE
Uncontrolled	≥99%	>99%	0.47	99.6%	99.5%	−120.7	—	100
E2E Control	—	—	—	—	—	—	2.05 (Bohr³)	100
Post-hoc VI	—	—	—	—	—	—	2.25	100
Combined	—	—	—	—	—	—	1.98	100

End-to-end training achieves MAE of 2.05 Bohr³ for $y$ 3 compared to 2.76 (EDM) and 2.41 (EquiFM); post hoc inference without retraining yields 2.25, and combined approaches approach specialized diffusion-based models but at significantly lower sampling cost.

6. Implications, Generalizations, and Theoretical Impact

Controlled VFM unifies flow-based generative modeling and Bayesian conditioning, enabling a flexible and reusable approach to sampling under arbitrary constraints without retraining, and providing a direct parallel to classifier guidance in diffusion but with exact ODE flows. Posterior-mean equivariance emerges as the critical sufficient property for full symmetry preservation, simplifying the design of symmetry-aware architectures: enforcing equivariance at the neural posterior-mean level guarantees global invariance in the generative process.

The framework is immediately extensible to any domain with combinatorial or geometric symmetries, such as polymer, crystal, or protein generation, and applies to both purely discrete, continuous, or combined data. Reusable pretrained VFM backbones facilitate rapid iteration on new controls or property constraints through plug-and-play classifiers or reward models, streamlining discovery pipelines in chemistry, materials science, and structured data domains.

7. Relation to Other Equivariant Flow Matching Approaches

Controlled equivariant VFM stands in close conceptual relation to methods such as EfficientFlow (Chang et al., 1 Dec 2025), PropMolFlow (Zeng et al., 27 May 2025), and ActionFlow (Funk et al., 2024), which leverage equivariant flow matching in different contexts (visuomotor policy learning, property-guided molecular design, spatially symmetric control respectively). The core principle—enforcing symmetries via isotropic priors and equivariant architectures, and leveraging mean-field or surrogate variational objectives—remains consistent. The innovations in (Eijkelboom et al., 23 Jun 2025) specifically introduce the variational inference interpretation and post hoc control for flexible, constraint-driven generation with symmetry guarantees, achieving SOTA results at reduced computational cost. This positions cVFM as a foundational unifying framework for constraint-satisfying, symmetry-aware generation in advanced machine learning systems.