Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generative models on phase space

Published 2 Apr 2026 in hep-ph and cs.AI | (2604.02415v1)

Abstract: Deep generative models such as diffusion and flow matching are powerful machine learning tools capable of learning and sampling from high-dimensional distributions. They are particularly useful when the training data appears to be concentrated on a submanifold of the data embedding space. For high-energy physics data, consisting of collections of relativistic energy-momentum 4-vectors, this submanifold can enforce extremely strong physically-motivated priors, such as energy and momentum conservation. If these constraints are learned only approximately, rather than exactly, this can inhibit the interpretability and reliability of such generative models. To remedy this deficiency, we introduce generative models which are, by construction, confined at every step of their sampling trajectory to the manifold of massless N-particle Lorentz-invariant phase space in the center-of-momentum frame. In the case of diffusion models, the "pure noise" forward process endpoint corresponds to the uniform distribution on phase space, which provides a clear starting point from which to identify how correlations among the particles emerge during the reverse (de-noising) process. We demonstrate that our models are able to learn both few-particle and many-particle distributions with various singularity structures, paving the way for future interpretability studies using generative models trained on simulated jet data.

Summary

  • The paper demonstrates a RAMBO-based q-space framework that guarantees exact Lorentz-invariant energy-momentum conservation in generated HEP phase space.
  • It leverages diffusion and flow matching models to accurately reproduce both smooth and singular matrix element distributions, validated with muon decay and e⁺e⁻→q q̄ g events.
  • Comparisons with conventional p-space models reveal that enforcing physical constraints improves interpretability and generation fidelity for high-multiplicity particle jets.

Generative Modeling on Lorentz-Invariant Phase Space: Exact Constraints and Physical Interpretability

Introduction

The paper "Generative models on phase space" (2604.02415) establishes a novel framework for deep generative modeling in high-energy physics (HEP) data, specifically targeting Lorentz-invariant, energy-momentum conserving NN-particle phase space. Unlike conventional generative approaches that operate in unconstrained ambient spaces and only approximate physical priors, this work enforces exact conservation laws at every stage of the sampling trajectory. The methodology exploits the RAMBO algorithm to map unconstrained auxiliary spaces (qq-space) to physical phase space (pp-space), enabling models—particularly diffusion models and flow matching—to generate samples confined precisely to the physical manifold.

Methodological Foundations

RAMBO and the qq-Space Representation

The RAMBO algorithm provides an invertible, permutation-invariant map from unconstrained 3-vectors (qq-space) to phase space configurations satisfying exact energy-momentum constraints. The key insight is to perform all training and generative modeling in qq-space, where the data is unconstrained, and subsequently map generated samples to phase space via conformal transformations. This sidesteps the measure-zero sampling problem inherent to manifolds embedded in high-dimensional spaces.

Diffusion and Flow Matching in qq-Space

Diffusion models are trained to denoise from a reference distribution in qq-space (Eq. 4), corresponding under RAMBO to uniform distribution on phase space. The forward process incorporates the exact score function of the qq-space reference measure, ensuring the sampling trajectory remains on the physical manifold—samples produced at any intermediate step respect energy-momentum conservation.

Flow matching is similarly adapted but has architectural challenges with phase space priors. Empirically, Gaussian priors in qq-space lead to successful training, though physical interpretability is lost compared to diffusion models.

Empirical Validation: Low-Dimensional Examples

The framework is validated in low dimensions, focusing on three-particle phase space, which admits comprehensive visualization and statistical analysis.

The distribution for muon decay—a smooth matrix element—is learned with high fidelity, as demonstrated by Dalitz plots: Figure 1

Figure 1: Dalitz plot of 100,000 samples from the muon decay distribution; center and right panels show diffusion model output and uniform phase space, respectively.

Further evaluation uses the distribution of the logarithm of the theoretical Dalitz plot PDF: Figure 2

Figure 2: Distributions of qq0 PDF for the true muon decay and generated samples, evidencing strong agreement except in the low-probability tails.

Energy and Rosenblatt-transformed distributions indicate Wasserstein-1 distances qq1 for marginal energies and qq2 for angular distributions, with a trade-off between angular isotropy and energy distribution fidelity contingent on augmentation strategies: Figure 3

Figure 3: Single-particle energy distributions for the muon decay matrix element, affirming the model’s capacity to replicate both energy endpoints and smooth features.

Figure 4

Figure 4: Rosenblatt-transformed distributions of energies; qq3 and qq4 are nearly uniform as expected.

Angular distributions exhibit residual anisotropy when limited augmentation is used, a consequence of qq5-space augmentation choices: Figure 5

Figure 5: Angular distributions for the muon decay matrix element; event plane isotropy is well reproduced, individual-particle anisotropies less so.

Nearly-Singular and High-Dimensional Phase Space

The model is further tested on nearly-singular matrix elements, specifically qq6 corresponding to QCD processes with soft and collinear divergences.

Dalitz plots for varying infrared cutoffs reveal that the diffusion models recover both permutation invariance and singular behaviors away from IR endpoints: Figure 6

Figure 6: Dalitz plots of 500,000 samples from the qq7 distribution as generated by MadGraph with different invariant mass cutoffs.

Generated energy and angular distributions mirror the cutoff-sensitive behavior and anisotropies of true QCD events: Figure 7

Figure 7: Energy distributions comparing diffusion samples and ground truth for qq8 for multiple cutoffs.

Figure 8

Figure 8: Angular distributions for qq9, highlighting diffusion model’s ability to reproduce anisotropy and collinearity.

The critical IRC-safe observable pp0 is learned nearly perfectly above the cutoff scale: Figure 9

Figure 9: Distribution of pp1 for pp2 events, comparing model output with analytic predictions away from unphysical singular regions.

High-Dimensional Phase Space: Antenna Pole Structure (APS)

The paper demonstrates compatibility with high-multiplicity events by reproducing distributions from the APS matrix element (pp3), relevant for jet physics.

Energy distributions and pp4-space magnitudes are distinctly nonuniform, requiring a Gaussian phase for initial diffusion steps to avoid score divergence. The pp5 distribution across 9 orders of magnitude maintains fidelity to analytic expectations: Figure 10

Figure 10: Energy distributions for APS events across cutoffs; comparison between pp6-space and pp7-space demonstrates necessity of auxiliary-space modeling.

Figure 11

Figure 11: Distributions of pp8 for ground truth and diffusion-generated APS events, contrasted with uniform phase space and analytic forms.

Comparison with Conventional pp9-Space Generative Models

Standard diffusion and flow matching in qq0-space fail to enforce strict conservation; energy and momentum violations are on the order of median particle energies, degrading physical reliability: Figure 12

Figure 12: Per-event normalized conservation violations, summarizing inability of qq1-space models to maintain constraints.

By contrast, qq2-space models enforce conservation to machine precision without loss in observable fidelity or generation quality.

Practical and Theoretical Implications

Practical: The framework provides a drop-in solution for generative modeling in HEP, capable of scaling to qq3-particle jets. Exact conservation improves interpretability for simulation, inference, and downstream tasks.

Theoretical: By separating the enforcement of physics priors from architecture, the approach serves as a tool for studying generative models’ ability to learn latent hierarchical and compositional features. The reverse trajectory in diffusion models becomes interpretable in terms of correlations emergent from uniform phase space.

Speculation on Future Developments

  • Application to more complex manifolds (e.g., jets with variable multiplicity, hadronization effects).
  • Integration of prior distributions in flow matching to combine physical interpretability with computational efficiency.
  • Use as a diagnostic to probe learning of latent symmetries and constraints in general AI, leveraging particle physics as an intermediate between natural and synthetic data.
  • Systematic study of trade-offs in augmentation, score parameterization, and manifold embedding for model performance and interpretability.

Conclusion

The exact enforcement of Lorentz-invariant, energy-momentum conserving constraints in generative modeling for HEP data, using RAMBO-based qq4-space, significantly elevates the physical reliability and interpretability of deep generative models. The methodology achieves strong numerical agreement with known analytic distributions for both smooth and singular matrix elements, at low and high multiplicities, and demonstrates prominent advantages over unconstrained qq5-space generative modeling. This framework is poised to influence the modeling of particle jets and inform broader questions in AI interpretability and trustworthiness.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.