Reproducing Equilibrium Matching on SiT-B/1 with Representation Alignment

Determine whether Equilibrium Matching (EqM) can be successfully implemented on the SiT-B/1 diffusion transformer architecture augmented with representation alignment to match the originally reported EqM performance; if reproduction is not possible, identify and characterize the specific incompatibilities between EqM and representation alignment or the SiT-B/1 training configuration that lead to degraded outcomes relative to standard flow matching.

Background

Equilibrium Matching (EqM) is a generative modeling approach that combines energy-based modeling with flow matching. In this paper, the authors attempted to implement EqM within their SiT-B/1 diffusion transformer setup that includes representation alignment, aiming to evaluate whether EqM improves training efficiency or generation quality under their configuration.

Despite following EqM’s methodology, the authors report that EqM did not reproduce its published results in their setting and performed significantly worse than standard flow matching. This suggests that EqM may be incompatible with representation alignment or other aspects of the SiT-B/1 training setup, leaving open whether and how EqM can be made to work in this context.

References

Despite following the methodology, we were unable to reproduce their reported results on our SiT-B/1 architecture with representation alignment. Performance was significantly worse than standard flow matching, suggesting potential incompatibilities between EqM and our architectural choices or training setup.

Speedrunning ImageNet Diffusion (2512.12386 - Bhanded, 13 Dec 2025) in Appendix: Additional Ablations, Alternative Training Objectives (Equilibrium Matching)