Assess whether equality of input- and latent-space distances is coincidental in TarFlow’s z-space

Determine whether the empirical observation that, for TarFlow-trained normalizing flows at noise levels yielding the best Fréchet Inception Distance, the average pairwise distance between different images under the same noise is approximately equal in input space and in the Gaussian representation space (i.e., d_x ≈ d_z), is merely coincidental or reflects a systematic property of TarFlow’s learned z-space.

Background

The paper analyzes the geometry of the Gaussian representation (z-space) learned by a TarFlow normalizing flow and introduces distance measures d_x and d_z to compare input-space and z-space neighborhoods. Empirically, for certain input noise levels η that yield strong generation quality, the authors observe that the mean distances between different images under the same noise in z-space closely match those in input space.

This equality (d_x ≈ d_z) was unexpected. The authors explicitly state that they do not know whether the observation is a coincidence or indicates a deeper structural property of the learned mapping. Clarifying this would improve understanding of how normalizing flows reshape geometry and why NFM training benefits from these couplings.

References

Second, for the values of \eta which yield best FIDs, as will be seen in the next section, we observe that d_x=d_z. This is unexpected, and we do not know whether it is pure coincidence or not.

The Coupling Within: Flow Matching via Distilled Normalizing Flows  (2603.09014 - Berthelot et al., 9 Mar 2026) in Subsection "Normalizing Flows z-space structure" (Section 3), paragraph discussing group 3 distances near Tables 1 and 2