Identify the cause of distributional rather than pairwise matching in NFM distillation

Identify the underlying causes of the observation that, despite using a pairwise regression loss to distill TarFlow normalizing flows into Flow Matching students, the student samples only loosely match the teacher samples and the distillation behaves as distributional rather than pairwise matching; specifically, disentangle whether this is driven by architectural inductive-bias mismatches between TarFlow and the Flow Matching student or by noise introduced by the Flow Matching sampling process.

Background

When generating samples with the same random seeds, the authors find that a Normalized Flow Matching (NFM) student does not replicate the teacher’s samples precisely, even though the training objective should encourage pairwise matching. Instead, the outputs are only loosely similar, suggesting the distillation acts more like distributional matching.

The authors explicitly acknowledge they cannot yet explain the cause and list two possible explanations: differences in inductive biases between teacher and student architectures, and stochasticity/noise introduced by the Flow Matching sampling procedure. Pinpointing the cause would inform better distillation strategies and model design.

References

While we cannot say for sure what causes this phenomenon, multiple conjectures can be considered:

The Coupling Within: Flow Matching via Distilled Normalizing Flows  (2603.09014 - Berthelot et al., 9 Mar 2026) in Subsubsection "Pairwise vs distributional distillation" (Section 4, Experiments), paragraph preceding the conjectured explanations