Identify the cause of distributional rather than pairwise matching in NFM distillation
Identify the underlying causes of the observation that, despite using a pairwise regression loss to distill TarFlow normalizing flows into Flow Matching students, the student samples only loosely match the teacher samples and the distillation behaves as distributional rather than pairwise matching; specifically, disentangle whether this is driven by architectural inductive-bias mismatches between TarFlow and the Flow Matching student or by noise introduced by the Flow Matching sampling process.
References
While we cannot say for sure what causes this phenomenon, multiple conjectures can be considered:
— The Coupling Within: Flow Matching via Distilled Normalizing Flows
(2603.09014 - Berthelot et al., 9 Mar 2026) in Subsubsection "Pairwise vs distributional distillation" (Section 4, Experiments), paragraph preceding the conjectured explanations