Simultaneous minimization of all DPA per-dimension losses by a global optimum

Determine whether any global minimizer (e*, d*) of the Distributional Principal Autoencoder (DPA) joint objective that aggregates per-dimension energy-score losses across k = 0, …, p necessarily minimizes each individual per-dimension loss term L_k[e, d] simultaneously. Concretely, for the aggregated objective sum_{k=0}^p ω_k L_k[e, d] with ω_k ≥ 0 and sum_{k} ω_k = 1, ascertain whether every global optimizer (e*, d*) must also be a minimizer of each L_k[e, d] taken separately for all k.

Background

The Distributional Principal Autoencoder (DPA) trains an encoder–decoder pair by minimizing a joint objective that sums energy-score-based reconstruction terms L_k over different encoding dimensionalities k. This yields principal-components-like interpretability by optimizing across multiple k simultaneously.

Whether a single encoder that is globally optimal for the aggregated objective must also minimize each constituent term L_k is crucial for understanding how the model balances performance across dimensions, and it impacts claims about extraneous dimensions and their informativeness. The paper references this as an open issue, originally discussed in prior work on DPA, and highlights its relevance when analyzing typical scenarios such as p ≫ K in manifold settings.

References

As discussed in , it remains an open question whether an optimal encoder is necessarily the one that minimizes all the terms in the loss simultaneously (which is the case for the terms K:p when the encoder is the K-best-approximating one), so the following argument will examine what is likely to happen for parameterizable manifolds as p \gg K.

Distributional Autoencoders Know the Score (2502.11583 - Leban, 17 Feb 2025) in Appendix, Subsection “Discussion on Remark” (sec:proof_ind_exact)