Optimal scaling theory for information‑gain controlled updates in VTI

Derive mathematical properties characterizing the optimal scaling of the gradient update applied to the parameters of the model‑weights distribution q_η in Variational Transdimensional Inference when regulating updates via an information‑gain threshold IG(q_η^(t+1) || q_η^(t)). Specifically, ascertain conditions, stability criteria, and convergence behavior for step‑size scaling of η under the entropy‑based information‑gain constraint to ensure stable joint optimization of (η, φ).

Background

The paper proposes stabilizing the simultaneous optimization of variational flow parameters φ and the model‑weights distribution parameters η by bounding the information gain on the categorical distribution over models. Concretely, they define an information‑gain threshold in terms of entropy differences IG(q_ηt+1 || q_ηt) and scale the gradient step for η (e.g., via bisection) to keep the information gain within the threshold, using Monte Carlo estimation and importance sampling to approximate entropy.

While empirical results demonstrate the practicality of this control scheme, the authors do not provide theoretical analysis of the optimal scaling strategy. Determining rigorous mathematical properties (e.g., stability and convergence guarantees) for the step‑size scaling under the information‑gain constraint remains unresolved and is identified for future research.

References

We show empirical results for controlling this rate and leave any mathematical properties for the optimal scaling to future research.

Amortized variational transdimensional inference (2506.04749 - Davies et al., 5 Jun 2025) in Appendix, Subsection “Controlling the learning rate via the information gain” (apdx:ig_threshold)