Gradient descent for computing optimal controls in neural ODEs

Develop detailed, rigorous results on the use of gradient descent to compute optimal controls—i.e., parameter paths (U_s, V_s, b_s) over s ∈ [0,1]—for neural ordinary differential equation models that map inputs x^i to outputs y^i via the terminal state x_{s=1}, including convergence guarantees and characterization of the obtained controls.

Background

The neural ODE viewpoint treats very deep residual networks as continuous-time controlled dynamical systems, with training formulated as an optimal control problem that interpolates data and labels.

While control theory provides tools for analyzing such systems, the specific learning goal—computing the control via gradient descent—lacks detailed theoretical results according to the authors, making this a concrete open direction.

References

However, the specificity of learning theory compared to control theory lies in the goal of computing such control using gradient descent eq:grad-desc. To date, no detailed results exist on this.

— The Mathematics of Artificial Intelligence (2501.10465 - Peyré, 15 Jan 2025) in Section “Very Deep Networks”, Neural Differential Equation paragraph

Gradient descent for computing optimal controls in neural ODEs

Background

References

Related Problems