Trainability and gradient stability for general multi-layer neural oscillators

Develop a rigorous trainability theory for the multi-layer neural oscillator architecture defined by the second-order neural ODEs ddot{y}^ℓ(t) = σ(w^ℓ ⊙ y^ℓ(t) + V^ℓ y^{ℓ−1}(t) + b^ℓ) for ℓ = 1,…,L with y^0(t) = u(t) and affine readout z(t) = A y^L(t) + c, by extending existing results on mitigation of exploding and vanishing gradients established for CoRNN, UnICORNN, and GraphCON. In particular, analyze the adjoint system associated with this second-order neural ODE to establish gradient stability and derive bounds on backpropagated gradients during training.

Background

While the paper establishes universality of multi-layer neural oscillators for approximating causal and continuous operators, it does not address trainability or generalization. Prior works have proved gradient stability and mitigation of exploding/vanishing gradients for specific oscillator-based architectures such as CoRNN, UnICORNN, and GraphCON. Extending such analysis to the general second-order neural ODE that defines multi-layer neural oscillators would provide theoretical foundations for training dynamics beyond these special cases.

The authors suggest that analyzing the adjoint system of the general second-order neural ODE may be a viable approach to derive gradient stability results, but this analysis is not carried out in the paper and is explicitly deferred to future work.

References

We do mention that trainability of oscillatory systems would profit from the fact that oscillatory dynamics is (gradient) stable and this formed the basis of the proofs of mitigation of the exploding and vanishing gradient problem for CoRNN in and UnICORNN in as well as GraphCON in . Extending these results to the general second-order neural ODE eq:layered, for instance through an analysis of the associated adjoint system, is left for future work.

Neural Oscillators are Universal  (2305.08753 - Lanthaler et al., 2023) in Discussion (Section 4)