Trainability and gradient stability for general multi-layer neural oscillators
Develop a rigorous trainability theory for the multi-layer neural oscillator architecture defined by the second-order neural ODEs ddot{y}^ℓ(t) = σ(w^ℓ ⊙ y^ℓ(t) + V^ℓ y^{ℓ−1}(t) + b^ℓ) for ℓ = 1,…,L with y^0(t) = u(t) and affine readout z(t) = A y^L(t) + c, by extending existing results on mitigation of exploding and vanishing gradients established for CoRNN, UnICORNN, and GraphCON. In particular, analyze the adjoint system associated with this second-order neural ODE to establish gradient stability and derive bounds on backpropagated gradients during training.
References
We do mention that trainability of oscillatory systems would profit from the fact that oscillatory dynamics is (gradient) stable and this formed the basis of the proofs of mitigation of the exploding and vanishing gradient problem for CoRNN in and UnICORNN in as well as GraphCON in . Extending these results to the general second-order neural ODE eq:layered, for instance through an analysis of the associated adjoint system, is left for future work.