Convergence of the generalized learning dynamics
Prove convergence of the generalized TD-style learning dynamics derived via semi-gradient descent on the cost function C_{F,I}[Z_t] in Appendix “Generalized derivation of learning dynamics,” showing that the iterates of the policy π_t(s'|s) ∝ p(s'|s) exp(F[Z_t](s')) and the endogenous cue update approach the optimal state value V* and its mapped concentration Z* under appropriate conditions (e.g., via a Lyapunov argument).
References
While we do not formally prove convergence of the generalized algorithms, we anticipate they decrease the deviation between $V_t$ and $V*$ if a suitable Lyapunov function exists that vanishes only at the mapped fixed point.
— Optimality theory of stigmergic collective information processing by chemotactic cells
(2407.15298 - Kato et al., 21 Jul 2024) in Appendix, Section “Generalized derivation of learning dynamics”