Global Convergence Rate of Deep Equilibrium Models with General Activations (2302.05797v3)

Published 11 Feb 2023 in stat.ML and cs.LG

Abstract: In a paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution for the quadratic loss function at a linear convergence rate. This paper shows that this fact still holds for DEQs with any generally bounded activation with bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we must create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.

References (30)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Global Convergence Rate of Deep Equilibrium Models with General Activations (2302.05797v3)

Summary

Related Papers

Tweets