Implicit bias of optimization in general neural network architectures

Characterize the implicit bias induced by gradient-based optimization methods (such as gradient descent, stochastic gradient descent, and Adam) on general neural network architectures, determining which solutions are selected among multiple empirical risk minimizers.

Background

The authors distinguish static properties of loss landscapes from dynamic properties of optimization trajectories, noting that practical algorithms often converge to specific solutions among many possible minimizers.

They remark that the algorithm-dependent selection mechanism—implicit bias—remains poorly understood for general architectures, motivating a need for precise characterizations of the solutions gradient-based methods favor in neural network training.

References

The algorithm biases induced by the optimization methods, known as implicit bias, on general neural network architectures are still largely unknown.

Geometry of Polynomial Neural Networks (2402.00949 - Kubjas et al., 1 Feb 2024) in Section 7 (Optimization), opening paragraphs