Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group (1901.08428v3)

Published 24 Jan 2019 in cs.LG and stat.ML

Abstract: We introduce a novel approach to perform first-order optimization with orthogonal and unitary constraints. This approach is based on a parametrization stemming from Lie group theory through the exponential map. The parametrization transforms the constrained optimization problem into an unconstrained one over a Euclidean space, for which common first-order optimization methods can be used. The theoretical results presented are general enough to cover the special orthogonal group, the unitary group and, in general, any connected compact Lie group. We discuss how this and other parametrizations can be computed efficiently through an implementation trick, making numerically complex parametrizations usable at a negligible runtime cost in neural networks. In particular, we apply our results to RNNs with orthogonal recurrent weights, yielding a new architecture called expRNN. We demonstrate how our method constitutes a more robust approach to optimization with orthogonal constraints, showing faster, accurate, and more stable convergence in several tasks designed to test RNNs.

Citations (181)

View on Semantic Scholar

Summary

The paper introduces a simple parametrization using the exponential map to transform constrained optimization problems in neural networks with orthogonal and unitary constraints into unconstrained problems, enabling the use of standard methods.
A key theoretical finding is that this exponential parametrization avoids introducing unwanted local minima or saddle points, leading to more stable and robust convergence in tasks like training orthogonal RNNs.
Empirical results show the method achieves faster convergence and better performance on standard RNN tasks than previous techniques, improving neural network optimization strategies.

Overview of "Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group"

This paper discusses an innovative approach to first-order optimization in neural networks with orthogonal and unitary constraints. The authors, Mario Lezcano-Casado and David Martinez-Rubio from the University of Oxford, leverage Lie group theory, particularly the exponential map, to transform constrained optimization problems into unconstrained ones over Euclidean space, facilitating the use of common optimization methods like gradient descent.

Parametrization and Theoretical Insights

The paper introduces a parametrization using the exponential map that transforms the constrained optimization problem in neural networks into a manageable unconstrained form. This approach is applicable to any connected compact Lie group, including the special orthogonal group $\SO{n}$ and the unitary group $\U{n}$. The exponential map allows for efficient calculations within neural networks, mitigating numerical complexity without significant runtime costs.

A key theoretical result is that the exponential parametrization does not introduce additional minima or saddle points, unlike other methods such as those proposed by Helfrich (2018) and Maduranga (2018). This feature results in more stable and robust convergence in tasks involving recurrent neural networks (RNNs), particularly when trained with orthogonal recurrent weights.

Implementation and Numerical Results

The paper details how to implement this parametrization efficiently, notably through approximations of the matrix exponential via Padé approximants coupled with the scale-squaring trick to maintain numerical precision. The authors provide empirical evidence demonstrating the superiority of their method through experiments on standard RNN tasks like the copying memory task, pixel-permuted MNIST, and the TIMIT speech dataset. The proposed architecture, termed exprnn, exhibits faster convergence and improved performance compared to other approaches, including variants of unitary RNNs (urnn) and those using Cayley transform-based parametrizations (scornn).

Broader Implications and Future Directions

This research significantly enhances the landscape of optimization techniques in neural networks, particularly in handling orthogonal constraints. The insights into parametrization can be potentially applied to other neural network architectures, including deep feedforward networks, where orthogonality might act as implicit regularization, improving generalization.

Future work could explore coupling the exponential parametrization with LSTM or GRU networks to exploit orthogonal constraints further. Additionally, extending these methods to homogeneous Riemannian manifolds like the Stiefel manifold could open up new avenues in optimizing neural networks with complex architectures.

This paper's contribution lies in providing a theoretically grounded, computationally efficient method for imposing orthogonal constraints, paving the way for enhanced neural network optimization strategies. The algorithm and code are available at the authors' GitHub repository, making these tools accessible for further research and development in the field.