Generalized BackPropagation, Étude De Cas: Orthogonality (1611.05927v1)

Published 17 Nov 2016 in cs.CV

Abstract: This paper introduces an extension of the backpropagation algorithm that enables us to have layers with constrained weights in a deep network. In particular, we make use of the Riemannian geometry and optimization techniques on matrix manifolds to step outside of normal practice in training deep networks, equipping the network with structures such as orthogonality or positive definiteness. Based on our development, we make another contribution by introducing the Stiefel layer, a layer with orthogonal weights. Among various applications, Stiefel layers can be used to design orthogonal filter banks, perform dimensionality reduction and feature extraction. We demonstrate the benefits of having orthogonality in deep networks through a broad set of experiments, ranging from unsupervised feature learning to fine-grained image classification.

Citations (61)

View on Semantic Scholar

Summary

The paper introduces the Generalized BackPropagation (gBP) algorithm, which leverages Riemannian optimization to train deep neural networks while maintaining weight constraints like orthogonality.
It proposes novel Stiefel layers with orthogonal weights and empirically demonstrates their advantages in tasks like unsupervised feature learning and image classification.
The use of Stiefel layers is also shown to simplify deep networks through effective low-rank approximations, improving both efficiency and classification accuracy.

Generalized BackPropagation and Orthogonality: Introducing Stiefel Layers

The paper, authored by Mehrtash Harandi and Basura Fernando, presents a significant extension to the traditional backpropagation algorithm, introducing the Generalized BackPropagation (gBP) algorithm. This algorithm offers the potential to include layers with constrained weights within deep neural networks, specifically focusing on orthogonality and positive definiteness constraints. Through the application of Riemannian geometry and optimization techniques on matrix manifolds, the authors propose a novel approach to training deep networks that departs from conventional methods.

Core Contributions

The primary contributions of the paper include:

Generalized BackPropagation Algorithm: The introduction of gBP, which leverages Riemannian optimization techniques to maintain structural properties, allowing for the inclusion of constraints such as orthogonality within network layers.
Stiefel Layers: The development of Stiefel layers, a type of network layer characterized by orthogonal weights. These layers can be utilized for designing orthogonal filter banks, dimensionality reduction, and unsupervised feature extraction.
Empirical Validation: Extensive experiments demonstrate the advantages of incorporating orthogonality into deep networks across multiple tasks, including unsupervised feature learning on face images and fine-grained image classification on datasets such as CIFAR10, CIFAR100, and STL.
Network Simplification: The paper also proposes using Stiefel layers for low-rank approximations to simplify deep networks by reducing the number of parameters, showcasing its effectiveness in terms of computational efficiency and improved classification accuracy.

Technical Overview

The gBP algorithm addresses a key limitation in traditional backpropagation, specifically its inability to preserve constraints like orthogonality on weight matrices when updating network parameters. To resolve this, gBP operates by projecting gradient updates onto a Riemannian manifold rather than relying on Euclidean space, thereby maintaining structural constraints throughout optimization.

The Stiefel manifold serves as the mathematical framework for gBP concerning orthogonality. The paper delineates the role of gBP in training networks constrained within the Stiefel manifold, leveraging Riemannian gradient descent techniques and projection methodologies that ensure parameter updates remain within the manifold.

The concept of momentum in gBP is extended using Riemannian connections, which facilitate the coherent transport of tangent vectors across manifold tangent spaces, promising convergence similar to Euclidean gradient descent methods.

Implications and Future Directions

The introduction of gBP and Stiefel layers suggests novel pathways for enhancing deep learning architectures especially under constrained scenarios. This includes potential improvements in generalization through better constraining of parameter space, offering insights into the utilization of orthogonal structures within neural architectures.

Moreover, the paper opens new avenues for exploring other structural constraints like positive-definiteness and subspace constraints, which could further diversify the applications and robustness of deep learning models. Future research could interrogate such structures within both supervised and unsupervised learning paradigms, potentially leading to enriched feature representations and reduced model complexity.

In conclusion, the paper contributes a substantive advance in deep learning through generalized optimization techniques, presenting orthogonality as a valuable property for network layers, and setting the stage for subsequent exploration into manifold-constrained neural network architectures.

Related Papers

Tweets

https://twitter.com/1313842026803195905/status/1734701134852333668

YouTube

Show All Videos