Reviving and Improving Recurrent Back-Propagation (1803.06396v4)

Published 16 Mar 2018 in cs.LG and stat.ML

Abstract: In this paper, we revisit the recurrent back-propagation (RBP) algorithm, discuss the conditions under which it applies as well as how to satisfy them in deep neural networks. We show that RBP can be unstable and propose two variants based on conjugate gradient on the normal equations (CG-RBP) and Neumann series (Neumann-RBP). We further investigate the relationship between Neumann-RBP and back propagation through time (BPTT) and its truncated version (TBPTT). Our Neumann-RBP has the same time complexity as TBPTT but only requires constant memory, whereas TBPTT's memory cost scales linearly with the number of truncation steps. We examine all RBP variants along with BPTT and TBPTT in three different application domains: associative memory with continuous Hopfield networks, document classification in citation networks using graph neural networks and hyperparameter optimization for fully connected networks. All experiments demonstrate that RBPs, especially the Neumann-RBP variant, are efficient and effective for optimizing convergent recurrent neural networks. Code is released at: \url{https://github.com/lrjconan/RBP}.

Citations (114)

View on Semantic Scholar

Summary

The paper revisits RBP by introducing two novel variants, CG-RBP and Neumann-RBP, to address longstanding stability issues.
The paper demonstrates that Neumann-RBP achieves constant memory efficiency and competitive performance compared to BPTT and TBPTT.
The paper validates its methods through experiments in associative memory, document classification, and hyperparameter optimization.

Reviving and Improving Recurrent Back-Propagation

The paper "Reviving and Improving Recurrent Back-Propagation" revisits the Recurrent Back-Propagation (RBP) algorithm, situating it within the modern deep learning landscape. The authors address the algorithm's instability issues and propose two novel variants: one grounded in conjugate gradients (CG-RBP) and another utilizing Neumann series (Neumann-RBP). Through these innovations, the paper provides insights into gradient approximation for recurrent neural networks (RNNs) and draws comparisons with Back Propagation Through Time (BPTT) and its truncated version (TBPTT).

Key Contributions

The authors' primary contributions lie in the re-examination and enhancement of RBP, offering solutions to the algorithm's long-standing stability challenges. They highlight two novel approaches:

CG-RBP: This variant involves the use of conjugate gradient methods applied to normal equations, aiming to stabilize and refine the solution to the system of linear equations inherent in standard RBP.
Neumann-RBP: By deploying Neumann series, this variant maintains the same time complexity as TBPTT, while offering significant improvements in memory efficiency, requiring only constant memory compared to the linear scaling with truncation steps typical of TBPTT.

Moreover, the paper elucidates the relationship between Neumann-RBP and TBPTT, contributing both to theoretical understanding and practical implementations. Through this work, the paper bridges an existing gap by providing an efficient and practical alternative to traditional RBP which suffered from stability and memory inefficiencies.

Experimental Validation

The efficacy of the proposed methods is demonstrated through various experiments across distinct application domains:

Associative Memory: Utilizing continuous Hopfield networks, Neumann-RBP shows stability and performance gains over traditional BPTT and original RBP, particularly visible in retrieval accuracy and the smoothness of training and validation curves.
Document Classification: On citation networks like Cora and Pubmed, GNNs optimized with Neumann-RBP outperform both BPTT and TBPTT, showcasing substantial improvements in computational efficiency and predictive accuracy.
Hyperparameter Optimization: Neumann-RBP's robust performance is demonstrated in tuning hyperparameters for fully connected networks, showing favorable results over traditional techniques. Its computational efficiency is highlighted, providing significant reductions in both runtime and memory compared to standard BPTT.

Implications and Future Work

The advancements presented in the paper bear consequential implications for the optimization of recurrent structures in deep learning models. The variants proposed introduce new pathways for stabilizing training processes, particularly for applications requiring the handling of long sequences or complex dynamical systems. Neumann-RBP’s constant memory requirement opens avenues for deploying complex RNNs in resource-constrained environments.

Moving forward, potential areas of exploration include scaling the methods for large-scale deep neural networks and further investigating the implications of these techniques on different neural architectures and application domains. The possibility of hybrid models combining strengths from multiple variants might provide additional insights and improvements.

Through careful reconsideration and enhancement of RBP, this paper contributes meaningfully to the ongoing discourse on efficient and stable training methodologies for RNNs, setting a foundation for future empirical and theoretical advancements in the field.

PDF Markdown

Related Papers

GitHub

GitHub - lrjconan/RBP: Recurrent Back Propagation, Back Propagation Through Optimization, ICML 2018 (38 stars)