Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation (2208.06496v1)

Published 12 Aug 2022 in cs.LG

Abstract: In recent years, using orthogonal matrices has been shown to be a promising approach in improving Recurrent Neural Networks (RNNs) with training, stability, and convergence, particularly, to control gradients. While Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the usage of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and we propose a Neumann series-based Scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley Orthogonal GRU, or simply NC-GRU. We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU as well as several other RNNs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Edison Mucllari (3 papers)
  2. Vasily Zadorozhnyy (6 papers)
  3. Cole Pospisil (2 papers)
  4. Duc Nguyen (19 papers)
  5. Qiang Ye (37 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.