Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BP(λ): Online Learning via Synthetic Gradients (2401.07044v1)

Published 13 Jan 2024 in cs.LG

Abstract: Training recurrent neural networks typically relies on backpropagation through time (BPTT). BPTT depends on forward and backward passes to be completed, rendering the network locked to these computations before loss gradients are available. Recently, Jaderberg et al. proposed synthetic gradients to alleviate the need for full BPTT. In their implementation synthetic gradients are learned through a mixture of backpropagated gradients and bootstrapped synthetic gradients, analogous to the temporal difference (TD) algorithm in Reinforcement Learning (RL). However, as in TD learning, heavy use of bootstrapping can result in bias which leads to poor synthetic gradient estimates. Inspired by the accumulate $\mathrm{TD}(\lambda)$ in RL, we propose a fully online method for learning synthetic gradients which avoids the use of BPTT altogether: accumulate $BP(\lambda)$. As in accumulate $\mathrm{TD}(\lambda)$, we show analytically that accumulate $\mathrm{BP}(\lambda)$ can control the level of bias by using a mixture of temporal difference errors and recursively defined eligibility traces. We next demonstrate empirically that our model outperforms the original implementation for learning synthetic gradients in a variety of tasks, and is particularly suited for capturing longer timescales. Finally, building on recent work we reflect on accumulate $\mathrm{BP}(\lambda)$ as a principle for learning in biological circuits. In summary, inspired by RL principles we introduce an algorithm capable of bias-free online learning via synthetic gradients.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Distributional coding of associative learning within projection-defined populations of midbrain dopamine neurons. bioRxiv, 2022.
  2. Circuit architecture of vta dopamine neurons revealed by systematic input-output mapping. Cell, 162(3):622–634, 2015.
  3. Cerebro-cerebellar networks facilitate learning through feedback decoupling. Nature Communications, 14(1):1–18, 2023.
  4. Understanding synthetic gradients and decoupled neural interfaces. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.  904–912. JMLR. org, 2017.
  5. Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  6. Eligibility traces and plasticity on behavioral time scales: experimental support of neohebbian three-factor learning rules. Frontiers in neural circuits, 12:53, 2018.
  7. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
  8. Gradient descent happens in a tiny subspace. arXiv preprint arXiv:1812.04754, 2018.
  9. Long Short-Term Memory. Neural Computation, 9(8):1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735.
  10. Dopamine neurons report an error in the temporal prediction of reward during learning. Nature neuroscience, 1(4):304–309, 1998.
  11. Decoupled neural interfaces using synthetic gradients. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.  1627–1635. JMLR. org, 2017.
  12. Cerebellar supervised learning revisited: biophysical modeling and degrees-of-freedom control. Current opinion in neurobiology, 21(5):791–800, 2011.
  13. 50 years since the Marr, Ito, and Albus models of the cerebellum. Neuroscience, 462:151–174, 2021.
  14. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  15. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941, 2015.
  16. Intact-brain analyses reveal distinct information carried by snc dopamine subcircuits. Cell, 162(3):635–647, 2015.
  17. Backpropagation through time and the brain. Current opinion in neurobiology, 55:82–89, 2019.
  18. Evaluating biological plausibility of learning algorithms the lazy way. In Real Neurons & Hidden Units: Future directions at the intersection of neuroscience and artificial intelligence @ NeurIPS 2019, 2019. URL https://openreview.net/forum?id=HJgPEXtIUS.
  19. A unified framework of online learning algorithms for training recurrent neural networks. Journal of Machine Learning Research, 21(135):1–34, 2020.
  20. Vijay Mohan K Namboodiri and Garret D Stuber. The learning of prospective and retrospective cognitive maps within neural circuits. Neuron, 109(22):3552–3575, 2021.
  21. Medina JF Ohmae S. Plasticity of ponto-cerebellar circuits generates a prospective error signal in climbing fiber. Program No. 579.01. Neuroscience 2019 Abstracts. Chicago, IL: Society for Neuroscience, 2019. Online, 2019.
  22. Cortico-cerebellar networks as decoupling neural interfaces. Advances in Neural Information Processing Systems, 34, 2021.
  23. Current state and future directions for learning in biological recurrent neural networks: A perspective piece. arXiv preprint arXiv:2105.05382, 2021.
  24. Reinforcement learning: An introduction. MIT press, 2018.
  25. Long range arena: A benchmark for efficient transformers. arXiv preprint arXiv:2011.04006, 2020.
  26. True online temporal-difference learning. The Journal of Machine Learning Research, 17(1):5057–5096, 2016.
  27. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280, 1989.
  28. A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science, 345(6204):1616–1620, 2014.

Summary

We haven't generated a summary for this paper yet.