Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks (2312.13311v1)

Published 20 Dec 2023 in cs.LG and eess.IV

Abstract: Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchronously train network blocks. However, this approach often involves extensive trial-and-error iterations to determine the best configuration for local training. This includes decisions on how to decouple network blocks and which auxiliary networks to use for each block. In our work, we introduce a novel BP-free approach: a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize distinct sub-neural networks separately, where the global loss is only responsible for updating the output layer. The local error signals used in the BP-free model can be computed in parallel, enabling a potential speed-up in the weight update process through parallel implementation. Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations, outperforming models trained with end-to-end backpropagation and other state-of-the-art block-wise learning techniques on datasets such as CIFAR-10 and Tiny-ImageNet. The code is released at https://github.com/Belis0811/BWBPF.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. “Estimating training data influence by tracing gradient descent,” Advances in Neural Information Processing Systems, vol. 33, pp. 19920–19930, 2020.
  2. “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
  3. “Decoupled neural interfaces using synthetic gradients,” in International conference on machine learning. PMLR, 2017, pp. 1627–1635.
  4. “Equivalence of backpropagation and contrastive hebbian learning in a layered network,” Neural computation, vol. 15, no. 2, pp. 441–454, 2003.
  5. “Early inference in energy-based models approximates back-propagation,” arXiv preprint arXiv:1510.02777, 2015.
  6. “Equilibrium propagation: Bridging the gap between energy-based models and backpropagation,” Frontiers in computational neuroscience, vol. 11, pp. 24, 2017.
  7. Arild Nøkland, “Direct feedback alignment provides learning in deep neural networks,” Advances in neural information processing systems, vol. 29, 2016.
  8. “Event-driven random back-propagation: Enabling neuromorphic deep learning machines,” Frontiers in neuroscience, vol. 11, pp. 324, 2017.
  9. “Decoupled greedy learning of cnns,” in International Conference on Machine Learning. PMLR, 2020, pp. 736–745.
  10. “Putting an end to end-to-end: Gradient-isolated learning of representations,” Advances in neural information processing systems, vol. 32, 2019.
  11. “Sedona: Search for decoupled neural networks toward greedy block-wise learning,” in International Conference on Learning Representations, 2020.
  12. “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  13. “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  14. “Training neural networks with local error signals,” in International conference on machine learning. PMLR, 2019, pp. 4839–4850.
  15. “Spike timing–dependent plasticity: a hebbian learning rule,” Annu. Rev. Neurosci., vol. 31, pp. 25–46, 2008.
  16. “Anatomically interpretable deep learning of brain age captures domain-specific cognitive impairment,” Proceedings of the National Academy of Sciences, vol. 120, no. 2, pp. e2214634120, 2023.
  17. “Early identification of alzheimer’s risk using anatomically interpretable estimation of brain age,” in Alzheimer’s Association International Conference. ALZ, 2023.
  18. “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
  19. “Tensorflow: a system for large-scale machine learning.,” in Osdi. Savannah, GA, USA, 2016, vol. 16, pp. 265–283.
  20. “Learning multiple layers of features from tiny images,” 2009.
  21. Ya Le and Xuan Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, pp. 3, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Anzhe Cheng (4 papers)
  2. Zhenkun Wang (34 papers)
  3. Chenzhong Yin (9 papers)
  4. Mingxi Cheng (10 papers)
  5. Heng Ping (9 papers)
  6. Xiongye Xiao (16 papers)
  7. Shahin Nazarian (31 papers)
  8. Paul Bogdan (51 papers)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com