Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Trifecta: Three simple techniques for training deeper Forward-Forward networks (2311.18130v2)

Published 29 Nov 2023 in cs.LG and cs.CV

Abstract: Modern machine learning models are able to outperform humans on a variety of non-trivial tasks. However, as the complexity of the models increases, they consume significant amounts of power and still struggle to generalize effectively to unseen data. Local learning, which focuses on updating subsets of a model's parameters at a time, has emerged as a promising technique to address these issues. Recently, a novel local learning algorithm, called Forward-Forward, has received widespread attention due to its innovative approach to learning. Unfortunately, its application has been limited to smaller datasets due to scalability issues. To this end, we propose The Trifecta, a collection of three simple techniques that synergize exceptionally well and drastically improve the Forward-Forward algorithm on deeper networks. Our experiments demonstrate that our models are on par with similarly structured, backpropagation-based models in both training speed and test accuracy on simple datasets. This is achieved by the ability to learn representations that are informative locally, on a layer-by-layer basis, and retain their informativeness when propagated to deeper layers in the architecture. This leads to around 84% accuracy on CIFAR-10, a notable improvement (25%) over the original FF algorithm. These results highlight the potential of Forward-Forward as a genuine competitor to backpropagation and as a promising research avenue.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Greedy layerwise learning can scale to imagenet, 2019.
  2. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994. doi: 10.1109/72.279181.
  3. Greedy layer-wise training of deep networks. In B. Schölkopf, J. Platt, and T. Hoffman (eds.), Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006. URL https://proceedings.neurips.cc/paper_files/paper/2006/file/5da713a690c067105aeb2fae32403405-Paper.pdf.
  4. Palm: Scaling language modeling with pathways, 2022.
  5. Error-driven input modulation: Solving the credit assignment problem without a backward pass, 2023.
  6. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  7. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11(19):625–660, 2010. URL http://jmlr.org/papers/v11/erhan10a.html.
  8. The pile: An 800gb dataset of diverse text for language modeling, 2020.
  9. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, 2010.
  10. Interlocking backpropagation: Improving depthwise model-parallelism, 2022.
  11. Stephen Grossberg. Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11(1):23–63, 1987. ISSN 0364-0213. doi: https://doi.org/10.1016/S0364-0213(87)80025-3. URL https://www.sciencedirect.com/science/article/pii/S0364021387800253.
  12. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, 2015a.
  13. Deep residual learning for image recognition, 2015b.
  14. Geoffrey Hinton. The forward-forward algorithm: Some preliminary investigations, 2022.
  15. Optimal perceptual inference. pp.  448–453, 01 1983.
  16. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, July 2006.
  17. Long short-term memory. Neural computation, 9:1735–80, 12 1997. doi: 10.1162/neco.1997.9.8.1735.
  18. Gpipe: Efficient training of giant neural networks using pipeline parallelism, 2019.
  19. Decoupled parallel backpropagation with convergence guarantee, 2018.
  20. Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.
  21. Decoupled neural interfaces using synthetic gradients, 2017.
  22. Alex Krizhevsky. Learning multiple layers of features from tiny images. pp.  32–33, 2009. URL https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
  23. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
  24. Parallel training of deep networks with local updates, 2021.
  25. MNIST handwritten digit database. 2010. URL http://yann.lecun.com/exdb/mnist/.
  26. Handwritten digit recognition with a back-propagation network. In D. Touretzky (ed.), Advances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann, 1989. URL https://proceedings.neurips.cc/paper_files/paper/1989/file/53c3bce66e43be4f209556518c2fcb54-Paper.pdf.
  27. Symba: Symmetric backpropagation-free contrastive learning with forward-forward algorithm for optimizing convergence, 2023.
  28. Random feedback weights support learning in deep neural networks, 2014.
  29. Focal loss for dense object detection, 2018.
  30. S Lowe. Reimplementation of the forward-forward algorithm. https://github.com/loeweX/Forward-Forward, 2023.
  31. Putting an end to end-to-end: Gradient-isolated learning of representations, 2020.
  32. A white paper on neural network quantization, 2021.
  33. Reading digits in natural images with unsupervised feature learning. NIPS, 01 2011.
  34. Arild Nøkland. Direct feedback alignment provides learning in deep neural networks, 2016.
  35. Training neural networks with local error signals, 2019.
  36. The predictive forward-forward algorithm, 2023.
  37. M Pezeshki. pytorch-forward-forward. https://github.com/mohammadpz/pytorch_forward_forward, 2023.
  38. PyTorch Contributors. Pytorch layernorm documentation. https://pytorch.org/docs/2.0/generated/torch.nn.LayerNorm.html, 2023.
  39. Learning Internal Representations by Error Propagation, pp. 318–362. 1987.
  40. Mobilenetv2: Inverted residuals and linear bottlenecks, 2019.
  41. Laion-5b: An open large-scale dataset for training next generation image-text models, 2022.
  42. Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2):336–359, oct 2019. doi: 10.1007/s11263-019-01228-7. URL https://doi.org/10.1007%2Fs11263-019-01228-7.
  43. Opening the black box of deep neural networks via information, 2017.
  44. Very deep convolutional networks for large-scale image recognition, 2015.
  45. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014. URL http://jmlr.org/papers/v15/srivastava14a.html.
  46. Going deeper with convolutions, 2014.
  47. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
Citations (2)

Summary

We haven't generated a summary for this paper yet.