Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer (2304.13061v2)

Published 25 Apr 2023 in cs.LG, cond-mat.dis-nn, cs.CV, and cs.NE

Abstract: In the last few years, the success of Transformers in computer vision has stimulated the discovery of many alternative models that compete with Transformers, such as the MLP-Mixer. Despite their weak inductive bias, these models have achieved performance comparable to well-studied convolutional neural networks. Recent studies on modern Hopfield networks suggest the correspondence between certain energy-based associative memory models and Transformers or MLP-Mixer, and shed some light on the theoretical background of the Transformer-type architectures design. In this paper, we generalize the correspondence to the recently introduced hierarchical Hopfield network, and find iMixer, a novel generalization of MLP-Mixer model. Unlike ordinary feedforward neural networks, iMixer involves MLP layers that propagate forward from the output side to the input side. We characterize the module as an example of invertible, implicit, and iterative mixing module. We evaluate the model performance with various datasets on image classification tasks, and find that iMixer, despite its unique architecture, exhibits stable learning capabilities and achieves performance comparable to or better than the baseline vanilla MLP-Mixer. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Attention is all you need. In Advances in Neural Information Processing Systems, vol. 30. 2017.
  2. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representation. 2021.
  3. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. 2021.
  4. Mlp-mixer: An all-mlp architecture for vision. Advances in Neural Information Processing Systems, 34, 2021.
  5. Melas-Kyriazi, L. Do you even need attention? a stack of feed-forward layers does surprisingly well on imagenet. arXiv preprint arXiv:2105.02723, 2021.
  6. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10819–10829. 2022.
  7. Metaformer baselines for vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  8. Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982.
  9. —. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Sciences, 81(10):3088–3092, 1984.
  10. Large associative memory problem in neurobiology and machine learning. In International Conference on Learning Representations. 2021.
  11. Hopfield networks is all you need. In International Conference on Learning Representations. 2021.
  12. A remark on a paper of krotov and hopfield [arxiv: 2008.06996]. arXiv preprint arXiv:2105.15034, 2021.
  13. Krotov, D. Hierarchical associative memory. arXiv preprint arXiv:2107.06446, 2021.
  14. Invertible residual networks. In International Conference on Machine Learning, pages 573–582. PMLR, 2019.
  15. Deep equilibrium models. Advances in Neural Information Processing Systems, 32, 2019.
  16. Implicit deep learning. SIAM Journal on Mathematics of Data Science, 3(3):930–958, 2021.
  17. Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  18. Are we ready for a new paradigm shift? a survey on visual deep mlp. Patterns, 3(7):100520, 2022.
  19. Global filter networks for image classification. Advances in neural information processing systems, 34:980–993, 2021.
  20. Sequencer: Deep lstm for image classification. In Advances in Neural Information Processing Systems. 2022.
  21. Vision gnn: An image is worth graph of nodes. In Advances in Neural Information Processing Systems. 2022.
  22. Dense associative memory for pattern recognition. In Advances in Neural Information Processing Systems, vol. 29. 2016.
  23. On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168(2):288–299, 2017.
  24. Dense associative memory is robust to adversarial inputs. Neural computation, 30(12):3151–3167, 2018.
  25. Modern hopfield networks and attention for immune repertoire classification. In Advances in Neural Information Processing Systems, vol. 33, pages 18832–18845. 2020.
  26. Transformers from an optimization perspective. In Advances in Neural Information Processing Systems. 2022.
  27. Energy transformer. arXiv preprint arXiv:2302.07253, 2023.
  28. Monotone operator equilibrium networks. Advances in neural information processing systems, 33:10718–10728, 2020.
  29. Stabilizing equilibrium models by jacobian regularization. In International Conference on Machine Learning, pages 554–565. PMLR, 2021.
  30. Multiscale deep equilibrium models. Advances in Neural Information Processing Systems, 33:5238–5250, 2020.
  31. Kawaguchi, K. On the theory of implicit deep learning: Global convergence with implicit layers. In International Conference on Learning Representations (ICLR). 2021.
  32. Deep equilibrium networks are sensitive to initialization statistics. In Proceedings of the 39th International Conference on Machine Learning, vol. 162, pages 136–160. 2022.
  33. Krizhevsky, A. Learning multiple layers of features from tiny images, 2009.
  34. Wightman, R. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
  35. Decoupled weight decay regularization. In International Conference on Learning Representations. 2019.
  36. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
  37. Deep networks with stochastic depth. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 646–661. Springer, 2016.
  38. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
  39. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2019.
  40. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations. 2018.
  41. Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence, vol. 34, pages 13001–13008. 2020.
  42. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703. 2020.
  43. 3d object representations for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13). 2013.
  44. Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision. 2014.
  45. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, vol. 25. 2012.
  46. Vision permutator: A permutable mlp-like architecture for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Toshihiro Ota (9 papers)
  2. Masato Taki (30 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.