Discrete Key-Value Bottleneck (2207.11240v3)
Abstract: Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. Challenges emerge with non-stationary training data streams such as continual learning. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes. Our paradigm will be to encode; process the representation via a discrete bottleneck; and decode. Here, the input is fed to the pre-trained encoder, the output of the encoder is used to select the nearest keys, and the corresponding values are fed to the decoder to solve the current task. The model can only fetch and re-use a sparse number of these key-value pairs during inference, enabling localized and context-dependent model updates. We theoretically investigate the ability of the discrete key-value bottleneck to minimize the effect of learning under distribution shifts and show that it reduces the complexity of the hypothesis class. We empirically verify the proposed method under challenging class-incremental learning scenarios and show that the proposed model - without any task boundaries - reduces catastrophic forgetting across a wide variety of pre-trained models, outperforming relevant baselines on this task.
- Big self-supervised models advance medical image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3478–3488, 2021.
- Generative vs. discriminative: Rethinking the meta-continual learning. Advances in Neural Information Processing Systems, 34, 2021.
- Baxter, J. A model of inductive bias learning. Journal of artificial intelligence research, 12:149–198, 2000.
- Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pp. 2206–2240. PMLR, 2022.
- Local learning algorithms. Neural computation, 4(6):888–900, 1992.
- Sparse distributed memory is a continual learner. International Conference for Learning Representations (ICLR), 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33:9912–9924, 2020.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- Hierarchical memory networks. arXiv preprint arXiv:1605.07427, 2016.
- Big self-supervised models are strong semi-supervised learners. Advances in neural information processing systems, 33:22243–22255, 2020.
- Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12(3):1–207, 2018.
- An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 215–223. JMLR Workshop and Conference Proceedings, 2011.
- Sequential mastery of multiple visual tasks: Networks naturally learn to learn and forget to forget. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9282–9293, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Online meta-learning. In International Conference on Machine Learning, pp. 1920–1930. PMLR, 2019.
- Efficient architecture search for continual learning. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Retrieval-augmented reinforcement learning. In International Conference on Machine Learning, pp. 7740–7765. PMLR, 2022.
- Retrieval augmented language model pre-training. In International conference on machine learning, pp. 3929–3938. PMLR, 2020.
- Continuous meta-learning without tasks. Advances in neural information processing systems, 33:17571–17581, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Task agnostic continual learning via meta learning. In 4th Lifelong Machine Learning Workshop at ICML 2020, 2020.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7), 2015.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
- Unsupervised deep learning by neighbourhood discovery. In International Conference on Machine Learning, pp. 2849–2858. PMLR, 2019.
- Kanerva, P. Sparse distributed memory. MIT press, 1988.
- Kanerva, P. Sparse distributed memory and related models. Technical report, 1992.
- Robustness implies generalization via data-dependent generalization bounds. In International Conference on Machine Learning, pp. 10866–10894. PMLR, 2022.
- Achieving forgetting prevention and knowledge transfer in continual learning. Advances in Neural Information Processing Systems, 34:22443–22456, 2021.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Learning multiple layers of features from tiny images. 2009.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. International Conference on Learning Representations, 2022.
- Large memory layers with product keys. Advances in Neural Information Processing Systems, 32, 2019.
- Latent retrieval for weakly supervised open domain question answering. arXiv preprint arXiv:1906.00300, 2019.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
- Discrete-valued neural communication. Advances in Neural Information Processing Systems, 34, 2021.
- Nwt: Towards natural audio-to-video generation with representation learning. arXiv preprint arXiv:2106.04283, 2021.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
- Foundational models for continual learning: An empirical study of latent replay. arXiv preprint arXiv:2205.00329, 2022.
- Sketch based memory for neural networks. In International Conference on Artificial Intelligence and Statistics, pp. 3169–3177. PMLR, 2021.
- Combined scaling for open-vocabulary image classification. arXiv e-prints, pp. arXiv–2111, 2021.
- Online task-free continual learning with dynamic sparse distributed memory. In European Conference on Computer Vision, pp. 739–756. Springer, 2022.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763. PMLR, 2021.
- Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems, 32, 2019.
- icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010, 2017.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
- Gradient projection memory for continual learning. International Conference for Learning Representations (ICLR), 2021.
- Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning, pp. 4548–4557. PMLR, 2018.
- Algorithmic insights on continual learning from fruit flies. arXiv preprint arXiv:2107.07617, 2021.
- Continual learning with deep generative replay. Advances in neural information processing systems, 30, 2017.
- Translation-equivariant image quantizer for bi-directional image-text generation. arXiv preprint arXiv:2112.00384, 2021.
- End-to-end memory networks. Advances in neural information processing systems, 28, 2015.
- Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137, 2021.
- Learning to imagine: Diversify memory for incremental learning using unlabeled data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9549–9558, 2022.
- Thrun, S. Is learning the n-th thing any easier than learning the first? Advances in neural information processing systems, 8, 1995.
- Learning to learn. Springer Science & Business Media, 2012.
- Deeper insights into vits robustness towards common corruptions. arXiv preprint arXiv:2204.12143, 2022.
- Patches are all you need? arXiv preprint arXiv:2201.09792, 2022.
- Three scenarios for continual learning. arXiv preprint arXiv:1904.07734, 2019.
- Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
- Efficient feature transformations for discriminative and generative continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13865–13875, 2021.
- Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 139–149, 2022.
- Emergent symbols through binding in external memory. International Conference on Learning Representations, 2021.
- Error-driven incremental learning in deep convolutional neural network for large-scale image classification. In Proceedings of the 22nd ACM international conference on Multimedia, pp. 177–186, 2014.
- How transferable are features in deep neural networks? Advances in neural information processing systems, 27, 2014.
- Vector-quantized image modeling with improved vqgan. International Conference on Learning Representations, 2022.
- Soundstream: An end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021.
- Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 1(8):364–372, 2019.
- Continual learning through synaptic intelligence. In International Conference on Machine Learning, pp. 3987–3995. PMLR, 2017.
- Understanding the robustness in vision transformers. In International Conference on Machine Learning, pp. 27378–27394. PMLR, 2022a.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.