CoLLEGe: Concept Embedding Generation for Large Language Models (2403.15362v2)
Abstract: Current LLMs are unable to quickly learn new concepts on the fly, often requiring a more involved finetuning process to learn robustly. Prompting in-context is not robust to context distractions, and often fails to confer much information about the new concepts. Classic methods for few-shot word learning in NLP, relying on global word vectors, are less applicable to LLMs. In this paper, we introduce a novel approach named CoLLEGe (Concept Learning with Language Embedding Generation) to modernize few-shot concept learning. CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts using a small number of example sentences or definitions. Our primary meta-learning objective is simply to facilitate a LLM to make next word predictions in forthcoming sentences, making it compatible with LLM pretraining. We design a series of tasks to test new concept learning in challenging real-world scenarios, including new word acquisition, definition inference, and verbal reasoning, and demonstrate that our method succeeds in each setting without task-specific training. Code and data for our project can be found at https://college-concept-learning.github.io/
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Learning to few-shot learn across diverse natural language classification tasks. In Scott, D., Bel, N., and Zong, C. (eds.), Proceedings of the 28th International Conference on Computational Linguistics, pp. 5108–5123, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.448. URL https://aclanthology.org/2020.coling-main.448.
- Recurrent memory transformer. Advances in Neural Information Processing Systems, 35:11079–11091, 2022.
- Meta-learning via language model in-context tuning. In Muresan, S., Nakov, P., and Villavicencio, A. (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 719–730, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.53. URL https://aclanthology.org/2022.acl-long.53.
- Adapting language models to compress contexts. In Bouamor, H., Pino, J., and Bali, K. (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 3829–3846, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.232. URL https://aclanthology.org/2023.emnlp-main.232.
- Supervised learning of universal sentence representations from natural language inference data. In Palmer, M., Hwa, R., and Riedel, S. (eds.), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 670–680, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1070. URL https://aclanthology.org/D17-1070.
- Bertin: Efficient pre-training of a spanish language model using perplexity sampling. Proces. del Leng. Natural, 68:13–23, 2022. URL https://api.semanticscholar.org/CorpusID:250526558.
- Elo, A. E. The Rating of Chessplayers, Past and Present. Arco Pub., New York, 1978. ISBN 0668047216 9780668047210. URL http://www.amazon.com/Rating-Chess-Players-Past-Present/dp/0668047216.
- Context-aware meta-learning. arXiv preprint arXiv:2310.10971, 2023.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=NAQvF08TcyG.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
- In-context autoencoder for context compression in a large language model. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=uREj4ZuGJE.
- Induction networks for few-shot text classification. In Inui, K., Jiang, J., Ng, V., and Wan, X. (eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3904–3913, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1403. URL https://aclanthology.org/D19-1403.
- High-risk learning: acquiring new word vectors from tiny data. In Palmer, M., Hwa, R., and Riedel, S. (eds.), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 304–309, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1030. URL https://aclanthology.org/D17-1030.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Learning to learn to disambiguate: Meta-learning for few-shot word sense disambiguation. In Cohn, T., He, Y., and Liu, Y. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4517–4533, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.405. URL https://aclanthology.org/2020.findings-emnlp.405.
- Drinking from a firehose: Continual learning with web-scale natural language. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5684–5696, 2022.
- Meta-learning online adaptation of language models. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023. URL https://openreview.net/forum?id=jPrl18r4RA.
- Few-shot representation learning for out-of-vocabulary words. In Korhonen, A., Traum, D., and Màrquez, L. (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4102–4112, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1402. URL https://aclanthology.org/P19-1402.
- Fasttext.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651, 2016.
- Kaplan. GRE Prep 2019. Kaplan Publishing, New York, 2019.
- A la carte embedding: Cheap but effective induction of semantic feature vectors. In Gurevych, I. and Miyao, Y. (eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 12–22, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1002. URL https://aclanthology.org/P18-1002.
- Skip-thought vectors. Advances in neural information processing systems, 28, 2015.
- Human-like systematic generalization through a meta-learning neural network. Nature, 623:115–121, 2023.
- One-shot and few-shot learning of word embeddings. arXiv preprint arXiv:1710.10280, 2017.
- Multimodal word meaning induction from minimal exposure to natural text. Cognitive science, 41 Suppl 4:677–705, 2017. URL https://api.semanticscholar.org/CorpusID:205032138.
- Meta learning for natural language processing: A survey. arXiv preprint arXiv:2205.01500, 2022.
- Metamt, a meta learning method leveraging multiple domain data for low resource machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 8245–8252, 2020.
- ReadOnce transformers: Reusable representations of text for transformers. In Zong, C., Xia, F., Li, W., and Navigli, R. (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 7129–7141, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.554. URL https://aclanthology.org/2021.acl-long.554.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
- Better word representations with recursive neural networks for morphology. In Hockenmaier, J. and Riedel, S. (eds.), Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL https://aclanthology.org/W13-3512.
- Addressing the rare word problem in neural machine translation. In Zong, C. and Strube, M. (eds.), Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 11–19, Beijing, China, July 2015. Association for Computational Linguistics. doi: 10.3115/v1/P15-1002. URL https://aclanthology.org/P15-1002.
- Meta-learning for improving rare word recognition in end-to-end asr. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5974–5978. IEEE, 2021.
- Efficient estimation of word representations in vector space. In International Conference on Learning Representations, 2013. URL https://api.semanticscholar.org/CorpusID:5959482.
- Miller, G. A. WordNet: A lexical database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994, 1994. URL https://aclanthology.org/H94-1111.
- Learning to compress prompts with gist tokens. Advances in Neural Information Processing Systems, 36, 2024.
- OpenAI. Gpt-4 technical report. 2023. URL https://api.semanticscholar.org/CorpusID:257532815.
- GloVe: Global vectors for word representation. In Moschitti, A., Pang, B., and Daelemans, W. (eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, Doha, Qatar, October 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1162. URL https://aclanthology.org/D14-1162.
- Domain adaptive dialog generation via meta learning. In Korhonen, A., Traum, D., and Màrquez, L. (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2639–2649, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1253. URL https://aclanthology.org/P19-1253.
- Nugget: Neural agglomerative embeddings of text. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 28337–28350. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/qin23a.html.
- Wandering within a world: Online contextualized few-shot learning. ArXiv, abs/2007.04546, 2020. URL https://api.semanticscholar.org/CorpusID:220424770.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv, abs/1910.01108, 2019.
- Learning semantic representations for novel words: Leveraging both form and context. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6965–6973, Jul. 2019. doi: 10.1609/aaai.v33i01.33016965. URL https://ojs.aaai.org/index.php/AAAI/article/view/4675.
- Schmidhuber, J. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4:131–139, 1992. URL https://api.semanticscholar.org/CorpusID:16683347.
- Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
- Memory, show the way: Memory based few shot word representation learning. In Riloff, E., Chiang, D., Hockenmaier, J., and Tsujii, J. (eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1435–1444, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1173. URL https://aclanthology.org/D18-1173.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
- Cross-thought for sentence encoder pre-training. In Webber, B., Cohn, T., He, Y., and Liu, Y. (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 412–421, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.30. URL https://aclanthology.org/2020.emnlp-main.30.
- Memory networks. In Bengio, Y. and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1410.3916.
- RECOMP: Improving retrieval-augmented LMs with context compression and selective augmentation. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=mlJLVigNHp.
- Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. In Conference on Empirical Methods in Natural Language Processing, 2021. URL https://api.semanticscholar.org/CorpusID:237304362.