2000 character limit reached
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models (2404.07839v2)
Published 11 Apr 2024 in cs.LG, cs.AI, and cs.CL
Abstract: We introduce RecurrentGemma, a family of open LLMs which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
- Jax: composable transformations of python+ numpy programs. 2018.
- Griffin: Mixing gated linear recurrences with local attention for efficient language models, 2024.
- Gemini Team. Gemini: A family of highly capable multimodal models, 2023.
- Gemma Team. Gemma: Open models based on gemini research and technology, 2024.
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
- Mistral 7b, 2023.
- T. Kudo and J. Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing, 2018.
- Resurrecting recurrent neural networks for long sequences. arXiv preprint arXiv:2303.06349, 2023.
- Attention is all you need, 2023.
- Aleksandar Botev (17 papers)
- Soham De (38 papers)
- Anushan Fernando (3 papers)
- George-Cristian Muraru (2 papers)
- Ruba Haroun (3 papers)
- Leonard Berrada (14 papers)
- Razvan Pascanu (138 papers)
- Pier Giuseppe Sessa (26 papers)
- Robert Dadashi (25 papers)
- Léonard Hussenot (25 papers)
- Johan Ferret (24 papers)
- Sertan Girgin (24 papers)
- Olivier Bachem (52 papers)
- Alek Andreev (7 papers)
- Kathleen Kenealy (11 papers)
- Thomas Mesnard (18 papers)
- Cassidy Hardin (5 papers)
- Surya Bhupatiraju (11 papers)
- Shreya Pathak (12 papers)
- Laurent Sifre (21 papers)