2000 character limit reached
Freely Long-Thinking Transformer (FraiLT) (2401.11626v2)
Published 21 Jan 2024 in cs.LG and cs.CL
Abstract: Freely Long-Thinking Transformer (FraiLT) is an improved transformer model designed to enhance processing capabilities without scaling up size. It utilizes a recursive approach, iterating over a subset of layers multiple times, and introduces iteration encodings to maintain awareness across these cycles. Iteration encoding allows FraiLT to achieve the interpretive depth of larger models in a compact form. When evaluated on a synthetic story dataset, FraiLT outperformed larger models, showcasing its ability to deliver high-quality performance while reducing memory demands. This model represents a step forward towards more efficient and accessible LLMs.
- Transformer-xl: Attentive language models beyond a fixed-length context.
- Universal transformers.
- Ronen Eldan and Yuanzhi Li. 2023a. Tinystories: A dataset of short stories. https://huggingface.co/datasets/roneneldan/TinyStories.
- Ronen Eldan and Yuanzhi Li. 2023b. Tinystories: How small can language models be and still speak coherent english?
- Vitaly L. Galinsky and Lawrence R. Frank. 2020. Universal theory of brain waves: From linear loops to nonlinear synchronized spiking and collective brain rhythms. Physical Review Research, 2(2).
- Training compute-optimal large language models.
- Block-recurrent transformers.
- Albert: A lite bert for self-supervised learning of language representations.
- Delight: Deep and light-weight transformer.
- Learning functions: When is deep better than shallow.
- Matus Telgarsky. 2016. Benefits of depth in neural networks.
- Llama 2: Open foundation and fine-tuned chat models.
- Attention is all you need.