Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
127 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
53 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Freely Long-Thinking Transformer (FraiLT) (2401.11626v2)

Published 21 Jan 2024 in cs.LG and cs.CL

Abstract: Freely Long-Thinking Transformer (FraiLT) is an improved transformer model designed to enhance processing capabilities without scaling up size. It utilizes a recursive approach, iterating over a subset of layers multiple times, and introduces iteration encodings to maintain awareness across these cycles. Iteration encoding allows FraiLT to achieve the interpretive depth of larger models in a compact form. When evaluated on a synthetic story dataset, FraiLT outperformed larger models, showcasing its ability to deliver high-quality performance while reducing memory demands. This model represents a step forward towards more efficient and accessible LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. Transformer-xl: Attentive language models beyond a fixed-length context.
  2. Universal transformers.
  3. Ronen Eldan and Yuanzhi Li. 2023a. Tinystories: A dataset of short stories. https://huggingface.co/datasets/roneneldan/TinyStories.
  4. Ronen Eldan and Yuanzhi Li. 2023b. Tinystories: How small can language models be and still speak coherent english?
  5. Vitaly L. Galinsky and Lawrence R. Frank. 2020. Universal theory of brain waves: From linear loops to nonlinear synchronized spiking and collective brain rhythms. Physical Review Research, 2(2).
  6. Training compute-optimal large language models.
  7. Block-recurrent transformers.
  8. Albert: A lite bert for self-supervised learning of language representations.
  9. Delight: Deep and light-weight transformer.
  10. Learning functions: When is deep better than shallow.
  11. Matus Telgarsky. 2016. Benefits of depth in neural networks.
  12. Llama 2: Open foundation and fine-tuned chat models.
  13. Attention is all you need.

Summary

We haven't generated a summary for this paper yet.