Transformers Can Do Arithmetic with the Right Embeddings (2405.17399v1)

Published 27 May 2024 in cs.LG and cs.AI

Abstract: The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

PDF HTML Abstract

Enhanced Arithmetic Capabilities in Transformers through Abacus Embeddings and Recurrence

The paper investigates the inherent challenges faced by transformer models, particularly in the context of arithmetic tasks, and proposes a solution to address these deficits. The primary contribution involves introducing Abacus Embeddings, which significantly improve positional representations of digits, and integrating recurrent layers to enhance the transformer’s reasoning capabilities.

Core Contributions and Methodologies

The authors identify that transformers struggle with arithmetic due to their difficulty in maintaining exact positional information of digits within sequences. To remedy this, they propose Abacus Embeddings, a novel positional embedding technique that encodes digit positions relative to the start of their respective numbers. This approach diverges from traditional positional embeddings by providing identical embeddings for digits of the same significance, hence preserving the positional hierarchy required for arithmetic operations.

Key Insights and Numerical Results

Abacus Embeddings:
- These embeddings significantly boost transformer performance on arithmetic. For example, models trained with Abacus Embeddings generalize to addition problems up to 120 digits in length with state-of-the-art generalization, representing a 6x factor relative to the training distribution—a notable enhancement over the previous 2.5x state-of-the-art.
- Models utilizing Abacus Embeddings reached up to 99% accuracy on 100-digit addition problems.
Architectural Enhancements:
- Input Injection: Introducing skip connections that propagate input features into each transformer layer was found to reduce generalization errors by 50% when layered with Abacus Embeddings.
- Recurrent Layers: By looping transformer layers, notable improvements were observed in multi-step reasoning tasks. The looped transformer, integrated with Abacus Embeddings, showed near-perfect generalization on extensive arithmetic problems.
- These methods collectively reduced errors from 92.9% to 99.1% in out-of-distribution accuracy, translating to an 87% reduction in error compared to standard architectures.

Extended Implications for Algorithmic Reasoning

The success of Abacus Embeddings extends beyond addition to other algorithmic reasoning tasks like multiplication and sorting.

Multiplication:
- Transformers augmented with Abacus Embeddings achieved near-perfect accuracy when tested on multiplication problems involving operands of up to 15 digits.
- The performance remains robust even as complexity increases, highlighting the embeddings’ capabilities in handling more intricate arithmetic tasks.
Sorting:
- The paper explores sorting problems, presenting arrays of variable length numbers. Abacus Embeddings enhance the model's ability to generalize across diverse scenarios, performing significantly better in generalization tasks than other embeddings.
- Different architectural setups (standard transformer, transformer with input injection, and looped transformer) were tested, showing varied results. Looped transformers excelled at accurately identifying the minimum element in the array during extrapolation tasks.

Future Prospects and Implications

This paper advances the understanding of transformer capabilities in performing arithmetic and algorithmic reasoning tasks. The findings open several avenues for future research:

Integration with General-Purpose Models:
- Investigating the combination of Abacus Embeddings with embeddings more suited for natural language, such as Rotary Embeddings (RoPE) and Functional Interpolation for Relative Position Embeddings (FIRE), indicates substantial potential. This amalgamation can create a robust embedding strategy that maintains high performance across arithmetic and broader NLP tasks.
Broader Range of Algorithmic Tasks:
- Extending the current approach to a more diverse set of algorithmic reasoning challenges can help in developing more versatile models and enhance the ability of transformers to generalize in increasingly complex scenarios.
Improved Positional Embedding Strategies:
- Future research might explore further refinements in positional embeddings, especially those that facilitate better length generalization without significant computational overhead.

In conclusion, the paper presents a noteworthy advance in improving transformer models' performance on arithmetic tasks through the introduction of Abacus Embeddings and recurrent architectures. These techniques not only achieve significant performance gains but also demonstrate promising transferability to other complex algorithmic procedures, paving the way for more practical and theoretically robust applications in AI.

PDF Markdown Bookmark Chat (Pro)

References (47)

Authors (11)

Sean McLeish (5 papers)
Arpit Bansal (17 papers)
Alex Stein (5 papers)
Neel Jain (13 papers)
John Kirchenbauer (21 papers)
Brian R. Bartoldson (23 papers)
Bhavya Kailkhura (108 papers)
Abhinav Bhatele (33 papers)
Jonas Geiping (73 papers)
Avi Schwarzschild (35 papers)
Tom Goldstein (226 papers)

Citations (20)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - mcleish7/arithmetic: Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (2024) (136 stars)

Tweets

https://twitter.com/SeanMcleish/status/1795481814553018542

https://twitter.com/iScienceLuvr/status/1795298337354088867

https://twitter.com/_akhaliq/status/1795309108171542909

https://twitter.com/arankomatsuzaki/status/1795300845942382701

https://twitter.com/_xjdr/status/1796334045682667752

https://twitter.com/omarsar0/status/1795552696432202045