- The paper shows that LLMs encode numbers using a digit-wise, circular base 10 representation rather than direct numeric values.
- Probing experiments on Llama 3 8B and Mistral 7B reveal high accuracy in numeric recovery by treating digits individually.
- Causal interventions using mod 10 transformations confirmed that modifying digit representations can effectively alter numeral outputs.
LLMs Encode Numbers Using Digit Representations in Base 10
This paper investigates the underlying mechanisms by which LLMs represent numerical data, specifically focusing on whether these models encode numbers in a manner that reflects their numerical values. The paper diverges from conventional assumptions by revealing that LLMs employ a digit-wise circular representation in base 10, rather than direct numeric value representation, when processing numbers.
Key Findings
The authors conducted probing experiments and causal interventions on two popular LLMs: Llama 3 8B and Mistral 7B. They discovered that these models internally represent numbers by encoding each digit separately in a circular manner using base 10. This digit-wise approach means that errors in numerical reasoning tasks are distributed across digits, revealing why LLMs often make mistakes that are close in string-similarity but not necessarily in numeric value.
- Error Distribution Analysis: Errors made by LLMs in numerical tasks appear to be scattered across digits rather than centered around numeric value, contradicting what would be expected if models operated with value-based representations.
- Probing Experiments: Through digit-wise circular probes, it was demonstrated that numeric value recovery from LLM representations is feasible with high accuracy when considering digits individually in base 10 rather than as whole numbers. This result was consistent over different layers of the models, proving the robustness of the circular digit representations.
- Causal Interventions: The paper performed causal interventions by manipulating the hidden representations of digits. By applying mod 10 transformations, they could accurately alter the numeral's value in generation tasks, further affirming the digit-wise representation hypothesis.
Implications for Numerical Reasoning
These findings suggest a fundamental difference in how LLMs handle arithmetic and numerical reasoning tasks. The digit-wise base 10 encoding allows models to self-correct minor errors in digit representation, which would be less feasible with direct value-space encodings. This has practical implications for improving numerical accuracy in tasks involving arithmetic operations and could inform enhanced model designs focusing on numerical reasoning.
Future Directions
The research opens several avenues for further exploration. Future studies might examine whether similar digit-wise encodings exist for other numerical types, such as fractions or irrational numbers, and how such representations affect broader mathematical reasoning capabilities in LLMs. Investigating how these representations influence model training dynamics and task-specific performances could also yield valuable insights.
Conclusion
The paper offers a nuanced understanding of numerical representation in LLMs, suggesting that a fragmented, digit-wise encoding in base 10 may underlie error patterns and computational strategies in these models. This challenges traditional thinking about numeric value representation and lays a foundation for future research into mathematical operations within AI frameworks.