Injecting Numerical Reasoning Skills into Language Models (2004.04487v1)

Published 9 Apr 2020 in cs.CL

Abstract: Large pre-trained LLMs (LMs) are known to encode substantial amounts of linguistic information. However, high-level reasoning skills, such as numerical reasoning, are difficult to learn from a language-modeling objective only. Consequently, existing models for numerical reasoning have used specialized architectures with limited flexibility. In this work, we show that numerical reasoning is amenable to automatic data generation, and thus one can inject this skill into pre-trained LMs, by generating large amounts of data, and training in a multi-task setup. We show that pre-training our model, GenBERT, on this data, dramatically improves performance on DROP (49.3 $\rightarrow$ 72.3 F1), reaching performance that matches state-of-the-art models of comparable size, while using a simple and general-purpose encoder-decoder architecture. Moreover, GenBERT generalizes well to math word problem datasets, while maintaining high performance on standard RC tasks. Our approach provides a general recipe for injecting skills into large pre-trained LMs, whenever the skill is amenable to automatic data augmentation.

Authors (3)

Mor Geva (58 papers)
Ankit Gupta (66 papers)
Jonathan Berant (107 papers)

Citations (212)

View on Semantic Scholar

Summary

Injecting Numerical Reasoning Skills into LLMs

The paper "Injecting Numerical Reasoning Skills into LLMs" addresses a significant limitation in the design of large pre-trained LLMs (LMs): their deficiency in high-level reasoning skills, particularly numerical reasoning. While these LMs are known to possess extensive linguistic information, they struggle with tasks requiring the manipulation and reasoning of numerical data. To overcome this, the paper introduces a methodology that combines automatic data generation and a multi-task training setup to infuse numerical skills into these models, exemplified by their development of a model named GenBERT.

Methodology Overview

The authors propose an innovative method of automatic data generation to teach LMs numerical skills without resorting to specialized architectures. They design GenBERT on a simple encoder-decoder architecture, capable of both generative and extractive tasks, eschewing the traditional approach of adding complex modules for numerical reasoning. The primary methodological advancements in this work include:

Automatic Data Synthesis: The process involves generating synthetic data that the model uses to learn basic numerical operations. This data generation is twofold:
- Numerical Data (ND): Focuses on arithmetic operations and understanding numeric values intrinsically expressed through tokens.
- Textual Data (TD): Uses a model derived from math word problems to generate context-rich question-passage pairs requiring numerical reasoning embedded in textual contexts.
Multi-task Training Framework: The authors concurrently train GenBERT on numerical data and standard masked LLMing objectives to retain linguistic capabilities. This approach mitigates the issue of catastrophic forgetting, a common pitfall when models are trained extensively on task-specific data without regard to maintaining broader linguistic functionalities.

Results and Impact

The introduction of GenBERT showcases significant improvement in numerical reasoning over text tasks. The model achieves impressive gains on standard datasets such as DROP, improving its F1 score from 49.3 to 72.3. Furthermore, GenBERT demonstrates generalization capabilities to math word problem datasets, helping to validate the effectiveness of injecting numerical skills through synthetic data.

This methodology has substantial implications:

Versatility: The framework offers a systematic path for integrating additional complex skills into pre-trained models, potentially extending beyond numerical reasoning to other domains where skills can be automated and data can be synthesized.
Reduced Dependency on Complex Architectures: Simplifying the architectural design without compromising the performance allows for broader adoption in environments where computational resources may be limited.
Enhanced Cross-task Performance: While GenBERT focuses on numerical reasoning, it retains high performance on reading comprehension tasks like SQuAD, indicating that skill injection processes can maintain the model's versatility.

Future Directions

The paper sets the stage for future research in expanding the capabilities of LMs by integrating other types of reasoning or domain-specific skills through automatic data generation. It opens avenues to explore how other specialist tasks can be modeled into LMs, paving the way for truly versatile language understanding systems that can switch context seamlessly between different types of reasoning and comprehension tasks. Additionally, further exploration into optimizing the balance between pre-training specialized skills and retaining general linguistic capabilities remains an active area for research to optimize task-switching and multi-task learning efficiencies.

In conclusion, the paper offers a comprehensive strategy for augmenting LLMs with numerical reasoning, providing a detailed framework with substantial empirical backing and positioning itself strategically in the advancing field of natural language processing research.

PDF Markdown

Related Papers

Find Related Papers