Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers (2106.15772v1)

Published 30 Jun 2021 in cs.AI and cs.CL

Abstract: We present ASDiv (Academia Sinica Diverse MWP Dataset), a diverse (in terms of both language patterns and problem types) English math word problem (MWP) corpus for evaluating the capability of various MWP solvers. Existing MWP corpora for studying AI progress remain limited either in language usage patterns or in problem types. We thus present a new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem types taught in elementary school. Each MWP is annotated with its problem type and grade level (for indicating the level of difficulty). Furthermore, we propose a metric to measure the lexicon usage diversity of a given MWP corpus, and demonstrate that ASDiv is more diverse than existing corpora. Experiments show that our proposed corpus reflects the true capability of MWP solvers more faithfully.

Citations (279)

Summary

  • The paper introduces ASDiv, a dataset of 2,305 diverse math word problems that overcomes limitations in existing corpora.
  • The paper demonstrates that MWP solvers perform significantly lower on ASDiv, revealing a more realistic challenge than traditional datasets.
  • The paper proposes a novel lexicon usage diversity metric to better evaluate solver performance and stimulate improvements in MWP solving.

A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers

The paper "A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers" by Shen-Yun Miao, Chao-Chun Liang, and Keh-Yih Su introduces the ASDiv dataset, a significant contribution to the field of NLP and Math Word Problem (MWP) solving. Notably, the authors create a diverse and comprehensive corpus aimed at bridging the limitations observed in existing MWP corpora, particularly in terms of problem type and lexicon diversity.

Motivation and Contributions

Math Word Problems serve as a more robust test of AI's capabilities than typical NLP tasks due to their requirement of mathematical reasoning beyond mere text interpretation. However, existing corpora fall short in simulating the complexity and diversity encountered in real-world MWPs, resulting in inflated performance metrics for MWP solvers when trained on these datasets. The ASDiv corpus aims to address these deficiencies by offering:

  1. A diverse collection of 2,305 MWPs encompassing a variety of linguistic patterns and problem types taught in elementary schools, annotated with problem types and grade levels for difficulty assessment.
  2. A novel lexicon usage diversity metric that measures the diversity of MWPs within a corpus, addressing the skewness problem prevalent in existing datasets.

Key Findings and Experimental Results

The authors present a thorough comparison of ASDiv with existing corpora, demonstrating its superior diversity. The lexicon usage diversity metric reveals that ASDiv avoids the over-representation of similar problems, a common pitfall in datasets like MathQA or AI2. This diversity allows the ASDiv corpus to provide a more accurate reflection of MWP solvers' true capabilities.

The experiments include evaluating MWP solvers on subsets of MathQA and ASDiv, demonstrating that ASDiv's diversity presents a more challenging test that yields performance closer to human-level expectations. Specifically, systems showed a significant decrease in performance when evaluated on ASDiv, indicating the exaggerated results previously obtained on less diverse datasets.

Implications and Future Directions

The implications of this research are profound both practically and theoretically. Practically, ASDiv provides a robust tool for developing and testing MWP solvers, encouraging improvements in algorithm design to handle varied linguistic and problem type presentations. Theoretically, the introduction of diversity metrics invites a reevaluation of corpus quality, urging the community to consider diversity over size.

Future work may focus on extending this framework to other languages or exploring deeper integration of MWP solving with broader AI applications. Moreover, advancing solver capabilities on diverse corpora like ASDiv could spur breakthroughs in educational technology, perhaps developing intelligent tutoring systems capable of assisting students in real-time problem-solving.

In conclusion, ASDiv represents a substantial step forward in evaluating and developing MWP solvers. By prioritizing diversity and providing detailed annotations, this dataset challenges current solvers and pushes the boundaries of what AI can achieve in mathematical reasoning through natural language.