Controlling Equational Reasoning in Large Language Models with Prompt Interventions (2307.09998v4)

Published 19 Jul 2023 in cs.CL and math.HO

Abstract: This paper investigates how hallucination rates in LLMs may be controlled and mitigated via a symbolic data generation framework, and explores a fundamental relationship between the rate of certain mathematical errors and interventions. Specifically, we systematically generate data for a derivation generation task, and apply targeted interventions on prompts to perturb aspects such as the surface forms of symbols, equational tree structures, and mathematical context, and evaluate the effect of prompt interventions across a range of LLMs including fine-tuned T5 models, GPT, and others. Experiments suggest that T5-Large can outperform the few-shot performance of GPT-4 on various evaluation sets generated via the framework, however, an extensive evaluation based on human analysis, template-based error detection, and various text generation metrics reveals fine-tuned model weaknesses beyond what the reference-based metrics singularly describe. We use these results to tie characteristic distributional footprints of interventions to the human evaluation of LLM derivation quality, potentially leading to significant control over fine-grained mathematical capabilities of LLMs with respect to specific types of errors.

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Controlling Equational Reasoning in Large Language Models with Prompt Interventions (2307.09998v4)

Summary

Related Papers