"I'd Like to Have an Argument, Please": Argumentative Reasoning in Large Language Models (2309.16938v2)

Published 29 Sep 2023 in cs.CL

Abstract: We evaluate two LLMs ability to perform argumentative reasoning. We experiment with argument mining (AM) and argument pair extraction (APE), and evaluate the LLMs' ability to recognize arguments under progressively more abstract input and output (I/O) representations (e.g., arbitrary label sets, graphs, etc.). Unlike the well-known evaluation of prompt phrasings, abstraction evaluation retains the prompt's phrasing but tests reasoning capabilities. We find that scoring-wise the LLMs match or surpass the SOTA in AM and APE, and under certain I/O abstractions LLMs perform well, even beating chain-of-thought--we call this symbolic prompting. However, statistical analysis on the LLMs outputs when subject to small, yet still human-readable, alterations in the I/O representations (e.g., asking for BIO tags as opposed to line numbers) showed that the models are not performing reasoning. This suggests that LLM applications to some tasks, such as data labelling and paper reviewing, must be done with care.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (52)

Authors (2)

Adrian de Wynter (20 papers)
Tangming Yuan (1 paper)

Citations (2)

View on Semantic Scholar

"I'd Like to Have an Argument, Please": Argumentative Reasoning in Large Language Models (2309.16938v2)

Related Papers