Getting MoRE out of Mixture of Language Model Reasoning Experts (2305.14628v2)

Published 24 May 2023 in cs.CL and cs.AI

Abstract: While recent LLMs improve on various question answering (QA) datasets, it remains difficult for a single model to generalize across question types that require distinct reasoning abilities. We provide empirical evidence that state-of-the-art LLMs suffer from poor generalizability on reasoning types beyond those seen in the prompt. To remedy this, we propose a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse specialized LLMs. We specialize the backbone LLM with prompts optimized for different reasoning categories, including factual, multihop, mathematical, and commonsense reasoning. Our key insight is to leverage agreement among the specialized experts to select the best answer for each question, or to abstain from answering. This gives MoRE higher accuracy than any single specialized model on a collection of 12 QA datasets from four reasoning types. Beyond generalizability, the interpretable design of MoRE improves selective question answering results compared to baselines without incorporating inter-expert agreement. This framework is also more interpretable and useful to human consumers of QA outputs. Our human study confirms that presenting expert predictions and the answer selection process helps annotators more accurately calibrate when to trust the system's output. We release all code and data to facilitate future work.

References (59)

Authors (5)

Chenglei Si (26 papers)
Weijia Shi (55 papers)
Chen Zhao (249 papers)
Luke Zettlemoyer (225 papers)
Jordan Boyd-Graber (68 papers)

Citations (17)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Getting MoRE out of Mixture of Language Model Reasoning Experts (2305.14628v2)

Summary

Related Papers