Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning (2205.09712v1)

Published 19 May 2022 in cs.AI and cs.CL

Abstract: LLMs have been shown to be capable of impressive few-shot generalisation to new tasks. However, they still tend to perform poorly on multi-step logical reasoning problems. Here we carry out a comprehensive evaluation of LLMs on 50 tasks that probe different aspects of logical reasoning. We show that LLMs tend to perform fairly well at single step inference or entailment tasks, but struggle to chain together multiple reasoning steps to solve more complex problems. In light of this, we propose a Selection-Inference (SI) framework that exploits pre-trained LLMs as general processing modules, and alternates between selection and inference to generate a series of interpretable, casual reasoning steps leading to the final answer. We show that a 7B parameter LLM used within the SI framework in a 5-shot generalisation setting, with no fine-tuning, yields a performance improvement of over 100% compared to an equivalent vanilla baseline on a suite of 10 logical reasoning tasks. The same model in the same setting even outperforms a significantly larger 280B parameter baseline on the same suite of tasks. Moreover, answers produced by the SI framework are accompanied by a causal natural-language-based reasoning trace, which has important implications for the safety and trustworthiness of the system.

PDF Abstract

Selection-Inference: Exploiting LLMs for Interpretable Logical Reasoning

Creswell et al.'s paper presents the Selection-Inference (SI) framework as a novel method for enabling LLMs to perform interpretable logical reasoning. The primary challenge addressed by the paper is the difficulty LLMs encounter with multi-step logical reasoning problems, despite their notable ability in few-shot generalization. The authors' dual-stage Selection-Inference (SI) approach aims to mitigate this by leveraging LLMs as general processing modules, implementing causal and modular reasoning structured into interpretable steps.

Key Contributions and Findings

The paper makes several significant contributions:

Comprehensive Evaluation of LLMs: The authors evaluated LLMs on a comprehensive suite of 50 tasks varying in logical reasoning complexity. They found that while LLMs perform well on simple entailment tasks, they struggle significantly with multi-step inference and complex reasoning problems.
Introduction of the SI Framework: The proposed SI framework alternates between a selection stage, aimed at isolating relevant facts, and an inference stage, responsible for deriving intermediate conclusions from these facts. This modular approach leverages the few-shot generalization capabilities of LLMs without requiring fine-tuning.
Performance Gains: Utilizing a 7B parameter LLM in the SI framework, the authors achieved over a 100% improvement on a suite of 10 logical reasoning tasks compared to a vanilla baseline. Notably, the same model outperformed a significantly larger 280B parameter baseline when evaluated on the same tasks.
Causal Reasoning Traces: The SI framework generates natural-language-based reasoning traces for each inference step, enhancing the interpretability and trustworthiness of the model's conclusions.

Technical Implementation

The SI framework's core strength is its modular design:

Selection Module: This module identifies a subset of relevant information from the context pertinent to a single logical step. By limiting the data available to the inference stage, it constrains the model, ensuring the subsequent inference step is grounded in causality. Notably, the upper layers of the selected LLM probability distribution are used to score the context sentences.
Inference Module: This module operates on the filtered data provided by the Selection module to produce new intermediate facts. These newly inferred facts update the context for the subsequent inference step. The process continues iteratively until the answer to the question is determined.

Practical and Theoretical Implications

Practical Implications: The SI framework's interpretability makes it particularly useful for applications in fields where accountability and transparency are crucial, such as legal reasoning, medical diagnosis, and scientific research. The ability to produce a causal reasoning trace is essential for debugging and validating AI systems, ultimately facilitating their adoption in critical sectors.

Theoretical Implications: The modular approach challenges the conventional end-to-end deep learning paradigm in reasoning tasks, aligning more closely with neurosymbolic AI methods. It underscores the importance of decomposing complex problems into manageable sub-tasks, thereby advancing the theoretical understanding of how LLMs can be utilized for structured reasoning.

Speculations on Future Developments

Future developments in AI reasoning might build on the SI framework's methodology. Notable areas for further research include:

Dynamic Halting Mechanisms: Developing mechanisms for the model to determine autonomously when to stop reasoning could enhance efficiency.
Enhanced Selection Techniques: Improving the Selection module, possibly through reinforcement learning, could further refine the accuracy and efficiency of the reasoning process.
Generalizing to Diverse Domains: Extending the framework to handle unstructured data retrieval, thereby eliminating the requirement for pre-defined context, could broaden the applicability of the SI framework.

Conclusion

Creswell et al.'s SI framework presents a noteworthy advancement in logical reasoning using LLMs, demonstrating substantial performance improvements and providing interpretable, causal reasoning traces. Its impact is multifaceted, offering practical benefits for high-stakes applications while contributing to the theoretical discourse on neurosymbolic AI. The framework sets a promising groundwork for future explorations into modular, interpretable reasoning systems.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Antonia Creswell (21 papers)
Murray Shanahan (46 papers)
Irina Higgins (21 papers)

Citations (300)

View on Semantic Scholar