SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning (2401.13246v4)
Abstract: Elucidating the reasoning process with structured explanations from question to answer is crucial, as it significantly enhances the interpretability, traceability, and trustworthiness of question-answering (QA) systems. However, structured explanations demand models to perform intricately structured reasoning, which poses great challenges. Most existing methods focus on single-step reasoning through supervised learning, ignoring logical dependencies between steps. Moreover, existing reinforcement learning (RL) based methods overlook the structured relationships, underutilizing the potential of RL in structured reasoning. In this paper, we propose SEER, a novel method that maximizes a structure-based return to facilitate structured reasoning and explanation. Our proposed structure-based return precisely describes the hierarchical and branching structure inherent in structured reasoning, effectively capturing the intricate relationships between different reasoning steps. In addition, we introduce a fine-grained reward function to meticulously delineate diverse reasoning steps. Extensive experiments show that SEER significantly outperforms state-of-the-art methods, achieving an absolute improvement of 6.9% over RL-based methods on EntailmentBank, a 4.4% average improvement on STREET benchmark, and exhibiting outstanding efficiency and cross-dataset generalization performance. Our code is available at https://github.com/Chen-GX/SEER.
- RL4F: Generating natural language feedback with reinforcement learning for repairing model outputs. In ACL.
- Richard Bellman. 1957. A markovian decision process. Journal of mathematics and mechanics.
- Think you have solved question answering? try arc, the AI2 reasoning challenge. CoRR.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Antonia Creswell and Murray Shanahan. 2022. Faithful reasoning using large language models. CoRR.
- Explaining answers with entailment trees. In EMNLP.
- ERASER: A benchmark to evaluate rationalized NLP models. In ACL.
- GLM: General language model pretraining with autoregressive blank infilling. In ACL.
- Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. In ICLR.
- Training compute-optimal large language models. CoRR.
- METGEN: A module-based entailment tree generation framework for answer explanation. In Findings of NAACL.
- Faithful question answering with Monte-Carlo planning. In ACL.
- Harsh Jhamtani and Peter Clark. 2020. Learning to explain: Datasets and models for identifying valid reasoning chains in multihop question-answering. In EMNLP.
- Qasc: A dataset for question answering via sentence composition. AAAI.
- Levente Kocsis and Csaba Szepesvári. 2006. Bandit based monte-carlo planning. In ECML. Springer.
- Qed: A framework and dataset for explanations in question answering. Transactions of the Association for computational Linguistics.
- Coderl: Mastering code generation through pretrained models and deep reinforcement learning. In NeurIPS.
- Program induction by rationale generation: Learning to solve and explain algebraic word problems. In ACL.
- RLET: A reinforcement learning based approach for explainable QA with entailment trees. In EMNLP.
- One cannot stand for everyone! leveraging multiple user simulators to train task-oriented dialogue systems. In ACL.
- Simpler context-dependent logical forms via model projections. In ACL.
- John McCarthy. 1959. Programs with common sense.
- Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP.
- Entailment tree explanations via iterative retrieval-generation reasoner. In Findings of NAACL.
- OpenAI. 2023. GPT-4 technical report.
- Contrastive reinforcement learning of symbolic reasoning domains. In NeurIPS.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res.
- Explain yourself! leveraging language models for commonsense reasoning. In ACL.
- Is reinforcement learning (not) for natural language processing: Benchmarks, baselines, and building blocks for natural language policy optimization. In ICLR.
- A survey of hallucination in large foundation models. CoRR.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP.
- STREET: A multi-task structured reasoning and explanation benchmark. In ICLR.
- Factually consistent summarization via reinforcement learning with textual entailment feedback. In ACL.
- PRover: Proof generation for interpretable reasoning over rules. In EMNLP.
- FaiRR: Faithful and robust deductive reasoning over natural language. In ACL.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- BLEURT: Learning robust metrics for text generation. In ACL.
- Richard S Sutton. 1988. Learning to predict by the methods of temporal differences. Machine learning.
- ProofWriter: Generating implications, proofs, and abductive statements over natural language. In Findings of ACL-IJCNLP.
- Entailer: Answering questions with faithful and truthful chains of reasoning. In EMNLP.
- A survey on explainability in machine reading comprehension. CoRR.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Unification-based reconstruction of multi-hop explanations for science questions. In EACL.
- Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
- Sarah Wiegreffe and Ana Marasovic. 2021. Teach me to explain: A review of datasets for explainable natural language processing. In NeurIPS.
- lilGym: Natural language visual reasoning with reinforcement learning. In ACL.
- WorldTree v2: A corpus of science-domain structured explanations and inference patterns supporting multi-hop inference. In LREC.
- Are large language models really good logical reasoners? a comprehensive evaluation from deductive, inductive and abductive views. arXiv preprint arXiv:2306.09841.
- Generating natural language proofs with verifier-guided search. In EMNLP.
- Tree of Thoughts: Deliberate problem solving with large language models.
- React: Synergizing reasoning and acting in language models. In ICLR.
- Nature language reasoning, a survey. arXiv preprint arXiv:2303.14725.
- Ar-lsat: Investigating analytical reasoning of text. arXiv preprint arXiv:2104.06598.
- Facilitating multi-turn emotional support conversation with positive emotion elicitation: A reinforcement learning approach. In ACL.