Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems (2407.03956v2)

Published 4 Jul 2024 in cs.MA and cs.CL

Abstract: Prior research has enhanced the ability of LLMs to solve logic puzzles using techniques such as chain-of-thought prompting or introducing a symbolic representation. These frameworks are still usually insufficient to solve complicated logical problems, such as Zebra puzzles, due to the inherent complexity of translating natural language clues into logical statements. We introduce a multi-agent system, ZPS, that integrates LLMs with an off the shelf theorem prover. This system tackles the complex puzzle-solving task by breaking down the problem into smaller, manageable parts, generating SMT (Satisfiability Modulo Theories) code to solve them with a theorem prover, and using feedback between the agents to repeatedly improve their answers. We also introduce an automated grid puzzle grader to assess the correctness of our puzzle solutions and show that the automated grader is reliable by evaluating it in a user-study. Our approach shows improvement in all three LLMs we tested, with GPT-4 showing 166% improvement in the number of fully correct solutions.

View on arXiv

Authors (3)

Shmuel Berman (3 papers)
Baishakhi Ray (88 papers)
Kathleen McKeown (85 papers)

Summary

Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems

The paper "Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems" introduces an approach to enhance the problem-solving capabilities of LLMs for complex logic puzzles, particularly Zebra puzzles. This approach, termed ZPS (Zebra Puzzle Solver), integrates LLMs with a theorem prover to reformulate puzzle solving into a constraint satisfaction problem. The proposed system effectively breaks down the puzzle's complexity by iterating between logical inference and natural language interpretation, demonstrating substantial improvements in solving accuracy and completeness across different LLMs.

Problem Context

Zebra puzzles, commonly referred to as logic grid puzzles, are characterized by a series of natural language clues that require solvers to deduce the correct associations between multiple entities and their attributes. These puzzles demand advanced reasoning to resolve implicit and explicit constraints. The challenge in mapping natural language clues to a structured logical space often confronts significant obstacles due to the complexity of language interpretation and computational reasoning. Previous efforts to address this challenge have relied on methods such as symbolic representation and human-in-the-loop systems, yet these approaches remain insufficient for large-scale puzzle-solving automation.

Methodology

ZPS employs a multi-agent framework that utilizes LLMs and a formal constraint solver to enhance the logical reasoning capabilities required for solving Zebra puzzles. The methodology is structured around several key components:

Decomposition: The initial step involves breaking down the puzzle into manageable sub-problems via an LLM agent. This facilitates the translation of complex natural language clues into simplified logical formulations.
Agent Feedback Loop: The core innovation lies in an iterative feedback loop. This loop begins with translating decomposed puzzle elements into SMT-LIB (Satisfiability Modulo Theories Library) format using an LLM agent. The SMT solver, specifically z3, evaluates these logical constraints to identify feasible assignments. The feedback from the solver—particularly regarding any syntactic or semantic errors—drives refinememts to the SMT translations, enhancing overall completeness and accuracy.
Solver Integration: The integration of the theorem prover with LLMs enables the exchange of syntactic and error-based feedback to continually refine puzzle-solving strategies. This tight coupling significantly mitigates the limitations of purely language-based systems by introducing rigorous formal verification.

Results and Analysis

The experiments conducted on 114 Zebra puzzles across three different LLMs—GPT-4, GPT-3.5, and Llama3-8b—demonstrate notable improvements with the ZPS approach. In particular, GPT-4 showed up to a 166% improvement in the number of fully correct solutions when solver feedback was incorporated. The addition of a decomposition agent further enhanced the solver's performance. ZPS outperformed baseline models, especially with the inclusion of SMT feedback, by achieving more correct solutions, evidenced by increases in average partial scores and solution accuracy.

The autograder, another integral component introduced in the paper, was validated through a user paper to ensure its reliability in grading puzzle solutions. The high degree of correlation with human graders confirms the robustness of the autograder in assessing logical assignments within the current framework.

Implications

The ZPS framework introduces a powerful paradigm for integrating LLMs with formal methods, paving the way for more reliable AI systems capable of nuanced logical reasoning. The implications extend beyond solving Zebra puzzles to broader applications that necessitate precise logical deduction from natural language inputs. The research suggests potential for further optimization through enhanced retry mechanisms and expanded context windows, enabling more sophisticated interaction and problem-solving capabilities. Moreover, the proven efficacy of multi-agent frameworks in this context suggests promising avenues for future AI research, particularly in enhancing reasoning and interpretative capacities of LLMs.

Conclusion

This paper makes significant strides in addressing the inherent challenges of puzzle-solving with LLMs by leveraging a constraint-guided multi-agent approach. By effectively integrating formal logic with language-based understanding, the authors demonstrate a compelling solution to the complex task of solving Zebra puzzles, highlighting the potential of agent-based systems and formal reasoning in AI. Future research could explore extending these methods to more diverse applications, further refining the interaction between LLMs and formal logical frameworks, thus broadening the utility of AI in complex reasoning tasks.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos