ReAct: Synergizing Reasoning and Acting in Language Models (2210.03629v3)

Published 6 Oct 2022 in cs.CL, cs.AI, and cs.LG

Abstract: While LLMs have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: https://react-lm.github.io

PDF Abstract

The ReAct Framework: Interleaving Reasoning and Action in LLMs

The ReAct framework proposes a method for enabling LLMs to solve complex tasks by synergistically combining reasoning and acting. Instead of treating reasoning (e.g., chain-of-thought) and acting (e.g., action plan generation) as separate capabilities, ReAct structures the LLM's operation as an interleaved sequence of thought, action, and observation steps. This approach allows the model to dynamically reason about the task, formulate actions to interact with external environments or knowledge sources, and incorporate observations from these interactions to refine its reasoning and subsequent actions. The core idea is that reasoning benefits from grounding in external information obtained via actions, while actions become more targeted and effective when guided by explicit reasoning steps.

Methodology: Thought, Action, Observation Cycle

The ReAct approach operationalizes this synergy through a specific prompting strategy. The LLM is prompted with few-shot examples demonstrating the desired interleaved pattern of thought, action, and observation. For a given task instance, the LLM iteratively generates:

Thought ( $t_i$ ): A natural language reasoning trace outlining the current understanding of the task, the strategy for the next step, or analysis of previous outcomes. This internal monologue helps the model decompose the problem, track progress, update plans, and handle unexpected situations.
Action ( $a_i$ ): A specific action intended to interact with an external source, formatted according to a predefined action space relevant to the task. Actions might include searching a knowledge base, querying an API, or interacting with a simulated environment.
Observation ( $o_i$ ): The feedback received from the external source after executing action $a_i$ . This could be a snippet of text from a Wikipedia page, a result from a calculation, or the state change description from an environment simulator.

This cycle ( $t_i, a_i, o_i$ ) repeats, with the context for generating the next thought ( $t_{i+1}$ ) comprising the initial prompt, the task input, and the history of preceding thought-action-observation triplets. The process terminates when an action indicates the final answer is reached or a stopping criterion is met.

The action space is task-dependent. For knowledge-intensive tasks like question answering (HotpotQA) and fact verification (Fever), the action space typically includes:

search[entity]: Queries an external knowledge source (e.g., Wikipedia API) for information about a specific entity.
lookup[string]: Looks for a specific string within a retrieved document, useful for finding keywords or sentences related to the reasoning process.
finish[answer]: Concludes the process and outputs the final answer.

For interactive decision-making tasks like ALFWorld (text-based game simulation) and WebShop (simulated online shopping), the action space corresponds to the admissible commands within the respective environments (e.g., go to, open, click, search).

The prompting relies on few-shot learning, where 1 to 6 examples of successful ReAct trajectories for the specific task are included in the prompt given to the LLM (e.g., PaLM-540B). These examples guide the model to produce the desired interleaved structure and task-specific reasoning patterns.

Implementation and Deployment Considerations

Implementing ReAct involves setting up an orchestration loop that interacts with the LLM and the external tools.

LLM Interface: Requires API access to a sufficiently capable LLM that can follow the structured prompting format and generate coherent thoughts and valid actions based on the provided context history.
Tool Integration: Interfaces need to be built for each action type. For Wikipedia-based tasks, this involves a simple API wrapper to search and retrieve page snippets. For ALFWorld and WebShop, it requires interfacing with their respective simulation engines to execute actions and receive state observations.
Prompt Engineering: Crafting effective few-shot prompts is crucial. The examples must clearly demonstrate the desired reasoning process, the correct action formatting, and how observations influence subsequent thoughts and actions.
Parsing and State Management: The control loop must parse the LLM's output to distinguish thoughts from actions, validate actions against the allowed action space, execute valid actions using the appropriate tool, and format the resulting observation before appending the triplet $(t_i, a_i, o_i)$ to the context for the next LLM call.
Error Handling: The system needs to handle potential LLM errors (e.g., generating invalid actions, hallucinating within thoughts) and tool errors (e.g., API failures, environment exceptions). The reasoning capability of ReAct can be leveraged here, allowing the model to potentially recognize and recover from errors based on observations.

Computational requirements depend on the chosen LLM and the complexity/length of the tasks. Each step involves an LLM inference call, and interactions with external tools add latency. The length of the context grows with each turn, potentially hitting context window limits for very long tasks.

def react_solve(task_description, prompt_examples, LLM, tools):
    """
    Executes the ReAct loop for a given task.

    Args:
        task_description (str): The input query or task definition.
        prompt_examples (str): Few-shot examples demonstrating ReAct trajectories.
        LLM (LLM_Interface): Interface to the LLM.
        tools (dict): Dictionary mapping action types to tool execution functions.
                      e.g., {'search': wikipedia_search, 'lookup': lookup_string, ...}

    Returns:
        str: The final answer or result.
    """
    context = prompt_examples + "\n\nTask: " + task_description + "\n"
    max_steps = 10 # Example limit
    
    for i in range(max_steps):
        # 1. Generate Thought and Action
        response = LLM.generate(context) 
        
        # Simple parsing (actual implementation needs robustness)
        thought = parse_thought(response) 
        action_str = parse_action(response)
        action_type, action_arg = parse_action_details(action_str)

        context += f"Thought {i+1}: {thought}\n"
        context += f"Action {i+1}: {action_str}\n"
        
        print(f"Step {i+1}:")
        print(f"  Thought: {thought}")
        print(f"  Action: {action_str}")

        # 2. Execute Action and Get Observation
        if action_type == "finish":
            final_answer = action_arg
            print(f"  Observation: Reached Final Answer.")
            return final_answer
        
        if action_type in tools:
            try:
                observation = tools[action_type](action_arg)
            except Exception as e:
                observation = f"Error executing action: {e}"
        else:
            observation = f"Error: Unknown action type '{action_type}'."

        context += f"Observation {i+1}: {observation}\n"
        print(f"  Observation: {observation}\n")

    return "Max steps reached without finishing."

def parse_thought(response): 
    # Extract thought part from LLM response
    pass 
def parse_action(response):
    # Extract action part from LLM response
    pass
def parse_action_details(action_str):
     # Parse action string into type and argument (e.g., "search[Python]" -> ("search", "Python"))
     pass

Experimental Results and Analysis

ReAct was evaluated against several baselines across different task types using PaLM-540B.

Knowledge-Intensive Tasks (HotpotQA, Fever):
- Baselines included standard few-shot prompting (Standard), Chain-of-Thought prompting (CoT), and an Acting-only variant (Act).
- On HotpotQA, ReAct achieved a score of 71.2, significantly outperforming CoT (56.8) and Act (51.1). It demonstrated a better ability to retrieve supporting facts through search actions and decompose the question via reasoning, mitigating hallucination issues observed in CoT.
- On Fever (fact verification), ReAct achieved an accuracy of 87.3, compared to 83.0 for CoT and 85.5 for Act. ReAct trajectories showed explicit steps of searching for evidence related to the claim and then reasoning about its veracity based on the retrieved information. Qualitative analysis highlighted ReAct's ability to recover from initial incorrect searches by reasoning about the lack of relevant information in the observation and formulating a new search query.
Interactive Decision-Making Tasks (ALFWorld, WebShop):
- Baselines included Act (acting-only LLM), and domain-specific methods like imitation learning (IL) using Behavior Cloning (BC) and reinforcement learning (RL) for ALFWorld (BUTLER).
- On ALFWorld (commonsense reasoning in simulated household environments), ReAct achieved a success rate of 71%, a substantial improvement over Act (37%) and the prior state-of-the-art IL/RL methods (BUTLER: 37%). ReAct needed only 2 in-context examples compared to the large expert datasets required by IL/RL.
- On WebShop (goal-oriented web navigation and shopping), ReAct achieved a success rate of 32.0 (averaged over 500 items), compared to 22.0 for Act and 18.8 for a specialized IL method (using HTML inputs). ReAct demonstrated more effective planning and adaptation within the complex state space of the simulated web environment.

Across tasks, ReAct consistently outperformed both reasoning-only (CoT) and acting-only (Act) baselines, supporting the central hypothesis that synergizing the two leads to improved performance. The generated trajectories were also found to be more interpretable, as the thought steps provided explicit insights into the model's reasoning process, making it easier to diagnose failures and understand successes. ReAct effectively uses actions to ground reasoning and mitigate hallucination by fetching external information, while using thoughts to maintain and adapt high-level plans during potentially long action sequences.

Synergy Dynamics

The effectiveness of ReAct stems from the bidirectional benefits between reasoning and acting:

Reasoning Enhances Acting:
- High-level planning: Thoughts allow the model to decompose complex goals into sequences of simpler actions.
- Strategic exploration: Reasoning helps decide which actions are most promising for information gain or goal progression.
- Error detection/Correction: Thoughts can identify when an action failed or yielded unexpected results (based on observation), prompting corrective actions or plan adjustments (e.g., "The search for X didn't work, let me try searching for Y instead").
- Maintaining context: For long trajectories, thoughts help track the overall goal and progress made so far.
Acting Enhances Reasoning:
- Grounding: Actions fetch real-time, external information, preventing the model from relying solely on its potentially outdated or incorrect internal knowledge (mitigating hallucination).
- Information gathering: Actions provide specific, targeted information needed to answer questions or verify facts, which may not be present in the initial context.
- Exploring consequences: In interactive environments, actions reveal the results of decisions, allowing the model to reason about cause and effect within the environment dynamics.

Limitations

The paper acknowledges several limitations:

Increased Steps/Tokens: ReAct trajectories are often longer and involve more LLM calls and token processing compared to CoT or standard prompting due to the interleaved structure and interactions.
Prompt Sensitivity: Performance relies heavily on the quality and relevance of the few-shot examples provided in the prompt.
Action Space Design: Defining an appropriate and effective action space is crucial and task-dependent.
Potential for Hallucination in Thoughts: While acting mitigates hallucination regarding external facts, the reasoning steps (thoughts) themselves can still contain logical fallacies or internal hallucinations, potentially leading actions astray.

Conclusion

ReAct presents a compelling framework for enhancing LLM capabilities by explicitly interleaving reasoning traces and actions that interact with external sources. Its demonstrated performance improvements on diverse knowledge-intensive and decision-making tasks highlight the benefits of this synergy. By generating interpretable thought-action-observation trajectories, ReAct allows LLMs to dynamically plan, gather information, and adapt to task requirements, overcoming limitations associated with purely reasoning-based or action-based approaches and offering a promising direction for building more capable and trustworthy autonomous agents.