Putting It All into Context: Simplifying Agents with LCLMs (2505.08120v1)

Published 12 May 2025 in cs.CL and cs.LG

Abstract: Recent advances in LLM (LM) agents have demonstrated significant potential for automating complex real-world tasks. To make progress on these difficult tasks, LM agent architectures have become increasingly complex, often incorporating multi-step retrieval tools, multiple agents, and scaffolding adapted to the underlying LM. In this work, we investigate whether all of this complexity is necessary, or if parts of these scaffolds can be removed on challenging tasks like SWE-bench. We show that in the case of SWE-bench, simply putting the entire environment into the context of a long context LLM (LCLM) and properly prompting the model makes it competitive with carefully tuned, complex agent scaffolds. We show that a Gemini-1.5-Pro model without any scaffolding or tools achieves 38% on SWE-Bench-Verified, comparable with approaches using carefully tuned agent scaffolds (32%). While the unscaffolded approach with Gemini-1.5-Pro falls short of the strongest agentic architectures, we demonstrate that the more capable Gemini-2.5-Pro using the same unscaffolded approach directly attains a 50.8% solve rate. Additionally, a two-stage approach combining Gemini-1.5-Pro with Claude-3.7 achieves a competitive 48.6% solve rate.

Summary

Simplifying LLM Agents for Complex Tasks

Recent work by Jiang et al. investigates the architectural complexity of LLM (LM) agents in automating complex real-world tasks, focusing on challenging tasks such as SWE-bench. The overarching goal of this research is to assess whether long context LLMs (LCLMs) provide a viable simplification to agent architectures without the need for intricate scaffolding—such as multi-step retrieval tools and multiple agents—by embedding the entire task environment within the model's context and leveraging prompting strategies.

Research Context and Motivation

LLM agents have increasingly demonstrated their capacity to autonomously tackle multifaceted real-world scenarios. This has naturally led to advanced and complex architectures that integrate various components tailored to specific applications, such as LMs operating APIs for software engineering tasks or interactions within scientific experimental models. These systems traditionally operate under the assumption of partial observability, interacting dynamically to gather necessary information iteratively to build a more complete environmental map.

However, in scenarios where the environment can be fully observed or all relevant information is accessible from the outset, the necessity for such complex scaffoldings can be questioned. SWE-bench, a benchmark task involving software engineering at the repository level, represents a prototype scenario where full accessibility to the repository at the onset potentially negates the need for traditional agent scaffolding methods. Jiang et al. propose that leveraging the contextual capabilities of LCLMs could eliminate the need for these intricate scaffoldings or tools, simplifying agent design significantly.

Methodology

The paper introduces two novel approaches for LM agent design: DirectSolve and SelectSolve. DirectSolve utilizes a zero-shot prompting technique with LCLMs to analyze and produce solutions with the entire repository state embedded within the context. This method benefits from strategies such as chain-of-thought prompting and code restatements, aiming to enhance the reasoning and consistency of solutions without multi-stage pipelines.

The SelectSolve approach, meanwhile, attempts to hybridize the strengths of both LCLMs and short-context LLMs (SCLMs). It begins with localization by an LCLM, followed by a more focused problem-solving phase using a SCLM, which processes selected high-relevancy files or components that fit within its reduced context constraints.

Significant Findings

The paper demonstrates several key findings:

DirectSolve with LCLMs can outperform traditional agent scaffolding models (by up to 6% on pass@1 metrics in SWE-Bench-Verified), hinting at the promising capabilities of LCLMs when properly prompted.
SelectSolve shows competitive performance and improves upon DirectSolve’s initial results, especially when leveraging capable SCLMs like Claude-3.7. This suggests a valuable synergy between LCLMs’ comprehensive context assimilation and SCLMs’ keen problem-solving focus.
Importantly, approaches relying heavily on specialized scaffoldings show challenges when transferred to models other than their primary adaptations, which highlights the necessity for adaptability in agentic design—a problem mitigated by Jiang et al.'s proposed methods.

Implications and Future Directions

This work has significant implications for further simplification of agent design without compromising performance in tasks that have complete environmental observability from the start. Future trajectories of LCLM advancements may yield even longer context windows, optimizing application areas that currently necessitate costlier interactive exploration or external mechanism integrations.

However, its implementation cost and scalability remain as primary areas for exploration. With ongoing marked reductions in LM inference costs and improvements in context processing efficiencies, the foundational concept of constructing monolithic LCLM-based agents could gradually shift paradigms in AI-driven task automation.

Furthermore, broader applications beyond SWE-bench could benefit from the paradigm shift proposed. Tasks in diverse domains such as complex query answering or scientific analysis, which traditionally require intricate system designs to handle partial observability, could see streamlined approaches leveraging substantially improved LCLMs capable of context assimilation across broader contexts. This paper lays groundwork for such transitions, marking a meaningful step away from scaffold-defined environments towards capability-focused LM applications.

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Find Related Papers

Authors (5)

Tweets

https://twitter.com/fly51fly/status/1922767872901673230

https://twitter.com/GptMaestro/status/1927559185136279944