Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?
Abstract: Does a lexical retriever suffice as LLMs become more capable in an agentic loop? This question naturally arises when building deep research systems. We revisit it by pairing BM25 with frontier LLMs that have better reasoning and tool-use abilities. To support researchers asking the same question, we introduce Pi-Serini, a search agent equipped with three tools for retrieving, browsing, and reading documents. Our results show that, on BrowseComp-Plus, a well-configured lexical retriever with sufficient retrieval depth can support effective deep research when paired with more capable LLMs. Specifically, Pi-Serini with gpt-5.5 achieves 83.1% answer accuracy and 94.7% surfaced evidence recall, outperforming released search agents that use dense retrievers. Controlled ablations further show that BM25 tuning improves answer accuracy by 18.0% and surfaced evidence recall by 11.1% over the default BM25 setting, while increasing retrieval depth further improves surfaced evidence recall by 25.3% over the shallow-retrieval setting. Source code is available at https://github.com/justram/pi-serini.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper asks a simple but important question: When todayโs AI chatbots get better at thinking and using tools, do we still need fancy, complicated search systems, or can a good โkeyword searchโ be enough? To explore this, the authors build a search helper called Pi-Serini and test whether a classic keyword-based method (BM25) can support deep, step-by-step research when paired with strong AI models.
What questions were they trying to answer?
- If an AI can search, read, and think in steps, is a simple keyword search (called โlexical retrievalโ) good enough to find the right information?
- Do we need expensive, meaning-based search systems (โdense retrieversโ), or can we tune and use keyword search correctly to get similar or better results?
- How can we make deep research both accurate and cost-effective (not too expensive or slow)?
How did they test it?
Pi-Serini: a simple search helper for AIs
Think of Pi-Serini like a smart research assistant that gives an AI three clear tools, so it acts more like a careful detective:
- search: run a keyword search and save the ranked list of results for later.
- read_search_results: browse the saved list in pages, looking at short snippets to decide whatโs promising.
- read_document: open a specific document and read only the parts that seem relevant.
This setup separates three stepsโfinding, skimming, and readingโso the AI doesnโt stuff everything into its short memory at once. It helps the AI choose what to read and avoid wasting space and time.
Time budgeting
The AI has a fixed time per question (for example, 5 minutes). If 70% of the time passes, itโs told to stop searching and write the best answer it can with the evidence it has. This keeps the system realistic and affordable.
The test set and measurements
They use a standard deep research benchmark called BrowseComp-Plus. It has hundreds of questions and a large collection of long documents. The AI must find the right evidence and produce the correct final answer.
They measure:
- Accuracy: How often the final answer is judged correct.
- Recall: How often the system surfaces the needed evidence in search results and how often the AI actually previews or reads it.
- Cost: How much money it takes to run the full test.
- Tool usage: How many tool calls the AI makes (searches, browses, reads).
They also run โablationโ tests. That means they change one thing at a time (like how many results to fetch, or how to tune BM25) to see what really matters.
What did they find?
- A tuned keyword search worked very well when paired with a strong AI.
- With Pi-Serini and a strong model (GPT-5.5), the system reached about 83% accuracy and about 95% evidence recall. That means it answered correctly most of the time and usually surfaced the right documents.
- It outperformed systems that used fancy meaning-based search (โdense retrieversโ) in this benchmark.
- Tuning BM25 mattered a lot.
- Changing BM25โs settings to better handle long documents boosted accuracy by about 18 percentage points and improved evidence recall by about 11 points compared to the default setup.
- Looking deeper into the search results helped recall.
- Increasing how many ranked results the system retrieved improved surfaced evidence recall by about 25 points compared to shallow search.
- Cost was much lower.
- Pi-Serini made full benchmark runs about 3 to 10 times cheaper than some dense-retriever agents, thanks to:
- Smart time budgeting (donโt over-search).
- โPrefix cachingโ (reusing repeated text to cut token costs).
- Better search that avoids wasted steps.
- Different AI models behave differently.
- Even strong models can chase the wrong lead. GPT-5.5 was better at backing out when a hypothesis looked weak, while another model, Claude Opus, sometimes kept digging into a wrong guess. The better behavior led to better accuracy.
Why is this important?
- It challenges the idea that we must always improve complex search components first. When the AI is good at reasoning and tool use, a well-configured keyword search can be enough for deep research in many cases.
- It makes deep research more practical. Good performance at much lower cost means researchers can run more tests, try more ideas, and improve systems faster.
- It shows where to focus next: teaching the AI to pick, browse, and read the right evidence from a large, saved list is now a key step.
What does this mean for the future?
- Donโt overlook simple tools. Classic keyword search (BM25), when tuned and used deeply, can be powerful with modern AIs.
- Build better tool interfaces. Separating โfind, skim, readโ helps the AI handle long documents and limited memory.
- Spend wisely. Time budgets and caching keep costs low. This makes large experiments and real-world use more affordable.
- Keep improving AI behavior. The AI still needs guidance to avoid โchasing the wrong clue.โ Better strategies for browsing and verifying evidence will push accuracy even higher.
Key terms explained
- LLM: A powerful AI that can read, write, and reason with text.
- Agentic loop: The AI works in stepsโthink, use a tool, look at the result, think againโlike a detective gathering clues.
- Retriever: A search system that finds relevant documents.
- Lexical retrieval: Finds documents using keywords (matching the exact words).
- Dense retriever: Finds documents using meaning (embeddings that capture similarity in meaning).
- BM25: A popular keyword search method. It scores documents by how well they match the query, adjusting for document length and repeated words.
- Retrieval depth: How far down the ranked list you look (top-5 vs. top-1000).
- Recall: Of all the documents you needed, how many did you find or preview?
- Prefix caching: Reusing repeated parts of prompts or context so you pay less for tokens sent to the AI.
Knowledge Gaps
Below is a consolidated list of concrete knowledge gaps, limitations, and open questions that remain unresolved and could guide follow-up research:
- External validity beyond BrowseComp-Plus: Does PI-SERINIโs BM25-centric approach generalize to open-web, dynamic, or much larger corpora (106โ109 docs), where retrieval depth, latency, and index size become more constraining?
- Domain and language generalization: How do results transfer to multilingual, low-resource, and domain-specific settings (e.g., biomedical, legal) where lexical mismatch is more severe?
- Scalability of โdepth-1000โ at web scale: What retrieval depths are feasible on very large indexes without prohibitive latency/cost, and how does surfaced/previewed/behavior recall degrade with scale?
- Fair interface-controlled comparisons: How do dense and hybrid retrievers perform when paired with the same PI-SERINI tool interface (search/read/browse, cached rankings, pagination) and time-budget policy, controlling for confounds present in released baselines?
- Hybrid retrieval and reranking: What are the gains from adding (1) dense first-stage, (2) sparseโdense hybrids (e.g., SPLADE/uniCOIL), or (3) cross-encoder rerankers atop tuned BM25 within the same agentic loop?
- Passage- vs document-level indexing: Does switching from whole-document to passage-level indexing improve previewed/behavior recall and reduce unnecessary reading calls on long, noisy documents?
- Query expansion strategies: How do classical expansions (e.g., RM3, Rocchio), pseudo-relevance feedback, or agent-learned term selection affect surfaced/previewed/behavior recall and final accuracy?
- Structured lexical querying: Would allowing limited Lucene syntax, fielded queries, phrase/proximity operators, or synonyms/lemmatization/stemming tuning materially improve retrieval for long documents?
- Reading granularity and chunking: Are line-based pagination and fixed defaults optimal? Would semantic chunking, section detection, or table/figure-aware readers increase behavior recall and reduce tool calls?
- Evidence navigation and ranking exploration: How can the large surfacedโpreviewed gap be narrowed (e.g., result clustering/diversification, learning-to-browse policies, โnext-best-evidenceโ suggestions)?
- Adaptive time budgeting: Can per-query difficulty estimation and adaptive termination (rather than a fixed 0.7T steer) deliver better accuracyโlatencyโcost trade-offs?
- Confidence calibration and stopping: How should agents calibrate confidence to decide when to stop searching or invest additional retrieval/reading, and how does this interact with accuracy and calibration error?
- Judge reliability and metric robustness: How sensitive are accuracy labels to the chosen LLM judge? What is inter-judge agreement, and how do results change with alternative judging prompts/models or human adjudication?
- Variance and reproducibility: What is the run-to-run variability across random seeds/temperatures, and how stable are gains from BM25 tuning and retrieval depth across repeated trials?
- Cost portability and deployment constraints: How do results change under different pricing models (no prefix cache, on-prem inference), hardware, and latency SLAs typical of production search systems?
- Corpus drift and freshness: How robust is the approach under frequent index updates, temporal drift, and newly emerging entities, especially when lexical priors lag reality?
- Adversarial/noisy corpora: How resilient are agents to misleading content, noise, and near-duplicate saturation, and can retrieval or browsing policies be hardened accordingly?
- Attribution fidelity and citation quality: Beyond document-level recall, how often are citations faithful and minimal (no over-citation), and do cited spans actually support the claimed answer?
- Query-type stratification: For which question categories (multi-hop, fuzzy/semantic, entity disambiguation, list aggregation) does lexical retrieval remain sufficient vs where dense/hybrid methods are necessary?
- Learning the search policy: Can agents be trained (via RL or offline imitation) to issue better lexical queries, browse cached rankings more effectively, and adopt reversible probing to avoid premature branch commitment?
- BM25 tuning robustness: Do k1/b settings tuned on a 100-query subset overfit? How do the optimal parameters shift across domains, document-length distributions, or after index refreshes?
- Pagination defaults and UI affordances: How sensitive are previewed/behavior recalls to default page sizes, snippet lengths, and ordering; would UI-like affordances (e.g., โjump to sectionsโ, quick summaries) help?
- Interaction with longer context windows: How do larger windows (or retrieval-augmented summarization) change the optimal balance between retrieval depth, browsing, and reading?
- โTo retrieve or notโ under a unified framework: In a controlled setup, when does file-system-style local navigation outperform retrieval, and can a single agent switch between the two regimes based on diagnosable conditions?
- Ethical and environmental considerations: What are the carbon and energy implications of deeper retrieval, larger LLMs, and prefix caching, and can cost/energy-aware policies be learned without hurting accuracy?
Practical Applications
Immediate Applications
The following applications can be deployed now by adapting the paperโs findings and Pi-Seriniโs design patterns to real systems.
- Enterprise RAG upgrade: high-recall evidence retrieval with cost control
- Sectors: software, enterprise search, legal, finance, healthcare
- What to do now:
- Replace โretrieve top-k and dump into contextโ with a three-tool flow:
search(kโ1000) โ read_search_results (paginate excerpts) โ read_document (paginate lines) - Tune BM25 for long documents (e.g., k1โ16โ25, bโ1) and index all relevant corpora (Anserini)
- Add time-budget steering (e.g., 300s with submission steer at 0.7T) and enable provider prefix caching
- Tools/products/workflows: Retrieval Controller microservice; โBrowse-then-Readโ agent plugin for LangChain/LlamaIndex; BM25 index pack for long-doc corpora
- Assumptions/dependencies: Access to an LLM with strong tool-use (frontier class); provider supports prefix caching; corpora are indexable and permissions-cleared; BM25 parameters must be tuned per corpus/domain
- Cost-managed AI research assistants for analysts and journalists
- Sectors: media, consulting, finance, academia
- What to do now: Deploy Pi-Serini-style agents that operate under time budgets and cache-friendly loops to bound expense while surfacing more evidence before answering
- Tools/products/workflows: โResearch Modeโ in knowledge platforms; evidence preview pane with pagination; cost dashboard tied to time budgets and token cache hit-rate
- Assumptions/dependencies: Stable provider pricing and cache economics; acceptable latency with deeper retrieval; curated or well-scoped corpora
- Evidence provenance auditing and compliance logging
- Sectors: healthcare (literature reviews), legal (e-discovery), finance (compliance), public policy
- What to do now: Use the four-tier evidence log (surfaced/previewed/opened/cited) to audit how answers were formed and to support โshow your workโ requirements
- Tools/products/workflows: Evidence Log schema; auditor dashboards; per-answer citation bundles
- Assumptions/dependencies: Storage and governance for logs; human review processes; clear policy on citation sufficiency; sensitive-data handling
- Developer documentation and support search assistants
- Sectors: software/SaaS
- What to do now: Index long-form docs and tickets; adopt browse/read pagination to keep tokens low while increasing recall; add time-budget steering to cap support costs
- Tools/products/workflows: IDE/ChatOps bot with โSearchโPreviewโOpenโ flow; tuned BM25 index for developer docs; prefix-cache-aware prompts
- Assumptions/dependencies: Up-to-date doc indexing; content chunking strategy for line-based reads; reliable auth to private repos
- Library/education research helpers
- Sectors: education, libraries
- What to do now: Integrate Pi-Serini-style agent into library portals to teach students โsearch, preview, then readโ with explicit citations
- Tools/products/workflows: LMS/library plugin with rank browsing and reading excerpts; educator-facing analytics on evidence coverage
- Assumptions/dependencies: Licensed access to collections; alignment with academic integrity policies; oversight for answer quality
- Procurement and benchmarking for public-sector AI systems
- Sectors: policy, government IT
- What to do now: Use Pi-Seriniโs metrics (accuracy vs. surfaced/previewed/behavior recall, tool-call counts, cost) to evaluate vendors and set cost/quality SLAs
- Tools/products/workflows: Standardized test corpora; retrieval-depth and BM25-parameter compliance checklists; mandated time budgets
- Assumptions/dependencies: Representative benchmarks; availability of a neutral LLM judge or human adjudication; procurement frameworks that allow tool-level requirements
- Knowledge-base support and deflection in customer service
- Sectors: customer support, telecom, retail, SaaS
- What to do now: Tune BM25 for long KB articles; paginate previews to triage relevant answers; enforce time budgets to limit escalation costs
- Tools/products/workflows: Support agent plugin with evidence previews; auto-citation in responses; routing when previewed recall is low
- Assumptions/dependencies: Regular index refresh; multilingual considerations if KB is mixed language; escalation policies
Long-Term Applications
These opportunities will benefit from further research, scaling, or productization beyond what the paper directly demonstrates.
- Web-scale agentic search with lexical-first retrieval
- Sectors: consumer search, enterprise web monitoring
- Future direction: Extend the retrieval controller to the open web (crawling/API federation), keeping cached rankings and browse/read decisions; add query reformulation and source quality filters
- Potential products: โAgentic Web Researchโ browser/extension; enterprise web-intelligence monitors
- Dependencies: Robust, compliant web access; deduplication and freshness; multilingual indexing; anti-scrape and ToS constraints
- Auto-tuning retriever/controller for long documents and tasks
- Sectors: software, MLOps, platform teams
- Future direction: Automated selection of BM25 (k1,b), retrieval depth k, excerpt/line sizes, and submission-steer timing, driven by telemetry and small labeled sets
- Potential products: โRetriever Auto-Tunerโ service; adaptive retrieval-depth scheduler
- Dependencies: Telemetry collection; offline evaluation labels or weak supervision; safe exploration policies
- Safety, audit, and regulatory standards for agentic search
- Sectors: public policy, healthcare, finance, legal
- Future direction: Standardize evidence logging (surfaced/previewed/opened/cited) and calibration metrics as audit artifacts; certification regimes for agentic answers
- Potential products: Audit toolkit; compliance reports; third-party attestations
- Dependencies: Cross-vendor adoption; data retention and privacy frameworks; legal clarity on provenance requirements
- Domain-grade assistants (clinical, legal, and compliance research)
- Sectors: healthcare, legal, finance
- Future direction: Combine tuned lexical retrieval with domain ontologies, de-identification, and human-in-the-loop review; explore when lexical suffices vs. hybrid/dense reranking
- Potential products: Clinical guideline scanner; e-discovery triage agent; regulatory change monitor
- Dependencies: Regulatory approval and risk management; high-quality domain corpora; strict provenance and human oversight
- On-prem/edge private research agents using lexical retrieval
- Sectors: defense, highly regulated industries, SMEs with privacy constraints
- Future direction: Deploy Pi-Serini-like stacks on-prem with local LLMs and Anserini; leverage caching and lexical retrieval to minimize compute costs
- Potential products: โPrivate Research Applianceโ with retrieval controller and audit logs
- Dependencies: Sufficient local compute; secure indexing pipelines; private model/tooling support
- Human-agent collaborative UIs for evidence triage
- Sectors: knowledge work across domains
- Future direction: New interfaces that let users steer the cached ranking (expand/contract, tag, backtrack), addressing premature branch commitment and improving trust
- Potential products: Evidence maps; interactive rank browsers; โbranch managementโ panels
- Dependencies: UX research; user training; integration with existing research workflows
- Agent orchestration and cost-aware scheduling
- Sectors: platform engineering, FinOps
- Future direction: Controllers that dynamically switch LLMs, retrieval depth, and browsing aggressiveness under time/cost budgets; escalate only when needed
- Potential products: Time-Budget Middleware; model-switching policy engine; cache-optimization layer
- Dependencies: Multi-model access; reliable cost/latency signals; acceptance of graceful degradation
- Multilingual and cross-modal retrieval extensions
- Sectors: global enterprises, media, academic publishers
- Future direction: Lexical retrieval enhanced with morphological analyzers and query translation; integrate OCR/ASR and image/table extraction with paginated read tools
- Potential products: Multilingual research agent; cross-modal evidence reader
- Dependencies: Language resources; high-quality OCR/ASR; indexing pipelines for non-text assets
- Robust search policies to mitigate premature branch commitment
- Sectors: all agentic systems relying on iterative search
- Future direction: Meta-controllers that detect weak hypotheses, enforce reversible probes, and trigger backtracking; learnable search policies
- Potential products: โBranch Managerโ policy module; failure-mode detectors
- Dependencies: Behavioral telemetry; training data for failure modes; evaluation benchmarks beyond fixed corpora
- Hybrid retrieval stacks and plug-and-play evaluation
- Sectors: IR research, industry labs
- Future direction: Compose tuned BM25 with light rerankers or sparse+dense hybrids; use Pi-Seriniโs logging to measure marginal benefits under cost budgets
- Potential products: Hybrid retrieval SDK; experiment harnesses and leaderboards
- Dependencies: Additional compute for reranking; robust ablation protocols; open benchmarks
Cross-cutting assumptions and dependencies
- Capable LLMs with strong tool-use are central to observed gains; results vary across model families and may degrade with smaller models.
- Prefix caching materially affects cost; requires provider support and stable prompting.
- BM25 parameters and retrieval depth must be tuned to the corpus (especially for long/noisy documents); defaults are often suboptimal.
- BrowseComp-Plus is a fixed-corpus benchmark; real-world web or dynamic corpora introduce freshness, noise, and scale challenges.
- Provenance, privacy, and licensing constraints may limit indexing and logging in regulated settings.
- Automated LLM judging used in the paper should be complemented with human evaluation in high-stakes deployments.
Glossary
- AgentIR: A reasoning-intensive retriever designed for deep research tasks. "Finally, we report the numbers of AgentIR~\cite{chen2026AgentIR}, a reasoning-intensive retriever trained for deep research."
- Agentic loop: An iterative interaction pattern where an LLM reasons, takes actions, observes feedback, and updates its behavior. "these systems increasingly operate through an agentic loop, where LLMs receive feedback from their environments"
- Anserini: An open-source IR toolkit providing BM25 and other retrieval capabilities. "Documents are indexed using BM25 via Anserini~\cite{10.1145/3239571} over the BrowseComp-Plus corpus."
- Behavior Recall: A recall metric computed over the union of documents the agent opened or cited. "Behavior Recall: recall computed over the union of the document sets $D_{\text{opened} \cup D_{\text{cited}$."
- BM25: A classic lexical ranking function for information retrieval based on term frequency and length normalization. "We verify the BM25 parameter settings and the retrieval depth to increase the likelihood that relevant documents remain in retrieved results"
- BM25 tuning: The process of adjusting BM25 parameters (e.g., k1, b) to better suit a corpus or task. "BM25 tuning improves answer accuracy by 18.0\% and surfaced evidence recall by 11.1\% over the default BM25 setting"
- BrowseComp-Plus: A fixed-corpus deep research benchmark used to evaluate search agents. "On BrowseComp-Plus~\cite{chen2025browsecompplusfairtransparentevaluation}, under time-budget steering, Pi-Serini with #1{gpt-5.5} achieves 83.1\% answer accuracy"
- Calibration Error: The discrepancy between a modelโs stated confidence and its empirical correctness. "Calibration Error: the discrepancy between the model's confidence and empirical correctness."
- Dense retriever: A retrieval model that uses learned dense embeddings to match queries and documents semantically. "outperforming released search agents that use dense retrievers."
- Evidence documents: Documents required to answer a query, not necessarily containing the exact final answer span. "Evidence documents are documents required to answer the query"
- Gold documents: A stricter subset of evidence documents that semantically contain the final answer. "gold documents are a stricter subset that both support answering and semantically contain the final answer."
- Interaction trajectory: The sequence of thoughts, actions, and observations an agent accumulates during its loop. "The agent operates over an interaction trajectory:"
- Length normalization: A BM25 component that adjusts scores to account for document length, important for long documents. "making length normalization and term-frequency saturation matter more than in passage retrieval."
- Lexical retriever: A retriever that matches queries to documents using exact or approximate term overlap rather than learned semantics. "Does a lexical retriever suffice as LLMs become more capable in an agentic loop?"
- LLM judge: An LLM used to evaluate answer correctness by comparing a modelโs output with a gold answer. "For answer evaluation, we use an LLM judge."
- Long-document evidence search: Retrieval over very long documents where relevant information is embedded in extensive text. "whereas tuned parameters better match long-document evidence search."
- Multi-hop information needs: Queries requiring reasoning over multiple pieces of evidence connected through intermediate steps. "resolving multi-hop information needs"
- Pareto frontier: The set of solutions that optimally trade off two objectives (e.g., accuracy and cost) where improving one worsens the other. "Pareto frontier"
- Prefix caching: Caching repeated input prefixes in an LLM session to reduce cost and latency. "making prefix caching central to its cost efficiency."
- Previewed Recall: A recall metric over documents whose excerpts were shown to the agent when browsing search results. "Previewed Recall: recall computed over the document set $D_{\text{previewed}$;"
- Rank pagination: Browsing a cached ranking in pages to inspect results without issuing new backend queries. "using rank pagination"
- ReAct loop: An agent framework where reasoning (thought) and acting (tool use) alternate iteratively. "The LLM agent runs a ReAct loop"
- Reasoning-aware retriever: A retriever that incorporates reasoning signals to handle complex queries. "Recent reasoning-aware retrievers further target complex queries by capturing implicit intent and resolving multi-hop information needs"
- Retrieval-Augmented Generation (RAG): A paradigm where generation is grounded in documents retrieved from an external corpus. "information-seeking systems such as Retrieval-Augmented Generation (RAG)~\cite{10.5555/3495724.3496517}"
- Retrieval controller: A component mediating all retrieval access and exposing constrained tool APIs to the agent. "a retrieval controller mediates all access to an Anserini BM25 backend."
- Retrieval depth: How far down the ranked list a system retrieves (e.g., top-1000), affecting recall. "increasing retrieval depth further improves surfaced evidence recall by 25.3\% over the shallow-retrieval setting."
- Reranker: A model that reorders initial retrieval results to improve ranking quality. "dense retrievers, sparse retrievers, and rerankers to improve early-stage ranking accuracy"
- Reranking: The process of reordering retrieved documents, often using more expensive models or features. "studies how reranking and document ordering affect search agents in deep research."
- Shallow retrieval: A setting that retrieves only a small top portion of results, often hurting recall in complex tasks. "Naive search agents often use shallow retrieval settings that insert the full texts of all ranked documents directly into the context window."
- Sparse retriever: A lexical or term-based retriever that relies on sparse representations (e.g., term frequencies). "dense retrievers, sparse retrievers, and rerankers to improve early-stage ranking accuracy"
- Submission steer: A runtime directive prompting the agent to stop using tools and finalize an answer before timeout. "the system injects a submission steer that instructs the agent to stop using tools and produce its best answer from the evidence collected so far."
- Surfaced Recall: A recall metric over documents initially returned by the search tool. "Surfaced Recall: recall computed over the document set $D_{\text{surfaced}$;"
- Time-budget steering: A policy that guides the agent to complete within a fixed wall-clock time rather than a fixed iteration count. "we use time-budget steering instead of the fixed iteration cap used in prior work"
- Tool call: An agent action invoking an external function (e.g., search or read) during the loop. "the action is a tool call or reasoning"
- Top-k: The number of top-ranked results to return from retrieval, controlled by the parameter k. "returns the top- search results, with ."
- Wall-clock time budget: A hard real-time limit for completing a query or experiment. "allowing the agent to terminate under a hard wall-clock time budget."
Collections
Sign up for free to add this paper to one or more collections.