Papers
Topics
Authors
Recent
Search
2000 character limit reached

GenericAgent (GA) Architecture

Updated 3 July 2026
  • GenericAgent (GA) is a long-horizon LLM agent architecture that maximizes decision-relevant information density using CIDM, ensuring all key facts are present while removing extraneous data.
  • Its system architecture integrates a minimal atomic tool set, hierarchical on-demand memory layers, and a self-evolution mechanism to streamline task execution with dynamic context management.
  • Empirical evaluations show GA achieves superior token efficiency and tool-use success rates compared to baselines, demonstrating practical scalability and long-term memory retention.

GenericAgent (GA) is a long-horizon LLM agent architecture designed to maximize decision-relevant information per token within a bounded context, enabling task completion efficiency, robust tool usage, long-term memory retention, and self-evolution across episodes. GA exemplifies a systems-level approach to LLM-based agents, emphasizing context information density maximization as the central optimization criterion for all core components (Liang et al., 18 Apr 2026). Closely related work includes the “GAIA” framework in JoyAgent-JDGenie, which combines multi-agent planning/execution, a hierarchical memory substrate, and schema-consistent tool orchestration to achieve high benchmark scores and adaptability (Liu et al., 1 Oct 2025).

1. Design Principle: Contextual Information Density Maximization

The foundational insight behind GA is that agent performance is a function of how much decision-relevant information is retained in a finite context window, rather than context length alone. The primary objective is to maximize the information density

ρ(C)=Idec(C)C\rho(C) = \frac{I_{\text{dec}}(C)}{|C|}

where CC is the current active context (comprising prompt, memory snippets, tool schemas, and past turns), Idec(C)I_{\text{dec}}(C) quantifies bits/tokens distilling next-decision-relevant signals, and C|C| represents total tokens. Two constraints direct all context manipulation:

  • Completeness: All facts and constraints for the next decision must be present.
  • Conciseness: Any extraneous information must be stripped away.

Practical enforcement is via a character-budget heuristic: if the cumulative context length CHC_H (sum over message history lengths) exceeds a budget B=αWtokensB = \alpha W_{\text{tokens}} (α3\alpha\approx3 chars/token), a staged truncation and compression routine is invoked. This staged approach prunes or compresses low-density content, continuously refreshing the context window with high-value material and guarding against prompt bloat—supporting extended, long-horizon interactions and reentrant tasking (Liang et al., 18 Apr 2026).

2. System Architecture and Core Components

GA is built around four tightly interlocked primitives:

  1. Minimal Atomic Tool Set: Restricts agent capabilities to a small set of irreducible, compositional atomic tools. Each is defined via JSON-Schema–style type functions, with interfaces formalized as structured tool-use calls and dispatcher-executed typed results.

| Tool Name | Core Capability | |----------------------|---------------------| | file_read | Fine-grained file access (range, keywords) | | file_patch | Targeted file patching with strict matching | | file_write | File output | | code_run | Single-step script execution, sandboxed | | web_scan | Live DOM scan, main text isolation | | web_execute_js | JS execution, diff-based output | | ask_user | User query | | update_working_checkpoint | Memory anchor update | | start_long_term_update| On-demand memory retrieval/commit |

GA exposes only nine tools, in contrast to the >50 typically present in leading baselines, dramatically reducing both prompt size and LLM action space (Liang et al., 18 Apr 2026).

  1. Hierarchical On-demand Memory: Comprises four layers for context-efficient information recall and retention:
    • L1: Always-on index (pointers to topic/fact/SOP—never contents),
    • L2: Validated, distilled factual knowledge for task-invariant recall,
    • L3: Reusable procedures (SOPs) extracted through trajectory distillation,
    • L4: Raw session logs (audit, not part of prompt).

Memory access is gating: only L1 (index) and ephemeral working memory are default-included. Deeper facts and procedures require explicit tool-mediated retrieval, economically controlling context size (Liang et al., 18 Apr 2026).

  1. Self-evolution Mechanism: Each completed task triggers a pipeline which reflects on verified traces (L4), distills successful strategies into bulletized SOPs (L3), and crystallizes these into executable script skeletons. Code is run in a sandbox for validation before entering the asset library; this ratchets agent policy toward increasing efficiency (empirically, 89.6% token reduction and 84.4% fewer LLM calls after nine workflow iterations) (Liang et al., 18 Apr 2026).
  2. Multi-stage Context Truncation and Compression: Enforces information density under context budget via:
    • Tool output truncation (length caps with head-tail inclusions),
    • Periodic tag-level block compression (e.g., sliding window on reasoning/tool tags),
    • FIFO message eviction (oldest first, while preserving critical anchors),
    • Persistent working-memory anchors (20 turn-summaries + key-info) resistant to eviction.

This processing ensures a context window always centered on high-impact, up-to-date facts and abstractions (Liang et al., 18 Apr 2026).

3. Control Flow and Execution

GA operates as a single, tightly orchestrated 92-line logic loop per (Liang et al., 18 Apr 2026), with steps:

  1. Assemble prompt: inject current task, meta-memory, L1, anchors, and tool schemas.
  2. LLM call: outputs either a textual answer or a structured tool-use directive.
  3. Tool dispatch: executes with live results returned to the LLM.
  4. Context update: appends truncated results, updates working-memory anchors on demand.
  5. Trigger memory consolidation/self-evolution if a task/subgoal completes.
  6. Run context compression/eviction routines as needed.
  7. Advance to the next turn or task.

This routine supports both synchronous “Interact mode” (user-initiated request) and autonomous “Reflect mode” (watcher-triggered by environmental change/timer).

4. Empirical Evaluation and Comparative Benchmarks

GA has been benchmarked across five axes—task completion, tool efficiency, memory effectiveness, self-evolution, and web browsing—against Claude Code, OpenClaw, Codex, and others. Highlights (Liang et al., 18 Apr 2026):

  • Task Completion & Token Economy:
    • SOP-Bench: 100% accuracy, 2.08M tokens, efficiency 0.48 Acc/M tokens (vs OpenClaw 0.38).
    • LifelongAgentBench: 100%, 241k tokens, 4.15 efficiency (vs OpenClaw 0.48).
    • GA always matches/exceeds baseline task success using ≤27.7% of tokens.
  • Tool-use Efficiency:
    • 9 GA tools vs Claude Code’s 53; on five long-horizon tasks, 100% success at 12.8 calls/188k tokens (Claude: 22.6 calls/537k tokens).
    • Tool-call distributions tightly focused on 4 primitives in GA.
  • Memory Effectiveness:
    • On repeated tasks, tokens drop ≈200k→100k (runtime 102→66s), with baselines flat.
    • Condensed memory achieves top task success rates at only 165 tokens (vs. 575 for raw SOPs).
    • GA outperforms on multi-hop, temporal, open-domain factual retention benchmarks.
    • “Context explosion” is mitigated: after 20 installed skills, empty prompt = 2,298 tokens (vs. 23–43k in other agents).
  • Self-evolution:
    • On a GitHub PR research task, tokens drop by 89.6% and LLM calls by 84.4% through nine runs.
    • On web benchmark tasks, subsequent runs consume 61–92% fewer tokens; baseline agents do not converge.
  • Web Browsing:
    • WebCanvas: GA 0.834 score @0.18M tokens, outperforming OpenClaw.
    • Real-world web scenarios: GA 0.577 @0.26M vs OpenClaw 0.5 @0.76M.

Across all vectors, GA simultaneously increases task success and reduces token/interactivity usage, validating the CIDM principle as a systems-level foundation.

GAIA, as presented in JoyAgent-JDGenie (Liu et al., 1 Oct 2025), generalizes many core themes found in GA under a collective multi-agent umbrella:

  • Multi-Agent Ensemble: Fuses Plan–Execute and ReAct paradigms, using posterior (weighted) voting and critic aggregation for robustness. System-level formula:

S(a)=i=1Nwi1[ai=a]pi  ,a=argmaxaS(a)S(a) = \sum_{i=1}^N w_i\,\mathbf{1}[a_i = a]\,p_i \;,\quad a^* = \arg\max_a S(a)

  • Three-Tier Memory: Working (M(1)M^{(1)}), Semantic (M(2)M^{(2)}), and Procedural (CC0).
  • Curated Tool Suite: Three families (search, code execution, multimodal parsing) exposed as schema-consistent APIs for agent use.
  • Achieves Pass@1 75.2/Pass@3 82.4 on GAIA validation tasks, surpassing all open-source baselines—approaching proprietary system performance.

GAIA explicitly separates out memory function, planning-vs-action trade-offs, and effect of ensemble voting; GA and GAIA share hierarchical memory, tool minimization, and procedural abstraction, though GAIA does not implement self-evolution via explicit task trace distillation in the way GA does.

6. Limitations and Future Directions

Currently, GA does not optimize weighting or procedural prompts end-to-end via reinforcement learning; settings are hand-tuned based on validation. Autonomous tool evolution, multi-agent scaling, and cross-domain transfer using generalized planners (as in OWL’s WORKFORCE) are open challenges (Liu et al., 1 Oct 2025). Both GA and GAIA point toward integration of RL-based adaptation, test-time scaling, and more formal mechanisms for agent self-refinement as key research frontiers. A plausible implication is that further efficiency and robustness gains will come from learning to adjust agent ensemble weights, procedure prompts, and tool suites dynamically.

7. Significance and Implications

GA and closely related architectures establish CIDM as a unifying systems principle for LLM agents, enabling persistent, low-token, high-success performance on long-horizon, decision-intensive tasks. The layered memory, minimal tool set, autonomous procedural evolution, and formal control of context present a pathway to scalable, adaptive, domain-general agent systems—closing on closed-source standards and providing a systematic empirical foundation for continued innovation in LLM agent research (Liang et al., 18 Apr 2026, Liu et al., 1 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GenericAgent (GA).