Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

Published 12 Apr 2026 in cs.AI | (2604.10513v1)

Abstract: AI agent development relies heavily on natural language prompting to define agents' tasks, knowledge, and goals. These prompts are interpreted by LLMs, which govern agent behavior. Consequently, agentic performance is susceptible to variability arising from imprecise or ambiguous prompt formulations. Identifying and correcting such issues requires examining not only the agent's code, but also the internal system prompts generated throughout its execution lifecycle, as reflected in execution logs. In this work, we introduce an analytics pipeline implemented as part of the Agent Mentor open-source library that monitors and incrementally adapts the system prompts defining another agent's behavior. The pipeline improves performance by systematically injecting corrective instructions into the agent's knowledge. We describe its underlying mechanism, with particular emphasis on identifying semantic features associated with undesired behaviors and using them to derive corrective statements. We evaluate the proposed pipeline across three exemplar agent configurations and benchmark tasks using repeated execution runs to assess effectiveness. These experiments provide an initial exploration of automating such a mentoring pipeline within future agentic governance frameworks. Overall, the approach demonstrates consistent and measurable accuracy improvements across diverse configurations, particularly in settings dominated by specification ambiguity. For reproducibility, we released our code as open source under the Agent Mentor library.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper introduces Agent Mentor, a pipeline that uses semantic trajectory mining to enable closed-loop prompt refinement for correcting ambiguous LLM behavior.
It leverages process mining and semantic feature extraction to generate precise corrective statements, enhancing agent reliability across various benchmarks.
Empirical results demonstrate notable accuracy improvements, with increases up to 37 percentage points in applications like AccessControl and HolidayFinder.

Agent Mentor: A Systematic Pipeline for Specification-Guided Correction in LLM-based Agents

Motivation and Problem Setting

The adoption of LLM-driven agentic architectures has foregrounded persistent obstacles in specification clarity, prompt engineering, and explainable failure modes. Despite major advances in LLM controller architectures and prompt optimization, agent trajectories remain susceptible to stochastic execution induced by natural language ambiguity and underspecification at the prompt or system specification level. This leads to non-deterministic behaviors where identical task inputs can result in semantically divergent trajectories and outcomes—an acute challenge for productionization of agentic AI systems with SLA constraints and risk thresholds.

A core limitation of prevailing techniques is their shallow coupling between runtime observability and specification-level repair: most efforts focus on outcome aggregation, local patching, or post-hoc logging rather than closing the loop between observed behavioral variance and persistent correction in the generative specification of the agent. "Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis" (2604.10513) introduces a systematic analytics pipeline, Agent Mentor, for behavioral improvement and specification maintenance in LLM-based agents through semantic trajectory mining and prompt correction.

Behavioral Improvement Lifecycle

The Agent Mentor Analytics Pipeline (AMAP) integrates runtime observability, semantic feature elicitation, and automated prompt correction into a coherent developmental and production feedback loop. Agent Mentor systematically instruments agentic applications to collect fine-grained execution logs—capturing each LLM invocation, tool call, and intermediate system prompt as a process event log. Across $k$ -multiple runs of a single user prompt, this results in an ensemble of trajectories reflective of executional variance.

Figure 1: Behavioral improvement lifecycle using semantic feature analysis, depicting the closed feedback loop from trajectory mining to specification correction and redeployment.

Workflow Mining and Feature Representation

By employing process mining techniques, execution traces are mapped onto workflow graphs, where nodes represent action segments (e.g., orchestrator, sub-agent, tool) and edges reflect sequencing and control flow. Each node's outputs are then embedded and clustered in a semantic space (e.g., using SBERT), partitioning the node's outcomes into semantically coherent clusters which typically align with distinct modes of success and failure.

Figure 2: Workflow view illustrating the structure of mined agent execution trajectories and identification of analysis targets.

A central innovation is the distillation of output textual artifacts at each node into a clean, semantically compressed representation, using LLM-based summarization. Semantic feature elicitation follows, with an augmented SVO approach extracting granular components such as subject, action, object, politeness tone, permission status, list status, and condition expressions from each instance. These features are aggregated and clustered, then rendered into a binary feature matrix with instance-to-value encoding.

Figure 3: Representation of each agentic trajectory as a vector of semantically distilled features, serving as input for importance analysis.

Feature Importance and Decision Analysis

A supervised learning phase leverages decision trees (using Gini–based importance metrics) to evaluate which semantic feature values are most predictive of success or failure, given ground-truth cluster assignments of execution paths. The result is a ranked set of diagnosis features per node, identifying specification elements most responsible for behavioral divergence.

Figure 4: Example decision tree illustrating discriminative feature splits for partitioning answer node trajectories by success/failure.

Corrective Statement Generation

Using the distilled feature importance profile and annotated clusters, Agent Mentor formulates corrective natural language statements via LLM, serving as specification-level repairs to be injected into the agent's system prompts. For example, in access control scenarios with ambiguous logic (“reject users that are unauthorized and untrusted”) AMAP-derived corrections made the condition explicit (“If either condition is true, refuse the request”), directly reducing downstream failure.

Empirical Results and Analysis

Experiments span IBM’s CUGA (multi-agent decomposition agent) on AppWorld benchmarks, as well as proprietary applications (“AccessControl” and “HolidayFinder”). Agents are run pre- and post-AMAP intervention, measuring accuracy as the proportion of correct outcomes across 100 full task runs per configuration. Notably:

CUGA tasks at the easy level show gains of +6 to +11 percentage points post-AMAP over a strong OSS LLM backbone (e.g., 69%→75%, 26%→37%).
AccessControl shows a dramatic accuracy elevation from 50% to 87% (+37 points) after specification correction.
HolidayFinder records an improvement from 7% to 15% accuracy (+8 points) when ambiguity in articulation is algorithmically resolved.

Robustness checks demonstrate consistent improvement irrespective of the underlying LLM (GPT-4o, LLaMA-4 Maverick, Mistral Medium, OSS), with corrected AccessControl performance at 86–99% across all tested models. The largest improvements accrue for scenarios where instruction ambiguity, rather than model reasoning flaws, is the principal failure mode; complex compositional reasoning tasks see moderate but not equivalent impact.

Theoretical and Practical Implications

The Agent Mentor framework provides a generalizable mechanism for closed-loop prompt repair, tightly binding runtime observability to persistent specification evolution. Practical implications include:

Automated detection and correction of ambiguous or underspecified agent instructions, supporting operational SLA compliance.
Sustainable alignment between design-time intent and observed behavior, underpinning risk governance and explainability mandates.
Model-agnostic deployment; improvement is conferred at the specification layer, decoupled from LLM pretraining or fine-tuning cycles.

From a theoretical perspective, the work positions agent specification as a dynamic, empirical object amenable to process mining, semantic abstraction, and interpretable feature attribution. It complements orthogonal trends in outcome-aggregation (self-consistency, ToT), prompt optimization, and reflection-based adjustment by structurally targeting persistent specification issues rather than transient sampling or local adaptation [schulhoff2024prompt_survey, (Xi et al., 2023)].

Open Challenges and Future Directions

Although Agent Mentor bridges key gaps in agent specification maintenance, several open questions persist:

Prompt engineering for LLM–driven feature elicitation and corrective statement generation may itself be sensitive to formulation—raising bootstrapping and stability issues.
Diminishing returns with specification expansion: as base prompts increase, injected corrections demonstrate lower marginal value versus increased inference cost.
Full automation is limited if all observed executions are unsuccessful (i.e., no positive cluster exemplars).
Integration of more granular root-cause localization and automatic ablation for minimization of corrective edits.

Promising extensions include embedding the pipeline as an internalized, on-policy self-repairing agent; exploring unsupervised success criteria (e.g., behavioral consistency) for overcoming the need for explicit supervision; and developing abstraction mechanisms for aggregation of corrections across task or agent classes.

Conclusion

Agent Mentor operationalizes systematic specification refinement for LLM-based agents, leveraging multi-run trajectory mining, semantic feature extraction, and decision-theoretic analysis to derive effective, model-agnostic corrective statements. It demonstrably enhances agent reliability and clarity on ambiguous tasks, functioning as an interface layer between human intent and agentic behavior. This work delineates a blueprint for empirical, analytics-driven prompt verification, with implications for the engineering, governance, and continuous improvement of agentic AI systems.