JAUNT: QoE-Centric LLM Tool Routing
- JAUNT is a framework that aligns user intent and real-time network state using dual-view semantic embeddings to enhance LLM tool routing.
- It employs continuous network profiling and user-sensitive routing to address challenges like latency and server availability.
- Empirical evaluations show that JAUNT improves QoE by effectively balancing the trade-off between delay and accuracy across diverse user profiles.
JAUNT (Joint Alignment of User Intent and Network State for QoE-centric LLM Tool Routing) is a framework designed to optimize tool selection for LLM agents by aligning both user intent and network state within a unified semantic embedding space. This joint alignment approach aims to maximize user-centric Quality of Experience (QoE), overcoming the limitations of conventional tool routing strategies that consider only semantic or functional matching and neglect external factors such as latency and server availability. JAUNT introduces a dual-view alignment architecture, continuous network profiling, user-profile–sensitive routing, and a comprehensive benchmark for QoE evaluation, demonstrating significant improvements over prior baselines (Li et al., 21 Oct 2025).
1. Dual-View Alignment Strategy
JAUNT formulates tool routing as an alignment problem between two embedding spaces: user intent and network state. The intent embedding maps the user's raw query and long-term profile (parameterized by sensitivity weights for delay and for task satisfaction) to a -dimensional semantic vector:
Similarly, the network-state embedding projects a real-time network metric vector (e.g., latency, throughput, availability) into the same semantic space:
Given a fixed pretrained embedding 0 of a candidate tool description 1, JAUNT jointly aligns these vectors by minimizing a weighted objective:
2
where
3
measures semantic alignment between intent and ground-truth tool, and
4
aligns network profiles with expected tool performance. End-to-end fine-tuning of 5 and 6 ensures candidate tools are both task-relevant and network-efficient.
2. Construction of Network Profiles and Embedding
The network profile for each Model Context Protocol (MCP) server is constructed via lightweight, regularly scheduled probes. For server 7, three core metrics are collected:
- Round-trip latency 8 (ms): request-response time.
- Throughput 9 (Mbps): measured by fixed-size dummy downloads.
- Availability 0: fraction of successful, error-free probes.
Raw observations are smoothed using Exponentially Weighted Moving Averages (EWMAs):
1
(same for 2 and 3), with 4 constants tuned per metric.
The instantaneous network vector 5 is normalized:
6
then embedded using an affine map and elementwise nonlinearity:
7
where 8, 9 are trainable, and 0 are empirical population moments. An RBF kernel is optional but not empirically required.
3. Semantic Embedding of Performance Indicators
To generalize beyond basic metrics, JAUNT supports network vectors 1 aggregating features such as latency, jitter, packet loss, etc. Embedding is performed via:
2
for learned 3 and pointwise nonlinearity 4 (e.g., ReLU/tanh).
All network, intent, and tool embeddings reside in 5. Alignment can be evaluated using cosine similarity:
6
Integration of the two views may use additive fusion 7 for downstream ranking.
4. User-Centric QoE and Benchmarking
JAUNT refines QoE computation by incorporating the Weber–Fechner law to model user-perceived latency. A basic binary success variable 8 is extended using:
9
with a delay penalty
0
and final conditional QoE:
1
The TRIP benchmark evaluates methods across nine prototypical user profiles in 2—including "speed-first", "accuracy-first", "balanced", and "special"—incorporating ambiguous queries, tone-injected variants, and simulated network scenarios (smooth vs. random with prototypical jitter or stability patterns).
5. Empirical Results and Comparative Analysis
JAUNT and three baselines are benchmarked on TRIP + NetMCP:
- Direct-Routing (DirRout): BM25 matching of 3 to tool descriptions.
- Prediction-Routing (PreRout): LLM-classified category followed by BM25 within that category.
- JAUNT-Greedy: Semantic/latency joint selection, greedy for lowest-latency candidate.
- JAUNT: Full dual-alignment with LLM-based joint optimization.
Across 4 user/query/network conditions:
- DirRout: 5
- PreRout: 6
- JAUNT-Greedy: 7
- JAUNT: 8
JAUNT improves QoE by 8.9% over JAUNT-Greedy (paired 9-test, 0). Results indicate "impulsive" users (high 1, low 2) benefit most from latency-sensitive routing, while "meticulous" users (low 3, high 4) are better served by accuracy alignment. The user-profile update module further reduces QoE variance, especially under network randomness. JAUNT achieves a mean success rate of 0.94 at 230 ms latency; in contrast, JAUNT-Greedy attains 0.64 at 160 ms, highlighting the system's ability to navigate the accuracy–delay trade-off.
6. Limitations and Future Directions
JAUNT’s mapping 5 from raw metrics to embeddings is learned offline, which can degrade with nonstationary network patterns; online continual tuning represents a potential mitigation. The framework assumes static 6 for each user, though these parameters may drift or be task-dependent—a more dynamic user-profile module is a relevant extension. Additionally, the LLM-based routing introduces extra inference latency, and high-throughput scenarios may require lightweight approximations (such as neural routers).
The modular architecture—separating semantic, network, and joint routing layers—facilitates eventual augmentation with multi-agent coordination, cross-platform orchestration, or integration of more sophisticated user adaptation mechanisms. Aligning tool selection with both semantic and operational considerations via shared embeddings is empirically validated to yield statistically significant advances in QoE for LLM-based service orchestration (Li et al., 21 Oct 2025).