Papers
Topics
Authors
Recent
Search
2000 character limit reached

JAUNT: QoE-Centric LLM Tool Routing

Updated 13 April 2026
  • JAUNT is a framework that aligns user intent and real-time network state using dual-view semantic embeddings to enhance LLM tool routing.
  • It employs continuous network profiling and user-sensitive routing to address challenges like latency and server availability.
  • Empirical evaluations show that JAUNT improves QoE by effectively balancing the trade-off between delay and accuracy across diverse user profiles.

JAUNT (Joint Alignment of User Intent and Network State for QoE-centric LLM Tool Routing) is a framework designed to optimize tool selection for LLM agents by aligning both user intent and network state within a unified semantic embedding space. This joint alignment approach aims to maximize user-centric Quality of Experience (QoE), overcoming the limitations of conventional tool routing strategies that consider only semantic or functional matching and neglect external factors such as latency and server availability. JAUNT introduces a dual-view alignment architecture, continuous network profiling, user-profile–sensitive routing, and a comprehensive benchmark for QoE evaluation, demonstrating significant improvements over prior baselines (Li et al., 21 Oct 2025).

1. Dual-View Alignment Strategy

JAUNT formulates tool routing as an alignment problem between two embedding spaces: user intent and network state. The intent embedding ϕintent\phi_{\rm intent} maps the user's raw query qq and long-term profile uu (parameterized by sensitivity weights w1w_1 for delay and w2w_2 for task satisfaction) to a dd-dimensional semantic vector:

ϕintent:(q,u)↦eI∈Rd\phi_{\rm intent}:(q,u)\mapsto e_I\in\mathbb{R}^d

Similarly, the network-state embedding ϕnet\phi_{\rm net} projects a real-time network metric vector x∈Rnx\in\mathbb{R}^n (e.g., latency, throughput, availability) into the same semantic space:

ϕnet:x↦eN∈Rd\phi_{\rm net}:x\mapsto e_N\in\mathbb{R}^d

Given a fixed pretrained embedding qq0 of a candidate tool description qq1, JAUNT jointly aligns these vectors by minimizing a weighted objective:

qq2

where

qq3

measures semantic alignment between intent and ground-truth tool, and

qq4

aligns network profiles with expected tool performance. End-to-end fine-tuning of qq5 and qq6 ensures candidate tools are both task-relevant and network-efficient.

2. Construction of Network Profiles and Embedding

The network profile for each Model Context Protocol (MCP) server is constructed via lightweight, regularly scheduled probes. For server qq7, three core metrics are collected:

  • Round-trip latency qq8 (ms): request-response time.
  • Throughput qq9 (Mbps): measured by fixed-size dummy downloads.
  • Availability uu0: fraction of successful, error-free probes.

Raw observations are smoothed using Exponentially Weighted Moving Averages (EWMAs):

uu1

(same for uu2 and uu3), with uu4 constants tuned per metric.

The instantaneous network vector uu5 is normalized:

uu6

then embedded using an affine map and elementwise nonlinearity:

uu7

where uu8, uu9 are trainable, and w1w_10 are empirical population moments. An RBF kernel is optional but not empirically required.

3. Semantic Embedding of Performance Indicators

To generalize beyond basic metrics, JAUNT supports network vectors w1w_11 aggregating features such as latency, jitter, packet loss, etc. Embedding is performed via:

w1w_12

for learned w1w_13 and pointwise nonlinearity w1w_14 (e.g., ReLU/tanh).

All network, intent, and tool embeddings reside in w1w_15. Alignment can be evaluated using cosine similarity:

w1w_16

Integration of the two views may use additive fusion w1w_17 for downstream ranking.

4. User-Centric QoE and Benchmarking

JAUNT refines QoE computation by incorporating the Weber–Fechner law to model user-perceived latency. A basic binary success variable w1w_18 is extended using:

w1w_19

with a delay penalty

w2w_20

and final conditional QoE:

w2w_21

The TRIP benchmark evaluates methods across nine prototypical user profiles in w2w_22—including "speed-first", "accuracy-first", "balanced", and "special"—incorporating ambiguous queries, tone-injected variants, and simulated network scenarios (smooth vs. random with prototypical jitter or stability patterns).

5. Empirical Results and Comparative Analysis

JAUNT and three baselines are benchmarked on TRIP + NetMCP:

  • Direct-Routing (DirRout): BM25 matching of w2w_23 to tool descriptions.
  • Prediction-Routing (PreRout): LLM-classified category followed by BM25 within that category.
  • JAUNT-Greedy: Semantic/latency joint selection, greedy for lowest-latency candidate.
  • JAUNT: Full dual-alignment with LLM-based joint optimization.

Across w2w_24 user/query/network conditions:

  • DirRout: w2w_25
  • PreRout: w2w_26
  • JAUNT-Greedy: w2w_27
  • JAUNT: w2w_28

JAUNT improves QoE by 8.9% over JAUNT-Greedy (paired w2w_29-test, dd0). Results indicate "impulsive" users (high dd1, low dd2) benefit most from latency-sensitive routing, while "meticulous" users (low dd3, high dd4) are better served by accuracy alignment. The user-profile update module further reduces QoE variance, especially under network randomness. JAUNT achieves a mean success rate of 0.94 at 230 ms latency; in contrast, JAUNT-Greedy attains 0.64 at 160 ms, highlighting the system's ability to navigate the accuracy–delay trade-off.

6. Limitations and Future Directions

JAUNT’s mapping dd5 from raw metrics to embeddings is learned offline, which can degrade with nonstationary network patterns; online continual tuning represents a potential mitigation. The framework assumes static dd6 for each user, though these parameters may drift or be task-dependent—a more dynamic user-profile module is a relevant extension. Additionally, the LLM-based routing introduces extra inference latency, and high-throughput scenarios may require lightweight approximations (such as neural routers).

The modular architecture—separating semantic, network, and joint routing layers—facilitates eventual augmentation with multi-agent coordination, cross-platform orchestration, or integration of more sophisticated user adaptation mechanisms. Aligning tool selection with both semantic and operational considerations via shared embeddings is empirically validated to yield statistically significant advances in QoE for LLM-based service orchestration (Li et al., 21 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to JAUNT.