CARROT: Cost-Aware Rate-Optimal Router

Updated 11 November 2025

CARROT is a modular framework of cost-aware, rate-optimal routing and scheduling that employs duality, thresholding, and water-filling techniques to optimize system utility.
It is instantiated in diverse systems, including Cloud-RAN, SD-WAN controllers, and LLM orchestrators, balancing computational cost against performance.
Empirical results demonstrate significant cost reductions with minimal utility loss, backed by theoretical guarantees and scalable, real-time implementations.

CARROT (Cost AwaRe Rate Optimal rouTer) denotes a family of routing and scheduling methodologies with formal rate-optimality and explicit cost-awareness constraints, as instantiated across cloud radio access networks, discrete energy-aware network routing, SD-WAN policy controllers, opportunistic wireless multirate protocols, and learned LLM orchestration systems. While the domain contexts span physical layer communications to modular software orchestration, all share a common mathematical formulation: maximizing utility (e.g., sum-rate, accuracy, throughput) subject to a computational, cost, or budget constraint; often realized through duality-based, thresholding, water-filling, or information-theoretic techniques.

1. Mathematical Formulations and Common Structure

Across all instantiations, CARROT systems solve a constrained optimization problem: $\begin{aligned} &\max_{\{r_k\}} \sum_{k} u_k(r_k) \qquad \text{s.t. } \sum_k c_k(r_k) \leq C_\max \,,\quad r_k \in \mathcal{R}_k \end{aligned}$ where $u_k$ is per-flow utility (e.g., achievable rate, model accuracy), $c_k$ is cost (e.g., computational cycles, LLM call price, energy), $C_\max$ is the global resource budget, and $\mathcal{R}_k$ the feasible set (coding rates, routing paths, model selections).

This structure underpins systems such as:

Cloud-RAN scheduler optimizing sum-rate subject to decoder complexity (Rost et al., 2015),
SD-WAN controller minimizing congestion and cost under SLA constraints (Quang et al., 2022),
LLM router minimizing expected cost under a target quality loss (Ding et al., 2024),
Web agent router penalizing both redundant prompt encoding and LLM invocation cost (Li et al., 13 Oct 2025),
RL-based modular tool orchestrators trading success rate vs. price (Qian et al., 9 Oct 2025),
Discrete-rate network routers minimizing total power under combinatorial routing (Wang et al., 2013).

2. Algorithmic Realizations: Duality, Water-Filling, and Thresholding

CARROT algorithms instantiate several key solution paradigms:

Lagrangian duality and water-filling: In Cloud-RAN scheduling, the KKT conditions yield the water-filling allocation

$r_k = \frac{1}{2\alpha_k}\left(\frac{1}{\lambda} - \beta_k\right)^+\quad \text{subject to } \sum_k (\alpha_k r_k^2 + \beta_k r_k) = C_\max$

The dual multiplier $\lambda$ is efficiently found via bisection.

Threshold rules for routing: In hybrid LLM routing, the optimal action for query $x$ is

$\pi^*(x; \lambda) = \begin{cases} \text{small model} & \text{if } c_s + \lambda \ell_s(x) \leq c_L + \lambda \ell_L(x) \ \text{large model} & \text{otherwise} \end{cases}$

which reduces to sending all queries with difficulty $d(x) \geq \tau(\lambda)$ to the small model.

Relaxation and rounding: For network routing with discrete energy costs, a non-convex integer program is relaxed by approximating step cost functions with convex surrogates, then randomized rounding extracts an unsplittable path for each commodity. The cost-optimal fractional solution is rounded with probabilistic guarantees (constant factor for uniform demands, $O(\log^{\beta-1} d)$ for arbitrary demands) (Wang et al., 2013).
Variational bottleneck with cost penalty: In cost-sensitive web agents, CARROT employs a variational information bottleneck objective penalizing both information rate and expected operational cost:

$L = E_{z \sim q_\phi(z|x)} [-\log p_\theta(y|z)] + \beta~I_\phi(z;x) + \lambda~E_{q_\phi(z|x)}[c(y|x)]$

with the router learning compressed representations trading routing accuracy versus cost (Li et al., 13 Oct 2025).

3. Implementation paradigms and deployment architectures

CARROT instantiations are characterized by real-time, scalable control logic, often split into hierarchical or two-stage loops:

Centralized Controller: In SD-WAN deployments, a global controller executes SPR (Smart Policy Routing) periodically (every ~50s), updating ingress-router policies for path selection, followed by a QoS allocation loop (every ~10s) assigning shaper rates to each flow group. Edge agents enforce next-hop assignments and traffic shaping independently (Quang et al., 2022).
Modular orchestration pipeline: For LLM orchestration, the router emits tool-calling actions via structured OpenAI-style function calls; an external microservice orchestrates model invocations and returns outputs, separating routing policy from implementation details (Qian et al., 9 Oct 2025).
Efficient per-query routing: Hybrid-LLM CARROT routers apply a monotonic score function per query, comparing against a calibrated threshold; the router overhead is negligible compared to model inference (0.036s per query vs. 0.46–15s for large models) (Ding et al., 2024).
Information-theoretic encoding: WebRouter's pipeline tokenizes the full prompt (goal, history, state), encodes via mDeBERTaV3-base, learns stochastic binary masks, and routes to candidate LLMs with soft cost weighting (Li et al., 13 Oct 2025).

4. Empirical Results and Performance Trade-offs

Quantitative analyses across use cases demonstrate strong cost-rate-optimal trade-offs:

System (Paper)	Cost Reduction	Utility Loss	Coverage
Cloud-RAN (Rost et al., 2015)	50% reduction in 90th percentile computational load (ε=10%)	<0.3% avg sum-rate loss	Outage-free
Hybrid LLM (Ding et al., 2024)	up to 40% fewer large-model calls	~3% drop in BART-score	No quality drop at 20% cut
SD-WAN (Quang et al., 2022)	26% cost savings (MLU-QoS vs. baseline)	SLA exceeded (≥95%*)	80% delay-reduction (95th)
WebRouter (Li et al., 13 Oct 2025)	87.8% cost reduction vs. GPT-4o	3.8% accuracy loss	5 real-world sites
RL-Orchestrator (Qian et al., 9 Oct 2025)	88% cost reduction at fixed accuracy (math/code)	10–20% performance gap vs. proprietary APIs	Adaptivity over model catalog

All variants empirically verify the rate-optimality: given a finite cost or complexity budget, CARROT approaches closely match unconstrained performance until the budget is tightly binding, beyond which graceful degradation occurs. For instance, in Cloud-RAN, computational outages are fully avoided with sub-1% throughput loss; in hybrid LLM routing, threshold calibration on held-out data generalizes robustly and achieves nearly zero loss at moderate cost savings.

5. Theoretical Guarantees and Limitations

CARROT-based methods provide formal guarantees under their respective mathematical frameworks:

Optimality via duality: Threshold-routing and water-filling schedules are provably cost-minimal solutions of their Lagrangian relaxations for given quality budgets or resource constraints (Rost et al., 2015, Ding et al., 2024).
Approximation ratios: Relaxation–rounding yields constant-factor cost approximations for uniform demands and $O(\log^{\beta-1} d)$ for arbitrary demands in discrete-rate routing (Wang et al., 2013).
Pareto-optimality: Information bottleneck objectives trace the theoretical Pareto frontier in rate–distortion–cost space (Li et al., 13 Oct 2025).

Typical limitations:

Continuous rate allocations or latent encodings must be discretized in practice,
Quality gap estimation must be accurate for optimal cost-saving in model routing,
RL-based routers require stratified, difficulty-balanced training sets,
Dual-decomposition and convex-approximation are required for distributed or scalable implementation in large networks.

6. Extensions and Contextual Adaptations

CARROT’s rate-optimal, cost-constrained logic has generalized across:

Multi-user and multi-antenna wireless scenarios (by per-stream scheduling),
Routing for discrete energy-aware links,
Modular inference systems spanning generic LLM catalogs,
Policy optimization for application-aware SD-WAN overlays,
Information-theoretic compression frameworks for web-agent prompt routing.

Key generalizations include multi-model extension (for $N \gg 2$ experts), joint handling of fronthaul capacity and computation, weighted-fairness objectives ( $\sum_k w_k r_k$ ), and dynamic adaptation via real-time controller loops.

Underlying all variants is the interplay of cost (tokens, energy, complexity) and rate (throughput, accuracy, utility), mediated via dual variables, explicit penalty terms, or soft information bottlenecks, yielding highly principled, implementable solutions for modern cost-sensitive systems.

In summary, CARROT defines a general methodology for cost-aware, rate-optimal scheduling and routing, applicable to diverse networking, inference, and orchestration contexts. Its core principles—constrained optimization, duality-based resource allocation, and empirical rate–cost–utility trade-offs—are recurrent across wireless, SD-WAN, energy-aware, and LLM-routing systems, underpinning robust, scalable, and theoretically justified deployment architectures.