CARROT: Cost-Aware Rate-Optimal Router
- CARROT is a modular framework of cost-aware, rate-optimal routing and scheduling that employs duality, thresholding, and water-filling techniques to optimize system utility.
- It is instantiated in diverse systems, including Cloud-RAN, SD-WAN controllers, and LLM orchestrators, balancing computational cost against performance.
- Empirical results demonstrate significant cost reductions with minimal utility loss, backed by theoretical guarantees and scalable, real-time implementations.
CARROT (Cost AwaRe Rate Optimal rouTer) denotes a family of routing and scheduling methodologies with formal rate-optimality and explicit cost-awareness constraints, as instantiated across cloud radio access networks, discrete energy-aware network routing, SD-WAN policy controllers, opportunistic wireless multirate protocols, and learned LLM orchestration systems. While the domain contexts span physical layer communications to modular software orchestration, all share a common mathematical formulation: maximizing utility (e.g., sum-rate, accuracy, throughput) subject to a computational, cost, or budget constraint; often realized through duality-based, thresholding, water-filling, or information-theoretic techniques.
1. Mathematical Formulations and Common Structure
Across all instantiations, CARROT systems solve a constrained optimization problem: $\begin{aligned} &\max_{\{r_k\}} \sum_{k} u_k(r_k) \qquad \text{s.t. } \sum_k c_k(r_k) \leq C_\max \,,\quad r_k \in \mathcal{R}_k \end{aligned}$ where is per-flow utility (e.g., achievable rate, model accuracy), is cost (e.g., computational cycles, LLM call price, energy), $C_\max$ is the global resource budget, and the feasible set (coding rates, routing paths, model selections).
This structure underpins systems such as:
- Cloud-RAN scheduler optimizing sum-rate subject to decoder complexity (Rost et al., 2015),
- SD-WAN controller minimizing congestion and cost under SLA constraints (Quang et al., 2022),
- LLM router minimizing expected cost under a target quality loss (Ding et al., 22 Apr 2024),
- Web agent router penalizing both redundant prompt encoding and LLM invocation cost (Li et al., 13 Oct 2025),
- RL-based modular tool orchestrators trading success rate vs. price (Qian et al., 9 Oct 2025),
- Discrete-rate network routers minimizing total power under combinatorial routing (Wang et al., 2013).
2. Algorithmic Realizations: Duality, Water-Filling, and Thresholding
CARROT algorithms instantiate several key solution paradigms:
- Lagrangian duality and water-filling: In Cloud-RAN scheduling, the KKT conditions yield the water-filling allocation
$r_k = \frac{1}{2\alpha_k}\left(\frac{1}{\lambda} - \beta_k\right)^+\quad \text{subject to } \sum_k (\alpha_k r_k^2 + \beta_k r_k) = C_\max$
The dual multiplier is efficiently found via bisection.
- Threshold rules for routing: In hybrid LLM routing, the optimal action for query is
which reduces to sending all queries with difficulty to the small model.
- Relaxation and rounding: For network routing with discrete energy costs, a non-convex integer program is relaxed by approximating step cost functions with convex surrogates, then randomized rounding extracts an unsplittable path for each commodity. The cost-optimal fractional solution is rounded with probabilistic guarantees (constant factor for uniform demands, for arbitrary demands) (Wang et al., 2013).
- Variational bottleneck with cost penalty: In cost-sensitive web agents, CARROT employs a variational information bottleneck objective penalizing both information rate and expected operational cost:
with the router learning compressed representations trading routing accuracy versus cost (Li et al., 13 Oct 2025).
3. Implementation paradigms and deployment architectures
CARROT instantiations are characterized by real-time, scalable control logic, often split into hierarchical or two-stage loops:
- Centralized Controller: In SD-WAN deployments, a global controller executes SPR (Smart Policy Routing) periodically (every ~50s), updating ingress-router policies for path selection, followed by a QoS allocation loop (every ~10s) assigning shaper rates to each flow group. Edge agents enforce next-hop assignments and traffic shaping independently (Quang et al., 2022).
- Modular orchestration pipeline: For LLM orchestration, the router emits tool-calling actions via structured OpenAI-style function calls; an external microservice orchestrates model invocations and returns outputs, separating routing policy from implementation details (Qian et al., 9 Oct 2025).
- Efficient per-query routing: Hybrid-LLM CARROT routers apply a monotonic score function per query, comparing against a calibrated threshold; the router overhead is negligible compared to model inference (0.036s per query vs. 0.46–15s for large models) (Ding et al., 22 Apr 2024).
- Information-theoretic encoding: WebRouter's pipeline tokenizes the full prompt (goal, history, state), encodes via mDeBERTaV3-base, learns stochastic binary masks, and routes to candidate LLMs with soft cost weighting (Li et al., 13 Oct 2025).
4. Empirical Results and Performance Trade-offs
Quantitative analyses across use cases demonstrate strong cost-rate-optimal trade-offs:
| System (Paper) | Cost Reduction | Utility Loss | Coverage |
|---|---|---|---|
| Cloud-RAN (Rost et al., 2015) | 50% reduction in 90th percentile computational load (ε=10%) | <0.3% avg sum-rate loss | Outage-free |
| Hybrid LLM (Ding et al., 22 Apr 2024) | up to 40% fewer large-model calls | ~3% drop in BART-score | No quality drop at 20% cut |
| SD-WAN (Quang et al., 2022) | 26% cost savings (MLU-QoS vs. baseline) | SLA exceeded (≥95%*) | 80% delay-reduction (95th) |
| WebRouter (Li et al., 13 Oct 2025) | 87.8% cost reduction vs. GPT-4o | 3.8% accuracy loss | 5 real-world sites |
| RL-Orchestrator (Qian et al., 9 Oct 2025) | 88% cost reduction at fixed accuracy (math/code) | 10–20% performance gap vs. proprietary APIs | Adaptivity over model catalog |
All variants empirically verify the rate-optimality: given a finite cost or complexity budget, CARROT approaches closely match unconstrained performance until the budget is tightly binding, beyond which graceful degradation occurs. For instance, in Cloud-RAN, computational outages are fully avoided with sub-1% throughput loss; in hybrid LLM routing, threshold calibration on held-out data generalizes robustly and achieves nearly zero loss at moderate cost savings.
5. Theoretical Guarantees and Limitations
CARROT-based methods provide formal guarantees under their respective mathematical frameworks:
- Optimality via duality: Threshold-routing and water-filling schedules are provably cost-minimal solutions of their Lagrangian relaxations for given quality budgets or resource constraints (Rost et al., 2015, Ding et al., 22 Apr 2024).
- Approximation ratios: Relaxation–rounding yields constant-factor cost approximations for uniform demands and for arbitrary demands in discrete-rate routing (Wang et al., 2013).
- Pareto-optimality: Information bottleneck objectives trace the theoretical Pareto frontier in rate–distortion–cost space (Li et al., 13 Oct 2025).
Typical limitations:
- Continuous rate allocations or latent encodings must be discretized in practice,
- Quality gap estimation must be accurate for optimal cost-saving in model routing,
- RL-based routers require stratified, difficulty-balanced training sets,
- Dual-decomposition and convex-approximation are required for distributed or scalable implementation in large networks.
6. Extensions and Contextual Adaptations
CARROT’s rate-optimal, cost-constrained logic has generalized across:
- Multi-user and multi-antenna wireless scenarios (by per-stream scheduling),
- Routing for discrete energy-aware links,
- Modular inference systems spanning generic LLM catalogs,
- Policy optimization for application-aware SD-WAN overlays,
- Information-theoretic compression frameworks for web-agent prompt routing.
Key generalizations include multi-model extension (for experts), joint handling of fronthaul capacity and computation, weighted-fairness objectives (), and dynamic adaptation via real-time controller loops.
Underlying all variants is the interplay of cost (tokens, energy, complexity) and rate (throughput, accuracy, utility), mediated via dual variables, explicit penalty terms, or soft information bottlenecks, yielding highly principled, implementable solutions for modern cost-sensitive systems.
In summary, CARROT defines a general methodology for cost-aware, rate-optimal scheduling and routing, applicable to diverse networking, inference, and orchestration contexts. Its core principles—constrained optimization, duality-based resource allocation, and empirical rate–cost–utility trade-offs—are recurrent across wireless, SD-WAN, energy-aware, and LLM-routing systems, underpinning robust, scalable, and theoretically justified deployment architectures.