Fuel-Constrained UAV Routing Problem

Updated 2 December 2025

FCURP is a routing challenge in operations research where UAVs must plan routes under strict fuel constraints with scheduled refueling stops.
The problem uses MILP formulations, heuristic algorithms, and reinforcement learning methods to optimize energy use, mission time, and route feasibility.
Recent research extends FCURP with cooperative multi-agent frameworks, stochastic modeling, and online adaptive planning to enhance real-world application.

The Fuel-Constrained UAV Routing Problem (FCURP) is a fundamental problem in the operations research and robotics communities, arising in mission planning for aerial vehicles with limited onboard energy that must visit a collection of spatially distributed targets. The problem incorporates hard fuel or energy capacity constraints, often permitting mid-mission refueling or recharging at designated depots, stationary sites, or—crucially—via rendezvous with mobile ground vehicles acting as refuelers or rechargers. Contemporary formulations address both single- and multi-agent variants, deterministic and stochastic energy consumption, and increasingly leverage both classical optimization and advanced machine learning methodologies.

1. Core Problem Definition and Mathematical Models

The canonical FCURP is defined on a node set $V = T \cup D$ , where $T = \{t_1, \ldots, t_n\}$ are targets and $D = \{d_0, d_1, \ldots, d_k\}$ are refueling stations, with $d_0$ typically the main depot. A set of $m$ homogeneous UAVs, each with fuel capacity $F$ , are initially stationed at depots. The objective is to construct feasible routes such that:

Every target is visited at least once by some UAV.
No UAV ever runs out of fuel en route; each contiguous path between refueling opportunities satisfies a segment-wise capacity constraint.
The total cost (typically distance or time) is minimized.

The problem is most often formalized as a Mixed-Integer Linear Program (MILP). For a single vehicle and multiple depots, the path $P$ must connect all targets, with the sum of fuel requirements along any depot-to-depot or depot-to-target-to-depot subpath not exceeding $F$ (Sundar et al., 2016, Sundar et al., 2013). In multivehicle settings, assignment variables indicate route segments per agent, and segment-wise or node-potential flow constructs are leveraged to enforce both connectivity and energy feasibility (Sundar et al., 2015, Venkatachalam et al., 2017, Venkatachalam et al., 2020).

For cooperative routing—i.e., with a mobile ground station such as a UGV—locations and timings of rendezvous points become decision variables as well, yielding a richer combinatorial structure (Mondal et al., 2023, Maini et al., 2018, Mondal et al., 2023).

2. Deterministic MILP and Heuristic Algorithms

Early approaches focus on deterministic, single- or multi-UAV versions. Arc-based MILP formulations introduce edge variables $x_{ij} \in \{0,1\}$ , node-based flow or potential variables (such as $u_i$ for fuel-on-arrival at node $i$ ), and enforce segment-level $F$ -bounds via either global constraints or lifted Miller–Tucker–Zemlin (MTZ) inequalities (Sundar et al., 2016, Sundar et al., 2015). For example, in (Sundar et al., 2016):

$\sum_{j} z_{ij} - \sum_{j} z_{ji} = \sum_{j} f_{ij} x_{ij}, \quad 0 \le z_{ij} \le F x_{ij}$

Various LP relaxations, dominance relations (arc- vs. node-based), and branch-and-cut algorithms are developed, demonstrating that tightened arc-flow models (F2 in (Sundar et al., 2016)) and their multi-depot analogs (F2 in (Sundar et al., 2015)) outperform node-based forms in both solution quality and scalability.

For the single-UAV, multiple-depot regime, efficient approximation algorithms are available. The approach in (Sundar et al., 2013) computes shortest “fuel-feasible” paths between targets (augmented with intermediate refuel stops if required), constructs a covering cycle, and repairs any stranded subpaths greedily. Local-improvement heuristics (e.g., $k$ -opt, depot-exchange) yield practical solutions within 1–2% of optimality on 25-target, 5-depot instances in seconds, vastly outperforming the theoretical log-factor approximation in practice.

3. Hierarchical and Cooperative Multi-Agent Frameworks

Recent research has elevated the FCURP to multi-agent, cooperative domains, notably integrating Unmanned Ground Vehicles (UGVs) as mobile refueling stations. A sequential, bilevel framework is now standard for such settings:

Outer Level: Refueling Site or Rendezvous Selection

A minimum set cover (MSC) is solved to select a minimal subset of candidate refuel points or rendezvous sites, guaranteeing that every target is in range of at least one such site (within $0.5 F$ for round-trip reachability) (Mondal et al., 2023, Maini et al., 2018, Mondal et al., 2023).
Practical solvers include greedy heuristics with $O(n^2)$ runtime, or constraint-programming (CP-SAT/OR-Tools) formulations for provable minimality.

Middle Level: UGV Routing

The sequence of rendezvous sites is ordered by solving a small Traveling Salesman Problem (TSP) on the chosen subset, with explicit time computations for UGV arrival at each site (Mondal et al., 2023).

Inner Level: Energy-Constrained VRP for UAV

The UAV’s mission is decomposed across subproblems, one per pair of consecutive refuel stops, leading to multiple Energy-Constrained Vehicle Routing Problems with Time Windows (E-VRPTW).
Each subproblem is formulated as a MILP over assignment point sets with time and fuel coupling, where fuel updates follow

$f^a_j \le f^a_i - P^a(v^a)\, t_{ij} x_{ij} + L_1(1 - x_{ij})$

subject to arrival and rendezvous window constraints (Mondal et al., 2023, Mondal et al., 2023).

This sequential decomposition circumvents the scalability issues of monolithic VRPs, enabling exact or near-exact solutions on instances with up to 100 nodes within minutes (Mondal et al., 2023).

4. Learning-Based and Stochastic Techniques

Modern solution strategies increasingly incorporate deep reinforcement learning (DRL), stochastic optimization, and online replanning.

Deep RL for Rendezvous and Routing

Transformer-based RL policies, trained via policy-gradient methods (REINFORCE, PPO), now select recharge locations or direct agent assignment decisions (Mondal et al., 2023, Mondal et al., 29 Apr 2025, Mishra et al., 9 Apr 2024). The RL policy operates over featurized node and agent state embeddings, masking infeasible actions due to fuel constraints, and driving decision quality on large and dynamic scenarios. RL systems are evaluated against genetic algorithms or classical heuristics, demonstrating:

23–32% reduction in mission times,
15% energy savings,
3× reduction in idle time,
Robust performance and fast inference on large-scale multi-agent deployments (Mondal et al., 2023, Mondal et al., 29 Apr 2025).

Two-Stage Stochastic Approaches

Uncertainty in energy use (stochastic fuel consumption) is modeled with two-stage stochastic programming, integrating scenario-based recourse into nominal MILP routes (Venkatachalam et al., 2017). Sample Average Approximation (SAA) and tabu-search heuristics yield solutions outperforming expected-value (EV) deterministic plans by up to 30%, with superior tractability on medium-to-large instances.

Chance-constrained MDPs (CCMDP) are used to bound risk of fuel depletion. LP-based occupancy-measure formulations for CCMDPs ensure that the risk (failure probability) of UAV energy outage remains below prescribed thresholds, balancing expected completion time against risk tolerance (Shi et al., 2022).

Online and Adaptive Planning

Dynamic replanning in response to unknown per-target fuel requirements or environment changes is addressed via online algorithms that backtrack rendezvous points in real time as new energy data arrives during execution. Experiments in simulated (Gazebo) environments verify feasibility and 20–37% improvement over naive, static baselines (Agarwal et al., 25 Jun 2025).

5. Algorithmic and Practical Performance

The various algorithmic paradigms for the FCURP yield complimentary performance characteristics, summarized as follows:

Method	Feasibility Guarantee	Optimality/Gap	Scalability	Reference
MILP (tightened arc-based)	Yes	Exact (~99%)	up to 40–100 targets	(Sundar et al., 2016, Sundar et al., 2015)
Greedy MSC + TSP + VRPTW heuristic	Yes	5–20% gap	up to 100 nodes, minutes	(Mondal et al., 2023, Maini et al., 2018)
RL-based rendezvous + CP routing	Yes (action-mask)	5–15% gap	100–200 nodes, seconds	(Mondal et al., 2023, Mondal et al., 29 Apr 2025)
SAA two-stage stochastic	Probabilistic	Matches recourse	10–30 targets, long runs	(Venkatachalam et al., 2017)
Tabu/repair heuristics	No (empirical)	1–2% gap	20–50 targets, seconds	(Sundar et al., 2013, Venkatachalam et al., 2017)
Online replanning	Yes (execution time)	Empirical	10–100 nodes, real-time	(Agarwal et al., 25 Jun 2025)

Key insights:

CP-based and MILP-based pipelines yield provable and high-quality solutions, though with scaling limits on very large (e.g., >100) node sets.
RL-based policies, leveraging Transformer encoders and effective action masking, offer rapid inference and generalize well to larger or dynamically changing problems, provided suitable training curricula.
Greedy and heuristic methods, especially when hybridized with strong local search or tabu strategies, approach optimality rapidly on practical-sized problems.

6. Extensions, Limitations, and Research Directions

Important extensions now actively researched include:

Multi-agent, heterogeneous teams: multiple UAVs and/or UGVs, possibly of differing speeds and capacities, coordinated via centralized or distributed policies (Mondal et al., 29 Apr 2025, Venkatachalam et al., 2020).
Stochasticity: Explicit treatment of energy consumption uncertainty, agent availability, and real-time disturbances using risk-aware models (Venkatachalam et al., 2017, Shi et al., 2022, Venkatachalam et al., 2020).
Integrated mission constraints: Time windows, heterogeneous priority or value of targets, energy costs with non-linear or time-varying rates, and operational restrictions such as no-fly zones or communication/sensing constraints (Mondal et al., 2023).
Online and mixed offline–online planning: Replanning mid-mission, accommodating new tasks, vehicle failures, dynamic environments, and real-time fuel/energy feedback (Agarwal et al., 25 Jun 2025).
Scalability: Application of attention-based graph networks and other neural architectures to maintain performance on very large, distributed or persistent surveillance instances (n > 100–200) (Mondal et al., 29 Apr 2025, Mishra et al., 9 Apr 2024).

Current limitations noted in the literature include the computational burden of exact methods, RL scalability to very large task graphs, and the integration of uncertainty or non-stationarity in online deployments.

7. Representative Experimental Results and Practice

Empirical validations span large-scale simulations (16–40 km², 30–100 points) and hardware-in-the-loop or field trials (Mondal et al., 2023, Mondal et al., 2023). Comparative studies demonstrate, for example:

Greedy heuristics achieve 19–40% makespan reduction and up to 58% energy savings over UGV-only baselines on small- and medium-sized problems, with minor losses or minor improvements on largest scales due to UGV speed bottlenecks (Mondal et al., 2023).
RL-based policies consistently outperform genetic and greedy heuristics on all completion time, energy, and utilization metrics, requiring fewer rendezvous events and less idle time (Mondal et al., 2023).
Online replanning frameworks reduce mission time by ~20% and rendezvous requirements by 35% relative to static plans in scenarios with unknown per-target processing costs (Agarwal et al., 25 Jun 2025).
Stochastic planning yields up to 30% value-of-stochastic-solution (VSS) benefit, particularly in regimes of low agent availability or high fuel uncertainty (Venkatachalam et al., 2017, Venkatachalam et al., 2020).

These results underscore the critical interplay between problem decomposition, refueling site selection, modern combinatorial optimization, and scalable learning methods in advancing the tractability and practicality of the FCURP.