Papers
Topics
Authors
Recent
2000 character limit reached

TravelUAV Benchmark: UAV Navigation & Planning

Updated 17 November 2025
  • TravelUAV Benchmark is a comprehensive suite integrating datasets, simulators, models, and evaluation protocols for UAV-based perception, navigation, and task assignment.
  • It features extensive vision-language navigation tasks with 12,149 flight episodes across diverse, photorealistic urban and semi-urban environments.
  • The benchmark provides rigorous evaluation metrics and baseline algorithms for UAV navigation, object detection, tracking, multi-agent planning, and video coding tasks.

The TravelUAV Benchmark denotes a set of datasets, simulators, mathematical models, and evaluation protocols developed to advance algorithmic research for UAV-based perception, navigation, task assignment, and data analysis. It provides standardized platforms covering vision-language navigation, object detection and tracking, multi-agent planning, learned video compression, and embodied intelligence evaluation. TravelUAV enables reproducible comparison of baseline and state-of-the-art algorithms across navigation, vision, control, and assignment tasks under realistic aerial scenarios, diverse sensory modalities, and operational constraints.

1. Dataset Composition and Modalities

TravelUAV, as used in recent vision-language navigation research (Lin et al., 9 Nov 2025), is instantiated as a large-scale AirSim-based simulator corpus. It consists of 12,149 flight episodes across 20 photorealistic outdoor environments (urban and semi-urban), rendered with physics-based UAV dynamics and low-level continuous control. Each trajectory is annotated with a navigation goal: a target object chosen from 76 classes and a natural-language instruction (e.g., “Fly to the red water tower behind the tree”). Modalities include:

  • RGB first-person camera images per decision step
  • 3D agent state (position {x,y,z}\{x, y, z\} and orientation {θ,ϕ,ψ}\{\theta, \phi, \psi\})
  • Dense ground-truth waypoints (3D coordinates + orientation)
  • Language instructions

The benchmark is partitioned into canonical splits:

Split Name Trajectories Environments Objects
Train 9,152 20 “seen” maps 76 classes
Test-Seen 1,410 20 (same as train) 76 classes
Test-Unseen-Map 958 2 new maps 76 classes
Test-Unseen-Object 629 20 maps “unseen” objects

Episodes are classified “Easy” (<250 m) or “Hard” (≥250 m) by path length. To evaluate data efficiency, a constrained regime uses only 25% of trajectories for training (sampled per scene under fixed seed).

2. Benchmark Task Definitions

TravelUAV covers a range of task formalizations:

Vision-Language Navigation for UAVs

The main navigation objective is learning a policy πθ(atst,I)\pi_\theta(a_t | s_t, I) that maps:

  • Current UAV state stS={x,y,z,θ,ϕ,ψ}s_t \in \mathcal{S} = \{x, y, z, \theta, \phi, \psi\}
  • Natural-language instruction II to a continuous action ata_t (translation/heading adjustment).

Episode termination occurs on reaching a spatial threshold (e.g., within 3 m of the goal) or the max horizon (T=200T=200 steps).

Planning adopts a two-stage hierarchy:

  1. Waypoint predictor/action decoder: proposes 3D local waypoints
  2. Continuous-control engine (VLN-CE): interpolates low-level controls between waypoints

Hard episodes (≥250 m) specifically challenge long-horizon trajectory synthesis and temporal credit assignment.

Multi-UAV Task Assignment

TravelUAV also includes benchmarks for multi-UAV task allocation under the Extended Team Orienteering Problem (ETOP) (Xiao et al., 2020). Mathematically:

  • Graph G=(V,A)G=(V,A), depot node $0$, fleet KK, location nodes N=V{0}N = V\setminus\{0\}
  • Distances dijd_{ij}, rewards rir_i, service times tit_i, UAV speeds sks_k, time budget TmaxT_\text{max}
  • Decision variables: yiky_{ik} (UAV kk visits node ii), xijkx_{ijk} (UAV kk traverses (i,j)(i,j))

Objective:

maxiVrikKyik\max \sum_{i \in V} r_i \sum_{k \in K} y_{ik}

Subject to node, route, and time constraints (flow, binary integrality, subtour elimination).

Scales: Small (K=5|K|=5, n=30n=30), Medium (K=10|K|=10, n=60n=60), Large (K=15|K|=15, n=90n=90).

3. Evaluation Protocols and Metrics

TravelUAV prescribes multiple rigorous metrics for fair comparison of algorithms:

  • Success Rate (SR):

SR=1Ni=1N1[dist(finali,goali)τ]SR = \frac{1}{N} \sum_{i=1}^N \mathbf{1} [ \mathrm{dist}(\mathrm{final}_i, \mathrm{goal}_i) \leq \tau ]

  • Oracle Success Rate (OSR):

OSR=1Ni=1Nmaxt1[dist(st(i),goali)τ]OSR = \frac{1}{N} \sum_{i=1}^N \max_t \mathbf{1} [ \mathrm{dist}(s_t^{(i)}, \mathrm{goal}_i) \leq \tau ]

  • Success weighted by Path Length (SPL):

SPL=1Ni=1NSilimax(li,li)SPL = \frac{1}{N} \sum_{i=1}^N S_i \cdot \frac{l_i^*}{\max(l_i, l_i^*)}

where SiS_i indicates success, lil_i actual, lil_i^* shortest path.

  • Normalized Error (NE):

NE=1Ni=1Ndist(finali,goali)dist(starti,goali)NE = \frac{1}{N} \sum_{i=1}^N \frac{\mathrm{dist}(\mathrm{final}_i, \mathrm{goal}_i)}{\mathrm{dist}(\mathrm{start}_i, \mathrm{goal}_i)}

  • Object Detection: Precision, Recall, Average Precision (AP @ IoU=0.7)
  • Single-Object Tracking: Success (AUC), Precision (center-error ≤ 20 px)
  • Multiple-Object Tracking (MOTA):

MOTA=1t(FNt+FPt+IDSWt)tGTtMOTA = 1 - \frac{\sum_t (\mathrm{FN}_t + \mathrm{FP}_t + \mathrm{IDSW}_t)}{\sum_t \mathrm{GT}_t}

with MOTP, IDF1, MT/ML, FP/FN/IDSW/FM as secondary metrics.

  • PSNR/bpp curves
  • BD-Rate (Δ\DeltaBD-rate):

ΔBDrate=100×RlowRhigh[Dcodec(R)Danchor(R)]dRDanchor(Rhigh)Danchor(Rlow)\Delta\mathrm{BD-rate} = 100 \times \frac{ \int_{R_{low}}^{R_{high}} [D_\mathrm{codec}(R) - D_\mathrm{anchor}(R)] dR }{ D_\mathrm{anchor}(R_{high}) - D_\mathrm{anchor}(R_{low}) }

for rate-distortion comparison.

Task Assignment Metrics

  • Total Reward Collected
  • CPU Time (wall-clock)
  • Variance, statistical significance over multiple runs

4. Baseline Algorithms and Experimental Configuration

Standardized baselines are provided for each task domain:

Vision-Language Navigation

  • Random policy: Uniform control sampling
  • Fixed policy: Hand-coded waypoint sequence
  • Cross-Modal Attention (CMA): Bi-LSTM with attention over image and text
  • TravelUAV (state-of-the-art UAV-VLN): Hierarchical planning, rule-based policy

Training uses 4×NVIDIA RTX-4090, Adam optimizer (lr=1e41e{-}4), batch size=1, RL horizon T=200T=200.

Three assistance levels:

  • L1: Instructions + Helper waypoints
  • L2: Helper waypoints only
  • L3: No assistance (autonomous)
  • Genetic Algorithm (GA): Permutation + split-vector encoding, fitness by collected reward, elitist survival
  • Ant Colony Optimization (ACO): 2D pheromone table, heuristic ηijk=skrjdijtj\eta_{ijk} = \frac{s_k r_j}{d_{ij} t_j}, evaporation/update by reward
  • Particle Swarm Optimization (PSO): Particle encodes route split, velocity updates, sorting-based decoding, local swarms

Reference configurations and open-source implementations are available for reproducibility.

  • Detectors: Faster R-CNN, R-FCN, SSD, RON
  • Trackers: MDNet, ECO, GOTURN, SORT, DSORT, MDP
  • HEVC-SCC (anchor)
  • OpenDVC
  • MPAI EEV (enhanced OpenDVC): Two-stage residual modeling, in-loop restoration

5. Reward Shaping, Ablations, and Analysis

TravelUAV incorporates advanced reward shaping for RL-based navigation:

  • Value-model reward:

RV=t=1nγt[Vρ(st)Vρ(st+1)]R^{V} = \sum_{t=1}^{n} \gamma^{t} \cdot [V_\rho(s_t) - V_\rho(s_{t+1})]

encourages monotonic state value progression

  • Cosine-similarity reward:

rtV={rlevel,if 11Sim(Fst,Fw)rlevel 11Sim(Fst,Fw),otherwiser^V_t = \begin{cases} r_\mathrm{level}, & \text{if } \frac{1}{1- \mathrm{Sim}(F_{s_t}, F_{w^*})} \ge r_\mathrm{level} \ \frac{1}{1- \mathrm{Sim}(F_{s_t}, F_{w^*})}, & \text{otherwise} \end{cases}

where FstF_{s_t}, FwF_{w^*} are multimodal embeddings

Reward caps (rlevel{1.0,3.0,5.0,}r_\mathrm{level} \in \{1.0, 3.0, 5.0, \infty\}) govern gradient stability. Ablation studies indicate rlevel=5.0r_\mathrm{level}=5.0 achieves optimal gradient informativeness.

Key quantitative results under the 25% data regime:

  • OpenVLN vs. TravelUAV baseline: NE drops 132.59→125.97; SR +2.80pp (11.59%→14.39%); OSR +3.53pp (24.50%→28.03%); SPL +2.49pp (10.45%→12.94%)
  • Hardest case (Hard/Test-Seen/L3): SR=0.00% (TravelUAV) vs. 0.47% (OpenVLN)

6. Difficulty Factors, Limitations, and Extensions

The TravelUAV Benchmark reveals unique challenges for UAV-centric vision and planning:

  • High object density: Mean ≈10.5 objects/frame; endemic ID switches and FP in MOT
  • Small/tiny object scale: Medium/high altitude imagery severely degrades detection/tracking
  • Rapid camera motion/viewpoint change: Requires robust feature updating and domain-agnostic MC
  • Adverse conditions: Diverse weather, occlusion, out-of-view rates stress model generalization
  • Long-horizon navigation: Credit assignment over 250–400 m, sparse reward landscape

Video coding benchmarks indicate learned codecs excel in outdoor, high-motion regimes versus block-based HEVC, but lag on indoor/fisheye scenes without UAV-specific domain adaptation.

Suggested research directions include domain adaptation for video codecs, multi-scale feature fusion for tiny-object detection, real-time model pruning/quantization, semantic-aware compression, and robust physical simulation (wind, sensor perturbations).

The multi-UAV assignment benchmark provides a complete recipe for problem instance generation, permutation/split encoding, and comparative evaluation—including downloadable open-source code.

7. Significance and Research Utility

TravelUAV establishes a unified, extensible suite of protocols for evaluating algorithms within the UAV domain. Integration with platforms such as AirSim, Unreal, and open-source solvers, and prescriptive evaluation metrics ensure comparability and reproducibility. Its heterogeneous task set—vision-language navigation, detection/tracking, multi-agent planning, data-efficient RL—offers rich ground for algorithmic development and benchmarking under aerial robotics constraints.

OpenVLN’s demonstrated improvements (up to 4.34% SR, 6.19% OSR, 4.07% SPL over baseline methods (Lin et al., 9 Nov 2025)) exemplify the benchmark’s capacity to empirically validate advances in data-efficient, language-guided UAV navigation. The TravelUAV corpus continues to motivate research in long-horizon planning, robustness to adversarial conditions, domain adaptation, compute-energy-accuracy trade-offs, and the systematic evaluation of embodied UAV agents.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to TravelUAV Benchmark.