TravelUAV Benchmark: UAV Navigation & Planning

Updated 17 November 2025

TravelUAV Benchmark is a comprehensive suite integrating datasets, simulators, models, and evaluation protocols for UAV-based perception, navigation, and task assignment.
It features extensive vision-language navigation tasks with 12,149 flight episodes across diverse, photorealistic urban and semi-urban environments.
The benchmark provides rigorous evaluation metrics and baseline algorithms for UAV navigation, object detection, tracking, multi-agent planning, and video coding tasks.

The TravelUAV Benchmark denotes a set of datasets, simulators, mathematical models, and evaluation protocols developed to advance algorithmic research for UAV-based perception, navigation, task assignment, and data analysis. It provides standardized platforms covering vision-language navigation, object detection and tracking, multi-agent planning, learned video compression, and embodied intelligence evaluation. TravelUAV enables reproducible comparison of baseline and state-of-the-art algorithms across navigation, vision, control, and assignment tasks under realistic aerial scenarios, diverse sensory modalities, and operational constraints.

1. Dataset Composition and Modalities

TravelUAV, as used in recent vision-language navigation research (Lin et al., 9 Nov 2025), is instantiated as a large-scale AirSim-based simulator corpus. It consists of 12,149 flight episodes across 20 photorealistic outdoor environments (urban and semi-urban), rendered with physics-based UAV dynamics and low-level continuous control. Each trajectory is annotated with a navigation goal: a target object chosen from 76 classes and a natural-language instruction (e.g., “Fly to the red water tower behind the tree”). Modalities include:

RGB first-person camera images per decision step
3D agent state (position $\{x, y, z\}$ and orientation $\{\theta, \phi, \psi\}$ )
Dense ground-truth waypoints (3D coordinates + orientation)
Language instructions

The benchmark is partitioned into canonical splits:

Split Name	Trajectories	Environments	Objects
Train	9,152	20 “seen” maps	76 classes
Test-Seen	1,410	20 (same as train)	76 classes
Test-Unseen-Map	958	2 new maps	76 classes
Test-Unseen-Object	629	20 maps	“unseen” objects

Episodes are classified “Easy” (<250 m) or “Hard” (≥250 m) by path length. To evaluate data efficiency, a constrained regime uses only 25% of trajectories for training (sampled per scene under fixed seed).

2. Benchmark Task Definitions

TravelUAV covers a range of task formalizations:

The main navigation objective is learning a policy $\pi_\theta(a_t | s_t, I)$ that maps:

Current UAV state $s_t \in \mathcal{S} = \{x, y, z, \theta, \phi, \psi\}$
Natural-language instruction $I$ to a continuous action $a_t$ (translation/heading adjustment).

Episode termination occurs on reaching a spatial threshold (e.g., within 3 m of the goal) or the max horizon ( $T=200$ steps).

Planning adopts a two-stage hierarchy:

Waypoint predictor/action decoder: proposes 3D local waypoints
Continuous-control engine (VLN-CE): interpolates low-level controls between waypoints

Hard episodes (≥250 m) specifically challenge long-horizon trajectory synthesis and temporal credit assignment.

Multi-UAV Task Assignment

TravelUAV also includes benchmarks for multi-UAV task allocation under the Extended Team Orienteering Problem (ETOP) (Xiao et al., 2020). Mathematically:

Graph $G=(V,A)$ , depot node $0$, fleet $K$ , location nodes $N = V\setminus\{0\}$
Distances $d_{ij}$ , rewards $r_i$ , service times $t_i$ , UAV speeds $s_k$ , time budget $T_\text{max}$
Decision variables: $y_{ik}$ (UAV $k$ visits node $i$ ), $x_{ijk}$ (UAV $k$ traverses $(i,j)$ )

Objective:

$\max \sum_{i \in V} r_i \sum_{k \in K} y_{ik}$

Subject to node, route, and time constraints (flow, binary integrality, subtour elimination).

Scales: Small ( $|K|=5$ , $n=30$ ), Medium ( $|K|=10$ , $n=60$ ), Large ( $|K|=15$ , $n=90$ ).

3. Evaluation Protocols and Metrics

TravelUAV prescribes multiple rigorous metrics for fair comparison of algorithms:

Success Rate (SR):

$SR = \frac{1}{N} \sum_{i=1}^N \mathbf{1} [ \mathrm{dist}(\mathrm{final}_i, \mathrm{goal}_i) \leq \tau ]$

Oracle Success Rate (OSR):

$OSR = \frac{1}{N} \sum_{i=1}^N \max_t \mathbf{1} [ \mathrm{dist}(s_t^{(i)}, \mathrm{goal}_i) \leq \tau ]$

Success weighted by Path Length (SPL):

$SPL = \frac{1}{N} \sum_{i=1}^N S_i \cdot \frac{l_i^*}{\max(l_i, l_i^*)}$

where $S_i$ indicates success, $l_i$ actual, $l_i^*$ shortest path.

Normalized Error (NE):

$NE = \frac{1}{N} \sum_{i=1}^N \frac{\mathrm{dist}(\mathrm{final}_i, \mathrm{goal}_i)}{\mathrm{dist}(\mathrm{start}_i, \mathrm{goal}_i)}$

Object Detection: Precision, Recall, Average Precision (AP @ IoU=0.7)
Single-Object Tracking: Success (AUC), Precision (center-error ≤ 20 px)
Multiple-Object Tracking (MOTA):

$MOTA = 1 - \frac{\sum_t (\mathrm{FN}_t + \mathrm{FP}_t + \mathrm{IDSW}_t)}{\sum_t \mathrm{GT}_t}$

with MOTP, IDF1, MT/ML, FP/FN/IDSW/FM as secondary metrics.

PSNR/bpp curves
BD-Rate ( $\Delta$ BD-rate):

$\Delta\mathrm{BD-rate} = 100 \times \frac{ \int_{R_{low}}^{R_{high}} [D_\mathrm{codec}(R) - D_\mathrm{anchor}(R)] dR }{ D_\mathrm{anchor}(R_{high}) - D_\mathrm{anchor}(R_{low}) }$

for rate-distortion comparison.

Task Assignment Metrics

Total Reward Collected
CPU Time (wall-clock)
Variance, statistical significance over multiple runs

4. Baseline Algorithms and Experimental Configuration

Standardized baselines are provided for each task domain:

Random policy: Uniform control sampling
Fixed policy: Hand-coded waypoint sequence
Cross-Modal Attention (CMA): Bi-LSTM with attention over image and text
TravelUAV (state-of-the-art UAV-VLN): Hierarchical planning, rule-based policy

Training uses 4×NVIDIA RTX-4090, Adam optimizer (lr= $1e{-}4$ ), batch size=1, RL horizon $T=200$ .

Three assistance levels:

L1: Instructions + Helper waypoints
L2: Helper waypoints only
L3: No assistance (autonomous)

Genetic Algorithm (GA): Permutation + split-vector encoding, fitness by collected reward, elitist survival
Ant Colony Optimization (ACO): 2D pheromone table, heuristic $\eta_{ijk} = \frac{s_k r_j}{d_{ij} t_j}$ , evaporation/update by reward
Particle Swarm Optimization (PSO): Particle encodes route split, velocity updates, sorting-based decoding, local swarms

Reference configurations and open-source implementations are available for reproducibility.

Detectors: Faster R-CNN, R-FCN, SSD, RON
Trackers: MDNet, ECO, GOTURN, SORT, DSORT, MDP

HEVC-SCC (anchor)
OpenDVC
MPAI EEV (enhanced OpenDVC): Two-stage residual modeling, in-loop restoration

5. Reward Shaping, Ablations, and Analysis

TravelUAV incorporates advanced reward shaping for RL-based navigation:

Value-model reward:

$R^{V} = \sum_{t=1}^{n} \gamma^{t} \cdot [V_\rho(s_t) - V_\rho(s_{t+1})]$

encourages monotonic state value progression

Cosine-similarity reward:

$r^V_t = \begin{cases} r_\mathrm{level}, & \text{if } \frac{1}{1- \mathrm{Sim}(F_{s_t}, F_{w^*})} \ge r_\mathrm{level} \ \frac{1}{1- \mathrm{Sim}(F_{s_t}, F_{w^*})}, & \text{otherwise} \end{cases}$

where $F_{s_t}$ , $F_{w^*}$ are multimodal embeddings

Reward caps ( $r_\mathrm{level} \in \{1.0, 3.0, 5.0, \infty\}$ ) govern gradient stability. Ablation studies indicate $r_\mathrm{level}=5.0$ achieves optimal gradient informativeness.

Key quantitative results under the 25% data regime:

OpenVLN vs. TravelUAV baseline: NE drops 132.59→125.97; SR +2.80pp (11.59%→14.39%); OSR +3.53pp (24.50%→28.03%); SPL +2.49pp (10.45%→12.94%)
Hardest case (Hard/Test-Seen/L3): SR=0.00% (TravelUAV) vs. 0.47% (OpenVLN)

6. Difficulty Factors, Limitations, and Extensions

The TravelUAV Benchmark reveals unique challenges for UAV-centric vision and planning:

High object density: Mean ≈10.5 objects/frame; endemic ID switches and FP in MOT
Small/tiny object scale: Medium/high altitude imagery severely degrades detection/tracking
Rapid camera motion/viewpoint change: Requires robust feature updating and domain-agnostic MC
Adverse conditions: Diverse weather, occlusion, out-of-view rates stress model generalization
Long-horizon navigation: Credit assignment over 250–400 m, sparse reward landscape

Video coding benchmarks indicate learned codecs excel in outdoor, high-motion regimes versus block-based HEVC, but lag on indoor/fisheye scenes without UAV-specific domain adaptation.

Suggested research directions include domain adaptation for video codecs, multi-scale feature fusion for tiny-object detection, real-time model pruning/quantization, semantic-aware compression, and robust physical simulation (wind, sensor perturbations).

The multi-UAV assignment benchmark provides a complete recipe for problem instance generation, permutation/split encoding, and comparative evaluation—including downloadable open-source code.

7. Significance and Research Utility

TravelUAV establishes a unified, extensible suite of protocols for evaluating algorithms within the UAV domain. Integration with platforms such as AirSim, Unreal, and open-source solvers, and prescriptive evaluation metrics ensure comparability and reproducibility. Its heterogeneous task set—vision-language navigation, detection/tracking, multi-agent planning, data-efficient RL—offers rich ground for algorithmic development and benchmarking under aerial robotics constraints.

OpenVLN’s demonstrated improvements (up to 4.34% SR, 6.19% OSR, 4.07% SPL over baseline methods (Lin et al., 9 Nov 2025)) exemplify the benchmark’s capacity to empirically validate advances in data-efficient, language-guided UAV navigation. The TravelUAV corpus continues to motivate research in long-horizon planning, robustness to adversarial conditions, domain adaptation, compute-energy-accuracy trade-offs, and the systematic evaluation of embodied UAV agents.

PDF Markdown Chat (Pro)

References (4)

OpenVLN: Open-world aerial Vision-Language Navigation (2025)

A Benchmark for Multi-UAV Task Assignment of an Extended Team Orienteering Problem (2020)

The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking (2018)

Learning to Compress Unmanned Aerial Vehicle (UAV) Captured Video: Benchmark and Analysis (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to TravelUAV Benchmark.

TravelUAV Benchmark: UAV Navigation & Planning

1. Dataset Composition and Modalities

2. Benchmark Task Definitions

Vision-Language Navigation for UAVs

Multi-UAV Task Assignment

3. Evaluation Protocols and Metrics

Navigation Metrics

Detection and Tracking Metrics (Du et al., 2018)

Video Coding Metrics (Jia et al., 2023)

Task Assignment Metrics

4. Baseline Algorithms and Experimental Configuration

Vision-Language Navigation

Multi-UAV Task Assignment (Xiao et al., 2020)

Object Detection and Tracking (Du et al., 2018)

Learned Video Coding (Jia et al., 2023)

5. Reward Shaping, Ablations, and Analysis

6. Difficulty Factors, Limitations, and Extensions

7. Significance and Research Utility

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

TravelUAV Benchmark: UAV Navigation & Planning

1. Dataset Composition and Modalities

2. Benchmark Task Definitions

Vision-Language Navigation for UAVs

Multi-UAV Task Assignment

3. Evaluation Protocols and Metrics

Navigation Metrics

Detection and Tracking Metrics (Du et al., 2018)

Video Coding Metrics (Jia et al., 2023)

Task Assignment Metrics

4. Baseline Algorithms and Experimental Configuration

Vision-Language Navigation

Multi-UAV Task Assignment (Xiao et al., 2020)

Object Detection and Tracking (Du et al., 2018)

Learned Video Coding (Jia et al., 2023)

5. Reward Shaping, Ablations, and Analysis

6. Difficulty Factors, Limitations, and Extensions

7. Significance and Research Utility

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics