Performance-Guided Refinement for Visual Aerial Navigation using Editable Gaussian Splatting in FalconGym 2.0

Published 2 Oct 2025 in cs.RO | (2510.02248v1)

Abstract: Visual policy design is crucial for aerial navigation. However, state-of-the-art visual policies often overfit to a single track and their performance degrades when track geometry changes. We develop FalconGym 2.0, a photorealistic simulation framework built on Gaussian Splatting (GSplat) with an Edit API that programmatically generates diverse static and dynamic tracks in milliseconds. Leveraging FalconGym 2.0's editability, we propose a Performance-Guided Refinement (PGR) algorithm, which concentrates visual policy's training on challenging tracks while iteratively improving its performance. Across two case studies (fixed-wing UAVs and quadrotors) with distinct dynamics and environments, we show that a single visual policy trained with PGR in FalconGym 2.0 outperforms state-of-the-art baselines in generalization and robustness: it generalizes to three unseen tracks with 100% success without per-track retraining and maintains higher success rates under gate-pose perturbations. Finally, we demonstrate that the visual policy trained with PGR in FalconGym 2.0 can be zero-shot sim-to-real transferred to a quadrotor hardware, achieving a 98.6% success rate (69 / 70 gates) over 30 trials spanning two three-gate tracks and a moving-gate track.

Abstract PDF Upgrade to Chat

Summary

The paper introduces the Performance-Guided Refinement algorithm to focus training on challenging tracks, thereby enhancing policy robustness.
It utilizes editable Gaussian Splatting with a novel Edit API to rapidly generate diverse static and dynamic navigation environments.
The framework demonstrates robust sim-to-real transfer, achieving 100% success on unseen tracks in simulation for both fixed-wing UAVs and quadrotors.

Introduction and Motivation

The paper presents FalconGym 2.0, a photorealistic simulation framework for visual aerial navigation, leveraging Gaussian Splatting (GSplat) and a novel Edit API for rapid, programmatic generation of diverse static and dynamic tracks. The central contribution is the Performance-Guided Refinement (PGR) algorithm, which iteratively focuses policy training on challenging tracks, thereby improving generalization and robustness. The framework is validated across two aerial platforms—fixed-wing UAVs and quadrotors—demonstrating superior generalization to unseen tracks and robust sim-to-real transfer.

FalconGym 2.0: Editable GSplat Simulation

FalconGym 2.0 advances prior NeRF-based simulators by adopting GSplat for efficient photorealistic rendering and introducing a world-frame Edit API. The GSplat scene is parameterized as a set of $N$ anisotropic Gaussians, each with mean $\mu_j$ , covariance $\Sigma_j$ , color $c_j$ , and opacity $\alpha_j$ . The Edit API exposes seven composable operations (add, translate, rotate, scale, duplicate, delete, lighting), enabling rapid synthesis of new tracks and dynamic environments.

Figure 1: The Edit API in FalconGym 2.0 enables programmatic world-frame placement and transformation of gates, supporting rapid generation of diverse static and dynamic tracks.

This editability is critical for scalable policy training, allowing for domain randomization and curriculum learning strategies without additional real-world data collection.

Closed-Loop Visual Policy Architecture

The closed-loop system in FalconGym 2.0 comprises a photorealistic renderer, platform-specific dynamics (Dubins airplane for UAVs, 12-state quadrotor model), a perception module, and a controller. The perception module is a U-Net trained to segment gate masks from onboard RGB images, with ground-truth masks generated analytically via 3D-to-2D projection. The controller consumes the predicted mask and a short history of past actions to output the next control command.

Figure 2: FalconGym 2.0 closed-loop system: dynamics propagate state, GSplat renderer generates RGB, perception predicts gate mask, controller outputs next action; PGR focuses training on challenging tracks.

This modular architecture mitigates overfitting observed in prior end-to-end ViT-based policies, which tended to memorize background features and failed to generalize to new tracks. By decoupling perception and control, the system supports onboard deployment and improved generalization.

Figure 3: UAV trajectories in FalconGym 2.0 across three unseen tracks; red overlays indicate predicted gate masks from the perception module.

Figure 4: Quadrotor trajectories in FalconGym 2.0 across three unseen tracks, demonstrating generalization of the visual policy.

PGR formulates policy training as a min-max optimization problem:

$\min_{\theta} \max_{g \in G} \mathbb{E}_{\tau \sim \pi_\theta} [\mathcal{L}(\tau; g)]$

where $g$ parameterizes a two-gate track, $G$ is the feasible/observable gate space, and $\mathcal{L}$ penalizes collisions, timeouts, and gate crossing error. The algorithm partitions $G$ into grids, evaluates policy performance per grid, and adaptively samples new training tracks from grids with high loss, mixing in uniform sampling to avoid mode collapse.

This approach concentrates data collection on policy failure modes, efficiently improving robustness and generalization. PGR is applied to both perception and controller modules in the fixed-wing UAV case study, and to perception only in the quadrotor case due to onboard compute constraints.

Experimental Results

Fixed-Wing UAV Case Study

Three tracks (Spatial-S, Random, Moving) are synthesized using the Edit API. Five policies are evaluated: state-based expert, two visual baselines (retrained in FalconGym 2.0), and two variants of the proposed method (uniform sampling, PGR). The baselines achieve high success on the training track but fail to generalize (0–50% SR) on unseen tracks. In contrast, both proposed variants generalize with 100% SR across all tracks, with PGR reducing mean gate error (MGE) compared to uniform sampling.

Figure 5: Robustness to gate-pose perturbations for UAV policies; PGR maintains higher success rates under increasing perturbation compared to baselines.

Quadrotor Case Study

Three tracks (Left-Turn, Random, Moving) are evaluated in both simulation and real hardware. Baselines overfit to the training track and degrade on unseen tracks. The proposed methods maintain 100% SR in simulation and ≥93% SR in real-world zero-shot transfer, with PGR providing modest improvements in MGE. The Moving track is evaluated with lateral tracking only for safety.

Figure 6: Real hardware quadrotor trajectories; visual policy trained in FalconGym 2.0 transfers zero-shot to hardware with high success rate.

Figure 7: Gate perturbation robustness for quadrotor policies; PGR and uniform sampling outperform baselines as perturbation increases.

Implementation and Deployment Considerations

GSplat Rendering: Training and inference leverage NeRFStudio Splatfacto pipeline; Edit API operations are tensorized for rapid execution ( $\sim$ 4ms per operation on RTX 4090).
Perception Module: U-Net segmentation trained on synthetic image-mask pairs generated via Edit API; ground-truth masks computed analytically.
Controller: Imitation learning from state-based expert; for hardware deployment, classical controller replaces neural controller for real-time constraints.
PGR Algorithm: Grid-based sampling, adaptive weighting, and uniform mixing to avoid mode collapse; scalable to high-dimensional gate spaces.
Sim-to-Real Transfer: Policies trained in FalconGym 2.0 are deployed zero-shot to ModalAI Starling 2 quadrotor with onboard inference at 8Hz.

Implications and Future Directions

The results demonstrate that editable photorealistic simulation, combined with performance-guided refinement, enables training of visual policies that generalize across diverse tracks and transfer robustly to real hardware. The modular architecture and Edit API facilitate scalable data generation and efficient policy improvement. The strong numerical results—100% SR on unseen tracks in simulation, 98.6% SR in real hardware—underscore the efficacy of the approach.

Future work should address:

Relaxing observability constraints to handle occluded gates and more complex environments.
Scaling to higher-speed flight regimes and more realistic UAV dynamics.
Distilling the full perception-control stack for onboard deployment at higher frequencies.
Extending the Edit API to broader robotics domains (e.g., ground robots, manipulation).

Conclusion

FalconGym 2.0, with its editable GSplat simulation and PGR algorithm, establishes a robust framework for visual aerial navigation policy training. The approach achieves strong generalization and sim-to-real transfer, outperforming state-of-the-art baselines in both robustness and success rate. The modular design and efficient data generation pipeline position FalconGym 2.0 as a practical tool for scalable, real-world deployment of visual navigation policies.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Authors (6)

Collections

YouTube

Show All Videos

alphaXiv

Performance-Guided Refinement for Visual Aerial Navigation using Editable Gaussian Splatting in FalconGym 2.0 (4 likes, 0 questions)

Performance-Guided Refinement for Visual Aerial Navigation using Editable Gaussian Splatting in FalconGym 2.0

Summary