RoboTracer: Spatial Tracing & UAV Localization

Updated 31 January 2026

RoboTracer is a dual system integrating a vision-language model for metric 3D spatial tracing and a UAV platform for radio-tagged animal localization.
It leverages multi-modal sensing, including depth maps and point clouds, alongside metric-scale supervision to ensure precise real-world measurements.
The system employs Bayesian filtering and information-theoretic POMDP planning to enable rapid, accurate control and localization despite current endurance and 3D tracking limitations.

RoboTracer is a term that refers to two concurrent advances in robotic spatial tracing and real-time localization: (1) a vision-LLM (VLM) for metric-grounded spatial reasoning and waypoint generation in diverse robotic scenarios, and (2) an autonomous UAV system for radio-tagged animal localization via information-theoretic planning. Both instantiations embody the capacity to perceive, interpret, and generate actionable representations of spatial traces, enabling precise control, measurement, and interaction in real-world environments (Zhou et al., 15 Dec 2025, Nguyen et al., 2017).

1. System Architecture and Perception Modules

The vision-LLM RoboTracer employs a universal spatial encoder based on the MapAnything backbone. This encoder, which is frozen during training, absorbs heterogeneous geometric inputs—including camera intrinsics, raw depth maps, and point clouds—transforming them into a tokenized representation for the language backbone. Lightweight linear projectors reformat hidden activations to match the downstream VLM’s embedding space. The architecture is explicitly designed to operate under variable sensor regimes, gracefully degrading to RGB-only inputs with uncertainty notification, yet fully leveraging available geometric annotations for enhanced 3D awareness (Zhou et al., 15 Dec 2025).

In the real-time localization context, RoboTracer describes the merging of a lightweight software-defined radio (SDR)-based RSSI sensor, a Bayesian particle filter tracker, and an information-driven POMDP planner onboard a UAV platform (3DR IRIS+ quadrotor). Custom folded 2-element Yagi VHF antennas, a HackRF One SDR (1 MHz–6 GHz range), and Intel Edison embedded computers enable rapid scanning, peak detection, and RSSI computation. The platform supports GPS-based location, heading control, and payloads under 2 kg for operational flexibility. Telemetry is managed via MAVLink (915 MHz) and REST (2.4 GHz) links (Nguyen et al., 2017).

2. Measurement, Representation, and Scale Supervision

RoboTracer’s VLM advances metric scale-awareness through a regression-supervised “scale decoder.” A <SCALE> token routes through an MLP head to regress the real-world scale factor $\hat s$ , supplementing standard next-token cross-entropy loss during supervised fine-tuning (SFT):

$\mathcal{L}_{\rm scale} = 0.1\,\|\log \hat s - \log s^*\|_2^2$

where $s^*$ is ground-truth scene scale. The aggregate SFT objective couples this with textual prediction, ensuring geometric “unit” values map correctly to meters in the physical environment. This formulation endows RoboTracer with robust internal sense of scale for spatial referencing and accurate metric measurement (Zhou et al., 15 Dec 2025).

UAV-based RoboTracer relies on two log-distance path-loss models (Log-path and Multi-path), with Gaussian noise assumptions, for RSSI-based distance inference:

Log-Path: $RSSI(d) = P_0 - 10n\log_{10}(d/d_0) + G_r(x,u) + \epsilon_P$
Multi-Path: $RSSI(d) = P_0 - 10n\log_{10}(d/d_0) + G_r(x,u) + 10n\log_{10}|1 + \Gamma(\psi)e^{-j\Delta\phi}| + \epsilon_P$

State vectors encode 3D positions, with random walk models for animal motion and known UAV control. Measurement likelihoods are Gaussian with environment-specific $\sigma_P$ (Nguyen et al., 2017).

3. Learning and Reasoning: SFT, RFT, and POMDP Planning

Metric-grounded spatial reasoning in RoboTracer VLM is achieved through a two-stage learning paradigm:

Supervised Fine-tuning (SFT): Trains for 3D referring and measuring on the TraceSpatial dataset (30M QA pairs across 4.5M scenes)—encompassing absolute metrics, hierarchical object descriptions, and reasoning chains up to 9 steps deep. Initial “metric alignment” updates only spatial and scale heads on RGB+ $\mathcal{G}$ inputs; further “metric enhancement” ensures general VQA capability remains intact (Zhou et al., 15 Dec 2025).
Reinforcement Fine-tuning (RFT): Leverages grouped relative policy optimization (GRPO), with composite rewards formulated to promote not only answer correctness, but explicit demonstration of reasoning steps:

$r_i = R_{\rm OF}(a_i) + R_P(a_i) + \tfrac14[R_{\rm PF}(a_i) + R_{\rm Acc}(a_i)]$

RFT introduces rewards for correct outcome formatting, point alignment (via start/end waypoints and dynamic time warping), stepwise process labeling, and accuracy within prescribed metric bounds.

For UAV RoboTracer, target search and localization is formulated as a POMDP. States represent joint configurations of animal and UAV positions; actions are discrete rotations and steps, prioritized by directional antenna gain. Planning uses Rényi divergence to maximize expected information gain over a fixed horizon $H$ :

$R^{(i)}_{k+H}(a_k) = \frac{1}{\alpha-1} \log \int [p(x_{k+H}|z_{1:k})]^\alpha [p(x_{k+H}|z_{1:k},z^{(i)}_{k+1:k+H}(a_k))]^{1-\alpha} dx$

UAV path planning selects actions approximately maximizing the discounted expectation of future rewards across sampled traces (Nguyen et al., 2017).

4. Dataset Construction and Benchmarking

The TraceSpatial dataset is the largest QA corpus for 3D spatial reasoning, enabling both SFT and RFT for the RoboTracer VLM. Approximately 48% of queries stress absolute-scale metrics, exceeding prior datasets (3%). Scenes span indoor, outdoor, and manipulation videos, annotated with multi-step reasoning chains and hierarchical object references.

To benchmark spatial trace competence, TraceSpatial-Bench provides 100 curated 3D cluttered scenes, requiring 3–8 reasoning steps per prompt including pick/place and push/pull with collision constraints. RoboTracer achieves a 79.1% average success rate in spatial understanding, referring, and measuring tasks, and on TraceSpatial-Bench records 45% success after RFT (vs. 3% for Gemini-2.5-Pro), representing an absolute 36% accuracy improvement (Zhou et al., 15 Dec 2025).

5. Robotic Control and Real-World Integration

RoboTracer’s output sequences of 3D waypoints $(u_t, v_t, d_t)$ are directly compatible with open-loop motion controllers on diverse robot morphologies such as UR5 manipulators and G1 humanoids. For UR5 integration, the model runs at 1.5 Hz, with endpoint pixel locations fed to segmentation (SAM), point cloud extraction (RealSense), and grasp pose determination (AnyGrasp). Mid-air object manipulation and dynamic trace re-planning are supported.

On the G1 humanoid, head-camera-based segmentation and point cloud extraction enable sequential waypoint following for tasks such as watering flowers with prescribed hover distances. This demonstrates RoboTracer’s capacity for accurate 3D spatial reasoning, collision avoidance, and metric-constrained manipulation in continually evolving real-world scenes (Zhou et al., 15 Dec 2025).

6. Information-Driven Localization and Search Termination

The "RoboTracer" UAV system employs an integrated Bayesian particle filter and POMDP path planner for animal localization. Real-time RSSI streams update particle filters for state estimation, while POMDP-driven heading selection maximizes information-theoretic utility, as computed by Rényi divergence between prior and anticipated beliefs. Search termination is governed by target uncertainty thresholds (covariance determinant) and battery constraints to ensure mission safety and efficiency.

Field demonstrations with up to five mobile radio tags report sub-30 m localization error in minutes of flight; simulation yields 12.2 m RMS error over 1.93 km. The system’s strengths are low payload mass, rapid multi-channel scanning, and principled uncertainty reduction. Limitations consist of 2D tracking (fixed altitude), ∼10 min flight endurance, and strict Gaussian noise assumptions. Identified directions for improvement include full 3D tracking, extended SDR range, and disturbance-aware planning for wildlife safety (Nguyen et al., 2017).

7. Quantitative Results and Limitations

RoboTracer (VLM) advances the field by attaining state-of-the-art spatial understanding and trace generation accuracy—exceeding established models in both absolute and relative terms (79.1% average success rate, +36% over Gemini-2.5-Pro), and demonstrating robust transfer to physical robotic controllers (Zhou et al., 15 Dec 2025).

The UAV RoboTracer system achieves rapid localization of multiple radio-tagged targets in both simulation and field conditions, with flight RMS error consistently under 30 m and sub-3 minute localization cycles. Optimizations for real-time operation include discrete action reduction, periodic planning, coarse intervals, Monte Carlo simulation for future traces, and particle count management (Nguyen et al., 2017).

Current limitations include degradation under pure RGB input, reliance on frozen spatial encoders, restriction to 2D localization, and battery-constrained mission duration. Proposed future directions include sensor fusion for true 3D spatial reasoning, improved hardware for endurance, and adaptive planning for dynamic/disturbance-rich ecological settings.

Both instantiations of RoboTracer epitomize the modern approach to embodied spatial measurement and reasoning in robotics, advancing multi-modal perception, information-driven planning, and integration with diverse robotic platforms for real-world deployment (Zhou et al., 15 Dec 2025, Nguyen et al., 2017).

Markdown Upgrade to Chat

References (2)

RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics (2025)

TrackerBots: Autonomous Unmanned Aerial Vehicle for Real-Time Localization and Tracking of Multiple Radio-Tagged Animals (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RoboTracer.

RoboTracer: Spatial Tracing & UAV Localization

1. System Architecture and Perception Modules

2. Measurement, Representation, and Scale Supervision

3. Learning and Reasoning: SFT, RFT, and POMDP Planning

4. Dataset Construction and Benchmarking

5. Robotic Control and Real-World Integration

6. Information-Driven Localization and Search Termination

7. Quantitative Results and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

RoboTracer: Spatial Tracing & UAV Localization

1. System Architecture and Perception Modules

2. Measurement, Representation, and Scale Supervision

3. Learning and Reasoning: SFT, RFT, and POMDP Planning

4. Dataset Construction and Benchmarking

5. Robotic Control and Real-World Integration

6. Information-Driven Localization and Search Termination

7. Quantitative Results and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research