FreeStreamGS: Online Feed-forward 3D Gaussian Splatting from Unposed Streaming Inputs

Published 2 Jun 2026 in cs.CV | (2606.03254v1)

Abstract: Feed-forward 3D Gaussian Splatting (3DGS) allows efficient and high-fidelity novel view synthesis (NVS) from an offline recorded image sequence. However, achieving online NVS from streaming and unposed image inputs remains challenging. Although online feed-forward geometric estimation methods have been proposed for streaming depth and point cloud recovery, they cannot be adapted to NVS due to severe rendering artifacts. This is because NVS demands stricter multi-view consistency in Gaussian scales and pose-geometry alignment; even minor deviations would accumulate over time and visibly degrade rendering quality. To this end, we propose FreeStreamGS, a robust online feed-forward framework for efficient and high-quality NVS. We introduce two key mechanisms: a Decoupled Intrinsic Recovery Head that removes cumulative camera intrinsic bias and prevents scene scale jitter during long-term streaming, and a Dynamic Point Refinement Offset strategy that relaxes rigid unprojection to correct coupled pose-depth drift. Extensive experiments show that FreeStreamGS achieves rendering quality competitive with state-of-the-art offline feed-forward 3DGS methods, despite operating without access to future frames.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper presents a feed-forward 3D Gaussian splatting framework that reconstructs high-fidelity scenes from unposed, streaming image inputs.
It introduces key mechanisms—DIR-Head for stable intrinsic recovery and DPR-Offsets for dynamic geometric adjustment—ensuring temporal consistency.
Empirical results show improved latency, rendering quality, and scalability versus both offline and optimization-based online 3DGS methods.

FreeStreamGS: Online Feed-forward 3D Gaussian Splatting from Unposed Streaming Inputs

Introduction and Motivation

FreeStreamGS addresses a critical challenge in 3D reconstruction and novel view synthesis (NVS): online and efficient 3D Gaussian Splatting (3DGS) directly from streaming, unposed image sequences, without access to future frames or precomputed camera parameters. The recent surge in feed-forward 3DGS has enabled high-fidelity offline reconstructions but remains unsuitable for real-time or latency-sensitive applications such as AR/VR, video stabilization, and navigation, where immediate scene understanding from a live, unconstrained camera is required. FreeStreamGS introduces a feed-forward architecture that incrementally reconstructs high-quality 3DGS under causal, streaming conditions, maintaining rendering fidelity and temporal stability.

Figure 1: FreeStreamGS is a feed-forward framework that incrementally reconstructs high-quality 3D Gaussians from unposed streaming image sequences, enabling low-latency and high-quality online novel view synthesis.

Methodological Advances and System Architecture

The core technical contributions are two synergistic mechanisms for robust online geometry and camera recovery in the absence of future observations: (1) the Decoupled Intrinsic Recovery Head (DIR-Head) and (2) Dynamic Point Refinement Offsets (DPR-Offsets).

DIR-Head tackles recursive drift in camera intrinsics estimation. Rather than predicting per-frame focal length, DIR-Head anchors global camera intrinsics using a normalized prediction from the first streaming frame’s feature embedding, mitigating scale inconsistencies and alignment failures seen in prior approaches. This addresses one principal source of geometric degeneration in online 3DGS, as illustrated by intrinsic-driven global scale inconsistency.
DPR-Offsets address the cumulative spatial misalignment caused by rigid per-pixel unprojection of 3D Gaussian primitives, especially under pose-depth coupling errors endemic to causal, non-optimized settings. The module predicts per-pixel 3D residuals, enabling flexible primitive placement that adapts to historical geometric drift.

Both mechanisms integrate into a causal, transformer-based extractor with key-value caching, producing temporally aware features over observed frame history. Separate heads infer camera extrinsics and DIR-Head intrinsics; a convolutional decoder predicts depth, orientation, scale, opacity, and color attributes for each pixel which, after applying DPR-Offsets, instantiate the Gaussian primitives. These are recursively fused into a global, memory-efficient scene representation via online voxelized aggregation with confidence weighting.

Figure 3: Overview of FreeStreamGS: causal temporal feature extraction; decoupled heads for camera recovery and Gaussian decoding; DPR-Offsets for geometric drift correction; online recursive primitive fusion anchors consistency and efficiency.

Analysis of Online Geometric Challenges

A detailed geometric analysis demonstrates two failure modes in previous online 3DGS: intrinsic drift and rigid viewing-ray constraints.

Intrinsic drift introduces global scale variation, leading to inconsistent geometry across frames.
Rigid constraints from viewing-ray unprojection compound pose and depth error, progressively distorting the spatial alignment of accumulated Gaussians.
Figure 2: (Left) Intrinsics drift induces scale inconsistency; (Right) Rigid viewing-ray constraint leads to primitive distortions under coupled pose-depth error.

FreeStreamGS’s design specifically targets these pathologies via its decoupled, physically grounded intrinsic prediction and dynamic per-pixel geometric relaxation.

Training Paradigm and Loss Construction

Given the lack of ground-truth camera information, FreeStreamGS leverages pretrained teacher models both for focal length (DIR-Head initialization with distillation loss) and geometry (Depth Anything V3 pseudo-ground-truth and alignment-invariant supervision). The training objective is a compound of:

Intrinsic distillation loss for stable intrinsics recovery.
Scale-shift-invariant geometric loss aligning predicted depth to depth priors.
A photometric reconstruction loss (MSE+SSIM+LPIPS) with novel-view-weighted supervision (NV-Sup) to enforce multi-view consistency and reduce overfitting to context views.

This strict causal-aware objective enables online, temporally stable 3DGS with high rendering fidelity.

Experimental Evaluation

Rendering and Latency Performance

Quantitative and qualitative comparison against state-of-the-art offline and online (optimization-based) 3DGS demonstrates the efficacy of FreeStreamGS:

Sparse Input Setting: Under 5 input views, FreeStreamGS achieves top or near-top PSNR, SSIM, and LPIPS on both ling2024dl3dv and zhou2018stereo benchmarks, outperforming offline approaches.
Dense Input Scalability: Unlike offline methods that suffer Out-Of-Memory failure at 64 input views, FreeStreamGS scales with stable performance due to its recursive online fusion strategy.
Latency: Per-frame inference time (250 ms, RTX 5880 Ada) undercuts optimization-based online methods (450 ms), matching or surpassing state-of-the-art feed-forward baselines.

Figure 4: FreeStreamGS (ours) produces high-fidelity novel views with sharp spatial features compared to offline and online baselines under sparse 5-view settings.

Generalization and Robustness

Zero-shot generalization is evaluated on the out-of-domain silberman2012nyuv2 dataset. FreeStreamGS outperforms or matches the best offline SOTA despite significant appearance and structure domain shift, with qualitative results evidencing preserved boundaries and robust reconstruction (no additional fine-tuning).

Figure 5: FreeStreamGS maintains cross-dataset generalization on unseen silberman2012nyuv2, highlighting robustness to domain shift.

Ablations

Ablation studies on DIR-Head, DPR-Offsets, and NV-Sup validate each architectural innovation:

Without DIR-Head: Catastrophic scale drift and misalignment, confirming the necessity of temporal intrinsics anchoring.
Without DPR-Offsets: Increased ghosting and geometric blur, indicating that rigid per-pixel projection is inadequate in online settings.
Without NV-Sup: Over-smoothing in unseen view synthesis, evidencing overfitting to context views.

Figure 7: Removing DIR-Head or DPR-Offsets degrades spatial alignment and structural integrity across test datasets.

Application: Video Stabilization

FreeStreamGS enables real-time, online full-frame video stabilization. Smoothing the estimated camera trajectory and re-rendering via reconstructed 3DGS effectively suppresses high-frequency temporal jitter, enabling stabilized visualization for long sequences.

Figure 6: Temporal X-T profiles before and after FreeStreamGS stabilization demonstrate effective causal jitter suppression across 150-frame sequences.

Failure Cases

Typical failure scenarios include transparent/reflective surfaces (depth estimation ambiguity), thin and high-frequency geometric structures (insufficient detail in depth priors and fusion), and highly dynamic scenes (lack of explicit temporal segmentation).

Figure 9: Failure cases on transparent structures and thin elements, indicating current limitations in geometric prior modeling and fusion robustness.

Theoretical and Practical Implications

FreeStreamGS advances the state of online feed-forward 3DGS reconstruction by restoring geometric stability previously reliant on global offline context or iterative optimization. Its design demonstrates that with decoupled temporal supervision and dynamic geometric flexibility, high-fidelity, scalable 3DGS is attainable under strict causal constraints. This paradigm shift enables practical deployment for real-time robotics, AR/VR, and live scene understanding, with direct implications for further reducing computational overhead and extending scene scale.

Theoretically, the study redefines the geometric requirements for reliable online 3D representation, emphasizing the importance of proper temporally anchored camera calibration and adaptive geometric placement beyond rigid projective mappings.

Future Directions

While FreeStreamGS establishes a strong foundation for online 3DGS, persistent limitations include scaling to very large or highly dynamic scenes, and the difficulty in modeling complex light transport phenomena (transparency, reflection). Future research will likely explore:

Explicit dynamic scene segmentation and representation.
Memory-efficient fusion and pruning to enable persistent lifelong 3D mapping.
Integrating physically based priors or augmenting depth/geometry estimation with additional sensory cues.
Online self-supervised or uncertainty-calibrated geometry learning for uncontrolled, dynamic environments.

Conclusion

FreeStreamGS offers a technically robust solution for online 3DGS reconstruction from unposed streaming image inputs, effectively countering the cumulative geometric drift endemic to the online causal setting via principled temporal and geometric decoupling. Experimental results demonstrate competitive reconstruction fidelity, low-latency operation, and strong cross-domain generalization, providing a viable route for scalable, real-time scene understanding and novel view synthesis in streaming, unconstrained scenarios.