TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks (2408.10739v1)

Published 20 Aug 2024 in cs.CV

Abstract: Neural radiance fields (NeRFs) generally require many images with accurate poses for accurate novel view synthesis, which does not reflect realistic setups where views can be sparse and poses can be noisy. Previous solutions for learning NeRFs with sparse views and noisy poses only consider local geometry consistency with pairs of views. Closely following \textit{bundle adjustment} in Structure-from-Motion (SfM), we introduce TrackNeRF for more globally consistent geometry reconstruction and more accurate pose optimization. TrackNeRF introduces \textit{feature tracks}, \ie connected pixel trajectories across \textit{all} visible views that correspond to the \textit{same} 3D points. By enforcing reprojection consistency among feature tracks, TrackNeRF encourages holistic 3D consistency explicitly. Through extensive experiments, TrackNeRF sets a new benchmark in noisy and sparse view reconstruction. In particular, TrackNeRF shows significant improvements over the state-of-the-art BARF and SPARF by $\sim8$ and $\sim1$ in terms of PSNR on DTU under various sparse and noisy view setups. The code is available at \href{https://tracknerf.github.io/}.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces feature tracks as a novel method to enforce global 3D consistency in NeRF reconstruction.
It achieves approximately a 50% reduction in pose errors compared to state-of-the-art methods like SPARF and BARF.
The approach demonstrates robust performance on challenging DTU and LLFF datasets, enabling high-quality view synthesis from sparse, noisy views.

TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

The paper "TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks" proposes a novel methodology for advancing Neural Radiance Fields (NeRF) to function effectively in scenarios with limited and noisy camera poses. NeRF, a prominent technique for novel view synthesis, generally requires extensive datasets of posed images to achieve high accuracy and photorealism. However, this paper addresses the practical limitation where acquiring such extensive data is infeasible, focusing on the more realistic setup of sparse and noisy views.

Core Contributions

TrackNeRF introduces the concept of leveraging feature tracks, which are connected pixel trajectories across all visible views that correspond to the same 3D points, to enforce reprojection consistency. This approach exhibits significant improvements compared to state-of-the-art methods that address similar challenges, such as BARF and SPARF.

Feature Track Utilization: The central innovation in TrackNeRF is the incorporation of feature tracks for enforcing more holistic 3D consistency. Unlike previous methods utilizing pairwise reprojection constraints, TrackNeRF employs a global reprojection loss by connecting pixel trajectories across all views, reminiscent of bundle adjustment (BA) used in Structure-from-Motion (SfM).
Enhanced Pose Optimization: By enforcing global geometry consistency through feature tracks, TrackNeRF effectively improves pose estimation accuracy. The experiments demonstrate a substantial reduction in pose errors, achieving approximately a ~50% reduction compared to SPARF on various benchmarks.
Robustness and Convergence: TrackNeRF shows robustness to greater levels of pose noise and converges more efficiently than previous approaches. The method also effectively reconstructs geometry and camera poses even with high levels of noise, outperforming SPARF, which fails under similar conditions.

Experimental Results

Through extensive experiments on datasets such as DTU and LLFF, TrackNeRF establishes new benchmarks:

DTU with Noisy Views: TrackNeRF surpasses competing methods in PSNR, SSIM, and depth error metrics. For instance, on the DTU dataset under 3-view setup with noisy poses, TrackNeRF achieves a PSNR improvement of ~0.8 and significantly better pose accuracy than SPARF.
DTU with Ground Truth Poses: Even with accurate camera poses, TrackNeRF achieves superior novel view synthesis quality compared to state-of-the-art methods, including those using diffusion and regularization techniques.
LLFF Dataset: Although performance gains are modest due to the simpler camera motion in LLFF, TrackNeRF still edges out SPARF in PSNR and pose accuracy, demonstrating the method's consistency across different datasets.

Methodological Insights

The paper meticulously details several critical components:

Track Adjustment: This involves extracting and refining feature tracks using keypoint adjustment to optimize multiview consistency, enhancing the reliability of the tracks for subsequent NeRF training.
Track Reprojection Loss: By minimizing the reprojection error across all views for each feature track, this loss ensures that corresponding pixels maintain alignment with the same 3D landmarks across all views, promoting global geometric consistency.
Depth Regularization: To counteract depth ambiguities and floaters, a depth regularization loss aligns depth gradients with image gradients, thereby improving the internal geometry representation within the NeRF.

Implications and Future Work

TrackNeRF's methodological advancements in handling sparse and noisy views significantly enhance the practical applicability of NeRF-based systems in real-world scenarios. The implications are twofold:

Practical Applications: In domains such as AR/VR and 3D reconstruction where data acquisition is often constrained, TrackNeRF offers a viable solution for achieving high-quality view synthesis with minimal and imprecise data.
Research Directions: Future developments could explore integrating TrackNeRF with more advanced feature matchers or faster rendering methods like Gaussian Splatting or InstantNGP to further enhance efficiency and robustness.

Conclusion

TrackNeRF represents an important step forward in extending the capabilities of Neural Radiance Fields to more practical and challenging scenarios. By adopting adaptations from traditional BA and applying them to NeRF, the paper demonstrates significant improvements in both pose accuracy and synthesized view quality under sparse and noisy conditions. The research lays a strong foundation for further innovations in this space, potentially inspiring new approaches to handling limited data scenarios in high-fidelity 3D scene reconstruction and view synthesis.

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1826102694718878174

https://twitter.com/Eng_Hemdi/status/1826166148385087718

https://twitter.com/KaustVision/status/1826569839466418518

https://twitter.com/CSVisionPapers/status/1826437558219215268