Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 62 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 36 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 67 tok/s Pro

Kimi K2 192 tok/s Pro

GPT OSS 120B 430 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

FlowR: Flowing from Sparse to Dense 3D Reconstructions (2504.01647v1)

Published 2 Apr 2025 in cs.CV

Abstract: 3D Gaussian splatting enables high-quality novel view synthesis (NVS) at real-time frame rates. However, its quality drops sharply as we depart from the training views. Thus, dense captures are needed to match the high-quality expectations of some applications, e.g. Virtual Reality (VR). However, such dense captures are very laborious and expensive to obtain. Existing works have explored using 2D generative models to alleviate this requirement by distillation or generating additional training views. These methods are often conditioned only on a handful of reference input views and thus do not fully exploit the available 3D information, leading to inconsistent generation results and reconstruction artifacts. To tackle this problem, we propose a multi-view, flow matching model that learns a flow to connect novel view renderings from possibly sparse reconstructions to renderings that we expect from dense reconstructions. This enables augmenting scene captures with novel, generated views to improve reconstruction quality. Our model is trained on a novel dataset of 3.6M image pairs and can process up to 45 views at 540x960 resolution (91K tokens) on one H100 GPU in a single forward pass. Our pipeline consistently improves NVS in sparse- and dense-view scenarios, leading to higher-quality reconstructions than prior works across multiple, widely-used NVS benchmarks.

Summary

FlowR: Advancements in 3D Reconstructions from Sparse to Dense Viewpoints

The paper "FlowR: Flowing from Sparse to Dense 3D Reconstructions" addresses the persistent challenge of achieving high-quality 3D reconstructions and novel view synthesis (NVS) with sparse input data. Current methods, such as 3D Gaussian splatting (3DGS) and neural radiance fields (NeRF), perform optimally with extensive, dense input view datasets. This presents a labor-intensive and costly process for applications like Virtual Reality (VR), where high fidelity is paramount. The authors propose FlowR, a novel approach integrating multi-view flow matching to enhance reconstruction quality in both sparse and dense scenarios.

Core Contributions

FlowR is characterized by two main components:

A Robust Initial Reconstruction Pipeline: It leverages a combination of 3DGS and advanced feature tracking to construct initial semi-dense 3D reconstructions. This involves an innovative use of co-visibility graphs and track triangulation, extending the robustness across varied input view distributions.
A Flow Matching Model: The model improves the quality of 3D reconstruction by drawing auxiliary information from generated novel views. It does so by using flow matching to seamlessly transition incorrectly rendered sparse view inputs into idealized dense reconstructions.

Methodology

FlowR commences with a state-of-the-art initialization pipeline that prepares the input data by scouting through keyframes and utilizing learning-based and traditional structure-from-motion facilities. This minimizes redundancies and ensures scalability. Key to the approach is using multi-view input scenarios effectively, thereby improving the feature matching and triangulation process.

The paper introduces a dataset compiling 3.6 million image pairs from 10,300 sequences, exploiting large-scale benchmarks like DL3DV10K and ScanNet++. The dataset serves as a foundational element for training their flow matching network, which predicates its operation on mapping inaccurate initial views to a densely reconstructed ground truth, utilizing a distinct multi-view diffusion transformer model.

Results

The authors present compelling empirical results across several benchmarks, including DL3DV140, ScanNet++, and Nerfbusters. FlowR consistently outperforms existing methods in both sparse-view and dense-view setups, notably elevating PSNR, SSIM, and LPIPS metrics. Of particular significance is the method's demonstrated capability to enhance perceptual quality measures like LPIPS even in constrained input scenarios.

In dense-view settings, the innovative camera conditioning and multi-view modeling enable FlowR to retain high fidelity across novel views, circumventing limitations found in previous generative approaches. The model's adaptation of flow matching provides a marked enhancement over baseline Gaussian noise starting points, exemplified by substantial improvements in clear, artifact-free renderings.

Implications and Future Work

FlowR's proposed pathway from sparse to dense reconstruction opens up robust avenues for applications that traditionally demanded exhaustive data acquisition efforts. The method's scalability and performance in challenging, constrained-view scenarios promise a broad range of applications from VR to real-time simulation environments where data scarcity is an issue.

Further research could explore integrating uncertainty quantification and active view selection strategies to optimize camera pose targeting during refinement. Additionally, expanding flow matching to encompass unseen regions could capitalize on generative task models able to infer plausible novelty in hitherto unobserved areas.

In summary, FlowR presents a sophisticated integration of machine learning techniques and domain modeling strategies, enhancing the bridge between sparse inputs and the high-quality 3D reconstructions necessary for advanced visual computing applications.