3D Block Inpainting Techniques

Updated 27 October 2025

3D block inpainting is the process of restoring missing regions in 3D models by leveraging intrinsic geometry, graph-based methods, and generative diffusion techniques.
Key methodologies include variational energy minimization on conformal maps, quadratic optimization in graph domains, and multi-view, attention-guided inpainting strategies.
Applications span digital restoration, AR/VR content creation, and interactive 3D editing while addressing challenges like high-frequency detail recovery and view-consistency.

3D block inpainting refers to the process of filling in missing, occluded, or deliberately removed 3D regions—referred to as “blocks” or “holes”—in surfaces, point clouds, or scene representations. The goal is to restore or complete 3D geometry and appearance such that the result is geometrically plausible, photorealistic, and consistent across multiple views or modalities. The term arises in various contexts including shape completion, digital restoration, scene editing, and immersive content creation, with the methodological details and representations driven by the nature of the input data (meshes, point clouds, multi-view images, volumetric grids, implicit fields) and the intended application.

1. Intrinsic Geometry Approaches

Several early works in 3D block inpainting leveraged intrinsic geometry, emphasizing the differential structure of surfaces. A representative example is the conformal approach for surface inpainting on Riemann surfaces, where the geometry is encoded via the conformal factor $\lambda$ and the mean curvature $H$ (Lui et al., 2012). Here, the 3D surface inpainting problem is reduced to an image inpainting task on the conformal parameter domain for the computed $\lambda$ and $H$ maps. Specifically, the missing regions correspond to domains $\Omega$ where $\lambda$ and $H$ are incomplete or non-smooth. A variational energy (e.g., $E(\lambda) = \int_{\Omega} |\Delta \lambda|^2$ ) is minimized subject to the known values on the boundaries. Once $\lambda$ and $H$ are inpainted over the domain, the 3D surface is reconstructed by solving the Gauss–Codazzi system with these intrinsic quantities.

This method guarantees that the inpainted regions are consistent with the underlying surface’s conformal structure and preserves critical geometric features (such as sharp edges or curvature extrema), although it requires accurate parameterization and is hindered by excessively large or highly irregular holes.

2. Graph- and Signal-Based Methods

For non-parametric representations such as point clouds (especially dynamic or unstructured ones), inpainting can be formulated using graph signal processing (Fu et al., 2019). In this context, a point cloud is regarded as a graph $\mathcal{G} = (\mathcal{V}, \mathcal{E}, W)$ , with each point a vertex and edges reflecting spatial or attribute proximity. The missing 3D block (or cube) is treated as a signal on an incomplete domain, and self-similar regions both within the current frame (intra-frame) and temporally adjacent frames (inter-frame) serve as references.

The block inpainting problem is thereby cast as a quadratic optimization: $\min_{c_r} \| \overline{\Omega} c_r - \overline{\Omega} c_t \|_2^2 + \alpha\| \Omega c_r - \Omega \hat{c}_s\|_2^2 + \gamma c_r^T L c_r + \beta \| c_r - W_{f-1} \hat{c}_t^{(f-1)} \|_2^2 + \beta \| c_r - W_{f+1} \hat{c}_t^{(f+1)}\|_2^2$ where $L$ is the graph Laplacian, $\Omega$ selects missing points, and the solution is closed-form. This framework exploits both spatial smoothness and spatiotemporal consistency, leading to reductions in artifacts and improved geometric fidelity, particularly in dynamic contexts.

3. Decomposition and Frequency Domain Transformations

A robust strategy for irregular, non-manifold data such as terrain point clouds is to decompose the signal into low- and high-frequency components (Xie et al., 4 Apr 2024). The smooth background is modelled as a B-spline surface $L$ (minimizing the sum of squared Euclidean distances to all points), while residual details (relative heights) $H$ are defined as signed distances from each point to $L$ . The 3D block inpainting task is then split: (1) fit $L$ via iterative least-squares and Newton updates, and (2) rasterize $H$ onto a 2D parameter domain, locate missing regions by coverage, and solve a Poisson inpainting equation (using patch-match–guided gradients).

This decomposition allows inpainting despite ill-defined or complex block boundaries and retains both global undulations and fine local details—outperforming mesh-based and deep-learned methods sensitive to boundary precision.

4. Multi-View and Generative Diffusion-Based Frameworks

State-of-the-art 3D block inpainting leverages multi-view consistency and deep generative priors. Modern pipelines, such as those based on 3D Gaussian Splatting (3DGS), NeRFs, and implicit fields, achieve high photorealism and geometric alignment by integrating inpainting into learned 3D scene representations.

a) Diffusion Priors Propagated into 3D

Frameworks like Inpaint3D (Prabhu et al., 2023) distill the generative prior of a 2D diffusion inpainting model into a 3D NeRF via Score Distillation Sampling (SDS) guided by masked multi-view images and robust depth prediction. During optimization, the rendered NeRF is constrained to match the distribution of plausible outputs from the 2D inpainting network in the masked region, while unmasked regions are supervised for photometric and depth consistency. The joint loss can be summarized as: $\mathcal{L}_{\text{total}} = w_1 \mathcal{L}_{\text{recon,unmasked}} + w_2 \mathcal{L}_{\text{SDS,masked}} + w_3 \mathcal{L}_{\text{distortion}} + w_4 \mathcal{L}_{\text{interlevel}} + w_5 \mathcal{L}_{\text{depth smoothness}}$ where each term governs a different aspect of geometric and appearance fidelity across views.

b) Multiview Inpainting with Explicit Consistency Priors

Approaches such as NeRFiller (Weber et al., 2023) and Instant3Dit (Barda et al., 30 Nov 2024) exploit the “grid prior” of Stable Diffusion–like models: arranging multiple (e.g., four) views in a tiled grid and jointly inpainting enables the model’s receptive field to enforce cross-view consistency. This is generalized in NeRFiller by joint multi-view inpainting (averaging noise predictions over many random grid groupings) to distill consistency into the entire dataset during iterative updates.

In Instant3Dit, three types of masks (coarse, mesh sculpting, surface) are employed, matching likely editing behaviors. A single feedforward inference pass enables inpainting in seconds, in contrast to optimization-driven SDS methods.

c) Geometry- and Attention-Guided Propagation

Recent works leverage geometry-aware fusion and attention-based propagation to further address inconsistencies. Geometry-aware diffusion models (Salimi et al., 18 Feb 2025) condition the diffusion process on explicit geometric cues (depth, mesh masks), fusing appearance and geometry in latent space and supporting few-view inpainting. DiGA3D combines multiple reference views (via K-means clustering), attention feature propagation (AFP) across clusters, and a texture-geometry SDS loss for robust, fine-level consistency and geometric fidelity (Pan et al., 1 Jul 2025).

The AFP module weights attention from reference to non-reference views while the TG-SDS loss leverages warped Canny edges and depth maps as conditioning, with the full gradient expressed as: $\nabla_\theta \mathcal{L}_{\text{TG-SDS}} = \mathbb{E}_{t,\epsilon} \left[ w(t) (\epsilon_\phi(I_{i,t}; m_i, y, t, C'_i, D'_i) - \epsilon_i) \frac{\partial I_i}{\partial \theta} \right]$

5. Reference- and Depth-Guided Explicit Inpainting

Reference-based approaches use a high-quality inpainted reference view to guide completion in all other views and align 3D geometry accordingly. RePaintGS (Seo et al., 11 Jul 2025) weights each inpainted view by its LPIPS similarity to the reference after warping, using this confidence to guide the optimization and Poisson blending of novel (reference-guided) structures into the global 3DGS scene. SplatFill (Dahaghin et al., 9 Sep 2025) and “High-fidelity 3D Gaussian Inpainting” (Zhou et al., 24 Jul 2025) employ depth-guided Gaussian placement, object-aware contrastive loss for feature consistency, and iterative refinement focused on consistency-aware problematic regions (using error gradient analysis and local re-inpainting).

A common mathematical structure is the alpha-blending rendering equation: $C(p) = \sum_{i} c_i \cdot \alpha'_i \prod_{j=1}^{i-1}(1 - \alpha'_j)$ together with region-wise weighted uncertainty and object-aware regularization to balance global smoothness and local detail.

6. Mask Generation and Localization Strategies

Accurate inpainting is contingent on precise localization of “holes” or blocks in 3D. Mask strategies have evolved from global object removal masks—prone to over-inpainting—to geometry-driven methods and block-wise mask refinement. Examples include:

In IMFine (Shi et al., 6 Mar 2025), dilated object masks are mapped to a pruned 3D scene, optimized (with a learnable per-Gaussian parameter) to minimize mask overlap error, and then refined with segmentation models to ensure that only never-before-seen regions are inpainted.
High-fidelity 3D Gaussian Inpainting (Zhou et al., 24 Jul 2025) implements Gaussian scene filtering: projecting each Gaussian kernel across key views and removing those lying within any mask, followed by repeated local smoothing and expansion to extract accurate inpainting regions.

These strategies improve output quality in unconstrained scenes, facilitate more efficient inpainting, and reduce unnecessary synthesis in already-visible regions.

7. Performance Metrics, Empirical Validation, and Practical Applications

Performance is rigorously evaluated both quantitatively and qualitatively:

Metric	Description	Example Results
PSNR, SSIM	Pixel or patch-wise photometric fidelity	ObjFiller-3D achieves PSNR 26.6 vs. NeRFiller’s 15.9 (Feng et al., 25 Aug 2025)
LPIPS, FID	Perceptual/semantic consistency	SplatFill shows LPIPS < 0.20, FID improvement over GScream (Dahaghin et al., 9 Sep 2025)
LoFTR	Multi-view geometric matching	DiGA3D reports increased feature matches (novel view correspondences)
Quant. ΔTime	Runtime efficiency	Instant3Dit/ InstaInpaint ≈ 3 sec/0.4 sec per edit (Barda et al., 30 Nov 2024, You et al., 12 Jun 2025)

These techniques are now integral in computer graphics (interactive 3D editing, asset creation), privacy filtering, data augmentation, digital heritage restoration, robotic perception, AR/VR, and real-time scene editing tasks. The increasingly modular, explicit, and geometry-aware designs enable interactive rates, generalization to unconstrained views, and robust downstream integration.

8. Ongoing Challenges and Future Directions

Despite substantial progress, open technical challenges persist in 3D block inpainting:

High-frequency detail preservation and view-consistent sharpness, especially far from training views, remain elusive for purely grid/diffusion-driven or unregularized frameworks (Weber et al., 2023, Barda et al., 30 Nov 2024).
Reliance on accurate depth estimation or camera calibration is a limitation, especially in scenes with ambiguous or weakly observable geometry (Zhao et al., 2022, Seo et al., 11 Jul 2025).
Some failure cases persist in scenes with severe occlusion, highly non-Lambertian appearance, or background ambiguities.
Efficient adaptation to dynamic, real-time, or few-shot contexts—especially for unconstrained, open-world scenarios—remains under-explored (Shi et al., 6 Mar 2025, Cao et al., 15 Aug 2024).
The trade-off between global multi-view consistency and preservation of local texture variance is still an open problem, with techniques such as region-wise uncertainty weighting and attention-based feature grouping proposed as partial solutions (Zhou et al., 24 Jul 2025, Cao et al., 15 Aug 2024, Pan et al., 1 Jul 2025).

Future work is expected to focus on more sophisticated geometry and attention-driven regularization, scalable fusion of multi-modal cues, more accurate semantic/instance mask adaptation, adaptation for non-static or multi-object scenes, and seamless integration into real-time VR/AR pipelines.

In summary, 3D block inpainting has evolved from intrinsic geometry-based surface approaches and graph-signal optimization to sophisticated, generative, multi-view and geometry-aware diffusion pipelines. The field encompasses a variety of representations and mask strategies, and is moving toward fully interactive, robust, and high-fidelity 3D scene completion and editing solutions.