- The paper presents a novel partial-reference IQA method that bridges patch-similarity and multi-view consistency in diffusion-based novel view synthesis.
- It introduces a two-stage process where partial quality maps are generated and then completed using cross-attention in an encoder-decoder architecture.
- Experiments show that PR-IQA substantially improves sparse-view 3D reconstruction by filtering diffusion artifacts and boosting reconstruction metrics.
Partial-Reference Image Quality Assessment for Diffusion-Based Novel View Synthesis
Sparse-view novel view synthesis (NVS) via diffusion models advances 3D scene reconstruction, but the utility of generated pseudo-ground-truth (GT) images is limited by photometric and geometric artifacts. These errors propagate to 3D Gaussian Splatting (3DGS) when used unfiltered, degrading the final geometry and texture. Existing image quality assessment (IQA) paradigmsโfull-reference (FR) metrics and no-reference (NR) predictorsโare inadequate: FR metrics require pose-aligned GTs, generally unavailable in NVS, while NR methods are unable to capture subtle, multi-view-consistent artifacts common in diffusion-generated imagery. Cross-reference (CR) IQA [wang2024crossscore, hermann2025puzzle, asim2025met3r] leverages unaligned reference views but is constrained to overlapping regions and fails to estimate quality outside mutual coverage.
PR-IQA: Methodological Innovations
PR-IQA bridges patch-similarity and multi-view consistency analysis via partial-reference completion. The approach operates in two stages:
- Partial Quality Map Generation: For a query image Iqโ and reference Irโ, geometric correspondences are established via VGGT, and DINOv2 features are aligned and compared to generate a cosine-similarity map in overlapping areas only. This represents a locally reliable set of quality scores.
- Quality Map Completion: A three-stream encoder-decoder network ingests the query image, reference image, and partial quality map. Cross-attention injects reference view evidence, propagating quality signals from observed regions to the unseen parts of the query. Channel/spatial decoupled attention allows for feature selectivity versus spatial propagation, enhancing structure consistency and geometric robustness.
This architecture achieves FR-level accuracy, despite GT absence, by exploiting cross-view geometric and semantic cues (Figure 1).
Figure 1: PR-IQA uses diffusion-generated views and novel cross-reference completion to create dense quality maps that closely correlate with FR-IQA metrics, enabling robust filtering for 3DGS training.
The network adopts a U-Net-like encoder-decoder topology with DINOv2 backbone, multi-scale skip connections, dual-gated attention blocks (CBAM), and reference-conditioned fusion at each pyramid stage (Figure 2). Each branch (query, reference, partial map) processes its input at four scales; cross-attention in query/partial branches ensures reference-guided propagation.
Training optimizes a composite objective combining pixel-wise L1โ, Jensen-Shannon Divergence (JSD), and Pearson Linear Correlation Coefficient (PLCC) losses. The PLCC loss enforces linear correlation to GT maps, LJSDโ penalizes uniformity and mode collapse, and L1โ secures local fidelity. Ablations demonstrate catastrophic performance drop when either auxiliary loss is removed.
Figure 2: Encoder-decoder architecture delineates cross-/self-attention modules, query fusion, and mask-aware pixel-shuffle downsampling, highlighting stage-wise component counts and attention heads.
PR-IQA-Guided 3DGS Pipeline
In the downstream sparse-view 3DGS pipeline, PR-IQA scores provide both image-level ranking for robust pseudo-GT selection and pixel-level masking to restrict optimization loss to high-confidence regions. Max-fusion of multiple reference maps per candidate allows optimistic inclusion of regions validated by any reference. The binary confidence mask is defined on top-ฯ percentile pixels; ablation shows ฯ=50 yields optimal balance between artifact suppression and data retention.
Figure 3: IQA-guided 3DGS reconstructions are markedly cleaner, avoiding blurring, artifacts, and misaligned Gaussians present in non-guided and baseline IQA approaches.
Experimental Analysis and Numerical Results
Extensive experiments across Tanks and Temples, Mip-NeRF 360, and RealEstate10K validate PR-IQA's superiority quantitatively (PLCC/SRCC) and qualitatively against FR-IQA, NR-IQA, and CR-IQA baselines. PR-IQA matches or exceeds FR metrics (LPIPS, SSIM, DINOv2-SIM) without pose-aligned GT, outperforming CrossScore, PuzzleSim, and MEt3R on both map correlation and downstream 3DGS metrics (PSNR, SSIM, LPIPS). The method demonstrates robust generalization to unseen generators (GEN3C, SEVA), resilience to geometric noise, and low false positive rates in hallucinated regions. PR-IQA's quality map completion is robust even under low-overlap conditions, preserving meaningful structure beyond the shared region.
Figure 4: PR-IQA's estimated quality maps recover object boundaries and fine structures, closely matching GT and outperforming blocky or noisy baseline methods.
Strong numeric results: PR-IQA achieves PLCC/SRCC scores exceeding 0.55 for DINOv2-SIM targets in challenging cross-view contexts; downstream, it lifts 3DGS PSNR to 16.7โ17.7, SSIM to 0.493โ0.632, and LPIPS to 0.327โ0.414 across all benchmarks, surpassing all non-FR baselines.
Figure 5: PR-IQA correlations scale optimally with the number of reference images; even a single reference yields best-in-class alignment among learned metrics.
Quality Map Fusion and Masking Strategy Ablations
Empirical analysis indicates Max-fusion of quality maps as the optimal aggregation operator, enabling performance improvement with additional references. Both binary and soft masking strategies offer robust guidance during 3DGS optimization, confirming the reliability of PR-IQA maps irrespective of downstream selection mechanics.
Figure 6: Max fusion consistently yields higher PLCC and SRCC as reference images increase; median/mean/min strategies lag substantially.
Robustness and Generalization
PR-IQA is robust to various geometric perturbations, including high confidence thresholds on depth point filtering and significant noise in camera pose parameters. Low-overlap and hallucinated region analyses further validate the conservativeness and specificity of the quality map completion: PR-IQA maintains low false positive rates and coherent responses in unseen regions, outperforming patch-level CC-IQA baselines.
Figure 7: PR-IQA remains stable and coherent under low-overlap conditions, preserving the perceptual structure in highly partial correspondence scenarios.
Figure 8: Quality estimation in hallucinated regions suppresses false positives on unsupported content, enhancing conservativeness compared to baselines.
Qualitative and Computational Analysis
Additional qualitative results (Figures 10, 11) on multiple datasets underscore PR-IQA's dense, fine-grained quality maps closely tracking GT, and reconstructions exhibit sharper geometry and more accurate textures when guided by PR-IQA maps (Figure 9). Computationally, the added overhead (โผ1s per image) is negligible relative to 3DGS optimization.


Figure 10: PR-IQA yields quality maps with high fidelity to GT DINOv2-SIM across multiple datasets.

Figure 11: SSIM-targeted PR-IQA maintains consistent quality estimation, outperforming CrossScore in both textured and smooth regions.
Figure 9: PR-IQA-Guided 3DGS produces sharper geometry and precise textures by focusing optimization on high-quality regions.
Implications and Future Directions
PR-IQA enables high-fidelity reconstruction and robust NVS from diffusion-generated images where pose-aligned GTs are unavailable. The approach achieves FR-level quality assessment by leveraging partial geometric overlap and cross-attention quality completion, substantially elevating 3DGS performance and filtering out unreliable regions. The framework generalizes to unseen generative pipelines and is resilient to geometric and photometric noise.
Theoretical implications: PR-IQA redefines CR-IQA by treating quality map completion as a context-aware completion problem, integrating geometric and semantic cues. Practically, it enables plug-and-play quality-aware filtering for any diffusion-based reconstruction pipeline, paving the way for scalable sparse-view 3D modeling. Future work may extend PR-IQA with human perceptual supervision, broader generator evaluation, and end-to-end geometric-quality joint learning.
Conclusion
PR-IQA advances cross-reference image quality estimation by introducing partial map completion, attaining metrics and reconstructions previously reserved for full-reference settings. Integrated into sparse-view 3DGS, it reliably filters diffusion-generated artifacts, improving reconstruction fidelity and robustness. PR-IQA establishes a new baseline for quality-aware supervision in generative sparse-view 3D reconstruction pipelines (2604.04576).