ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS (2505.23734v2)

Published 29 May 2025 in cs.CV

Abstract: Feed-forward 3D Gaussian Splatting (3DGS) models have recently emerged as a promising solution for novel view synthesis, enabling one-pass inference without the need for per-scene 3DGS optimization. However, their scalability is fundamentally constrained by the limited capacity of their encoders, leading to degraded performance or excessive memory consumption as the number of input views increases. In this work, we analyze feed-forward 3DGS frameworks through the lens of the Information Bottleneck principle and introduce ZPressor, a lightweight architecture-agnostic module that enables efficient compression of multi-view inputs into a compact latent state $Z$ that retains essential scene information while discarding redundancy. Concretely, ZPressor enables existing feed-forward 3DGS models to scale to over 100 input views at 480P resolution on an 80GB GPU, by partitioning the views into anchor and support sets and using cross attention to compress the information from the support views into anchor views, forming the compressed latent state $Z$. We show that integrating ZPressor into several state-of-the-art feed-forward 3DGS models consistently improves performance under moderate input views and enhances robustness under dense view settings on two large-scale benchmarks DL3DV-10K and RealEstate10K. The video results, code and trained models are available on our project page: https://lhmd.top/zpressor.

Summary

ZPressor: Enhancing Scalability in Feed-Forward 3D Gaussian Splatting

The paper "ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS" introduces a novel approach to enhance the scalability of feed-forward 3D Gaussian Splatting (3DGS) models for novel view synthesis (NVS). The primary challenge addressed is the scalability issue associated with these models when dealing with a high number of input views, which leads to either degraded performance or excessive memory consumption due to the limited capacity of their encoders.

Key Contributions and Methodology

Information Bottleneck Analysis: The authors employ the Information Bottleneck (IB) principle to analyze feed-forward 3DGS frameworks. This principle helps in understanding how to compress multi-view inputs into a compact latent space while preserving essential scene information and discarding redundant data. The paper hypothesizes that this latent representation can improve the scalability of 3DGS models.
ZPressor Module: ZPressor is introduced as a lightweight, architecture-agnostic module designed to efficiently compress multi-view inputs. The approach involves partitioning input views into anchor and support sets, utilizing cross-attention to compress information from support views into anchor views, resulting in a compressed latent state. This enables models to handle over 100 input views at high resolutions on an 80GB GPU.
Integration and Benchmarking: The ZPressor module is integrated into several state-of-the-art feed-forward 3DGS models, namely DepthSplat, MVSplat, and pixelSplat, and evaluated on two large-scale benchmarks - DL3DV-10K and RealEstate10K. The module consistently improved performance in moderate input settings and enhanced robustness in dense view settings.

Numerical Results

The paper reports significant improvements in PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index Measure), and LPIPS (Learned Perceptual Image Patch Similarity) across various view settings. For example, DepthSplat's performance saw a considerable increase in PSNR by up to 4.65 under 36 input views when augmented with ZPressor. Additionally, models equipped with ZPressor exhibited reduced memory usage and inference time, a testament to the efficiency gains from the compression strategy.

Implications

From a practical perspective, ZPressor's ability to enhance scalability and efficiency of 3DGS models paves the way for deploying NVS technology in real-world applications such as augmented reality (AR) and virtual reality (VR). Theoretically, the integration of IB principles into model design opens new avenues for research in efficient representation learning, potentially influencing future developments in AI across domains where large-scale data processing and model scalability are critical.

Future Directions

Although ZPressor offers substantial scalability improvements, its efficacy might be limited when dealing with extremely dense input scenarios due to the computational challenges associated with large numbers of compressed Gaussian primitives. Future research could explore combined strategies incorporating Gaussian merging or memory-efficient rendering techniques to further push the boundaries of dense input view handling.

In conclusion, the paper provides a robust framework in terms of both theoretical foundation and practical application, contributing significantly to the field of novel view synthesis by enhancing the feed-forward 3DGS model architecture with ZPressor. This advancement holds promise for more effective and scalable scene reconstruction processes in complex environments.