ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining (2408.10906v1)

Published 20 Aug 2024 in cs.CV

Abstract: 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU. We utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce \textbf{\textit{Gaussian-MAE}}, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces ShapeSplat, a large-scale 3D dataset built from over 65K CAD objects using Gaussian splatting for rapid, high-fidelity scene representation.
It proposes Gaussian-MAE, a masked autoencoder that reconstructs Gaussian parameters to enhance self-supervised learning and improve classification and segmentation tasks.
Extensive experiments demonstrate up to 95.37% accuracy on ModelNet10 and improved segmentation mIoU on ShapeNet-Part, validating the method's superior performance.

A Formal Analysis of ShapeSplat and Gaussian-MAE

The paper introduces ShapeSplat, a large-scale dataset generated to aid the paper of 3D representations through 3D Gaussian Splatting (3DGS). The dataset and associated methods are positioned to facilitate advancements in representation learning, targeting tasks such as classification and segmentation. 3DGS represents 3D scenes using Gaussian primitives, which offers a variety of benefits, including rapid rendering speeds, high fidelity, differentiability, and extensive editability. This paper significantly advances the application of 3DGS by producing ShapeSplat and introducing the Gaussian-MAE model, delivering a comprehensive examination of their utility and potential in self-supervised learning.

Dataset Generation

ShapeSplat is derived from the ShapeNet and ModelNet datasets, boasting $65K$ objects from $87$ categories. The dataset construction required the compute equivalent of 2 GPU years on a TITAN XP GPU, highlighting the intensive computational effort invested. The dataset is created by rendering 2D images of CAD models from uniform camera angles, followed by the training of Gaussian splat parameters through differentiable rasterization and Gaussian pruning techniques.

The statistical metrics, including PSNR, JSD, and MMD, reveal substantial distributional differences between the optimized Gaussian splat centroids and the initial point cloud data. The average Gaussian number exceeds $20K$, significantly higher than in traditional point cloud methods. The dataset construction approach ensures high-quality renderings while minimizing computational redundancy.

Gaussian-MAE Model

Gaussian-MAE is introduced as a masked autoencoder model aimed at learning representations directly from Gaussian parameters. This method employs two primary features: the embedding feature $E$ and the grouping feature $G$ . These features allow Gaussian parameters to be processed in tailored representation spaces that capture the intrinsic properties of the Gaussian splatting technique.

Gaussian-MAE effectively reconstructs Gaussian parameters during unsupervised pretraining and significantly improves downstream task performance through finetuning. The paper meticulously analyzes the impact of different Gaussian attributes such as opacity, scale, rotation, and spherical harmonics on representation learning, revealing that these parameters contribute differently to classification and segmentation tasks.

Numerical Results and Experimental Analysis

The extensive experimental results indicate substantial improvements in various benchmarks:

On ModelNet10, pretraining with Gaussian-MAE and leveraging all Gaussian parameters results in a classification accuracy of up to $95.37\%$ , outperforming all other methods.
Notably, on ScanObjectNN, Gaussian-MAE pretrained on Gaussian centroids achieves competitive accuracy compared to the point cloud pretraining method Point-MAE.
In segmentation tasks on the ShapeNet-Part dataset, Gaussian-MAE correlates Gaussian centroids with higher class mIoU, demonstrating the effectiveness of pretraining methods on Gaussian parameters.

The inclusion of Gaussian feature grouping and splats pooling layers further enhances performance, allowing for effective grouping and embedding of similar Gaussians. These additions lead to meaningful improvements in pretraining and finetuning performance, particularly for denser splat inputs that capture finer color and geometric details.

Implications and Future Directions

The research presented opens up several compelling avenues for future work:

Enhanced Gaussian Representation: Improving the representation of Gaussian parameters without downsampling could significantly enhance reconstruction quality and subsequent downstream task performance.
Integration with LLMs: Exploring the integration of Gaussian splats with LLMs can unlock new possibilities in scene understanding and open-world semantic segmentation.
Cross-Modality Learning: Leveraging foundations from 2D models to enrich 3D Gaussian representation learning would be a valuable direction for the community.
Real-world Applications: Further extending the dataset to include real-world scans and more complex scenes would facilitate the application of these methods in more diverse and practical scenarios.

Conclusion

The ShapeSplat dataset and Gaussian-MAE bring significant advancements to the field of 3D representation learning. The methodical construction of ShapeSplat and the innovative introduction of Gaussian-MAE collectively set a new benchmark for self-supervised learning approaches, paving the way for further developments in 3D vision tasks. By making their dataset and model publicly available, the authors have equipped the research community with the tools to explore and expand the vast potential of Gaussian splatting for various applications in computer vision.

PDF Markdown

Related Papers

Tweets

https://twitter.com/janusch_patas/status/1826085582738542776