3D-PU Dataset: Benchmarking Point Cloud Upsampling

Updated 22 July 2025

3D-PU datasets are benchmark collections of paired low-resolution and high-resolution point clouds designed to validate upsampling techniques.
They are constructed from 3D surface meshes using methods like Poisson disk sampling to generate uniform dense ground truths alongside sparse inputs.
They enable standardized benchmarking by applying metrics such as Chamfer and Hausdorff distances, driving improvements in geometric deep learning.

The 3D-PU Dataset refers to a class of benchmark datasets used for evaluating point cloud upsampling techniques, particularly those aimed at reconstructing denser, more uniform point sets from sparse or non-uniform inputs. While there is no single canonical dataset named "3D-PU" in the reviewed literature, the term is commonly used informally to denote datasets with these characteristics, especially in the context of methods such as PU-GAN and its successors, which have set de facto standards in this area. Datasets of this type play a central role in training, validating, and benchmarking contemporary deep-learning-based upsampling models in 3D vision.

1. Definition and Scope

A "3D-PU dataset" (Editor’s term) is characterized by collections of 3D point clouds, typically constructed as pairs of low-resolution (sparse or downsampled) and ground-truth high-resolution (dense and uniform) samples. The purpose is to facilitate the supervised learning and objective benchmarking of point cloud upsampling algorithms, which transform sparse 3D scans into denser and more uniformly distributed representations.

Such datasets are primarily derived from 3D surface meshes, from which uniform point samples are generated using spatial sampling methods such as Poisson disk sampling. Typically, these datasets include synthetic patches cropped from canonical 3D models (e.g., models of common objects or scenes), though recent datasets also incorporate real-scanned data (e.g., LiDAR scans) to address practical noise and non-uniformity inherent in real-world acquisition.

2. Construction Methodologies

Datasets typically associated with the 3D-PU category are constructed via a multi-stage process:

Mesh Selection: A set of 3D surface meshes is curated to provide a diverse coverage of object shapes and geometric features.
Uniform Sampling: For each mesh, a dense and uniform set of surface points is generated, often via Poisson disk sampling, to serve as the ground truth.
Patch Extraction: Local patches are cropped from 3D shapes (e.g., via random sampling or patch centers) to encourage models to generalize beyond global context and focus on local surface details.
Non-Uniform/Noisy Input Generation: Sparse or non-uniform versions of each patch are created by randomly subsampling or introducing perturbations to the ground-truth points, simulating real-world conditions.
Input–Output Pairing: Each data sample is represented as a pair: a low-resolution input patch and its corresponding high-resolution, uniform ground truth.

For instance, in the PU-GAN framework, the dataset comprises 147 3D meshes, with 120 used for training and 27 for evaluation. Training data are generated by extracting 24,000 input–output patch pairs using Poisson disk sampling for uniformity, with additional random downsampling to synthesize the corresponding low-density input sets (Li et al., 2019).

3. Evaluation Metrics

Standardized evaluation protocols for 3D-PU datasets employ a collection of geometric and statistical metrics to assess the fidelity, uniformity, and surface proximity of upsampled point clouds:

Uniform Loss (𝓛₍uni₎): Quantifies point distribution uniformity, incorporating both local point density balance (via chi-squared imbalances) and neighbor separation consistency (relative to an expected uniform pattern).
Chamfer Distance (CD): Measures the average nearest-neighbor distance between points in the predicted and ground-truth sets, serving as a symmetric surface similarity index.
Hausdorff Distance (HD): Reports the maximum deviation across nearest-neighbor correspondences, indicating worst-case geometric error.
Point-to-Surface (P2F) Distance: Calculates the distance of generated points to the underlying mesh surface, providing a direct assessment of geometric fidelity and surface adherence.
Earth Mover’s Distance (EMD): Used as a reconstruction loss during training, EMD captures bijective set-to-set correspondence error in 3D point locations.

Quantitative baselines are established by evaluating these metrics on consistent test splits of the dataset and comparing the outputs of different upsampling algorithms (Li et al., 2019, Qian et al., 2019, Lee et al., 2022).

4. Benchmarking and Impact

The widespread availability and usage of datasets in the 3D-PU class have enabled rigorous comparative benchmarking of upsampling methods such as PU-GAN, PU-GCN, and PU-MFA. Benchmark results reported in the literature demonstrate substantial improvements in output point cloud quality over classical and early learning-based methods:

Method	CD (×10⁻³)	HD (×10⁻³)	P2F (×10⁻³)	Uniform Loss (↓)
PU-GAN	0.300	2.097	2.972	lower
PU-GCN	~0.258	1.885	2.721	lower
PU-MFA	0.2326	1.094	2.545	lower

Improvements in these metrics translate to better preservation of fine-grained features, reduced noise, and superior surface reconstruction—validated both quantitatively and qualitatively. Furthermore, the datasets enable ablation studies to isolate the effects of architectural components (e.g., self-attention, multi-scale features) in network designs.

5. Extensions and Real-World Applicability

Recent works have extended the utility of 3D-PU-style datasets to include real-world LiDAR data (e.g., the KITTI dataset), enhancing the relevance and robustness of benchmarking. High-quality synthetic sets, such as those in PU-GAN, provide controlled noise-free baselines, while real-scanned sets introduce practical challenges such as occlusions, sensor noise, and variable point density.

While methods such as PU-MFA have been evaluated primarily on PU-GAN and KITTI datasets, their methodological advances (U-Net-style multi-scale features, adaptive attention) have demonstrated generalization to broader 3D-PU-style data, indicating applicability across various input domains (Lee et al., 2022). This suggests that robust upsampling methods can be developed and validated using 3D-PU datasets for downstream tasks such as mesh reconstruction, object recognition, and 3D scene understanding.

Although 3D-PU datasets are focused on upsampling and geometric fidelity, related initiatives such as UniG3D offer a multi-modal perspective by integrating mesh, image, point cloud, and text modalities (Sun et al., 2023). While UniG3D is not explicitly constructed for upsampling, its point cloud component—derived from dense 2D renderings and voxel-based uniform sampling—shares underlying objectives of surface fidelity and distribution uniformity, thus providing a broader resource for generative and reconstruction tasks.

Unlike spatially grounded datasets such as SURPRISE3D, which target language-guided segmentation and spatial reasoning (Huang et al., 10 Jul 2025), the principal focus of 3D-PU datasets lies in geometric detail, uniformity, and localized surface patch accuracy, rather than multi-modal or semantic reasoning.

7. Significance for Research and Practice

3D-PU datasets have played a pivotal role in advancing both methodological innovation and standardized benchmarking in point cloud upsampling. By providing high-quality, paired examples of sparse and dense point cloud patches, they have enabled the development of specialized neural network architectures that utilize adversarial training, graph convolutions, and attention mechanisms to achieve state-of-the-art upsampling performance. These datasets constitute a cornerstone of empirical validation for geometric deep learning in 3D vision, bridging the gap between synthetic data generation and real-world application constraints, and facilitating reproducibility across competing methods in the field.