Papers
Topics
Authors
Recent
Search
2000 character limit reached

MVPNet Dataset: Real-World 3D Object Reconstructions

Updated 2 March 2026
  • MVPNet is a real-world 3D point cloud dataset featuring dense, colored reconstructions and rich geometric complexity from everyday objects.
  • It employs a multi-stage reconstruction pipeline—combining SfM, MVS, back-projection, and point cloud fusion—to deliver high-fidelity geometric data.
  • MVPNet supports robust 3D object classification and transfer learning, with benchmark results demonstrating significant accuracy improvements over traditional methods.

MVPNet is a large-scale, real-world 3D point cloud dataset designed to advance object-centric 3D understanding within the computer vision community. Derived from MVImgNet—an extensive multi-view image dataset—MVPNet provides dense, colored 3D reconstructions for a wide variety of everyday objects, offering rich class diversity and real-scanned geometric complexity. It is specifically constructed to address limitations of synthetic benchmarks by supplying high-fidelity object point clouds captured under practical, unconstrained conditions (Yu et al., 2023).

1. Construction Pipeline

The MVPNet dataset is generated via a systematic multi-stage reconstruction pipeline based on the MVImgNet video corpus. Each input video comprises a sequence of RGB frames accompanied by camera intrinsics (KK), extrinsics (RR, tt), and foreground segmentation masks (MM). The reconstruction procedure involves:

  1. Sparse Structure-from-Motion (SfM): COLMAP SfM is applied to select frames, yielding estimated camera poses {Ri,ti}\{R_i, t_i\} and intrinsics KiK_i for every view ii.
  2. Dense Multi-View Stereo (MVS): COLMAP’s PatchMatch MVS algorithm produces dense per-pixel depth maps Di(u,v)D_i(u,v) and surface normals Ni(u,v)N_i(u,v) for each observed view.
  3. Segregation and Back-projection: Binary masks Mi(u,v)M_i(u,v) restrict reconstruction to object pixels. Each valid image pixel is back-projected into world coordinates as

Pi(u,v)=Ri[Ki1(u,v,1)Di(u,v)]RitiP_i(u,v) = R_i^\top [K_i^{-1}(u,v,1)^\top \cdot D_i(u,v)] - R_i^\top t_i

where (u,v)(u,v) is the 2D image coordinate.

  1. Point Cloud Fusion: Aggregate the back-projected points Pi(u,v)P_i(u,v) across all frames, with optional view-angle weighting wi(u,v)=max(0,Nivi)w_i(u,v) = \max(0,N_i \cdot v_i), where viv_i denotes the camera’s viewing direction.
  2. Manual Cleaning: Outlier pruning eliminates reconstructions with excessive noise or insufficient points; residual background is removed manually.
  3. Final Output: One dense, colored point cloud (including surface normals and RGB values) is generated for each source video.

This approach ensures high-fidelity geometric capture directly from actual image data, providing object-centric point clouds with realistic variation in appearance and geometry.

2. Dataset Statistics

MVPNet comprises $87,200$ object-centric point cloud samples across $150$ real-world human-centric object classes. Category distribution averages approximately $581$ point clouds per category (range: $100$–$1500$ per class).

The benchmark split is as follows:

  • Training: $64,000$ point clouds (80%\sim 80\%)
  • Testing: $16,000$ point clouds (20%\sim 20\%)
  • Users may optionally carve out a validation set from the training samples.

Each point cloud features hundreds of thousands of points, color attributes, and geometric normals, capturing the diversity inherent to unconstrained multi-view capture (Yu et al., 2023).

3. Annotation Format and Directory Structure

All MVPNet samples are distributed in the PLY format (ASCII or binary), each file representing a unique object instance. The data organization and file annotation are as follows:

  • Point Attributes:
    • x,y,zx, y, z: 3D coordinates (float32\text{float32})
    • r,g,br, g, b: Vertex colors (uint8\text{uint8}), directly inherited from source images
    • nx,ny,nzn_x, n_y, n_z: Surface normals (float32\text{float32}) estimated by MVS
  • Class label:
    • Each point cloud is assigned an integer class ID in [0,149][0,149], stored in the PLY header or accompanying metadata.
  • Directory Structure:

1
2
3
4
5
6
7
8
MVPNet/
  ├── train/
  │   ├── class_000/
  │   │   ├── obj0001.ply
  │   │   ├── obj0002.ply
  │   │   └── ...
  │   └── class_001/
  └── test/ (similarly structured)

This structured arrangement facilitates efficient parsing by machine learning libraries and supports scalable benchmarking.

4. Preprocessing and Normalization Procedures

To standardize MVPNet point clouds for algorithmic consumption, the following preprocessing sequence is recommended:

  1. Centering: Subtract the point centroid to align each object at the coordinate origin:

P=PPˉ,Pˉ=1Nj=1NPjP' = P - \bar{P}\,, \quad \bar{P} = \frac{1}{N}\sum_{j=1}^N P_j

  1. Scaling: Normalize all points to reside within the unit sphere:

s=maxjPj2,P=P/ss = \max_j \|P'_j\|_2,\quad P'' = P'/s

  1. Downsampling: Optionally sub-select or voxel-grid filter points to a fixed budget (e.g., $1,024$ or $2,048$ points per cloud).
  2. Data Augmentation: Random rotation, additive Gaussian noise (jitter, σ0.01\sigma \approx 0.01), and point dropout are suggested during training.

These steps ensure geometric invariance, mitigate overfitting, and permit compatibility with existing 3D deep learning pipelines (Yu et al., 2023).

5. Baseline Benchmarks and Protocols

MVPNet supports both in-dataset benchmarking and transfer learning evaluation:

A. 150-way Object Classification on MVPNet

  • Training/testing split: $64,000$/$16,000$
  • Metrics: Overall Accuracy (OA), Mean Class Accuracy (mAcc)
  • Representative results:
Method OA (%) mAcc (%)
PointNet 70.72 54.46
PointNet++ 79.15 58.24
DGCNN 86.49 63.98
PointMLP 88.89 73.64
CurveNet 88.88 75.37
GDANet 89.54 68.41
PAConv 83.35 59.13
PCT (Transformer) 91.49 75.41

B. Transfer to ScanObjectNN

Pretraining on MVPNet enables tangible gains on external real-world benchmarks. For instance, PointNet++ trained from scratch on the ScanObjectNN PB_T50_RS split yields 76.50%76.50\% OA and 73.42%73.42\% mAcc, while pretraining on MVPNet increases these metrics to 78.76%78.76\% OA and 76.54%76.54\% mAcc.

This substantiates MVPNet's utility for developing transferable 3D representations and real-scan robustness (Yu et al., 2023).

Applications:

  • Real-world 3D object classification and retrieval
  • Self-supervised pretraining for partial-scan completion or understanding
  • Single-view 3D reconstruction (learning geometric priors)
  • Robotics tasks such as grasping and pose estimation

Limitations:

  • Objects are captured with only 180180^\circ multi-view coverage per instance, resulting in incomplete back surfaces.
  • The taxonomy is human-centric (everyday objects), with under-representation of natural or biological categories.
  • Point density and noise levels vary, reflecting MVS artifacts and real-world imaging imperfections.
  • No scene-level or contextual data; all samples are strictly object-centric (Yu et al., 2023).

Recommended Practices:

  • Consistently apply centering and unit-sphere normalization.
  • Employ aggressive rotation and jitter augmentation for scratch training.
  • Use a small learning rate for pretrained layers in transfer learning.
  • Validate both on MVPNet's internal test split and on external real-scan datasets (e.g., ScanObjectNN) to assess generalization.
  • Fuse MVPNet pretraining with synthetic datasets such as ShapeNet and ModelNet to increase geometric coverage.

A plausible implication is that MVPNet, as a real-data benchmark, serves as an indispensable complement to synthetic CAD datasets—enabling the development and evaluation of robust, transferable 3D recognition models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MVPNet Dataset.