Mip-NeRF360 Dataset
- Mip-NeRF 360 dataset is a curated collection of nine unbounded 360° scenes featuring both indoor and outdoor environments with complex, multiscale geometry.
- It provides rigorous test cases with fixed photometric conditions, comprehensive calibration, and precise pose estimation to support novel view synthesis methods.
- The dataset employs advanced undistortion, scene contraction, and disparity-based ray sampling to evaluate anti-aliased, unbounded neural radiance field algorithms.
The Mip-NeRF 360 dataset is a curated collection of nine real-world, unbounded 360° scenes developed to support and evaluate anti-aliased neural radiance field (NeRF) methods capable of synthesizing novel views in large, cluttered environments with both near and distant geometry. Designed alongside the Mip-NeRF 360 framework, this dataset uniquely addresses the shortcomings of bounded NeRF benchmarks, providing challenging multimodal scenes to test the efficacy of unbounded neural scene representations (Barron et al., 2021).
1. Dataset Composition and Characteristics
The Mip-NeRF 360 dataset consists solely of real-world image sequences, explicitly excluding synthetic data. The nine captured scenes are divided into five outdoor and four indoor environments:
- Outdoor Scenes: “bicycle,” “flowers,” “garden,” “stump,” and “treehill.” Each features a prominent central object region (e.g., a bicycle or tree stump) surrounded by complex, detailed backgrounds such as foliage, hills, or sky. These scenes exhibit fine, high-frequency geometry both in the near and far field, including features such as leaf meshes and rock textures.
- Indoor Scenes: “room,” “counter,” “kitchen,” and “bonsai.” These present tabletop or countertop arrangements with clutter (tools, plants, small appliances) in front of surfaces that span the scene edges. High-frequency geometry, such as wire frames and leaf veins, is present under diffuse lighting, with both near and far surfaces included.
All scenes were constructed to present highly challenging conditions—high-dynamic-range geometry spanning multiple spatial scales, thin structures, and the simultaneous presence of both proximate and remote objects. All images were acquired under fixed lighting to minimize photometric variance (Barron et al., 2021).
2. Data Acquisition Protocol
Image sequences were captured with consistent photometric and intrinsic parameters:
- Cameras:
- Outdoor scenes: Captured with a Sony NEX-C3 mirrorless camera and 18–55 mm zoom lens set to 18 mm (35 mm-equiv. ≈27 mm); native image resolution ~4256×2848.
- Indoor scenes: Acquired with a Fujifilm X100V, fixed 22 mm lens (≈35 mm-equiv.) at 6240×4160 native resolution.
- All camera settings—exposure, ISO, aperture, white balance, and focus—were locked at the first reference frame for photometric consistency. Outdoor captures were performed under uniformly overcast skies to avoid harsh shadows, while large, diffuse window illumination dominated indoor environments.
- The camera was moved in a circular trajectory around a fixed object or central region, approximately level, covering 360° horizontally with minor elevation change (±10°), thus providing full azimuthal coverage and modest vertical parallax.
- Each scene comprises 100–330 images, depending on the complexity and extent of the environment.
- Final working resolutions, following undistortion and downsampling with ImageMagick, are uniformly 1280×800 (~1.0 MP) or 1536×1024 (~1.6 MP), balancing detail with training feasibility (Barron et al., 2021).
3. Data Preprocessing and Calibration
- Lens Undistortion and Calibration: Structure-from-motion (SfM) was performed using COLMAP, employing the OpenCV “plumb-bob” radial distortion model to estimate focal length, principal point, and two radial coefficients; a shared model was used for all images in each scene.
- Image Undistortion: COLMAP was used to export undistorted, cropped images, which were subsequently downsampled to final resolution.
- Pose Estimation: Camera intrinsics and extrinsics (R, t) were estimated per image via COLMAP’s SfM pipeline.
- Coordinate Normalization: Camera centers were recentered to the centroid, then principal component analysis (PCA) determined the 'vertical' world Y axis (smallest principal component). Camera coordinates were uniformly scaled to fit within [−1,1]³. This normalization ensures the scene parameterization encompasses all scene content, with Gaussians inside a bounding sphere of radius 2 and the inner unit sphere remaining unaffected by contraction (Barron et al., 2021).
4. Dataset Splits and Usage Protocol
- Train/Test Split: For each scene, 1 in every 8 images (≈12.5%) was uniformly sampled as a test set, ensuring even coverage across the 360° sweep. The remaining ≈87.5% images constituted the training set.
- No validation set was defined; hyperparameters were fixed on one scene, then held constant for all nine.
- The evaluation protocol employs a true novel-view approach: all test images are considered unseen views. No ground-truth geometry is supplied; only RGB images and estimated camera poses are provided (Barron et al., 2021).
5. Scene Parameterization and Ray Sampling
Mip-NeRF 360 introduces a series of domain-specific parameterizations for unbounded scenes—applied here to all dataset content:
- Scene Contraction: A smooth, non-linear mapping is used for 3D point contraction:
For a Gaussian with mean and covariance , the contraction is approximated using a first-order Taylor expansion:
- Disparity-Parameterized Ray Sampling: Rays are sampled uniformly in normalized disparity coordinate by defining :
with inverse:
Thus, samples are symmetrically distributed in disparity (inverse depth).
- Distortion-Based Regularizer: To penalize volumetric artifacts (floaters) and background collapse, the following regularizer is applied at ray intervals:
For intervals per ray (typically 32–64), this cost is computed in 0 time but remains tractable (Barron et al., 2021).
6. Dataset Format, Distribution, and Statistics
- Structure: Each scene’s directory includes:
images/— Undistorted, downsampled images in PNG or JPG format (either 1280×800 or 1536×1024).poses.txt— Per-image COLMAP poses (R, t) and intrinsics.sparse/— Optional COLMAP sparse point cloud.train.txt,test.txt— Lists of images for training and test splits.- Optionally, COLMAP text-format metadata (
images.txt,cameras.txt) and a compact 9×17poses.txt(fx, fy, cx, cy, k1, k2, r11…r33, tx, ty, tz).
- Distribution: All nine scenes, along with preprocessing, training, and evaluation code, are publicly provided in the project repository. For convenience, each scene is also distributed as a single NPZ archive bundling images, poses, bounding data, and split lists (Barron et al., 2021).
Summary statistics per scene appear below:
| Scene | Total Images | Resolution | Bounding Radius | Training Frames | Test Frames |
|---|---|---|---|---|---|
| bicycle | 222 | 1536×1024 | ≈1.00 | 194 | 28 |
| flowers | 147 | 1280×800 | ≈0.95 | 128 | 19 |
| garden | 330 | 1536×1024 | ≈1.05 | 288 | 42 |
| stump | 303 | 1536×1024 | ≈1.00 | 264 | 39 |
| treehill | 115 | 1280×800 | ≈0.90 | 100 | 15 |
| room | 184 | 1280×800 | ≈0.85 | 160 | 24 |
| counter | 131 | 1280×800 | ≈0.88 | 114 | 17 |
| kitchen | 145 | 1280×800 | ≈0.92 | 127 | 18 |
| bonsai | 121 | 1280×800 | ≈0.82 | 106 | 15 |
7. Significance and Applications
The Mip-NeRF 360 dataset provides a benchmark for advancing unbounded, anti-aliased neural volume rendering, specifically targeting scenarios with complex, high-frequency, multiscale geometry, and both near-field and far-field elements in real-world conditions. Its design—spanning challenging lighting, clutter, and scale—enables rigorous evaluation of NeRF-based and alternative novel-view synthesis algorithms. The dataset, in conjunction with the non-linear parameterization, disparity-based ray sampling, and regularization strategies detailed above, offers a reproducible standard for replicating and extending the results demonstrated in the Mip-NeRF 360 framework (Barron et al., 2021).