Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mip-NeRF360 Dataset

Updated 10 June 2026
  • Mip-NeRF 360 dataset is a curated collection of nine unbounded 360° scenes featuring both indoor and outdoor environments with complex, multiscale geometry.
  • It provides rigorous test cases with fixed photometric conditions, comprehensive calibration, and precise pose estimation to support novel view synthesis methods.
  • The dataset employs advanced undistortion, scene contraction, and disparity-based ray sampling to evaluate anti-aliased, unbounded neural radiance field algorithms.

The Mip-NeRF 360 dataset is a curated collection of nine real-world, unbounded 360° scenes developed to support and evaluate anti-aliased neural radiance field (NeRF) methods capable of synthesizing novel views in large, cluttered environments with both near and distant geometry. Designed alongside the Mip-NeRF 360 framework, this dataset uniquely addresses the shortcomings of bounded NeRF benchmarks, providing challenging multimodal scenes to test the efficacy of unbounded neural scene representations (Barron et al., 2021).

1. Dataset Composition and Characteristics

The Mip-NeRF 360 dataset consists solely of real-world image sequences, explicitly excluding synthetic data. The nine captured scenes are divided into five outdoor and four indoor environments:

  • Outdoor Scenes: “bicycle,” “flowers,” “garden,” “stump,” and “treehill.” Each features a prominent central object region (e.g., a bicycle or tree stump) surrounded by complex, detailed backgrounds such as foliage, hills, or sky. These scenes exhibit fine, high-frequency geometry both in the near and far field, including features such as leaf meshes and rock textures.
  • Indoor Scenes: “room,” “counter,” “kitchen,” and “bonsai.” These present tabletop or countertop arrangements with clutter (tools, plants, small appliances) in front of surfaces that span the scene edges. High-frequency geometry, such as wire frames and leaf veins, is present under diffuse lighting, with both near and far surfaces included.

All scenes were constructed to present highly challenging conditions—high-dynamic-range geometry spanning multiple spatial scales, thin structures, and the simultaneous presence of both proximate and remote objects. All images were acquired under fixed lighting to minimize photometric variance (Barron et al., 2021).

2. Data Acquisition Protocol

Image sequences were captured with consistent photometric and intrinsic parameters:

  • Cameras:
    • Outdoor scenes: Captured with a Sony NEX-C3 mirrorless camera and 18–55 mm zoom lens set to 18 mm (35 mm-equiv. ≈27 mm); native image resolution ~4256×2848.
    • Indoor scenes: Acquired with a Fujifilm X100V, fixed 22 mm lens (≈35 mm-equiv.) at 6240×4160 native resolution.
  • All camera settings—exposure, ISO, aperture, white balance, and focus—were locked at the first reference frame for photometric consistency. Outdoor captures were performed under uniformly overcast skies to avoid harsh shadows, while large, diffuse window illumination dominated indoor environments.
  • The camera was moved in a circular trajectory around a fixed object or central region, approximately level, covering 360° horizontally with minor elevation change (±10°), thus providing full azimuthal coverage and modest vertical parallax.
  • Each scene comprises 100–330 images, depending on the complexity and extent of the environment.
  • Final working resolutions, following undistortion and downsampling with ImageMagick, are uniformly 1280×800 (~1.0 MP) or 1536×1024 (~1.6 MP), balancing detail with training feasibility (Barron et al., 2021).

3. Data Preprocessing and Calibration

  • Lens Undistortion and Calibration: Structure-from-motion (SfM) was performed using COLMAP, employing the OpenCV “plumb-bob” radial distortion model to estimate focal length, principal point, and two radial coefficients; a shared model was used for all images in each scene.
  • Image Undistortion: COLMAP was used to export undistorted, cropped images, which were subsequently downsampled to final resolution.
  • Pose Estimation: Camera intrinsics and extrinsics (R, t) were estimated per image via COLMAP’s SfM pipeline.
  • Coordinate Normalization: Camera centers were recentered to the centroid, then principal component analysis (PCA) determined the 'vertical' world Y axis (smallest principal component). Camera coordinates were uniformly scaled to fit within [−1,1]³. This normalization ensures the scene parameterization encompasses all scene content, with Gaussians inside a bounding sphere of radius 2 and the inner unit sphere remaining unaffected by contraction (Barron et al., 2021).

4. Dataset Splits and Usage Protocol

  • Train/Test Split: For each scene, 1 in every 8 images (≈12.5%) was uniformly sampled as a test set, ensuring even coverage across the 360° sweep. The remaining ≈87.5% images constituted the training set.
  • No validation set was defined; hyperparameters were fixed on one scene, then held constant for all nine.
  • The evaluation protocol employs a true novel-view approach: all test images are considered unseen views. No ground-truth geometry is supplied; only RGB images and estimated camera poses are provided (Barron et al., 2021).

5. Scene Parameterization and Ray Sampling

Mip-NeRF 360 introduces a series of domain-specific parameterizations for unbounded scenes—applied here to all dataset content:

  • Scene Contraction: A smooth, non-linear mapping is used for 3D point contraction:

contract(x)={xx1 (21/x)(x/x)x>1\text{contract}(\mathbf x) = \begin{cases} \mathbf x & \|\mathbf x\| \le 1 \ \bigl(2 - 1/\|\mathbf x\|\bigr)\, (\mathbf x/\|\mathbf x\|) & \|\mathbf x\| > 1 \end{cases}

For a Gaussian with mean μμ and covariance ΣΣ, the contraction is approximated using a first-order Taylor expansion:

(μ,Σ)    (f(μ),Jf(μ)ΣJf(μ))(μ,Σ)\;\mapsto\;\left(f(μ),\,J_f(μ)\,Σ\,J_f(μ)^\top\right)

  • Disparity-Parameterized Ray Sampling: Rays are sampled uniformly in normalized disparity coordinate s[0,1]s∈[0,1] by defining g(t)=1/tg(t)=1/t:

s=g(t)g(tn)g(tf)g(tn)s = \frac{g(t) - g(t_n)}{g(t_f) - g(t_n)}

with inverse:

t=g1((1s)g(tn)+sg(tf))t = g^{-1}((1-s)\,g(t_n) + s\,g(t_f))

Thus, samples are symmetrically distributed in disparity (inverse depth).

  • Distortion-Based Regularizer: To penalize volumetric artifacts (floaters) and background collapse, the following regularizer is applied at ray intervals:

L(s,w)=i,jwiwjsi+si+12sj+sj+12+13iwi2(si+1si)L(\mathbf s, \mathbf w) = \sum_{i,j} w_i\, w_j\, \left| \frac{s_i + s_{i+1}}{2} - \frac{s_j + s_{j+1}}{2} \right| + \frac{1}{3} \sum_i w_i^2\,(s_{i+1} - s_i)

For NN intervals per ray (typically 32–64), this cost is computed in μμ0 time but remains tractable (Barron et al., 2021).

6. Dataset Format, Distribution, and Statistics

  • Structure: Each scene’s directory includes:
    • images/ — Undistorted, downsampled images in PNG or JPG format (either 1280×800 or 1536×1024).
    • poses.txt — Per-image COLMAP poses (R, t) and intrinsics.
    • sparse/ — Optional COLMAP sparse point cloud.
    • train.txt, test.txt — Lists of images for training and test splits.
    • Optionally, COLMAP text-format metadata (images.txt, cameras.txt) and a compact 9×17 poses.txt (fx, fy, cx, cy, k1, k2, r11…r33, tx, ty, tz).
  • Distribution: All nine scenes, along with preprocessing, training, and evaluation code, are publicly provided in the project repository. For convenience, each scene is also distributed as a single NPZ archive bundling images, poses, bounding data, and split lists (Barron et al., 2021).

Summary statistics per scene appear below:

Scene Total Images Resolution Bounding Radius Training Frames Test Frames
bicycle 222 1536×1024 ≈1.00 194 28
flowers 147 1280×800 ≈0.95 128 19
garden 330 1536×1024 ≈1.05 288 42
stump 303 1536×1024 ≈1.00 264 39
treehill 115 1280×800 ≈0.90 100 15
room 184 1280×800 ≈0.85 160 24
counter 131 1280×800 ≈0.88 114 17
kitchen 145 1280×800 ≈0.92 127 18
bonsai 121 1280×800 ≈0.82 106 15

7. Significance and Applications

The Mip-NeRF 360 dataset provides a benchmark for advancing unbounded, anti-aliased neural volume rendering, specifically targeting scenarios with complex, high-frequency, multiscale geometry, and both near-field and far-field elements in real-world conditions. Its design—spanning challenging lighting, clutter, and scale—enables rigorous evaluation of NeRF-based and alternative novel-view synthesis algorithms. The dataset, in conjunction with the non-linear parameterization, disparity-based ray sampling, and regularization strategies detailed above, offers a reproducible standard for replicating and extending the results demonstrated in the Mip-NeRF 360 framework (Barron et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mip-NeRF360 Dataset.