Synthetic Dynamic Multiview Dataset

Updated 18 December 2025

Synthetic Dynamic Multiview (SynDM) dataset is a synthetic resource featuring photorealistic simulation, dynamic object motion, and synchronized multiview capture for benchmarking advanced perception tasks.
It utilizes robust data generation pipelines from platforms like GTA V, CARLA, and Unity to create diverse, annotated data including RGB, masks, depth, and keypoints.
The dataset supports rigorous evaluation protocols using metrics such as PSNR, LPIPS, and FID to quantify performance in novel view synthesis, dynamic reconstruction, and multi-modal fusion.

The Synthetic Dynamic Multiview (SynDM) dataset designates a class of synthetic resources whose primary attributes are dynamic scene content and explicit multiview capture. SynDM datasets combine photorealistic simulation, controlled dynamic object trajectories, multi-camera rigs, and extensive annotation to support benchmarking and model development in tasks including novel view synthesis, dynamic reconstruction, multi-view segmentation, and cross-domain adaptation. Pioneering instances such as the SynDM collection of "Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos" (Jiang et al., 16 Dec 2025), the SEED4D generator and dataset (Kästingschäfer et al., 1 Dec 2024), and the SynPlay human dataset (Yim et al., 21 Aug 2024) define critical technical standards and evaluation protocols now central to research in 4D perception, NeRFs, and dynamic scene understanding.

1. Foundational Principles and Dataset Taxonomy

SynDM resources are distinguished by several defining properties:

Synthetic Content: Data generated in physically realistic simulation environments (e.g., GTA V, CARLA, Unity).
Dynamic Scenes: Foreground objects (humans, vehicles, animals) exhibit temporally coherent motion driven by simulation engines or motion capture (MoCap).
Multiview Capture: Synchronous acquisition from N>2 viewpoints per time frame, generally covering wide baseline rotations (azimuth, elevation) and spatial divergence.
Explicit Supervision: For each frame, side-view or exocentric ground-truth images (RGB, mask, depth) are provided—key for evaluation of novel-view rendering and multi-modal learning.

Typical SynDM datasets are either designed for reconstructing held-out views (novel view synthesis), for evaluating instance-level segmentation across dynamic scenes, or for supporting multi-modal scene understanding (depth, flow, semantic instance, LiDAR).

2. Data Generation Pipelines and Scene Composition

SynDM datasets utilize advanced simulation pipelines, often extending existing game engines for full multiview support and dynamic object scripting.

GTA V–based Pipeline (SynDM/ExpanDyNeRF) (Jiang et al., 16 Dec 2025):
- Modified GTAV-TeFS plugin enables arbitrary camera count per physical frame (swap latency: ~0.2 ms/view).
- 9 diverse scenes, spanning human, animal, and vehicle object classes.
- AI-driven actor trajectories (path-following, waypoint navigation), with realistic occlusion and urban/rural backgrounds.
CARLA-based Pipeline (SEED4D) (Kästingschäfer et al., 1 Dec 2024):
- Python/C++ orchestration for scenario definition via JSON/YAML configs; supports dynamic agent spawning, weather presets, and traffic modulation.
- Modular sensor suite: up to 7 ego-vehicle cameras, 100 exocentric views for static scenes, 10 exocentric for dynamic.
- Uniform camera distribution on half-spheres (Spherical Fibonacci lattice), explicit world-to-camera extrinsics and NeRFStudio compatibility.
Unity-based Pipeline (SynPlay) (Yim et al., 21 Aug 2024):
- Rule-guided scenario graphs designed around six Korean folk games, imposing coarse rules on agent behavior, diversified by sampling 257 unique MoCap-augmented animation clips.
- Seven-view rig per scenario: UAVs (rotary airborne at multiple altitudes), static CCTVs, and mobile ground vehicles.
- Realistic motion blending and animation layering amplify the combinatorial diversity of motion states per frame.

3. Multiview Capture Architectures and Annotation Formats

Capture architectures in SynDM datasets are built to maximize spatial and angular variability for each dynamic event:

Dataset	Camera Views/Frame	View Geometry	Dynamic Objects	Key Modes
SynDM	22	Sphere (azimuth: –45°…+45°, elevation: –45°,0°, +45°)	Human, Animal, Vehicle	RGB, mask, depth
SEED4D-3D	107	7 ego + 100 exo cameras, half-sphere distribution	Vehicle/road agents	RGB, depth, flow, LiDAR
SynPlay	7	3 UAV (airborne), 3 CCTV (static), 1 UGV (mobile)	Human	RGB, mask, keypoints

Annotations universally contain:

RGB images (PNG, sRGB)
Per-pixel instance segmentation masks
Camera intrinsics and extrinsics matrices (OpenGL or Blender format)
Depth maps (optional per dataset)
2D keypoints (COCO format in SynPlay), semantic/instance masks, point clouds (SEED4D), and metadata (parsed in JSON: scene, agent IDs, sensor info).

4. Data Organization, Benchmark Protocols, and Evaluation Metrics

SynDM datasets have standardized splits and evaluation methods:

Directory Structure:
- Scene-major subfolders, with nested frames, camera views, RGB/mask/depth files, and comprehensive metadata.json.
- Dedicated loader scripts (e.g., syn_dm_loader.py for SynDM) provide DataLoader integration for mainstream frameworks.
Benchmark Protocols:
- Train on monocular primary sequences (reference view), test on held-out side views for novel view synthesis.
- For dynamic multi-agent scenes (SynPlay), evaluation on multiview human detection, segmentation, and pose estimation.
- SEED4D defines static and dynamic (temporal) splits to support both single-frame and trajectory-based tasks.
Evaluation Metrics:
- PSNR: $\mathrm{PSNR}=10 \log_{10} \left( \frac{MAX_I^2}{\mathrm{MSE}} \right )$
- LPIPS: perceptual similarity measure between rendered and ground-truth images
- FID: Fréchet inception distance for distribution-level image realism
- Mean IoU for instance/semantic segmentation
- Depth RMSE, SSIM, 3D-bbox IoU, and Chamfer Distance for geometry tasks

Table: SynDM Example Benchmark Scores (from (Jiang et al., 16 Dec 2025)):

Category	Metric	SynDM (Ours)	Best Prior
Human	PSNR	21.71	21.27
	LPIPS	0.182	0.305
Animal	FID	155.8	262.0
Vehicle	PSNR	17.21	17.07

5. Use Cases, Integration Strategies, and Limitations

SynDM datasets are integral for:

Evaluating dynamic NeRF systems under extreme viewpoint shifts (ExpanDyNeRF (Jiang et al., 16 Dec 2025)).
Cross-domain and few-shot learning, especially for human or vehicle segmentation and detection (SynPlay (Yim et al., 21 Aug 2024)).
Multi-modal fusion research (camera + LiDAR, depth, segmentation in SEED4D (Kästingschäfer et al., 1 Dec 2024)).
Benchmarking monocular 3D/4D scene reconstruction and interpolation/extrapolation tasks.

Integration protocols typically involve:

Pretrain–Finetune cycles: warm up detection/segmentation models on full synthetic train sets, then finetune on real or held-out data.
Progressive transformation learning, leveraging ascending synthetic realism for regularization or label scarcity mitigation.
Viewpoint-matching subsets: selection of subset cameras for specific downstream tasks or further regularization.

Limitations highlighted include:

Photorealism gaps (SEED4D: CARLA imagery is not fully real).
Physics simplifications (steering, dynamic interactions).
Domain specificity (most datasets are urban/driving or agent-centric; limited indoor, static, or non-agent domains).

6. Comparative Context and Implications

SynDM datasets address significant gaps in multi-view dynamic scene supervision absent in prior work. Existing dynamic NeRF datasets either lack explicit side-view ground-truth (NVIDIA), are limited to static backgrounds or rotational views (DyNeRF), or fail to provide unconstrained scene composition. By synchronizing moving primary cameras with dense viewpoint supervision, SynDM supports rigorous quantification of novel view synthesis, robustness to large-angle rotation, and data-driven simulation of real-world complexity in controlled synthetic settings.

A plausible implication is increased reliability of model generalization from synthetic to real domains as evidenced by SynPlay’s consistent outperformance in data-scarce regimes: six- to eight-fold gains in detection AP when combining SynDM sources with minimal real labels. SynDM architectures provide compatible formats and API scripts (PyTorch integration, NeRFStudio JSONs), facilitating rapid experiment prototyping and reproducible downstream benchmarking.

7. Future Directions and Research Opportunities

Future research in SynDM environments will likely expand in several directions:

Photorealistic rendering to bridge sim-to-real gaps, e.g., through style transfer or advanced graphics.
Expanded coverage of indoor, heterogeneous, and non-agent-centric scenes.
Long-term 4D dynamic forecasting and full-scene interpolation tasks as proposed in SEED4D for autonomous driving.
Sensor fusion tasks leveraging multi-modal ground-truth (LiDAR, optical flow, 3D bounding boxes).
More comprehensive annotation standards, camera configuration flexibility, and support for emerging vision modalities.

The progressive integration of SynDM datasets into academic benchmarks sets a precedent for scalable, realistic, and richly annotated synthetic resources in machine perception and dynamic scene understanding.