AutoSplat: 3D Gaussian Splatting for Autonomous Driving

Updated 25 April 2026

AutoSplat is a framework that leverages 3D Gaussian splatting for real-time scene reconstruction and multi-view consistent view synthesis in autonomous driving contexts.
It employs a two-phase background reconstruction with semantic segmentation and geometric flattening, alongside symmetry-regularized foreground object modeling.
The method achieves state-of-the-art performance on benchmark datasets, delivering real-time FPS and enhanced fidelity for dynamic urban scenes.

AutoSplat is a scene reconstruction and view synthesis framework for autonomous driving environments, leveraging 3D Gaussian splatting under geometric and appearance constraints to address challenges such as complex backgrounds, dynamic objects, and sparse viewpoints. It produces an explicit 3D representation that enables multi-view consistent, real-time simulation and novel trajectory creation (e.g., virtual lane changes) in urban driving scenarios, and achieves state-of-the-art performance on benchmark datasets (Khan et al., 2024).

1. Pipeline Architecture

AutoSplat ingests sequences of $N$ calibrated RGB images $\{I_i, K_i, E_i\}_{i=1}^N$ , synchronized LiDAR sweeps $\{L_i\}$ , and tracked 3D bounding boxes $\{T_i^k\}$ for $K$ dynamic foreground objects. The processing pipeline consists of:

Background Reconstruction: Conducted in two phases.
- Phase 1 segments each image into “road,” “sky,” and “other” with Mask2Former. Gaussians are class-labeled by backprojecting LiDAR points, with geometric flattening constraints imposed on road/sky Gaussians.
- Phase 2 jointly optimizes all background Gaussians, masking foreground regions.
Foreground (Object) Reconstruction:
- Each object is initialized from a 3D template mesh (e.g., NFI-inverted shapes), placed along tracked trajectories.
- Reflected-Gaussian consistency supervises both observed and unobserved sides by leveraging object symmetry.
- Appearance dynamics are captured via per-Gaussian, per-timestep residuals in spherical harmonics.
Scene-Level Fusion: All Gaussians are fine-tuned together, including object trajectory corrections, on the full image.

The final representation supports high-fidelity rendering from arbitrary views and trajectory edits (such as simulated lane changes).

2. Mathematical Structure of 3D Gaussian Splatting

The core of AutoSplat’s representation is a set of parameterized 3D Gaussians, each described by:

Center: $\mu \in \mathbb{R}^3$
Covariance: $\Sigma \in \mathbb{R}^{3 \times 3}$ , factorized as $\Sigma = R S S^T R^T$ where $R$ is a rotation matrix and $S = \operatorname{diag}(s_x, s_y, s_z)$ a scaling.
Opacity: $\{I_i, K_i, E_i\}_{i=1}^N$ 0
Color: Encoded in spherical harmonic coefficients $\{I_i, K_i, E_i\}_{i=1}^N$ 1

A Gaussian’s spatial density at $\{I_i, K_i, E_i\}_{i=1}^N$ 2 is:

$\{I_i, K_i, E_i\}_{i=1}^N$ 3

For rasterization, spatial Gaussians are projected as 2D Gaussians $\{I_i, K_i, E_i\}_{i=1}^N$ 4 in the image plane. Pixel color $\{I_i, K_i, E_i\}_{i=1}^N$ 5 is composited by alpha blending:

$\{I_i, K_i, E_i\}_{i=1}^N$ 6

3. Constraints for Road and Sky Representation

To ensure rendering consistency and realism across multi-view and trajectory perturbations, dedicated geometric constraints are imposed for Gaussians assigned to “road” and “sky”:

Flatness Constraint: Forcing zero roll $\{I_i, K_i, E_i\}_{i=1}^N$ 7, zero pitch $\{I_i, K_i, E_i\}_{i=1}^N$ 8, and minimal vertical extent $\{I_i, K_i, E_i\}_{i=1}^N$ 9:

$\{L_i\}$ 0

This term is added to a combined loss for each semantic region $\{L_i\}$ 1 during background training:

$\{L_i\}$ 2

This prevents floating artifacts and preserves plausible parallax under camera or vehicle motion.

4. Reflected-Gaussian Consistency for Object Supervision

AutoSplat enforces appearance and geometric consistency for foreground objects’ unobserved sides by reflecting each Gaussian across the canonical plane of symmetry:

The symmetry plane has normal $\{L_i\}$ 3, with the reflection matrix $\{L_i\}$ 4.
Each Gaussian’s center, rotation, and spherical harmonics are reflected as:

$\{L_i\}$ 5

where $\{L_i\}$ 6 is the Wigner-D matrix for spherical harmonics.

Rendered images of both original and reflected Gaussians are jointly supervised:

$\{L_i\}$ 7

This approach eliminates geometric “boxiness” and color artifacts on previously unobserved surfaces.

5. Dynamic Appearance via Residual Spherical Harmonics

To accurately reconstruct transient appearance changes (e.g., brake lights, indicators, time-varying shadows), per-Gaussian, per-timestep residuals are learned through a small MLP operating on a temporal embedding:

Given time-step embedding $\{L_i\}$ 8, position $\{L_i\}$ 9, and static $\{T_i^k\}$ 0,

$\{T_i^k\}$ 1

$\{T_i^k\}$ 2

A sparsity regularization $\{T_i^k\}$ 3 (with weight $\{T_i^k\}$ 4) is applied to suppress nonphysical flicker, enabling efficient modeling of appearance dynamics.

6. Training Objective and Optimization

The full learning objective consists of:

Background loss: $\{T_i^k\}$ 5, optimized for 15K iterations in each of two background phases.
Foreground loss:

$\{T_i^k\}$ 6

Fusion: The final objective optimizes both sets: $\{T_i^k\}$ 7.

Training schedules use 30K iterations for background (in two phases), 5K for foreground, and 10K for joint fusion, typically on a single NVIDIA V100 GPU.

7. Experimental Protocol and Benchmarking

Datasets

Pandaset: Ten challenging urban sequences (80 frames each), spanning day/night conditions with multiple dynamic vehicles.
KITTI: Standard driving sequences, with varying camera coverage for novel view evaluation (25%, 50%, 75% frames held out).

Metrics

Novel view synthesis: PSNR (↑), SSIM (↑), LPIPS (↓) on held-out, nearby novel views.
Lateral-shift simulation: FID (↓) for 1–3 m simulated lane changes.

Results Table

Dataset/Task	AutoSplat	EmerNeRF	SUDS	MARS	NSG	NeRF
Pandaset, test-view PSNR	27.84	27.73	25.13	23.66	22.79	—
Pandaset, test-view SSIM	0.906	0.801	0.843	0.832	0.802	—
Pandaset, test-view LPIPS	0.291	0.394	0.426	0.502	0.578	—
Pandaset, FPS	26	—	—	—	—	—
Lateral FID 1/2/3 m	54.7/68.7/83.0	68.2/90.4/102.8	95.4/122.7/150.8	—	—	—
KITTI, 75% held PSNR	26.59	—	22.77	24.23	21.53	18.56
KITTI, 75% held SSIM	0.913	—	0.797	0.845	0.673	0.557
KITTI, 75% held LPIPS	0.204	—	0.171	0.160	0.254	0.554

AutoSplat matches or exceeds previous best reported PSNR/SSIM, and uniquely combines real-time performance ( $\{T_i^k\}$ 8 FPS), low FID under simulated trajectory shifts, and improved geometric and appearance fidelity, particularly for previously unobserved object sides and dynamic lighting phenomena.

8. Position Within Scene Reconstruction Research

AutoSplat extends conventional 3D Gaussian splatting by: (i) imposing geometric flatness for road/sky, (ii) template-based, symmetry-regularized foreground object initialization, and (iii) efficient dynamic appearance modeling via residual spherical harmonics (Khan et al., 2024). Comparative experiments on view synthesis, trajectory perturbation, and dynamic rendering demonstrate improved accuracy and realism compared to NeRF-based and prior dynamic-scene representations, notably EmerNeRF, SUDS, MARS, NSG, and baseline NeRF.

The framework’s explicit, fast, and physically informed representation contributes new capabilities for multi-view-consistent, real-time simulation environments relevant to the development and evaluation of autonomous vehicles.

Markdown Report Issue Upgrade to Chat

References (1)

AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AutoSplat.

AutoSplat: 3D Gaussian Splatting for Autonomous Driving

1. Pipeline Architecture

2. Mathematical Structure of 3D Gaussian Splatting

3. Constraints for Road and Sky Representation

4. Reflected-Gaussian Consistency for Object Supervision

5. Dynamic Appearance via Residual Spherical Harmonics

6. Training Objective and Optimization

7. Experimental Protocol and Benchmarking

Datasets

Metrics

Results Table

8. Position Within Scene Reconstruction Research

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

AutoSplat: 3D Gaussian Splatting for Autonomous Driving

1. Pipeline Architecture

2. Mathematical Structure of 3D Gaussian Splatting

3. Constraints for Road and Sky Representation

4. Reflected-Gaussian Consistency for Object Supervision

5. Dynamic Appearance via Residual Spherical Harmonics

6. Training Objective and Optimization

7. Experimental Protocol and Benchmarking

Datasets

Metrics

Results Table

8. Position Within Scene Reconstruction Research

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research