SuperQuadricOcc: Real-Time 3D Scene Modeling
- SuperQuadricOcc is a self-supervised framework that uses superquadric primitives to compactly represent 3D scenes and estimate dense occupancy for automated driving.
- It approximates each superquadric with a multilayer Gaussian shell for differentiable 2D supervision, leading to improved memory efficiency, speed, and mIoU compared to Gaussian baselines.
- Its real-time inference via direct voxelization delivers notable performance gains, achieving 21.5 FPS and a 5.9% mIoU improvement on the Occ3D benchmark.
SuperQuadricOcc is a self-supervised semantic occupancy estimation framework designed to provide real-time, dense spatial and semantic understanding of 3D scenes, with particular emphasis on automated driving applications. It replaces the large numbers of 3D Gaussian primitives commonly used in previous occupancy networks with a more compact and expressive set of superquadric primitives. During training, each superquadric is approximated by a multilayer icosphere-tessellated shell of Gaussians, allowing for differentiable rasterization and effective 2D supervision. Inference leverages direct superquadric voxelization, yielding substantial gains in memory efficiency, inference speed, and mean Intersection-over-Union (mIoU) compared to Gaussian-based baselines on the Occ3D benchmark, and constitutes the first real-time self-supervised occupancy model with competitive accuracy (Hayes et al., 21 Nov 2025).
1. Superquadric Scene Representation
SuperQuadricOcc models a 3D scene as a collection of superquadric occupancy fields. Each superquadric is parameterized by center , axis scales , rotation (quaternion parameterization), opacity , semantic logits , and shape exponents , which define the degree of “squareness” along principal axes.
The inside–outside function in the local frame is given by: and the induced occupancy-probability field is . Shape exponents interpolate between ellipsoids, cylinders, cuboids, and intermediary forms. The model achieves substantial compactness: SuperQuadricOcc uses superquadrics, compared to Gaussians required in GaussianFlowOcc for similar scene coverage, yielding an 84% reduction in primitive count.
2. Multi-Layer Gaussian Approximation and Supervision
To enable supervision via 2D images, SuperQuadricOcc approximates each superquadric by a multi-layer shell of Gaussian primitives during training. This facilitates efficient Gaussian rasterization and enables loss computation against 2D pseudo-labels.
A set of positive scale factors scales the superquadric, yielding shells. For each, an icosahedron tessellation with faces produces surface points, to which Gaussians are anchored. The mean, anisotropic covariance, and per-Gaussian opacity are set such that each Gaussian matches the peak density of the underlying superquadric at its center. This construction allows each superquadric to be approximated by a set of Gaussians with spatially varying footprint, while capturing both curved and planar geometry.
3. Differentiable Gaussian Rasterization and Training Loss
During training, the shell Gaussians are projected into each camera’s image plane using the projection matrix , yielding an elliptical 2D footprint per Gaussian. Alpha-composite volumetric rendering is performed by sorting Gaussians by depth and compositing their class probabilities and depth values. Semantic and depth maps rendered from the Gaussian shells are supervised using 2D pseudo-labels from Grounded-SAM and Metric3Dv2, respectively.
The training loss is the sum of a cross-entropy loss for semantic rendering and an loss for depth, combined as , with . No temporal or flow-based labels are required, unlike GaussianFlowOcc, simplifying training and maintaining self-supervision.
4. Model Architecture and Training Regimen
The SuperQuadricOcc backbone processes six surround-view RGB images at each time step, using a ResNet-50 encoder to extract multi-scale features. An initial set of trainable superquadric feature vectors and mean positions undergo iterative refinement via three Transformer layers—deformable cross-attention over image features and self-attention among the primitive “slots.” Five lightweight MLP heads predict per-primitive parameters: axis scales, opacity, semantic logits, and shape exponents.
Training employs batch size 6, over 18 epochs on 4 A100 GPUs, and images of resolution . The superquadric-to-Gaussian shell module, generating 9 shells and 80 faces per shell (720 Gaussians per superquadric), operates only during training. No explicit temporal information is modeled.
5. Efficient Voxelization and Real-Time Inference
During inference, the conversion to Gaussian shells is omitted. Superquadric primitives are directly voxelized onto a 3D grid. For each voxel center , occupancy and semantic fields are aggregated from nearby superquadrics within a neighborhood of radius 5:
Voxels with are marked empty; others take the argmax over logits in . Computation is localized to avoid a full pass, and rotation matrices are precomputed. The implementation yields 21.5 FPS on an NVIDIA A100 at peak memory usage of 0.70 GB.
6. Benchmark Results and Comparative Analysis
On the Occ3D dataset, SuperQuadricOcc achieves a mean Intersection-over-Union (mIoU) of 12.69, an IoU of 33.67, 0.70 GB memory usage, and 21.5 FPS, using 1,600 superquadrics. By comparison:
| Method | mIoU | IoU | Memory (GB) | FPS | Primitive Count |
|---|---|---|---|---|---|
| SuperQuadricOcc | 12.69 | 33.67 | 0.70 | 21.5 | 1,600 superq. |
| GaussianFlowOcc | 11.98 | 35.85 | 2.85 | 9.6 | 10,000 Gauss. |
| GaussianFlowOcc* | 9.98 | 35.40 | 0.62 | 20.4 | 1,600 Gauss. |
Relative to the 10,000-Gaussian baseline, SuperQuadricOcc attains +5.9% mIoU, –75% memory usage, +124% inference speed, and –84% in primitive count. The expressiveness of superquadrics enables drastic model size reduction without degradation in semantic or geometric accuracy. Competitive or superior 3D occupancy estimation is achieved entirely under self-supervision and without temporal labels.
7. Limitations and Future Research Directions
Limitations include slightly lower binary IoU (free/occupied) compared to Gaussian approaches, due to mismatch between the Gaussian-based loss used for supervision and the final superquadric evaluation at inference. Additionally, modeling extremely irregular or concave geometry remains challenging for single superquadrics, and dynamic scene motion is not modeled.
Future work includes investigation of end-to-end differentiable superquadric rendering (removing the need for Gaussian shells), adaptive shell scale and tessellation learning, incorporation of temporal flow or velocity labels for dynamic scenes, extension to multimodal inputs (e.g., LiDAR, radar), and optimization of the primitive count via sparsity priors.
SuperQuadricOcc demonstrates the viability of compact superquadric representations, combined with Gaussian surrogates for differentiable supervision, as an efficient solution for real-time, self-supervised occupancy modeling with state-of-the-art performance (Hayes et al., 21 Nov 2025).