Papers
Topics
Authors
Recent
2000 character limit reached

SuperQuadricOcc: Real-Time 3D Scene Modeling

Updated 28 November 2025
  • SuperQuadricOcc is a self-supervised framework that uses superquadric primitives to compactly represent 3D scenes and estimate dense occupancy for automated driving.
  • It approximates each superquadric with a multilayer Gaussian shell for differentiable 2D supervision, leading to improved memory efficiency, speed, and mIoU compared to Gaussian baselines.
  • Its real-time inference via direct voxelization delivers notable performance gains, achieving 21.5 FPS and a 5.9% mIoU improvement on the Occ3D benchmark.

SuperQuadricOcc is a self-supervised semantic occupancy estimation framework designed to provide real-time, dense spatial and semantic understanding of 3D scenes, with particular emphasis on automated driving applications. It replaces the large numbers of 3D Gaussian primitives commonly used in previous occupancy networks with a more compact and expressive set of superquadric primitives. During training, each superquadric is approximated by a multilayer icosphere-tessellated shell of Gaussians, allowing for differentiable rasterization and effective 2D supervision. Inference leverages direct superquadric voxelization, yielding substantial gains in memory efficiency, inference speed, and mean Intersection-over-Union (mIoU) compared to Gaussian-based baselines on the Occ3D benchmark, and constitutes the first real-time self-supervised occupancy model with competitive accuracy (Hayes et al., 21 Nov 2025).

1. Superquadric Scene Representation

SuperQuadricOcc models a 3D scene as a collection of superquadric occupancy fields. Each superquadric SS is parameterized by center μR3\mu \in \mathbb{R}^3, axis scales s=(sx,sy,sz)R+3s=(s_x, s_y, s_z) \in \mathbb{R}_+^3, rotation RSO(3)R \in SO(3) (quaternion parameterization), opacity σR+\sigma \in \mathbb{R}_+, semantic logits cRCc \in \mathbb{R}^C, and shape exponents ϵ1,ϵ2>0\epsilon_1, \epsilon_2>0, which define the degree of “squareness” along principal axes.

The inside–outside function in the local frame xS=R(xμ)x_S = R(x - \mu) is given by: f(xS)=(xsx2/ϵ2+ysy2/ϵ2)ϵ2/ϵ1+zsz2/ϵ1f(x_S) = \left( \frac{|\frac{x}{s_x}|^{2 / \epsilon_2} + |\frac{y}{s_y}|^{2 / \epsilon_2}}{} \right)^{\epsilon_2 / \epsilon_1} + |\frac{z}{s_z}|^{2/\epsilon_1} and the induced occupancy-probability field is po(x)=exp(f(xS))p_o(x) = \exp(-f(x_S)). Shape exponents interpolate between ellipsoids, cylinders, cuboids, and intermediary forms. The model achieves substantial compactness: SuperQuadricOcc uses N=1, ⁣600N = 1,\!600 superquadrics, compared to 10, ⁣00010,\!000 Gaussians required in GaussianFlowOcc for similar scene coverage, yielding an 84% reduction in primitive count.

2. Multi-Layer Gaussian Approximation and Supervision

To enable supervision via 2D images, SuperQuadricOcc approximates each superquadric by a multi-layer shell of Gaussian primitives during training. This facilitates efficient Gaussian rasterization and enables loss computation against 2D pseudo-labels.

A set of LL positive scale factors K={k1,,kL}R+K = \{k_1, \ldots, k_L\} \subset \mathbb{R}_+ scales the superquadric, yielding LL shells. For each, an icosahedron tessellation with FF faces produces LFL \cdot F surface points, to which Gaussians are anchored. The mean, anisotropic covariance, and per-Gaussian opacity σj=σexp(f(xS=j))\sigma_j = \sigma \exp(f(x_{S=j})) are set such that each Gaussian matches the peak density of the underlying superquadric at its center. This construction allows each superquadric to be approximated by a set of LFL \cdot F Gaussians with spatially varying footprint, while capturing both curved and planar geometry.

3. Differentiable Gaussian Rasterization and Training Loss

During training, the shell Gaussians are projected into each camera’s image plane using the projection matrix Πi\Pi_i, yielding an elliptical 2D footprint per Gaussian. Alpha-composite volumetric rendering is performed by sorting Gaussians by depth and compositing their class probabilities and depth values. Semantic and depth maps rendered from the Gaussian shells are supervised using 2D pseudo-labels from Grounded-SAM and Metric3Dv2, respectively.

The training loss is the sum of a cross-entropy loss for semantic rendering and an L1L_1 loss for depth, combined as L=Lsem+λLdepthL = L_\mathrm{sem} + \lambda L_\mathrm{depth}, with λ0.1\lambda \approx 0.1. No temporal or flow-based labels are required, unlike GaussianFlowOcc, simplifying training and maintaining self-supervision.

4. Model Architecture and Training Regimen

The SuperQuadricOcc backbone processes six surround-view RGB images at each time step, using a ResNet-50 encoder to extract multi-scale features. An initial set of trainable superquadric feature vectors and mean positions undergo iterative refinement via three Transformer layers—deformable cross-attention over image features and self-attention among the primitive “slots.” Five lightweight MLP heads predict per-primitive parameters: axis scales, opacity, semantic logits, and shape exponents.

Training employs batch size 6, over 18 epochs on 4 A100 GPUs, and images of resolution 256×704256 \times 704. The superquadric-to-Gaussian shell module, generating 9 shells and 80 faces per shell (720 Gaussians per superquadric), operates only during training. No explicit temporal information is modeled.

5. Efficient Voxelization and Real-Time Inference

During inference, the conversion to Gaussian shells is omitted. Superquadric primitives are directly voxelized onto a 3D grid. For each voxel center pZ3p \in \mathbb{Z}^3, occupancy and semantic fields are aggregated from nearby superquadrics within a neighborhood of radius 5: vo(p)=iN(p)exp(fi(Ri(pμi)))σiv_o(p) = \sum_{i \in \mathcal{N}(p)} \exp(-f_i(R_i(p-\mu_i))) \cdot \sigma_i

vc(p)=iN(p)exp(fi(Ri(pμi)))civ_c(p) = \sum_{i \in \mathcal{N}(p)} \exp(-f_i(R_i(p-\mu_i))) \cdot c_i

Voxels with vo<τ=0.01v_o < \tau = 0.01 are marked empty; others take the argmax over logits in vcv_c. Computation is localized to avoid a full N×VN \times V pass, and rotation matrices are precomputed. The implementation yields 21.5 FPS on an NVIDIA A100 at peak memory usage of 0.70 GB.

6. Benchmark Results and Comparative Analysis

On the Occ3D dataset, SuperQuadricOcc achieves a mean Intersection-over-Union (mIoU) of 12.69, an IoU of 33.67, 0.70 GB memory usage, and 21.5 FPS, using 1,600 superquadrics. By comparison:

Method mIoU IoU Memory (GB) FPS Primitive Count
SuperQuadricOcc 12.69 33.67 0.70 21.5 1,600 superq.
GaussianFlowOcc 11.98 35.85 2.85 9.6 10,000 Gauss.
GaussianFlowOcc* 9.98 35.40 0.62 20.4 1,600 Gauss.

Relative to the 10,000-Gaussian baseline, SuperQuadricOcc attains +5.9% mIoU, –75% memory usage, +124% inference speed, and –84% in primitive count. The expressiveness of superquadrics enables drastic model size reduction without degradation in semantic or geometric accuracy. Competitive or superior 3D occupancy estimation is achieved entirely under self-supervision and without temporal labels.

7. Limitations and Future Research Directions

Limitations include slightly lower binary IoU (free/occupied) compared to Gaussian approaches, due to mismatch between the Gaussian-based loss used for supervision and the final superquadric evaluation at inference. Additionally, modeling extremely irregular or concave geometry remains challenging for single superquadrics, and dynamic scene motion is not modeled.

Future work includes investigation of end-to-end differentiable superquadric rendering (removing the need for Gaussian shells), adaptive shell scale and tessellation learning, incorporation of temporal flow or velocity labels for dynamic scenes, extension to multimodal inputs (e.g., LiDAR, radar), and optimization of the primitive count via sparsity priors.

SuperQuadricOcc demonstrates the viability of compact superquadric representations, combined with Gaussian surrogates for differentiable supervision, as an efficient solution for real-time, self-supervised occupancy modeling with state-of-the-art performance (Hayes et al., 21 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to SuperQuadricOcc.