SuperQuadric Scene Representation

Updated 28 November 2025

Superquadric-based scene representation is a method that encodes 3D environments using parametric volumetric primitives spanning ellipsoids, cuboids, and intermediate forms.
It leverages the mathematically defined superquadric surface equation and robust estimation pipelines—both probabilistic and deep learning-based—to extract shape and pose from sensory data.
This approach reduces memory and computation compared to voxel and mesh models, enabling efficient real-time applications in SLAM, mapping, and robotic manipulation.

A superquadric-based scene representation encodes 3D environments as compositions of parametric volumetric primitives—superquadrics—that span a continuous family of shapes including ellipsoids, cuboids, cylinders, and various intermediate or pinched forms. This approach centers around the use of the superquadric surface equation, parameterized by scale, shape exponents, and pose, as a compact yet expressive representation of rigid and amorphous scene components. By fitting or predicting these primitives from sensory input, superquadric-based scene representations offer an interpretable, analytically tractable, and memory-efficient alternative to dense voxels, point clouds, or mesh-based models, enabling a spectrum of downstream tasks from SLAM and semantic mapping to robotic manipulation and generative modeling.

1. Mathematical Formulation of Superquadric Primitives

A superquadric in its canonical frame is defined by the implicit equation

$F(x, y, z; a_x, a_y, a_z, \varepsilon_1, \varepsilon_2) = \left( \left| \frac{x}{a_x} \right|^{\frac{2}{\varepsilon_2}} + \left| \frac{y}{a_y} \right|^{\frac{2}{\varepsilon_2}} \right)^{\frac{\varepsilon_2}{\varepsilon_1}} + \left| \frac{z}{a_z} \right|^{\frac{2}{\varepsilon_1}} = 1$

where $a_x, a_y, a_z > 0$ are scale parameters along each axis, and $\varepsilon_1, \varepsilon_2 > 0$ are roundness or squareness exponents. The parametric form is

$\mathbf{r}(\eta, \omega) = \begin{bmatrix} a_x\,\mathrm{sgn}(\cos\eta)\,|\cos\eta|^{\varepsilon_1} \mathrm{sgn}(\cos\omega)\,|\cos\omega|^{\varepsilon_2} \ a_y\,\mathrm{sgn}(\cos\eta)\,|\cos\eta|^{\varepsilon_1} \mathrm{sgn}(\sin\omega)\,|\sin\omega|^{\varepsilon_2} \ a_z\,\mathrm{sgn}(\sin\eta)\,|\sin\eta|^{\varepsilon_1} \end{bmatrix}$

with angular parameters $\eta \in [-\frac{\pi}{2}, \frac{\pi}{2}], \omega \in [-\pi, \pi]$ (Fedele et al., 1 Apr 2025).

The full world-frame embedding attaches a rigid transformation $[R, t]$ (rotation $R \in SO(3)$ , translation $t \in \mathbb{R}^3$ ), yielding an 11-parameter state vector per primitive. The exponents control geometric transitions: for example, $\varepsilon_1, \varepsilon_2 \to 1$ yields ellipsoids, while $\varepsilon < 1$ and $\varepsilon > 1$ interpolate between star-like, spherical, and boxy forms (Tschopp et al., 2021, Fedele et al., 1 Apr 2025).

2. Primitive Recovery: Fitting and Estimation Pipelines

Superquadric recovery from sensory data (point clouds, depth images, image masks) has been advanced by both probabilistic and deep learning-based approaches.

Probabilistic Fitting: The Gaussian–uniform mixture model (GUM) from the EMS algorithm (Liu et al., 2021) frames superquadric recovery as maximum-likelihood estimation under explicit modeling of inlier noise and outliers. The fit alternates between computing soft inlier weights and optimizing shape/pose parameters with a trust-region reflective optimizer, augmented by symmetry-based parameter switching to mitigate local minima.
Feed-forward Neural Estimation: Methods such as SuperDec (Fedele et al., 1 Apr 2025) and Mask R-CNN + CNN regressors (Šircelj et al., 2020) predict superquadric parameters from point clouds or segmented depth images using deep architectures, often incorporating transformers for query–point interaction and further optimization via differentiable bidirectional distance fitting (Levenberg–Marquardt refinement).
Multi-view and Image-only Fitting: For monocular SLAM, superquadric shape and pose are retrieved from multi-view 2D semantic masks using multi-stage numeric and analytic optimization, with centroid triangulation, PCA-based orientation estimation, and silhouette- or point-to-surface losses, often followed by bundle adjustment (Tschopp et al., 2021, Han et al., 2022).

These systems demonstrate robustness to noise, partial occlusion, and outliers, and enable efficient decomposition of complex scenes into a sparse set of interpretable primitives (Liu et al., 2021, Fedele et al., 1 Apr 2025).

3. Occupancy Modeling and Probabilistic Scene Fields

Superquadric primitives can parameterize dense semantic and geometric fields using probabilistic models.

Probabilistic Mixture: QuadricFormer (Zuo et al., 12 Jun 2025) models each superquadric as a local occupancy probability field, with semantics obtained via mixture weighting across primitives:

$p_o(X) = 1 - \prod_{i=1}^P (1 - p_o(X \mid Q_i)),\qquad p_o(X \mid Q_i) = \exp(-F_i(X_{Q_i}))$

$p_c(X) = \frac{\sum_{i=1}^P p_o(X \mid Q_i) a_i c_i}{\sum_{j=1}^P p_o(X \mid Q_j) a_j}$

where $a_i$ is opacity and $c_i$ a semantic class vector. The differentiability of this construction enables direct integration with neural scene networks.

Real-Time Occupancy (Gaussian Approximations): SuperQuadricOcc (Hayes et al., 21 Nov 2025) enables GPU-accelerated occupancy prediction by approximating each superquadric with multi-layer icosphere-tessellated Gaussian mixtures. This approach allows rapid rasterization for both self-supervised training (projection to 2D views) and real-time 3D inference using an efficient CUDA voxelization kernel.

Superquadric fields reduce the number of required primitives by over 80% relative to Gaussians (Hayes et al., 21 Nov 2025, Zuo et al., 12 Jun 2025), with a major reduction in memory and inference time.

4. Integration into SLAM, Mapping, and Robotics

Superquadric-based scene representations have been adopted for both SLAM and robotic manipulation:

Semantic/Object-centric SLAM: Both optimization-based (Tschopp et al., 2021, Han et al., 2022) and monocular/deep methods (Fedele et al., 1 Apr 2025) leverage superquadric landmarks for improved object-level anchoring, efficient bundle adjustment, and enhanced 3D IoU over quadrics or cuboids. Data association across frames is facilitated by geometric, semantic, and feature-based matching strategies. Instances are typically parameterized minimally (position, yaw, scale, exponents), yielding high real-time tracking speed.
Manipulation and Grasp Synthesis: Superquadrics enable closed-form computation of surface curvature, antipodal contact points, and approach vectors for grasp planning. For bin picking and cluttered scenes, frameworks such as RGBSQGrasp infer superquadric primitives and guide grasp pose sampling, filtering candidates by collision checking and surface fit quality, and achieve up to 92% grasp success rates in real-world experiments (Xu et al., 4 Mar 2025). Mirror-based completion is leveraged for partial-view data (Makhal et al., 2017).
Path Planning and Collision Checking: Superquadric primitives serve as analytic collision bodies in motion planning. SuperDec demonstrates path-planning feasibility and memory efficiency against point cloud and cuboid models, enabling practical deployment in robotics (Fedele et al., 1 Apr 2025).

5. Expressiveness, Efficiency, and Limitations

Superquadric-based scene representations offer key advantages:

Representation	Parameter Count	Shape Diversity	Scene Modularity	Typical Primitives (per room)
Voxels/Meshes	High	Arbitrary	Low	10⁶–10⁸
Gaussians	Moderate–High	Ellipsoidal	Low–Medium	10³–10⁴
Cuboids	Low	Rectilinear	Moderate	10–100
Superquadrics	Low	Very High	High	10–150

Superquadrics interpolate between spheres, cylinders, ellipsoids, cuboids, wedges, and star-shaped forms, enabling high scene fidelity with very sparse representations (Zuo et al., 12 Jun 2025, Fedele et al., 1 Apr 2025).

However, limitations include difficulties modeling extreme non-convexities, thin-walled or multiply-connected geometries, and highly articulated objects. Mixtures/hierarchies of superquadrics or hybridization with explicit mesh/point-based models are indicated for such regimes (Liu et al., 2021, Makhal et al., 2017).

6. Downstream Applications and Qualitative Behaviors

Applications enabled by superquadric-based scene representations include:

Controllable Scene Generation: Editing underlying superquadric graphs propagates to generative 2D models (e.g., depth-conditioned ControlNets with Stable Diffusion), facilitating geometric scene editing and semantic style transfer without retraining (Fedele et al., 1 Apr 2025).
Differentiable Rendering and Simulation: Approaches such as Differentiable Blocks World fit textured superquadric meshes directly from multi-view image collections using differentiable rasterization, supporting interpretable, actionable, and editable scene graphs suitable for downstream physics or graphics tasks (Monnier et al., 2023).
Occupancy Prediction for Automated Driving: Superquadric Occ and QuadricFormer establish new semantic 3D occupancy SOTA on nuScenes and Occ3D, with a small number of primitives and dramatically reduced runtime—mIoU 20.04–20.12% for 1600 quads vs. 18.73% for Gaussian-based methods at higher compute cost (Hayes et al., 21 Nov 2025, Zuo et al., 12 Jun 2025).
Robustness to Noise and Occlusion: Probabilistic and learning-based superquadric fitting methods systematically outperform least-squares or radial-only approaches under high noise, outlier ratios, and partial observation, recording lower surface distance error and higher inlier recall (Liu et al., 2021, Šircelj et al., 2020).

7. Comparative Evaluation and Future Directions

Recent quantitative studies report that superquadric-based models reduce Chamfer distance and primitive count versus state-of-the-art cuboid and Gaussian splatting representations (e.g., SuperDec achieves L₂ Chamfer ≈ 0.05 × 10⁻³ with 5.8 primitives per object, vs. SQ’s 0.28 with 10 primitives) (Fedele et al., 1 Apr 2025). In robotics, real-world grasping and path-planning tasks confirm practical gains in accuracy, speed, and generalization to unseen object instances (Xu et al., 4 Mar 2025, Fedele et al., 1 Apr 2025).

Ongoing research explores the integration of superquadric primitives in temporally consistent, multi-view frameworks; hybrid representations for fine-grained geometry; and neural implicit or photometric optimization in scene decomposition (Hayes et al., 21 Nov 2025, Monnier et al., 2023). A persistent technical challenge is extending the expressive power to complex topologies while maintaining the interpretability and efficiency that distinguish superquadric-based scene representations.