Papers
Topics
Authors
Recent
2000 character limit reached

REArtGS++: 3D Articulated Object Reconstruction

Updated 28 November 2025
  • REArtGS++ is a 3D articulated object reconstruction framework that integrates planar Gaussian splatting, decoupled screw motions, and temporal geometry constraints to achieve accurate part-level reconstructions.
  • It employs a differentiable rendering pipeline with temporal consistency and motion blending to ensure physically plausible animations from sparse, two-state, multi-view RGB inputs.
  • The framework supports general screw motions without joint-type priors and seamlessly integrates with mesh-based workflows, enabling interactive editing and high-performance ray tracing.

REArtGS++ is a 3D articulated object reconstruction and rendering framework that integrates planar Gaussian splatting, decoupled screw motion models, and temporal geometry constraints. As an evolution of REArtGS and in parallel to REdiSplats, REArtGS++ advances the ability to reconstruct, animate, and render part-aware articulated objects from minimal (two-state) multi-view RGB supervision while supporting physically plausible motion, temporal consistency, and integration with mesh-based graphics workflows (Wu et al., 21 Nov 2025, &&&1&&&).

1. Problem Formulation and Conceptual Foundations

REArtGS++ addresses the articulated surface reconstruction and part-level kinematic estimation problem given two collections of multi-view RGB images {I0v}v=1V\{I_0^v\}_{v=1}^V and {I1v}v=1V\{I_1^v\}_{v=1}^V of an object in two different states (t=0t=0, t=1t=1). The primary objectives are: (a) reconstruct the geometry {Sj}j=1k\{S_j\}_{j=1}^k of each rigid part, and (b) estimate joint-level screw motion parameters {ωj}j=1k\{\omega_j\}_{j=1}^k.

The framework seeks parameters that minimize the aggregate photometric loss across both states and views via a differentiable rendering operator Πv\Pi_v: {S^j,  ω^j}=argminSj,ωjt{0,1}v=1VLphoto ⁣(Itv,  Πv({Sj},{ωj},t))\bigl\{\hat S_j,\;\hat\omega_j\bigr\} =\arg\min_{S_j,\omega_j} \sum_{t\in\{0,1\}}\sum_{v=1}^V \mathcal{L}_{\rm photo}\!\Bigl(I_t^v,\;\Pi_v\bigl(\{S_j\},\{\omega_j\},t\bigr)\Bigr)

REArtGS++ overcomes two limitations of REArtGS (Wu et al., 9 Mar 2025): (1) it supports general screw motions (no joint-type prior; not just revolute/prismatic), and (2) it enforces geometry consistency across the continuous motion path (t[0,1]t \in [0,1]) via temporal regularization, preventing "drift" in unseen configurations (Wu et al., 21 Nov 2025).

2. Decoupled Screw Motion and Planar Gaussian Parameterization

REArtGS++ models each rigid part's motion by a decoupled screw transformation in SE(3)\mathrm{SE}(3). Each joint's kinematic parameters ωj\omega_j comprise a unit axis uj\mathbf{u}_j, its moment mj\mathbf{m}_j, a rotation θj\theta_j, and translation djd_j. The tt-dependent transform is: Rj(t)=exp([uj]×θj(t)) pj(t)=(IRj(t))(uj×mj)+ujdj(t)\begin{aligned} R_j(t)&=\exp\bigl([\mathbf{u}_j]_\times\,\theta_j(t)\bigr)\ p_j(t)&=(I - R_j(t)) (\mathbf{u}_j \times \mathbf{m}_j) + \mathbf{u}_j d_j(t) \end{aligned} with θj(t)=(t0.5)θ^j/0.5,  dj(t)=(t0.5)d^j/0.5\theta_j(t) = (t - 0.5) \hat{\theta}_j / 0.5,\; d_j(t) = (t - 0.5) \hat{d}_j / 0.5.

The object geometry is encoded as a set of NN anisotropic Gaussians Gi={μi,Σi,wi,ci}\mathcal{G}_i = \{\mu_i, \Sigma_i, w_i, c_i\}, typically with Σi=RiSi2RiT\Sigma_i = R_i S_i^2 R_i^T, Si=diag(ε,si,2,si,3)S_i = \mathrm{diag}(\varepsilon, s_{i,2}, s_{i,3}), ε1\varepsilon \ll 1 enforcing the planar constraint. Each Gaussian is also assigned a probability mi,jm_{i,j} of belonging to part jj.

The planar nature is enforced by minimizing the smallest eigenvalue: Lplanar=i=1Nmin(λ1(Σi),λ2(Σi),λ3(Σi))\mathcal{L}_{\rm planar} = \sum_{i=1}^N \min(\lambda_1(\Sigma_i), \lambda_2(\Sigma_i), \lambda_3(\Sigma_i)) ensuring each Gaussian represents a small, oriented surface patch suitable for high-fidelity normal and depth recovery (Wu et al., 21 Nov 2025).

3. Differentiable Rendering and Temporal Consistency

Rendering utilizes volume alpha-blending along rays using the transparency and color contributed by all intersected Gaussians: C(ρ)=i=1Nρciαij<i(1αj),αi=1exp(wiTi(ρ))C(\rho) = \sum_{i=1}^{N_\rho} c_i\,\alpha_i \prod_{j<i}(1-\alpha_j),\quad \alpha_i = 1 - \exp(-w_i \cdot T_i(\rho)) where Ti(ρ)T_i(\rho) is the ray integral over the ii-th Gaussian (Wu et al., 21 Nov 2025, Wu et al., 9 Mar 2025).

Temporal geometry consistency is enforced using a first-order Taylor expansion. For rendered depth D(ρ,t)D(\rho, t) and normal N(ρ,t)N(\rho, t) at pixel ρ\rho and time tt, the normal at arbitrary tt is approximated as: N(ρ,t)N(ρ,t0)+N(ρ,t)tt0(tt0)N(\rho, t) \approx N(\rho, t_0) + \bigl.\frac{\partial N(\rho, t)}{\partial t}\bigr|_{t_0}(t - t_0) with the time-derivative numerically estimated by finite difference from the canonical pose t=0.5t^*=0.5. The temporal geometry loss is: Lgeo=t0{0,1}(1I(ρ))[Nˉ(ρ,t0)N(ρ,t0)]+[tNˉtN]1\mathcal{L}_{\rm geo} =\sum_{t_0 \in \{0,1\}} (1-\|\nabla I(\rho)\|) \Bigl\| [\bar N(\rho, t_0) - N(\rho, t_0)] + [\nabla_t \bar N - \nabla_t N] \Bigr\|_1 which encourages consistent normal directions and motion across time, especially at surface boundaries (Wu et al., 21 Nov 2025).

4. Optimization Objective, Motion Blending, and Implementation

The global objective aggregates multiple loss terms: L=λrenderLrender+λplanarLplanar+λgeoLgeo+λvoteLvote+λcenterLcenter\mathcal{L} = \lambda_{\rm render}\,\mathcal{L}_{\rm render} + \lambda_{\rm planar}\,\mathcal{L}_{\rm planar} + \lambda_{\rm geo}\,\mathcal{L}_{\rm geo} + \lambda_{\rm vote}\,\mathcal{L}_{\rm vote} + \lambda_{\rm center}\,\mathcal{L}_{\rm center} where Lrender\mathcal{L}_{\rm render} is L1_1+D-SSIM photometric loss, Lvote\mathcal{L}_{\rm vote} regularizes the soft segmentation mi,jm_{i,j}, and Lcenter\mathcal{L}_{\rm center} penalizes deviations of part means (Wu et al., 21 Nov 2025).

Motion blending of Gaussians is realized as: μi(t)=j=1kmi,j(Rj(t)(μioj)+oj+pj(t))\mu_i(t) = \sum_{j=1}^k m_{i,j} (R_j(t)(\mu_i - o_j) + o_j + p_j(t)) ensuring Gaussians are spatially coupled to part motions throughout optimization and during inference.

Implementation employs an initialization phase fitting static/dynamic Gaussians from the two input states, K-means clustering for part identification, and ~30K Adam iterations. At test time, joint parameters and part-Gaussian associations enable rendering at arbitrary t[0,1]t\in [0, 1] and mesh extraction via TSDF fusion (Wu et al., 21 Nov 2025).

5. Ray-Tracing and Editability in REArtGS++

REArtGS++ as implemented in the REdiSplats paradigm further leverages flat Gaussian primitives as explicit triangle meshes, enabling high-performance ray tracing, mesh-based interactions, and full compatibility with 3D editing tools (Byrski et al., 15 Mar 2025). Each planar Gaussian is represented both as a covariance function and as a mesh polygon (typically n=8n=8 sides) with vertices: Pi,k=RiSivi,k(local)+miP_{i,k} = R_i S_i\, v_{i,k}^{\rm(local)} + m_i Enabling efficient triangle intersection tests and subsequent probabilistic blending for physically-realistic rendering, including shadows and light transport.

Editing is performed by directly manipulating mesh vertices, and solving for updated (mi,Ri,Si)(m_i', R_i', S_i') using the warped locations, guaranteeing that mesh edits result in consistent updates to the Gaussian parameters (Byrski et al., 15 Mar 2025).

6. Empirical Results and Comparative Performance

REArtGS++ demonstrates superior performance on standard articulated reconstruction benchmarks:

Dataset CD-w \downarrow Joint Error (deg/cm) Part-motion error Relative Drop vs REArtGS
Synthetic/Real 20–30% lower <<1/<<1 0.2°/0.01 m Yes

Especially notable are results on screw-joint and multi-part objects, where prior methods degrade severely. Average angular errors and axis position errors remain under 1° and 1 cm, respectively. Qualitatively, sharper boundaries and temporally smooth interpolations are observed (Wu et al., 21 Nov 2025).

In the REdiSplats regime, REArtGS++ matches or exceeds state-of-the-art SSIM/PSNR/LPIPS for novel-view synthesis (see Table 1 in (Byrski et al., 15 Mar 2025)), with n=8n=8 polygon sides providing a balance between accuracy and rendering speed.

7. Contributions, Limitations, and Future Directions

REArtGS++ advances articulated object reconstruction through:

  1. A decoupled screw-motion SE(3) formulation with no joint-type prior, handling general revolute, prismatic, and screw motions.
  2. Part-aware planar Gaussian splats, conferring clean surface normals and robust per-part geometry.
  3. Temporal geometric regularization by Taylor expansion, ensuring motion-consistent reconstructions.
  4. Local voting regularizers for part assignment robustness.
  5. Integration of physically accurate ray tracing and mesh editability, enabling algorithmic and interactive manipulation (Wu et al., 21 Nov 2025, Byrski et al., 15 Mar 2025).

Limitations include reduced performance on highly transparent/refractive surfaces and reliance on accurate camera-pose alignment between states. Prospective directions involve learned depth priors, extension to multi-frame video and deformable linkages, and expanding differentiable rendering for complex material properties.

REArtGS++ thus establishes a comprehensive, generalizable methodology for articulated object reconstruction and part-level motion estimation from sparse multiview supervision, providing both state-of-the-art accuracy and practical integration with modern 3D graphics pipelines.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to REArtGS++.