MoE-GS for Dynamic Gaussian Splatting

Updated 23 October 2025

The paper presents MoE-GS, a framework that integrates multiple specialized 3D Gaussian Splatting models using adaptive routing to improve dynamic scene reconstruction fidelity.
It employs a novel Volume-aware Pixel Router to project Gaussian-level gating weights into pixel space, ensuring coherent expert blending across spatial and temporal dimensions.
Efficiency strategies like single-pass rendering, gate-aware pruning, and knowledge distillation optimize computational cost while maintaining high visual quality.

Mixture of Experts for Dynamic Gaussian Splatting (MoE-GS) is an advanced framework designed to address the limitations of traditional dynamic scene reconstruction by integrating multiple specialized 3D Gaussian Splatting (3DGS) models via a novel adaptive routing mechanism. In the context of dynamic 3D scene synthesis and rendering, MoE-GS improves reconstruction fidelity and robustness to diverse scene characteristics, dynamically blending the strengths of several expert models to accommodate spatial and temporal variability. The core technical contribution is the Volume-aware Pixel Router, which projects learned Gaussian-level gating weights into pixel space using differentiable weight splatting, enabling spatially and temporally coherent expert blending. The framework further introduces single-pass multi-expert rendering, gate-aware Gaussian pruning, and a distillation strategy, ensuring competitive efficiency despite the increased capacity of MoE architectures (Jin et al., 22 Oct 2025).

1. Motivation and Limitations of Previous Approaches

Dynamic Gaussian Splatting methods have become central for real-time, high-fidelity dynamic 3D scene reconstruction, but prior models exhibit three primary deficiencies: lack of consistent performance across heterogeneous scenes; spatial inconsistency, where differing regions require different modeling strengths; and inadequate handling of temporal fluctuation, as frame-by-frame dynamics vary substantially. Existing methods typically rely on a single 3DGS model or statically partition scene components, which leads to either underfitting (in complex, rapidly changing zones) or model redundancy (in more static regions) and consequent inefficiency in both training and inference (Guo et al., 2024, Lee et al., 2024).

MoE-GS addresses these issues by leveraging multiple experts—each tailored to specific deformation, motion profiles, or appearance regimes—and a dynamic, volumetric-to-pixel gating process to blend their outputs contextually.

2. MoE-GS Architecture and Volume-aware Pixel Router

The principal innovation of MoE-GS is the Volume-aware Pixel Router, a differentiable mechanism for mapping per-Gaussian routing weights into pixel space. Each expert is a separately optimized 3DGS model specializing in particular spatial–temporal features (e.g., rapid non-rigid motion, static background, fine texture detail).

Per-Gaussian weights encode both intrinsic properties (position, rotation, scale, opacity) and contextual dependencies (directional/view cue, time signal):

$w_i^{\mathrm{per}} = [w_i, w_i^{\mathrm{dir}}, (t \cdot w_i^{\mathrm{time}})]^T$

These weights are rasterized onto the image plane using differentiable splatting. The resulting pixel-level weights $w_{2D}$ are refined via a lightweight MLP with directional and temporal embedding:

$R' = w_{2D} + \Phi(w_{2D}^{\mathrm{dir}}, w_{2D}^{\mathrm{time}}, r)$

where $r$ denotes the viewing direction. The final expert selection probabilities at each pixel are computed as:

$G'_k = \mathrm{Softmax}(R'_k)$

The MoE-GS output at pixel $u$ is then

$I_{\text{MoE}}(u) = \sum_k G'_k(u) \cdot I_{E_k}(u)$

This design ensures that the gating decision is informed by volumetric properties and projected onto the appropriate image pixels, adapting to both spatial region and dynamic context.

3. Efficiency Strategies: Single-Pass Rendering and Gate-aware Pruning

Since MoE architectures typically incur greater computational cost than single-expert models, MoE-GS incorporates two optimizations.

Single-Pass Multi-Expert Rendering: All Gaussians are processed in one pass, tagged via a one-hot expert identity. The rendering equation avoids repeated rasterization:

$C_k(u) = \sum_j T_j(u) \cdot \alpha_j(u) \cdot c_j \cdot (e_j)_k$

where $T_j(u)$ is the cumulative transmittance, $\alpha_j(u)$ the opacity, $c_j$ the color, and $(e_j)_k$ identifies expert $k$ 's Gaussians. This approach computes all expert outputs in parallel, separating them at the alpha blending stage only.

Gate-aware Gaussian Pruning: To mitigate model redundancy, the router accumulates the gradient of gating weights with respect to per-Gaussian parameters across the dataset:

$\mathcal{E}_i = \frac{1}{|\mathcal{D}|}\sum_{v \in \mathcal{D}} \left\| \frac{\partial G'_k(v)}{\partial w_i^{\mathrm{per}}(v)} \right\|$

Experts whose importance $\mathcal{E}_i$ falls below a threshold $\tau$ are pruned during training, which preserves fidelity while reducing memory and compute resources.

4. Knowledge Distillation for Lightweight Deployment

MoE-GS employs a distillation procedure that enables the transfer of fusion performance to individual experts, supporting lightweight inference without changes to expert architectures. After MoE optimization, the MoE-rendered image $I_{\text{MoE}}$ acts as the pseudo-ground truth for each expert. Confidence weights $G'_k$ are used as attention for supervised training. The distillation loss per expert is:

$\mathcal{L}_k^{\mathrm{KD}} = \lambda \cdot \mathcal{L}(G'_k \cdot I_{E_k}, G'_k \cdot I_{\text{GT}}) + (1-\lambda) \cdot \mathcal{L}((1-G'_k) \cdot I_{E_k}, (1-G'_k) \cdot I_{\text{MoE}})$

This formulation encourages each expert to match the MoE fusion in high-confidence regions while using its own output otherwise, allowing real-time deployment of single experts with minimal fidelity loss.

5. Experimental Evaluation and Benchmark Performance

MoE-GS is validated on standard dynamic scene datasets (N3V, Technicolor), demonstrating consistent improvements over individual expert models and previous state-of-the-art 3DGS frameworks. Quantitative metrics include PSNR, SSIM, and LPIPS:

On N3V, MoE-GS configurations (2/3/4 experts) achieve higher PSNR than STG, Ex4DGS, and 4DGaussians baselines.
On Technicolor, 3-expert MoE-GS ranks highest in PSNR, SSIM, and LPIPS across multiple scenes.
Efficiency optimizations (single-pass rendering, pruning) improve FPS and memory usage while preserving or increasing visual fidelity.
Qualitative results show sharper reconstructions and robust temporal consistency, attributed to adaptive expert blending via the volume-aware pixel router.

6. Conceptual Links to Broader Research and Future Directions

MoE-GS draws on and extends previous mixture-of-experts work in implicit neural representations (Ben-Shabat et al., 2024), uncertainty-aware motion enhancement (Guo et al., 2024), explicit static/dynamic separation and interpolation strategies (Lee et al., 2024), and hybrid expert routing/gating mechanisms. The router’s volumetric-to-pixel mapping is distinct from prior per-Gaussian or per-pixel gating.

A plausible implication is that gating strategies incorporating optical flow cues, region complexity, or frequency analysis (Guo et al., 2024, Zhou et al., 7 Aug 2025) could further refine MoE expert specialization. Ongoing research may address challenges of expert transition smoothness, routing stability, and extension to higher-dimensional dynamics.

The modular framework is extensible—new expert models can be added or trained in parallel, and advanced gating functions could leverage learned, scene-dependent features. The distillation strategy presents a pathway to real-time, resource-constrained deployment with MoE-quality reconstructions.

7. Objective Assessment and Open Questions

MoE-GS represents the first mixture-of-experts formulation optimized for dynamic Gaussian splatting. The adaptive expert blending and routing provide robustness to scene and temporal variations previously unattainable with single-expert 3DGS models. Increased model capacity and reduced FPS are inherent limitations of the MoE design, but efficiency strategies and distillation mitigate these concerns.

Open questions include optimizing the router for minimal expert boundary artifacts, scaling to larger expert ensembles with hierarchical gating, and rigorous exploration of the trade-off between routing complexity and reconstruction gain. The generalizability of MoE-GS to extreme dynamic scenes, occluded geometry, or sparse data remains an active research direction.