Domain-Guided Spatial Routing (DSR)

Updated 17 November 2025

Domain-Guided Spatial Routing (DSR) is a mechanism that combines local 3D spatial features with global domain embeddings to guide expert selection in Mixture-of-Experts architectures.
It integrates a dual-branch design with a frozen representation encoder and a sparsely gated domain-aware branch to enable conditional expert routing based on spatial and domain cues.
Coupled with Entropy-Controlled Dynamic Allocation, DSR dynamically adjusts the number of activated experts per token, improving semantic segmentation performance and load balancing.

Domain-Guided Spatial Routing (DSR) is a spatially- and domain-conditioned expert-selection strategy introduced in the DoReMi Mixture-of-Experts (MoE) architecture for 3D point-cloud understanding. DSR enables context-aware selection of specialized expert networks by leveraging both local geometric information and global domain cues. It serves as the principal mechanism by which DoReMi instantiates dynamic, adaptive expert allocation, allowing for fine-grained control over conditional computation in multi-domain scenarios. DSR yields substantial performance improvement in semantic segmentation tasks on diverse 3D benchmarks, providing measurable gains in cross-domain generalization and computational efficiency.

1. Architectural Placement of DSR in DoReMi

In the DoReMi framework, 3D point-cloud inputs are processed by a dual-branch encoder:

Representation (Re) Branch: A frozen, pretrained foundation expert trained on large-scale, multi-attribute self-supervised objectives. This branch encodes cross-domain geometric and structural priors; its output, $f^{Re}$ , is computed as $E_{Re}(f)$ for input features $f$ .
Domain-aware (Do) Branch: Implements a sparsely gated MoE composed of $K$ experts ( $E_1,\ldots,E_K$ ). For each input token, only a selected subset of experts is activated. DSR is responsible for computing per-token expert routing logits using spatially convolved features and learnable domain embeddings.

The final output of each encoder layer is a feature fusion:

$f^{o} = f^{Re} + f^{Do}$

where $f^{Do} = \sum_{j=1}^K w_{:,j} \odot E_j(f)$ , and $w_{:,j}$ is the per-token, per-expert routing weight.

2. Formal Definition and Mathematical Workflow

DSR decomposes routing into spatial conditioning, domain conditioning, and gated expert selection:

Spatial Feature Extraction: Input features $f \in \mathbb{R}^{N \times D}$ (with $N$ tokens, each $D$ -dimensional) are reshaped into a sparse 3D tensor approximating XYZ voxel structure. A learnable 3D convolution and normalization yields spatially-aware features:

$f' = \mathrm{SpatialConv3D}(f)$

Domain Embedding: Each domain identifier $d$ (e.g., ScanNet, S3DIS, Structured3D) is mapped via a learnable embedding table to a domain vector:

$e_d = \mathrm{MLP_{domain}}(d) \in \mathbb{R}^D$

Feature Fusion for Gating: The domain embedding is broadcast-added to the spatial features for each token:

$z = f' + e_d \in \mathbb{R}^{N \times D}$

Gating Network and Logit Computation: A small MLP gating network $G$ produces expert selection logits:

$g = G(z) \in \mathbb{R}^{N \times K}$

Softmax Probabilities: Routing probabilities are assigned for each token $i$ over experts $j$ :

$p_{i,j} = \frac{\exp(g_{i,j})}{\sum_{j'=1}^K \exp(g_{i,j'})}$

3. Entropy-Controlled Dynamic Allocation Coupling

DSR is tightly coupled with Entropy-Controlled Dynamic Allocation (EDA), which modulates routing sparsity based on token-level uncertainty:

Token Entropy Computation:

$H_i = -\sum_{j=1}^K p_{i,j} \cdot \log p_{i,j}$

with $H_{\text{max}} = \log K$ .

Dynamic Expert Count Assignment:

The number of experts $k_i$ to activate per token is adjusted by normalized entropy:

$k_i = \left\lceil \frac{H_i}{H_{\text{max}}} \cdot (K-1) + 1 \right\rceil,\quad k_{\text{min}} = 1,\ k_{\text{max}}=K$

Top-K Selection and Weight Masking:

For each token, only the $k_i$ most probable experts are selected. The weight mask is:

$w_{i,j} = \begin{cases} p_{i,j} & \text{if } j \in E_i^{\text{act}} \ 0 & \text{otherwise} \end{cases}$

Load-Balancing Loss:

To assure even expert utilization:

$c_j = \frac{1}{N} \sum_{i=1}^N \mathbb{1}[j \in E_i^{\text{act}}],\quad r_j = \frac{1}{N} \sum_{i=1}^N p_{i,j}$

$L_{\text{balance}} = K \cdot \sum_{j=1}^K c_j \cdot r_j$

4. Algorithmic Workflow

The DSR mechanism can be expressed as the following pseudocode:

Input: point tokens f ∈ ℝ^{N×D}, domain id d
Hyperparams: K experts, k_min=1, k_max=K

1. f′ ← SpatialConv3D(f)
2. e_d ← MLP_domain(d)
3. z ← f′ + expand(e_d)
4. g ← MLP_gate(z)
5. p ← Softmax(g)
6. H ← −sum_over_j p·log(p)
7. For i in 1…N:
      k_i ← ceil( (H_i / logK)·(K−1) + 1 )
      E_i^act ← top-k_i indices of p_{i,:}
      For j in 1…K:
         w_{i,j} ← p_{i,j} if j ∈ E_i^act else 0
8. f^Do ← [ For i: ∑_{j} w_{i,j} · E_j(f_i) ]
9. f^o ← f^Re + f^Do

5. Empirical Analysis of DSR Performance

DSR’s contribution to semantic segmentation and cross-domain generalization is substantiated through ablation and load-balance studies:

Configuration	ScanNet mIoU (%)	S3DIS mIoU (%)	Increment
Baseline (neither Re/DSR/EDA)	77.5	75.7	-
+ Re branch only	78.8	76.0	+1.3
+ DSR (Re+DSR, no EDA)	79.5	76.4	+0.7
+ EDA (full Re+DSR+EDA)	80.1	77.2	+0.6

Component ablations (Table 7) attribute a direct $+0.7\%$ segmentation mIoU gain to DSR alone (Re+DSR vs. Re only). A similar gain of $+0.7\%$ is observed on S3DIS. Full coupling with EDA further boosts performance, leading to state-of-the-art results.

Analysis in Table 8 suggests optimal placement is at the end of each encoder stage (five stages total) with $K=8$ experts. Expert-utilization plots (Figure 6) demonstrate that domain-specific routing is achieved, as different domains systematically route to distinct expert subsets. Load-balance improvements are evidenced by a decrease in the normalized-std of expert loads from $0.941$ to $0.894$ upon EDA activation (Table 9).

6. Functional Significance and Interaction with EDA

DSR’s core operation is additive fusion of local 3D spatial features with global domain embeddings, followed by token-specific gating. This design ensures that each input point token’s expert routing considers not only geometric context but also the global, dataset-level domain statistics. EDA subsequently refines these routing probabilities by introducing entropy-driven sparsity, thus controlling the number of active experts adaptively and facilitating conditional computation that balances representational diversity and computational efficiency.

The interaction of DSR and EDA ensures:

Contextual expert specialization (via spatial features and domain conditioning)
Dynamic computational allocation in response to token-level uncertainty
Stable and efficient utilization of expert resources across domains

7. Summary and Implications

Domain-Guided Spatial Routing is a key enabler of DoReMi’s competitive performance for multi-domain 3D point-cloud understanding. By grounding expert selection in both geometric context and domain statistics, DSR delivers quantifiable improvements in segmentation accuracy and load balancing. Its coupling with EDA further enhances adaptive computation, establishing a clear precedent for future MoE frameworks in 3D vision. Expert utilization analyses and normalized-std metrics substantiate DSR’s efficacy in domain adaptation. A plausible implication is that similar spatial- and domain-fused routing designs could be extended to other multi-domain, high-dimensional modalities.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Domain-Guided Spatial Routing (DSR).