Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hierarchical Bi-directional State Scan (HiBiSS)

Updated 31 May 2026
  • The paper’s main contribution is the introduction of HiBiSS, a bi-directional SSM scan that overcomes unidirectional limitations by enforcing axis-aligned smoothing.
  • HiBiSS employs four coupled directional recurrences within each HiSS block to fuse self-attention outputs with state-space mixers, ensuring multi-view consistency.
  • Empirical results reveal that HiBiSS significantly improves FID and multi-view error metrics in 3D head regression compared to unidirectional scan variants.

Hierarchical Bi-directional State Scan (HiBiSS) is a specialized State Space Model (SSM) scan architecture introduced in the context of single-shot 3D Gaussian head avatar regression, as developed in the MVCHead framework for multi-view-consistent 3D generative modeling without the use of multi-view supervision or intermediate view synthesis (Chharia et al., 24 May 2026). HiBiSS constitutes the principal architectural innovation within each Hierarchical State Space (HiSS) block, systematically addressing both the spatial anisotropies and directional dependencies associated with multi-view consistency.

1. Motivation and Core Principles

HiBiSS was designed to resolve the limitations of unidirectional recurrent scans, such as those originally adopted in the Mamba SSM architecture, which are restricted to causal left-to-right propagation. In the domain of 3D head regression, this restriction inhibits the communication of information along the vertical axis, resulting in insufficient integration of global context and suboptimal handling of multi-view inconsistencies, particularly yaw-induced horizontal drift and pitch-induced vertical drift. By introducing coupled, bi-directional 2D recurrences explicitly aligned with these axes (rightward, leftward, downward, upward), HiBiSS enforces axis-aligned smoothing and cross-row/column coherence to directly attenuate the principal directions of view-dependent drift.

2. HiSS Block Architecture and HiBiSS Integration

Each HiSS block operates at a specific resolution level ll in a coarse-to-fine hierarchy, processing an H×W×d feature grid FRH×W×dF \in \mathbb{R}^{H \times W \times d}. Two parallel feature mixers—Self-Attention + MLP and the State-Space Mixer (HiBiSS)—process the input in tandem. After separate processing, the outputs are fused, typically via summation or concatenation with subsequent linear transformation. Per-attribute MLP heads then regress Gaussian parameter offsets from coarser level anchors. The integrated pipeline for each HiSS block is as follows:

hi,j+1=Ahhi,j+BhFi,jh^{\rightarrow}_{i,j+1} = A_h\,h^{\rightarrow}_{i,j} + B_h\,F_{i,j}3

3. Hierarchical Bi-directional Scan Algorithm and Mathematical Formulation

HiBiSS executes four coupled SSM scans per block—one for each direction (→, ←, ↓, ↑)—by maintaining separate hidden state tensors. The update and output equations per direction are as follows:

Horizontal Forward (→):

  • Hidden state: hi,j+1=Ahhi,j+BhFi,jh^{\rightarrow}_{i,j+1} = A_h\,h^{\rightarrow}_{i,j} + B_h\,F_{i,j}
  • Output: F~i,jhor=Chhi,j+DhFi,j\tilde F^{\mathrm{hor}}_{i,j} = C_h\,h^{\rightarrow}_{i,j} + D_h\,F_{i,j}

Vertical Forward (↓):

  • Hidden state: hi+1,j=Avhi,j+BvFi,jh^{\downarrow}_{i+1,j} = A_v\,h^{\downarrow}_{i,j} + B_v\,F_{i,j}
  • Output: F~i,jver=Cvhi,j+DvFi,j\tilde F^{\mathrm{ver}}_{i,j} = C_v\,h^{\downarrow}_{i,j} + D_v\,F_{i,j}

Directional Output Fusion:

  • Aggregate the outputs:

F~i,j=d{,,,}WdF~i,jd\tilde F_{i,j} = \sum_{d \in \{\rightarrow, \leftarrow, \downarrow, \uparrow\}} W_{d}\,\tilde F^{d}_{i,j}

All directions are scanned and fused for each spatial location, ensuring full axis-aligned context propagation.

4. Data-Flow and Implementation Workflow

The data flow through each HiBiSS block is structured as follows:

  1. Grid Projection: Linear projection maps the input token matrix XR(HW)×dX \in \mathbb{R}^{(H \cdot W) \times d} to the grid FRH×W×dF \in \mathbb{R}^{H \times W \times d}.
  2. Directional SSM Scans: The four directional scans (HiBiSS) are performed on FF, computing outputs FRH×W×dF \in \mathbb{R}^{H \times W \times d}0.
  3. Directional Fusion: The outputs are fused pointwise to form FRH×W×dF \in \mathbb{R}^{H \times W \times d}1.
  4. Token Layout Restoration: FRH×W×dF \in \mathbb{R}^{H \times W \times d}2 is projected back to the token layout, residual-added to FRH×W×dF \in \mathbb{R}^{H \times W \times d}3, and layer-normalized.
  5. Feed-forward Output: The normalized result is fed to downstream MLP, attention heads, or subsequent HiSS blocks.

5. Training Paradigm and Interoperation with MVCHead

HiBiSS operates at each resolution level FRH×W×dF \in \mathbb{R}^{H \times W \times d}4 to FRH×W×dF \in \mathbb{R}^{H \times W \times d}5 in the HiSS hierarchy on fixed spatial grids (e.g., 32×32 or 64×64). Hidden dimension FRH×W×dF \in \mathbb{R}^{H \times W \times d}6 typically ranges between 256–512. Structured SSM kernels FRH×W×dF \in \mathbb{R}^{H \times W \times d}7 employ diagonal plus low-rank “HiPPO” parameterizations for computational efficiency. Each directional scan incurs FRH×W×dF \in \mathbb{R}^{H \times W \times d}8 complexity; all four together approximately quadruple the cost relative to a single SSM, resulting in a runtime of 1–2 ms per block on an H100 GPU at FRH×W×dF \in \mathbb{R}^{H \times W \times d}9.

HiBiSS is jointly trained with the SE(3) Multi-view Critic, which evaluates the rendered consistency of regressed 3D Gaussians across sampled transformations hi,j+1=Ahhi,j+BhFi,jh^{\rightarrow}_{i,j+1} = A_h\,h^{\rightarrow}_{i,j} + B_h\,F_{i,j}0. The multi-view consistency loss is:

hi,j+1=Ahhi,j+BhFi,jh^{\rightarrow}_{i,j+1} = A_h\,h^{\rightarrow}_{i,j} + B_h\,F_{i,j}1

The overall objective blends this with an adversarial loss, a KNN shape loss, and additional contrastive regularization:

hi,j+1=Ahhi,j+BhFi,jh^{\rightarrow}_{i,j+1} = A_h\,h^{\rightarrow}_{i,j} + B_h\,F_{i,j}2

During inference, only HiBiSS forward passes are executed, ensuring real-time single-shot 3D Gaussian regression.

6. Comparative Evaluation and Empirical Results

Ablation studies underscore HiBiSS’s efficacy in enforcing multi-view geometric consistency and image realism. On the FFHQ-C 512×512 benchmark, the following Fréchet Inception Distance (FID) and Multi-View Error in 3D Reconstruction (MEt3R) were observed:

Model Variant FID MEt3R
MVCHead (HiBiSS) 3.94 0.2620
– w/o HiBiSS (unidirectional scan) 4.78 0.2873
– w/o entire HiSS state-space (no SS2D) 5.28 0.2948

These results indicate that the axis-aligned, bidirectional recurrence imposed by HiBiSS confers measurable advantages in both visual fidelity and multi-view geometric consistency over strictly unidirectional or SSM-absent variants.

7. Architectural Significance and Interactions

HiBiSS distinguishes itself from the original Mamba SSM scan—which is limited to left-to-right (causal) recurrence—by guaranteeing both horizontal and vertical propagation through four coupled recurrences with shared parameter structures. The state-space machinery coexists with parallel self-attention mixers within the HiSS block, each targeting complementary modeling objectives: HiBiSS enforces axis-aligned smoothness and pose-aware feature fusion, whereas self-attention captures global facial semantics. Gradients from the SE(3) Multi-view Critic propagate through the renderer, regressed Gaussian parameters, and the HiBiSS blocks, biasing SSM kernel parameters toward axis-aligned smoothness that reduces cross-view inconsistencies. A plausible implication is that this synergy yields robust pose-aware anisotropic smoothing without requiring explicit multi-view supervision or intermediate 2D view generation.


For implementation details, explicit pseudocode, ablation protocols, and released datasets, see the original MVCHead publication (Chharia et al., 24 May 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hierarchical Bi-directional State Scan (HiBiSS).