Papers
Topics
Authors
Recent
2000 character limit reached

Hyperbolic State Space Hallucination

Updated 9 January 2026
  • Hyperbolic State Space Hallucination is a framework for fine-grained domain generalization that uses state space hallucination and hyperbolic manifold consistency to achieve style-invariant representations.
  • The SSH module quantitatively perturbs channel-wise statistics to simulate diverse style variations, while the HMC module employs hyperbolic geometry to enforce embedding consistency.
  • Empirical evaluations demonstrate significant accuracy improvements over benchmarks, validating the effectiveness of multi-scale style enrichment in maintaining fine-grained separability.

Hyperbolic State Space Hallucination (HSSH) is a framework for fine-grained domain generalization (FGDG) designed to produce representations that are robust to unseen domain-specific style variations, particularly those that threaten the discernment of subtle category-defining patterns. HSSH advances the state-of-the-art by integrating two modules—State Space Hallucination (SSH) and Hyperbolic Manifold Consistency (HMC)—within a Vision Mamba backbone, achieving style-invariant fine-grained separability via style extrapolation and hyperbolic embedding consistency. This approach addresses the fragility of fine-grained recognition tasks (e.g., subtle plumage differences in bird species) under strong cross-domain style shifts such as illumination and color, achieving superior accuracy on multiple FGDG benchmarks (Bi et al., 10 Apr 2025).

1. Architectural Overview

HSSH builds on a four-stage Vision Mamba encoder, introducing SSH and HMC in each block to handle style perturbations and exploit hyperbolic geometry for discriminative embedding. At each encoder block ii, the batch state tensor si=FiRB×Ci×Hi×Wis^i = \mathbf{F}^i \in \mathbb{R}^{B\times C_i\times H_i\times W_i} undergoes style hallucination via SSH to yield s^i=Fi~\hat s^i = \widetilde{\mathbf{F}^i}. Both sis^i and s^i\hat s^i are mapped to hyperbolic embeddings zi=ϕ(si),  z^i=ϕ(s^i)z^i = \phi(s^i),\; \hat z^i = \phi(\hat s^i) via exponential map, and HMC enforces style invariance by minimizing their geodesic distance i=14dH(zi,z^i)\sum_{i=1}^4 d_\mathbb{H}(z^i,\hat z^i). The model’s classifier is shared between final-layer original and hallucinated states to enforce fine-grained categorization consistent across styles.

2. State Space Hallucination (SSH)

SSH operates blockwise, quantifying channel-wise "style" for each sample via mean and standard deviation over spatial locations:

μj,ki=1HiWih,wFj,k,h,wi,σj,ki=1HiWih,w(Fj,k,h,wiμj,ki)2\mu^i_{j,k} = \frac{1}{H_iW_i}\sum_{h,w}F^i_{j,k,h,w},\qquad \sigma^i_{j,k} = \sqrt{\frac{1}{H_iW_i}\sum_{h,w}(F^i_{j,k,h,w}-\mu^i_{j,k})^2}

for j[1..B]j\in[1..B], k[1..Ci]k\in[1..C_i]. A least-squares fit of σ=γμ+b\sigma = \gamma \mu + b extracts a slope γi\gamma^i; the range [minγi,maxγi][\min\gamma^i,\max\gamma^i] is extrapolated to [2minγimaxγi,2maxγiminγi][2\min\gamma^i-\max\gamma^i,\,2\max\gamma^i-\min\gamma^i], from which a randomized γ~i\widetilde{\gamma}^i is sampled. This defines a hallucinated style pair (μ~i,σ~i)(\widetilde{\mu}^i, \widetilde{\sigma}^i) along the extrapolated line. The feature tensor is re-normalized:

F^i=σ~iFiμiσi+μ~i\hat F^i = \widetilde{\sigma}^i\frac{F^i-\mu^i}{\sigma^i}+\widetilde{\mu}^i

producing s^i\hat s^i, the style-hallucinated state. Performing SSH at all four encoder stages enhances multi-scale style enrichment, as verified by empirical ablation.

3. Hyperbolic Manifold Consistency (HMC)

To ensure style perturbations via SSH do not alter fine-grained identity, HMC maps both sis^i and s^i\hat s^i into a Poincaré ball Bcn={xRn:cx2<1}\mathcal{B}_c^n = \{x\in\mathbb{R}^n : c\|x\|^2 < 1\} of constant negative curvature c-c using the exponential map at the origin:

ϕ(x)=exp0c(x)=tanh ⁣(cx)xcx\phi(x) = \exp_0^c(x) = \tanh\!\left(\sqrt{c}\|x\|\right)\frac{x}{\sqrt{c}\|x\|}

The hyperbolic distance is given by

dH(u,v)=2ctanh1(c ⁣ucv)d_{\mathbb{H}}(u,v) = \frac{2}{\sqrt{c}}\tanh^{-1}\left(\sqrt{c}\|-\!u\oplus_c v\|\right)

where Möbius addition is

ucv=(1+2cu,v+cv2)u+(1cu2)v1+2cu,v+c2u2v2u\oplus_c v = \frac{(1+2c\langle u,v\rangle +c\|v\|^2)u + (1-c\|u\|^2)v}{1+2c\langle u,v\rangle +c^2\|u\|^2\|v\|^2}

Minimizing the hyperbolic geodesic between ziz^i and z^i\hat z^i encourages style-invariant fine-grained separation, leveraging the ball’s exponential distance scaling for semantic amplification.

4. Training Objectives and Optimization

At the final block (i=4i=4), s4s^4 and s^4\hat s^4 are input to a shared linear classifier ϕcls\phi_\mathrm{cls} of dimension CfineC_\mathrm{fine}. Cross-entropy classification losses are incurred for both states:

Lcls=1Bj=1Bk=1Cfineyj,klog[ϕcls(s4)j,k]\mathcal{L}_\mathrm{cls} = -\frac{1}{B}\sum_{j=1}^B \sum_{k=1}^{C_\mathrm{fine}} y_{j,k}\log[\phi_\mathrm{cls}(s^4)_{j,k}]

L~cls=1Bj=1Bk=1Cfineyj,klog[ϕcls(s^4)j,k]\widetilde{\mathcal{L}}_\mathrm{cls} = -\frac{1}{B}\sum_{j=1}^B \sum_{k=1}^{C_\mathrm{fine}} y_{j,k}\log[\phi_\mathrm{cls}(\hat s^4)_{j,k}]

HMC imposes the style-invariance constraint:

LHMC=i=14j=1BdH(zji,z^ji)\mathcal{L}_\mathrm{HMC} = \sum_{i=1}^4 \sum_{j=1}^B d_\mathbb{H}(z^i_j, \hat z^i_j)

The overall loss is:

L=Lcls+L~cls+λLHMC,λ=0.5\mathcal{L} = \mathcal{L}_\mathrm{cls} + \widetilde{\mathcal{L}}_\mathrm{cls} + \lambda\mathcal{L}_\mathrm{HMC},\qquad \lambda=0.5

Optimization is performed using Adam for 100 epochs, following standard deep learning practices.

5. Role of Hyperbolic Geometry in Fine-grained Discrimination

Hyperbolic space encodes hierarchical and high-order statistical relations, enabling exponential expansion of distance near the ball boundary. This property allows the model to represent many closely related fine-grained classes without crowding, facilitating the discernment of minute semantic differences—such as subtle texture or shape cues—between categories. Projecting state embeddings to a negatively curved manifold amplifies these subtle cues, while suppressing linear style variations arising from SSH, thereby ensuring fine-grained separability even under substantial appearance shifts.

6. Empirical Evaluation in Fine-grained Domain Generalization

HSSH achieves state-of-the-art results on three FGDG benchmarks:

  • CUB ↔ Paintings (birds): VMamba baseline: 63.47% avg.; FSDG (prior best): 55.10% avg.; HSSH: 66.03% avg. (↑2.56% over VMamba, ↑16.14% over FSDG)
  • RS-FGDG (remote-sensing scenes): VMamba: 66.85% avg.; HSSH: 69.65% avg. (↑2.80%)
  • Birds-31 (natural-image domains): VMamba: 88.24% avg.; FSDG: 82.31% avg.; HSSH: 90.69% avg. (↑2.45% over VMamba, ↑8.38% over FSDG)

Ablation studies on CUB–Paintings show SSH alone raises accuracy from 63.47% to 64.86%, with further gains to 66.03% when HMC is added. Hallucinating all four stages of the encoder produces greater performance than any subset, highlighting the importance of multi-scale style enrichment and hyperbolic alignment. These results support the effectiveness of HSSH in maintaining discriminative fine-grained recognition across substantial domain-induced style variation.

7. Contextual Significance and Implications

The introduction of Hyperbolic State Space Hallucination represents a shift in FGDG methodology, combining linear-time style extrapolation with hyperbolic manifold consistency to safeguard fine-grained feature separability under diverse unseen conditions. This design effectively mitigates the collapse of subtle semantic cues under cross-domain style shifts, as evidenced by substantial improvements over prior methods. A plausible implication is that negative curvature embedding may find broader applicability in other problem domains requiring robust hierarchical or fine-grained structure under distributional drift (Bi et al., 10 Apr 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Hyperbolic State Space Hallucination (HSSH).