Hyperbolic State Space Hallucination
- Hyperbolic State Space Hallucination is a framework for fine-grained domain generalization that uses state space hallucination and hyperbolic manifold consistency to achieve style-invariant representations.
- The SSH module quantitatively perturbs channel-wise statistics to simulate diverse style variations, while the HMC module employs hyperbolic geometry to enforce embedding consistency.
- Empirical evaluations demonstrate significant accuracy improvements over benchmarks, validating the effectiveness of multi-scale style enrichment in maintaining fine-grained separability.
Hyperbolic State Space Hallucination (HSSH) is a framework for fine-grained domain generalization (FGDG) designed to produce representations that are robust to unseen domain-specific style variations, particularly those that threaten the discernment of subtle category-defining patterns. HSSH advances the state-of-the-art by integrating two modules—State Space Hallucination (SSH) and Hyperbolic Manifold Consistency (HMC)—within a Vision Mamba backbone, achieving style-invariant fine-grained separability via style extrapolation and hyperbolic embedding consistency. This approach addresses the fragility of fine-grained recognition tasks (e.g., subtle plumage differences in bird species) under strong cross-domain style shifts such as illumination and color, achieving superior accuracy on multiple FGDG benchmarks (Bi et al., 10 Apr 2025).
1. Architectural Overview
HSSH builds on a four-stage Vision Mamba encoder, introducing SSH and HMC in each block to handle style perturbations and exploit hyperbolic geometry for discriminative embedding. At each encoder block , the batch state tensor undergoes style hallucination via SSH to yield . Both and are mapped to hyperbolic embeddings via exponential map, and HMC enforces style invariance by minimizing their geodesic distance . The model’s classifier is shared between final-layer original and hallucinated states to enforce fine-grained categorization consistent across styles.
2. State Space Hallucination (SSH)
SSH operates blockwise, quantifying channel-wise "style" for each sample via mean and standard deviation over spatial locations:
for , . A least-squares fit of extracts a slope ; the range is extrapolated to , from which a randomized is sampled. This defines a hallucinated style pair along the extrapolated line. The feature tensor is re-normalized:
producing , the style-hallucinated state. Performing SSH at all four encoder stages enhances multi-scale style enrichment, as verified by empirical ablation.
3. Hyperbolic Manifold Consistency (HMC)
To ensure style perturbations via SSH do not alter fine-grained identity, HMC maps both and into a Poincaré ball of constant negative curvature using the exponential map at the origin:
The hyperbolic distance is given by
where Möbius addition is
Minimizing the hyperbolic geodesic between and encourages style-invariant fine-grained separation, leveraging the ball’s exponential distance scaling for semantic amplification.
4. Training Objectives and Optimization
At the final block (), and are input to a shared linear classifier of dimension . Cross-entropy classification losses are incurred for both states:
HMC imposes the style-invariance constraint:
The overall loss is:
Optimization is performed using Adam for 100 epochs, following standard deep learning practices.
5. Role of Hyperbolic Geometry in Fine-grained Discrimination
Hyperbolic space encodes hierarchical and high-order statistical relations, enabling exponential expansion of distance near the ball boundary. This property allows the model to represent many closely related fine-grained classes without crowding, facilitating the discernment of minute semantic differences—such as subtle texture or shape cues—between categories. Projecting state embeddings to a negatively curved manifold amplifies these subtle cues, while suppressing linear style variations arising from SSH, thereby ensuring fine-grained separability even under substantial appearance shifts.
6. Empirical Evaluation in Fine-grained Domain Generalization
HSSH achieves state-of-the-art results on three FGDG benchmarks:
- CUB ↔ Paintings (birds): VMamba baseline: 63.47% avg.; FSDG (prior best): 55.10% avg.; HSSH: 66.03% avg. (↑2.56% over VMamba, ↑16.14% over FSDG)
- RS-FGDG (remote-sensing scenes): VMamba: 66.85% avg.; HSSH: 69.65% avg. (↑2.80%)
- Birds-31 (natural-image domains): VMamba: 88.24% avg.; FSDG: 82.31% avg.; HSSH: 90.69% avg. (↑2.45% over VMamba, ↑8.38% over FSDG)
Ablation studies on CUB–Paintings show SSH alone raises accuracy from 63.47% to 64.86%, with further gains to 66.03% when HMC is added. Hallucinating all four stages of the encoder produces greater performance than any subset, highlighting the importance of multi-scale style enrichment and hyperbolic alignment. These results support the effectiveness of HSSH in maintaining discriminative fine-grained recognition across substantial domain-induced style variation.
7. Contextual Significance and Implications
The introduction of Hyperbolic State Space Hallucination represents a shift in FGDG methodology, combining linear-time style extrapolation with hyperbolic manifold consistency to safeguard fine-grained feature separability under diverse unseen conditions. This design effectively mitigates the collapse of subtle semantic cues under cross-domain style shifts, as evidenced by substantial improvements over prior methods. A plausible implication is that negative curvature embedding may find broader applicability in other problem domains requiring robust hierarchical or fine-grained structure under distributional drift (Bi et al., 10 Apr 2025).