Validation of BidirLM on non‑transformer causal architectures

Validate the applicability of the BidirLM adaptation and composition framework for transforming causal decoder language models into bidirectional encoders to non-transformer causal architectures, specifically state-space models such as Mamba and Gated Delta Networks.

Background

BidirLM adapts causal decoder LLMs into bidirectional encoders by enabling bidirectional attention and using a two-step training pipeline: masked next-token prediction followed by contrastive learning. To scale and preserve knowledge, the approach combines linear weight merging with a lightweight multi-domain data mixture. The paper demonstrates strong results on transformer-based backbones (Gemma3 and Qwen3) and shows successful composition with domain- and modality-specialized transformer variants.

All results in the paper rely on transformer decoder architectures. The authors explicitly note that extending and validating the same adaptation and merging procedure on non-transformer causal backbones—particularly state-space models such as Mamba and Gated Delta Networks—has not yet been established and remains open.

References

Finally, validating our framework on non-transformer causal architectures, such as state-space models~\citep{gu2024mambalineartimesequencemodeling, yang2025gateddeltanetworksimproving}, remains an open question.

BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs  (2604.02045 - Boizard et al., 2 Apr 2026) in Future Work — Additional mitigation techniques and model architectures