Adaptive Quaternion Cross-Fusion Block
- The paper demonstrates that the A-QCF block significantly improves liver tumor segmentation, achieving a Dice score increase of up to 5.4% compared to unimodal baselines.
- A-QCF block is a parameter-efficient, bi-directional attention module that fuses modality-specific features using quaternion convolutions in a dual encoder architecture.
- Empirical evaluations confirm that adaptive gating and cross-attention in the A-QCF block enhance model focus on critical anatomical details even with unpaired CT and MRI datasets.
The Adaptive Quaternion Cross-Fusion (A-QCF) block is a data-driven, bi-directional attention module developed for effective multimodal feature integration in deep networks, specifically targeting scenarios with unpaired medical imaging datasets. Designed as the central mechanism within the A-QCF-Net for liver tumor segmentation from unpaired CT and MRI cohorts, the A-QCF block enables dynamic, gated exchange of abstract modality-specific information using the representational efficiency of Quaternion Neural Networks. It operates at each level of a two-stream U-Net–style encoder architecture, facilitating robust fusion between streams while maintaining parameter efficiency and full unimodal compatibility (V et al., 25 Dec 2025).
1. Architectural Role within A-QCF-Net
The A-QCF block is positioned after each down-sampling stage in the twin quaternion encoder streams of the A-QCF-Net, which simultaneously processes CT and MRI data. Each stream encodes modality-specific features: CT streams emphasize sharp anatomical boundaries, while MRI streams offer enhanced soft-tissue contrast. The A-QCF block performs bidirectional, attention-gated fusion between these streams, enabling the exchange of complementary cues. Fused features are then propagated to the next encoding stage. Following the final encoder, both streams converge at a shared quaternion bottleneck, resulting in a unified modality-agnostic representation.
2. Quaternion Neural Network Foundations
A quaternion is expressed as , with . Quaternion multiplication (Hamilton product) is non-commutative and defined by: In implementation, 3D real-valued feature tensors with $4C$ channels are reinterpreted as quaternion channels. Quaternion convolutions use four real sub-kernels , arranged as a real-weight matrix analogously to the Hamilton product. This formulation yields a parameter count that is 25% of an equivalent real-valued convolution, as a real 3D convolution from to requires weights, whereas the quaternion variant requires only .
3. Data Flow and Mathematical Operations in the A-QCF Block
Let denote the batch-wise raw encoder feature maps for CT and MRI streams.
The A-QCF block updates features through the following sequence:
- Quaternion Projections:
- Channel-wise Cross-Attention:
The softmax operates across the $4C$ channels at every voxel position.
- Raw Context Vector:
- Adaptive Gating:
Here, is global average pooling and is the sigmoid function; is broadcast to the spatial dimensions.
- Gated Fusion and Output Projection:
The reverse path (CT MRI) is computed analogously by swapping stream roles. This mechanism ensures residual-style compatibility: when , the block reduces to unimodal processing.
4. Parameterization and Practical Implementation
At each encoder stage, feature tensors are maintained as , with quaternion channel counts across increasing depth. All QConv modules—, , —are implemented as quaternion convolutions, while is a convolution. The training leverages standard deep learning initialization techniques, such as Kaiming normal initialization on the real-valued sub-kernels, with no explicit quaternion-specific initializers reported.
5. Mechanisms for Bidirectional Knowledge Transfer
The A-QCF block is explicitly engineered for cross-domain information exchange. At each encoding scale, the block:
- Computes query projections from CT and key/value projections from MRI.
- Constructs an attention map enabling MRI CT context vector formation.
- Generates an adaptive, data-driven gating scalar () that modulates MRI-to-CT (and vice versa) fusion strength, contingent on both the raw CT features and the transferred context.
- Fuses this adaptively weighted external context back into the primary stream via a quaternion convolution.
- Applies the same mechanism in reverse to facilitate CTMRI transfer.
This design enables mutual regularization, as each modality constrains and enhances the other's intermediate feature space throughout the encoder. If either source feature map is zeroed, the block defaults to a pure unimodal pass, preserving robustness in degraded or single-modality scenarios.
6. Empirical Performance and Ablation Analysis
Empirical evaluation demonstrates that A-QCF-Net, leveraging the A-QCF block, attains a tumor Dice score of 76.7% on LiTS (CT) and 78.3% on ATLAS (MRI), marking improvements of 5.4% and 4.7% over the unimodal nnU-Net baselines, respectively. Ablation studies quantify the impact of block design:
| Configuration | Mean DSC ± SD | Δ DSC (%) |
|---|---|---|
| Full A-QCF-Net | 0.860 ± 0.007 | 0 |
| No cross-fusion | 0.782 ± 0.018 | –9.1 |
| No adaptive gate ( static) | 0.845 ± 0.009 | –1.7 |
| Real-valued backbone | 0.821 ± 0.012 | –4.5 |
| No shared bottleneck | 0.838 ± 0.010 | –2.6 |
| No decoder attention gates | 0.849 ± 0.008 | –1.3 |
Gate activation statistics, such as a median [0.71, 0.91] at stage 1 and $0.49$ [0.31, 0.65] at stage 4, confirm depth-dependent modulation of cross-fusion. Grad-CAM and Grad-CAM++ analyses indicate that the fused representations concentrate model attention on relevant pathological anatomy, supporting the clinical interpretability and fidelity of the cross-modal feature integration.
7. Integration within Training and Unified Representation
A-QCF blocks are inserted after each of four encoder down-sampling blocks per stream. Following encoding, both streams merge into a shared quaternion bottleneck, enforcing the learning of a joint, modality-agnostic latent space. Decoders with attention gates mirror the encoders, incorporating skip connections modulated via quaternion attention. Training is driven by a composite loss: where denotes the sum of Dice and cross-entropy losses. AdamW optimizer is employed with learning rate , weight decay , and adaptive learning rate scheduling.
In summary, the Adaptive Quaternion Cross-Fusion block constitutes a robust, parameter-efficient, and adaptively gated mechanism for bidirectional multimodal feature integration. Its ability to operate with unpaired datasets and achieve superior segmentation accuracy is empirically substantiated, validating its role as the core component underpinning unified multimodal neural architectures for medical imaging tasks (V et al., 25 Dec 2025).