Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptive Quaternion Cross-Fusion Block

Updated 1 January 2026
  • The paper demonstrates that the A-QCF block significantly improves liver tumor segmentation, achieving a Dice score increase of up to 5.4% compared to unimodal baselines.
  • A-QCF block is a parameter-efficient, bi-directional attention module that fuses modality-specific features using quaternion convolutions in a dual encoder architecture.
  • Empirical evaluations confirm that adaptive gating and cross-attention in the A-QCF block enhance model focus on critical anatomical details even with unpaired CT and MRI datasets.

The Adaptive Quaternion Cross-Fusion (A-QCF) block is a data-driven, bi-directional attention module developed for effective multimodal feature integration in deep networks, specifically targeting scenarios with unpaired medical imaging datasets. Designed as the central mechanism within the A-QCF-Net for liver tumor segmentation from unpaired CT and MRI cohorts, the A-QCF block enables dynamic, gated exchange of abstract modality-specific information using the representational efficiency of Quaternion Neural Networks. It operates at each level of a two-stream U-Net–style encoder architecture, facilitating robust fusion between streams while maintaining parameter efficiency and full unimodal compatibility (V et al., 25 Dec 2025).

1. Architectural Role within A-QCF-Net

The A-QCF block is positioned after each down-sampling stage in the twin quaternion encoder streams of the A-QCF-Net, which simultaneously processes CT and MRI data. Each stream encodes modality-specific features: CT streams emphasize sharp anatomical boundaries, while MRI streams offer enhanced soft-tissue contrast. The A-QCF block performs bidirectional, attention-gated fusion between these streams, enabling the exchange of complementary cues. Fused features are then propagated to the next encoding stage. Following the final encoder, both streams converge at a shared quaternion bottleneck, resulting in a unified modality-agnostic representation.

2. Quaternion Neural Network Foundations

A quaternion q∈Hq \in \mathbb{H} is expressed as q=a+bi+cj+dkq = a + b i + c j + d k, with i2=j2=k2=ijk=−1i^2 = j^2 = k^2 = ijk = -1. Quaternion multiplication (Hamilton product) is non-commutative and defined by: q1⊗q2=(a1a2−b1b2−c1c2−d1d2)+(a1b2+b1a2+c1d2−d1c2)i+(a1c2−b1d2+c1a2+d1b2)j+(a1d2+b1c2−c1b2+d1a2)kq_1 \otimes q_2 = (a_1a_2 - b_1b_2 - c_1c_2 - d_1d_2) + (a_1b_2 + b_1a_2 + c_1d_2 - d_1c_2)i + (a_1c_2 - b_1d_2 + c_1a_2 + d_1b_2)j + (a_1d_2 + b_1c_2 - c_1b_2 + d_1a_2)k In implementation, 3D real-valued feature tensors with $4C$ channels are reinterpreted as CC quaternion channels. Quaternion convolutions use four real sub-kernels {Wr,Wi,Wj,Wk}\{W_r, W_i, W_j, W_k\}, arranged as a real-weight matrix analogously to the Hamilton product. This formulation yields a parameter count that is 25% of an equivalent real-valued convolution, as a k3k^3 real 3D convolution from 4Cin4C_\text{in} to 4Cout4C_\text{out} requires 16CoutCink316C_\text{out}C_\text{in}k^3 weights, whereas the quaternion variant requires only 4CoutCink34C_\text{out}C_\text{in}k^3.

3. Data Flow and Mathematical Operations in the A-QCF Block

Let Fct,Fmri∈RB×4C×D×H×WF_{ct}, F_{mri} \in \mathbb{R}^{B \times 4C \times D \times H \times W} denote the batch-wise raw encoder feature maps for CT and MRI streams.

The A-QCF block updates features through the following sequence:

  1. Quaternion Projections:

Qct=QConvq(Fct)Q_{ct} = Q\mathrm{Conv}_q(F_{ct})

Kmri=QConvk(Fmri),Vmri=QConvv(Fmri)K_{mri} = Q\mathrm{Conv}_k(F_{mri}),\quad V_{mri} = Q\mathrm{Conv}_v(F_{mri})

  1. Channel-wise Cross-Attention:

Amri→ct=softmaxchannel(Qct⊙Kmri)A_{mri \to ct} = \mathrm{softmax}_\text{channel}(Q_{ct} \odot K_{mri})

The softmax operates across the $4C$ channels at every voxel position.

  1. Raw Context Vector:

Crawmri→ct=Amri→ct⊙VmriC_{\text{raw}}^{mri\to ct} = A_{mri \to ct} \odot V_{mri}

  1. Adaptive Gating:

sct=GAP(Fct)∈RB×4C×1×1×1s_{ct} = \mathrm{GAP}(F_{ct}) \in \mathbb{R}^{B \times 4C \times 1 \times 1 \times 1}

sctx=GAP(Craw)s_{ctx} = \mathrm{GAP}(C_{\text{raw}})

λmri→ct=σ(MLP(concat[sct,sctx]))\lambda_{mri\to ct} = \sigma(\mathrm{MLP}(\mathrm{concat}[s_{ct}, s_{ctx}]))

Here, GAP\mathrm{GAP} is global average pooling and σ\sigma is the sigmoid function; λ\lambda is broadcast to the spatial dimensions.

  1. Gated Fusion and Output Projection:

Fct′=QConvout(concat[Fct,λmri→ct⋅Crawmri→ct])F'_{ct} = Q\mathrm{Conv}_{out}(\mathrm{concat}[F_{ct}, \lambda_{mri\to ct} \cdot C_{\text{raw}}^{mri\to ct}])

The reverse path (CT →\to MRI) is computed analogously by swapping stream roles. This mechanism ensures residual-style compatibility: when Fmri≡0F_{mri} \equiv 0, the block reduces to unimodal processing.

4. Parameterization and Practical Implementation

At each encoder stage, feature tensors are maintained as F∈RB×4C×D×H×WF \in \mathbb{R}^{B \times 4C \times D \times H \times W}, with quaternion channel counts C={12,24,48,96,192,256}C = \{12, 24, 48, 96, 192, 256\} across increasing depth. All QConv modules—QConvqQ\mathrm{Conv}_q, QConvkQ\mathrm{Conv}_k, QConvvQ\mathrm{Conv}_v—are implemented as 1×1×11 \times 1 \times 1 quaternion convolutions, while QConvoutQ\mathrm{Conv}_{out} is a 3×3×33 \times 3 \times 3 convolution. The training leverages standard deep learning initialization techniques, such as Kaiming normal initialization on the real-valued sub-kernels, with no explicit quaternion-specific initializers reported.

5. Mechanisms for Bidirectional Knowledge Transfer

The A-QCF block is explicitly engineered for cross-domain information exchange. At each encoding scale, the block:

  1. Computes query projections from CT and key/value projections from MRI.
  2. Constructs an attention map enabling MRI →\to CT context vector formation.
  3. Generates an adaptive, data-driven gating scalar (λ\lambda) that modulates MRI-to-CT (and vice versa) fusion strength, contingent on both the raw CT features and the transferred context.
  4. Fuses this adaptively weighted external context back into the primary stream via a quaternion convolution.
  5. Applies the same mechanism in reverse to facilitate CT→\toMRI transfer.

This design enables mutual regularization, as each modality constrains and enhances the other's intermediate feature space throughout the encoder. If either source feature map is zeroed, the block defaults to a pure unimodal pass, preserving robustness in degraded or single-modality scenarios.

6. Empirical Performance and Ablation Analysis

Empirical evaluation demonstrates that A-QCF-Net, leveraging the A-QCF block, attains a tumor Dice score of 76.7% on LiTS (CT) and 78.3% on ATLAS (MRI), marking improvements of 5.4% and 4.7% over the unimodal nnU-Net baselines, respectively. Ablation studies quantify the impact of block design:

Configuration Mean DSC ± SD Δ DSC (%)
Full A-QCF-Net 0.860 ± 0.007 0
No cross-fusion 0.782 ± 0.018 –9.1
No adaptive gate (λ\lambda static) 0.845 ± 0.009 –1.7
Real-valued backbone 0.821 ± 0.012 –4.5
No shared bottleneck 0.838 ± 0.010 –2.6
No decoder attention gates 0.849 ± 0.008 –1.3

Gate activation statistics, such as a median λMRI→CT=0.82\lambda_{\text{MRI} \to \text{CT}} = 0.82 [0.71, 0.91] at stage 1 and $0.49$ [0.31, 0.65] at stage 4, confirm depth-dependent modulation of cross-fusion. Grad-CAM and Grad-CAM++ analyses indicate that the fused representations concentrate model attention on relevant pathological anatomy, supporting the clinical interpretability and fidelity of the cross-modal feature integration.

7. Integration within Training and Unified Representation

A-QCF blocks are inserted after each of four encoder down-sampling blocks per stream. Following encoding, both streams merge into a shared quaternion bottleneck, enforcing the learning of a joint, modality-agnostic latent space. Decoders with attention gates mirror the encoders, incorporating skip connections modulated via quaternion attention. Training is driven by a composite loss: Ltotal=LDiceCE(yct,y^ct)+LDiceCE(ymri,y^mri),L_\text{total} = L_\text{DiceCE}(y_{ct}, \hat{y}_{ct}) + L_\text{DiceCE}(y_{mri}, \hat{y}_{mri}), where LDiceCEL_\text{DiceCE} denotes the sum of Dice and cross-entropy losses. AdamW optimizer is employed with learning rate 1×10−41 \times 10^{-4}, weight decay 1×10−51 \times 10^{-5}, and adaptive learning rate scheduling.

In summary, the Adaptive Quaternion Cross-Fusion block constitutes a robust, parameter-efficient, and adaptively gated mechanism for bidirectional multimodal feature integration. Its ability to operate with unpaired datasets and achieve superior segmentation accuracy is empirically substantiated, validating its role as the core component underpinning unified multimodal neural architectures for medical imaging tasks (V et al., 25 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Quaternion Cross-Fusion (A-QCF) Block.