Grouped Coordinate Attention (GCA)

Updated 24 November 2025

Grouped Coordinate Attention (GCA) is an attention mechanism that decomposes feature channels into groups and applies directional pooling to achieve efficient global context modeling.
GCA integrates with CNN backbones to enhance long-range dependency and boundary segmentation, significantly boosting performance in tasks like medical image segmentation.
By using lightweight 1×1 bottlenecks and minimal overhead (under 5% additional FLOPs), GCA approximates global attention while preserving fine spatial details.

Grouped Coordinate Attention (GCA) is an attention mechanism designed to endow convolutional neural networks with efficient long-range dependency modeling by decomposing feature channels into groups, applying directional pooling along spatial axes, and producing global but lightweight attention maps. Originally introduced in the context of medical image segmentation, GCA addresses the limitations of both standard convolution (localized receptive field) and full self-attention (quadratic computational cost), enabling convolutional backbones to approximate global context with negligible parameter and FLOP overhead compared to transformer-based mechanisms (Ding et al., 18 Nov 2025).

1. Motivation and Conceptual Framework

Conventional convolution operations provide localized feature extraction and struggle to encode long-range contextual information essential for tasks such as boundary delineation in medical images. While self-attention mechanisms, as instantiated in Transformers, allow arbitrary global modeling, they exhibit $O(H^2W^2C)$ cost for feature tensors of spatial resolution $H \times W$ and channel dimension $C$ . Existing lightweight attention modules (e.g., SE, CBAM, CoordAtt) are either too aggressive in collapsing spatial inputs or incur non-trivial convolutional overhead to achieve full 2D spatial modeling.

GCA counters these limitations by:

Partitioning the channel dimension into $G$ groups of size $C_g = C/G$ ,
Performing coordinate pooling (average and max) independently along height and width within each group,
Fusing these axis-pooled descriptors through a two-stage, shared 1×1 bottleneck to produce directional attention maps,
Broadcasting these maps to reweight input features within each group, then recombining the results.

This hierarchical and group-wise approach yields efficient, directionally-aware global attention, preserving fine spatial information while incurring under 5% additional FLOPs per block relative to standard convolutions (Ding et al., 18 Nov 2025).

2. Mathematical Formulation and Mechanism

Let $X \in \mathbb{R}^{B \times C \times H \times W}$ denote the input tensor, where $B$ is batch size, $C$ channels, and $H$ , $W$ spatial dimensions. The steps are:

Channel Grouping: $X$ is split into $G$ groups: $X = [X_1, \ldots, X_G]$ , each $X_g \in \mathbb{R}^{B \times C_g \times H \times W}$ .
Directional (Coordinate) Pooling for each group $g$ $g$ :
- Height:
  - $f_{g}^{h,\text{avg}} = \text{AvgPool}_W(X_g) \in \mathbb{R}^{B \times C_g \times H \times 1}$
  - $f_{g}^{h,\text{max}} = \text{MaxPool}_W(X_g) \in \mathbb{R}^{B \times C_g \times H \times 1}$
- Width:
  - $f_{g}^{w,\text{avg}} = \text{AvgPool}_H(X_g) \in \mathbb{R}^{B \times C_g \times 1 \times W}$
  - $f_{g}^{w,\text{max}} = \text{MaxPool}_H(X_g) \in \mathbb{R}^{B \times C_g \times 1 \times W}$
- Descriptors: $F_g^h = [f_{g}^{h,\text{avg}}, f_{g}^{h,\text{max}}] \in \mathbb{R}^{B \times 2C_g \times H \times 1}$ ; similarly $F_g^w$ .
Shared Bottleneck and Attention Generation:
- The concatenated descriptors $[F_g^h;F_g^w]$ are processed by a two-stage 1×1 convolution bottleneck (reduction ratio $r$ ), BN, ReLU, and Sigmoid.
- Output is split into $A_g^h \in \mathbb{R}^{B \times C_g \times H \times 1}$ and $A_g^w \in \mathbb{R}^{B \times C_g \times 1 \times W}$ .
Feature Reweighting and Merge:
- $Y_g = X_g \odot A_g^h \odot A_g^w$ .
- Final output: $Y = \text{Concat}(Y_1, \ldots, Y_G) \in \mathbb{R}^{B \times C \times H \times W}$ .

3. Integration into Residual Architectures

In the GCA-ResUNet architecture, the GCA module is integrated into the bottleneck block of a ResNet-50 backbone. The sequence is as follows:

Standard bottleneck: $x \rightarrow$ 1×1 Conv (reduction) $\rightarrow$ 3×3 Conv $\rightarrow$ 1×1 Conv (expansion) $\rightarrow$ BN.
GCA is applied to the output of this expansion+BN, prior to residual addition.
The GCA-reweighted features are then added to the skip (identity) connection and passed through a ReLU activation.

Applying GCA at this location ensures both local (convolutional) and long-range (grouped coordinate attention) dependencies are encoded in each residual unit, prior to merging with the skip connection (Ding et al., 18 Nov 2025).

4. Computational Complexity and Comparison

The parameter and FLOP profile of GCA is:

Per group: Two 1×1 convolutions require $3C_g^2/r$ parameters.
Total: $3C^2/r$ parameters across $G$ groups.

For typical settings ( $C=256$ , $H=W=56$ , $r=16$ ), this is $<$ 2% extra parameters of the original bottleneck, and $<$ 5% FLOP overhead per block. In comparison:

Layer	FLOPs growth	Parameters
Self-attention	$O((H W)^2 C)$	$O(C^2)$
GCA	$O(C^2/r + H C + W C)$	$O(C^2/r)$

Self-attention scales quadratically with spatial size, while GCA scales linearly with $C$ and only adds quadratic terms in $C$ relative to $r$ . GCA thus delivers substantial computational savings, particularly for high-resolution inputs.

5. Empirical Results and Ablation Studies

Extensive experimentation on Synapse multi-organ CT and ACDC cardiac MR segmentation benchmarks demonstrates the superiority of GCA-integrated architectures:

Synapse Dataset: Dice Similarity (DSC, %)

Method	Avg DSC
R50-U-Net	77.61
Att-U-Net	77.77
Swin-U-Net	79.13
SelfReg-U-Net	80.54
VM-U-Net	81.08
GCA-ResUNet	86.11

ACDC Dataset: Dice Similarity (DSC, %)

Method	Avg DSC
U-Net	89.68
Swin-U-Net	90.00
SelfReg-U-Net	91.49
GCA-ResUNet	92.64

Inference times and GPU memory utilization for GCA-ResUNet ( $<$ 4 GB at $224\times224$ ) match standard ResNet-Unet pipelines, whereas hybrid transformer-based U-Nets typically require $8$–$12$ GB (Ding et al., 18 Nov 2025).

Ablation studies compared variants of the residual block:

Variant	Synapse Avg DSC (%)
Baseline	77.6
+SE Module	79.2
+CBAM	80.1
+GCA	86.1

GCA inserted into every residual block (Layer1–4) provided the strongest performance, with incremental gains also observed for partial placements.

6. Strengths, Limitations, and Prospective Directions

Strengths:

Facilitates global context modeling along spatial axes and channels with minimal compute and parameter overhead.
Significantly enhances segmentation of boundaries and small anatomical structures.
Offers plug-and-play compatibility with existing 2D and 3D residual architectures.

Limitations:

Represents a “factorized” approximation that does not encode arbitrary off-axis pairwise relations as in full nonlocal attention.
Requires selection of two hyperparameters: $G$ (number of groups) and $r$ (reduction ratio).

Potential Extensions:

Generalization to 3D volumetric data through pooling along three orthogonal axes.
Synergy with lightweight MLP modules for further stepwise global modeling.
Application to non-medical image tasks such as object detection and instance segmentation where directional global context is advantageous.

7. Context and Significance

Grouped Coordinate Attention constitutes an efficient mechanism for integrating structured global dependencies into convolutional backbones, achieving state-of-the-art segmentation performance on medical image benchmarks while maintaining computational and memory efficiency. Its low overhead and modularity position it as a practical enhancement for residual network architectures, with implications for a broad range of vision tasks requiring both local and global spatial understanding (Ding et al., 18 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

GCA-ResUNet:Image segmentation in medical images using grouped coordinate attention (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Grouped Coordinate Attention (GCA).