Papers
Topics
Authors
Recent
2000 character limit reached

LGCOAMix: Superpixel Data Augmentation

Updated 3 December 2025
  • LGCOAMix is a context-aware, object-part-aware superpixel-based data augmentation technique that preserves semantic consistency in mixed images.
  • It employs superpixel grid blending and integrated local-global context learning to maintain fine object details and improve training efficiency.
  • Empirical results on benchmarks like CIFAR100 and CUB200-2011 show enhanced classification accuracy and robust performance for both CNNs and transformers.

LGCOAMix is a context-aware and object-part-aware superpixel-based data augmentation technique designed to overcome the generalization bottlenecks of existing cutmix-style augmentation methods for deep visual recognition. By generating augmented samples via superpixel-based grid blending and integrating both local and global context learning, LGCOAMix achieves improved semantic consistency of mixed images and labels, efficient training, and enhanced discriminative representation for both convolutional and transformer architectures. It is the first approach to propose a label mixing strategy using superpixel attention within the cutmix data augmentation paradigm and is unique in learning local features from discriminative superpixel-wise regions and across-image superpixel contrasts (Dornaika et al., 28 Nov 2025).

1. Motivation and Limitations of Prior Work

Cutmix-based data augmentation (such as CutMix, GridMix, SaliencyMix) constructs augmented images by patch-wise cut-and-paste operations, typically using rectangular or grid-shaped regions. While these strategies introduce global image-level variation and have exhibited strong generalization, they typically degrade local contextual focus and disrupt semantically coherent object parts (e.g., mixing distinct bird body regions), leading to performance ceilings—especially in fine-grained recognition. Furthermore, mixed label computation in prior art is primarily area-based, often resulting in label–image mismatches when background or non-discriminative regions dominate, requiring inefficient solutions like double forward propagation or dependencies on external saliency models. LGCOAMix directly addresses these issues by (1) employing superpixel-wise region mixing to preserve object parts, (2) learning both global and local (superpixel-based) context, and (3) realizing one-pass, semantically consistent label mixing without external models (Dornaika et al., 28 Nov 2025).

2. Superpixel-Based Grid Generation and Mixing

LGCOAMix utilizes the SLIC superpixel algorithm to partition each source image xk∈RW×H×Cx_k\in\mathbb{R}^{W\times H\times C} into a randomly sampled number of superpixels qk∼U(qmin,qmax)q_k\sim U(q_{min}, q_{max}). This produces superpixel maps S1\mathbf{S}_1 and S2\mathbf{S}_2, where each pixel is associated with a superpixel label in [1,qk][1,q_k]. The LGCOAMixer component then randomly samples each superpixel in S2\mathbf{S}_2 with Bernoulli probability p=0.5p=0.5 to generate a binary mask M\mathbf{M}. The mixed image and its corresponding superpixel map are computed as: xmix=(1−M)⊙x1+M⊙x2x_{mix} = (1 - M) \odot x_1 + M \odot x_2

Smix=(1−M)⊙S1+M⊙S2S_{mix} = (1 - M) \odot S_1 + M \odot S_2

This approach enables boundary-conforming region mixing, ensuring that object-part information and fine structures are preserved. Boundary-truncated superpixels are retained without degrading semantic consistency (Dornaika et al., 28 Nov 2025).

3. Joint Local and Global Context Representation Learning

LGCOAMix integrates both global and local context modeling in its training pipeline:

  • Global Representation: The mixed image xmixx_{mix} is processed through an encoder θenc\theta_{enc} yielding feature map ZZ. A global classifier fglobalf_{global} computes class predictions over the mixed label ymixy_{mix}.
  • Local/Superpixel Representation: The decoded high-res features Z^\hat{Z} are average-pooled within each superpixel in SmixS_{mix}—yielding a sequence F∈RL×DF \in \mathbb{R}^{L\times D} of feature vectors, one per superpixel.
  • Superpixel Self-Attention: Each Fâ„“F_\ell is projected into QKV representations and updated using scaled dot-product self-attention:

Q=FWÏ•,K=FWk,V=FWvQ = F W_\phi,\quad K = F W^k,\quad V = F W^v

SA(Q,K,V)=softmax(QK⊤/d)VSA(Q, K, V) = \mathrm{softmax}(QK^\top/\sqrt{d}) V

C=LayerNorm(F+SA(Q,K,V))C = \mathrm{LayerNorm}(F + SA(Q, K, V))

  • Region Weights and Local Loss: Attention weights wâ„“=σ(∑cCâ„“,c)w_\ell = \sigma(\sum_c C_{\ell,c}) identify the most discriminative superpixels. The top-N=⌊tL⌋N = \lfloor tL \rfloor (typically t=0.7t = 0.7) are classified with a dedicated local head flocalf_{local}, contributing to the local classification loss LlocalL_{local}.

Additionally, a superpixel-wise contrastive loss LcontrastL_{contrast} is adopted, aligning selected superpixel features across batch samples of the same class to promote invariance and enhance representation discriminability (Dornaika et al., 28 Nov 2025).

4. Semantic Superpixel Attention for Label Mixing

LGCOAMix introduces a novel mixed-label calculation based on superpixel attention rather than area: λatt=∑i∈Ix2wi⋅∣Smix[i]∣∑j=1Lwj⋅∣Smix[j]∣\lambda_{att} = \frac{\sum_{i \in I_{x_2}} w_i \cdot |S_{mix}[i]|}{\sum_{j=1}^{L} w_j \cdot |S_{mix}[j]|} where Ix2I_{x_2} indexes superpixels from x2x_2 and ∣Smix[i]∣|S_{mix}[i]| is the area for superpixel ii. The final mixed label is: ymix=(1−λatt)y1+λatty2y_{mix} = (1 - \lambda_{att}) y_1 + \lambda_{att} y_2 This process delivers semantic consistency between image regions and label proportions, eliminates reliance on multiple forward passes or saliency pretrained networks, and is applicable to both CNNs and Vision Transformers (Dornaika et al., 28 Nov 2025).

5. Algorithmic Workflow

The LGCOAMix pipeline involves the following sequential steps:

Step Operation Output
1 Superpixel sampling on inputs (q1q_1, q2q_2) S1S_1, S2S_2 superpixel maps
2 Bernoulli mask generation on S2S_2 (p=0.5p=0.5) M\mathbf{M}
3 Mixing (Eq. 4) xmixx_{mix}, SmixS_{mix}
4 Encoding and decoding xmixx_{mix} ZZ, Z^\hat{Z}
5 Superpixel pooling + self-attention CC, ww, csc_s
6 Label mixing via attention (Eq. 5, 2) ymixy_{mix}
7–9 Compute LglobalL_{global}, LlocalL_{local}, LcontrastL_{contrast} LtotalL_{total}
10 Backpropagation (inference: encoder + head only)

Batch formation and training utilize SGD (momentum 0.9, weight decay 5⋅10−45\cdot 10^{-4}), with diverse learning rates and batch sizes tailored to each benchmark dataset. At inference, only the encoder and global classifier are required, ensuring runtime equivalence to OcCaMix (Dornaika et al., 28 Nov 2025).

6. Empirical Results

LGCOAMix demonstrates state-of-the-art performance on major visual recognition tasks. For classification (Top-1 accuracy):

  • CIFAR100/ResNet18: Baseline 78.58, CutMix 79.69, AutoMix 82.04, LGCOAMix 82.34 (+0.30 over best prior)
  • TinyImageNet/ResNet18: Baseline 61.66, CutMix 64.35, OcCaMix 67.35, LGCOAMix 68.27 (+0.92)
  • CUB200-2011/ResNeXt50: Baseline 81.41, CutMix 82.63, AutoMix 83.52, LGCOAMix 84.37 (+0.85)
  • Stanford Dogs/ResNet50: Baseline 61.46, CutMix 63.92, OcCaMix 69.34, LGCOAMix 70.95
  • ViT-B/16 on CUB200-2011: Baseline 80.45, LGCOAMix 82.20

For weakly supervised object location (CUB200-2011, ResNet50): Loc Acc improves from 50.21% (baseline) and 55.22% (CutMix) to 58.65% (LGCOAMix).

Ablation studies reveal that switching from square grid to superpixel grid yields +0.78%, adding local classification +0.76%, and contrastive learning +0.31% (total +3.76%, CIFAR100/ResNet18), supporting the method’s core design choices (Dornaika et al., 28 Nov 2025).

7. Implementation, Hyperparameters, and Public Resources

Superpixel generation exploits SLIC (Achanta et al., 2012) with qmin,qmaxq_{min},q_{max} set per dataset (e.g., U(30,40) for CIFAR100). Masking probability is fixed at p=0.5p=0.5 to maximize diversity. The top-ranked region selection parameter tt ranges from [0.6, 0.8], generally set to $0.7$. Loss weights are γ1=0.1\gamma_1=0.1 and γ2=0.05\gamma_2=0.05 (CIFAR100). All major benchmarks report comparable inference speed to state-of-the-art methods. Source code is available at https://github.com/DanielaPlusPlus/LGCOAMix (Dornaika et al., 28 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to LGCOAMix.