AutoMix: Unveiling the Power of Mixup for Stronger Classifiers (2103.13027v6)

Published 24 Mar 2021 in cs.CV and cs.AI

Abstract: Data mixing augmentation have proved to be effective in improving the generalization ability of deep neural networks. While early methods mix samples by hand-crafted policies (e.g., linear interpolation), recent methods utilize saliency information to match the mixed samples and labels via complex offline optimization. However, there arises a trade-off between precise mixing policies and optimization complexity. To address this challenge, we propose a novel automatic mixup (AutoMix) framework, where the mixup policy is parameterized and serves the ultimate classification goal directly. Specifically, AutoMix reformulates the mixup classification into two sub-tasks (i.e., mixed sample generation and mixup classification) with corresponding sub-networks and solves them in a bi-level optimization framework. For the generation, a learnable lightweight mixup generator, Mix Block, is designed to generate mixed samples by modeling patch-wise relationships under the direct supervision of the corresponding mixed labels. To prevent the degradation and instability of bi-level optimization, we further introduce a momentum pipeline to train AutoMix in an end-to-end manner. Extensive experiments on nine image benchmarks prove the superiority of AutoMix compared with state-of-the-art in various classification scenarios and downstream tasks.

View on arXiv

Authors (7)

Zicheng Liu (153 papers)
Siyuan Li (140 papers)
Di Wu (477 papers)
Zihan Liu (102 papers)
Zhiyuan Chen (58 papers)
Lirong Wu (67 papers)
Stan Z. Li (222 papers)

Citations (63)

View on Semantic Scholar

Summary

AutoMix: Unveiling the Power of Mixup for Stronger Classifiers

AutoMix introduces an innovative approach to address the persistent challenge in data augmentation for deep neural networks (DNNs)—the trade-off between optimal mixup policies and computational efficiency. This paper proposes a novel framework, AutoMix, that reformulates mixup-based training into two interconnected sub-tasks: mixed sample generation and mixup classification. The framework integrates these tasks within a bi-level optimization setup, offering significant improvements over prior methods in terms of both accuracy and computational overhead.

Framework and Methodology

AutoMix employs a parametric mixup strategy through its core module, Mix Block (MB). MB uses a cross-attention mechanism to generate adaptive masks that ensure mixup samples maintain discriminative features aligned with their mixed labels. The model bypasses the need for complex offline optimizations typical of previous methods, instead using a learnable light-weight mixup generator that operates under direct supervision of mixed labels. The integration of feature maps during the generation process enhances the relevance of patches selected for mixing, addressing label mismatch issues prevalent in traditional handcrafted methods.

To stabilize the bi-level optimization process and prevent degradation, AutoMix introduces a Momentum Pipeline. This mechanism uses a momentum-based updating rule to decouple the training process, allowing the classification task and mixup generation to be optimized simultaneously without entanglement issues. This end-to-end training strategy enables AutoMix to achieve faster convergence and higher performance without substantial computational costs.

Experimental Results

Extensive testing on nine image classification benchmarks underscores AutoMix's superiority. Across varied tasks and network architectures, AutoMix consistently outperformed state-of-the-art mixup techniques. Specifically, the framework demonstrated marked improvements in generalization and accuracy on typical benchmarks such as CIFAR-10/100, Tiny-ImageNet, and ImageNet-1k, and in fine-grained classification scenarios such as CUB-200 and FGVC-Aircraft. Robustness studies further validate AutoMix's ability to handle data perturbations effectively, maintaining stability against corruption and adversarial sample attacks.

Implications and Future Directions

AutoMix represents a significant stride towards enhancing mixup strategies for DNNs. Its ability to generate label-consistent samples dynamically and efficiently broadens the scope for practical applications in AI. The successful fusion of generation and classification tasks within a single framework opens pathways for exploring mixup augmentation in unsupervised and semi-supervised learning contexts.

Future research could leverage the modularity of AutoMix to extend mixup strategies beyond conventional classification tasks. Investigating its applicability in other domains, such as object detection and semantic segmentation, or exploring its potential in multimodal tasks would be valuable directions. Additionally, further refinement of the cross-attention mechanism could yield even more precise mixing procedures, enhancing its utility in complex real-world datasets.

In essence, AutoMix provides a robust and efficient solution to the challenges of mixup-based sample augmentation, with promising applications that transcend traditional boundaries in deep learning methodologies.

PDF Markdown

Related Papers

Adversarial AutoMixup (2023)
Infinite Class Mixup (2023)
Harnessing Hard Mixed Samples with Decoupled Regularizer (2022)
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup (2021)
SUMix: Mixup with Semantic and Uncertain Information (2024)

Find Related Papers

YouTube

Show All Videos