Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization (2006.01791v2)

Published 2 Jun 2020 in cs.LG and stat.ML

Abstract: Advanced data augmentation strategies have widely been studied to improve the generalization ability of deep learning models. Regional dropout is one of the popular solutions that guides the model to focus on less discriminative parts by randomly removing image regions, resulting in improved regularization. However, such information removal is undesirable. On the other hand, recent strategies suggest to randomly cut and mix patches and their labels among training images, to enjoy the advantages of regional dropout without having any pointless pixel in the augmented images. We argue that such random selection strategies of the patches may not necessarily represent sufficient information about the corresponding object and thereby mixing the labels according to that uninformative patch enables the model to learn unexpected feature representation. Therefore, we propose SaliencyMix that carefully selects a representative image patch with the help of a saliency map and mixes this indicative patch with the target image, thus leading the model to learn more appropriate feature representation. SaliencyMix achieves the best known top-1 error of 21.26% and 20.09% for ResNet-50 and ResNet-101 architectures on ImageNet classification, respectively, and also improves the model robustness against adversarial perturbations. Furthermore, models that are trained with SaliencyMix help to improve the object detection performance. Source code is available at https://github.com/SaliencyMix/SaliencyMix.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. A. F. M. Shahab Uddin (2 papers)
  2. Mst. Sirazam Monira (1 paper)
  3. Wheemyung Shin (1 paper)
  4. TaeChoong Chung (3 papers)
  5. Sung-Ho Bae (29 papers)
Citations (204)

Summary

  • The paper introduces SaliencyMix, a novel data augmentation technique that leverages saliency maps to retain critical image information and improve CNN generalization.
  • It demonstrates significant performance gains by achieving 2.76% top-1 error on CIFAR-10 and setting new benchmarks on ImageNet with standard architectures.
  • Additionally, SaliencyMix enhances model robustness, improving object detection and adversarial resistance, making it a valuable strategy for regularization.

SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization

The paper introduces SaliencyMix, a novel data augmentation technique designed to enhance the generalization capabilities of convolutional neural networks (CNNs) by leveraging saliency maps to inform image augmentation. The authors propose this approach to address the limitations found in current regional dropout methodologies, such as Cutout and CutMix, which either remove significant portions of informative data or randomly select image patches for augmentation, possibly leading to ineffective learning of feature representations.

SaliencyMix works by performing a careful selection of image patches guided by saliency maps. It uses these maps to identify the most salient parts of an image—indicative of the object—thus ensuring that the augmented data retains relevant semantic information. Through this process, the model is guided towards learning more meaningful feature representations, enhancing both performance and robustness.

The paper reports significant improvements in classification accuracy due to the incorporation of this augmentation strategy. For example, on the CIFAR-10 dataset using the WideResNet architecture, SaliencyMix achieves a top-1 error rate of 2.76%, while on CIFAR-100 it reduces the error to 16.56%. Additionally, on the ImageNet classification task, SaliencyMix sets a new benchmark with a top-1 error of 21.26% for ResNet-50 and 20.09% for ResNet-101.

The authors analyze how SaliencyMix outperforms other data augmentation techniques such as Cutout and Mixup, primarily because of its ability to maintain critical object information within the augmented data. By addressing the randomization drawbacks seen in CutMix, which can result in including patches that contribute little or no meaningful context, SaliencyMix offers a structured approach that maintains the salient features of original images.

Apart from improving classification accuracy, the research demonstrates SaliencyMix's utility in related tasks. Training object detection models with SaliencyMix pre-trained classifiers leads to enhanced detection performance, as evidenced by an improvement of +1.77 in mean average precision (mAP) on the Pascal VOC dataset.

Furthermore, SaliencyMix provides increased robustness against adversarial attacks, a notable advantage given the susceptibility of deep learning models to adversarially perturbed inputs. By incorporating more meaningful image patches through saliency-enriched augmentation, models trained with SaliencyMix demonstrate better resilience, improving top-1 accuracy on adversarially perturbed inputs by 1.96%.

Despite the slight increase in computational demands due to saliency map generation, SaliencyMix proves to be an effective trade-off, delivering substantial advances in model performance and robustness. The authors conclude that future work could explore integrating more complex or high-level semantic information into the augmentation process, potentially further enhancing the effectiveness of SaliencyMix.

This research contributes a compelling augmentation strategy that advocates for a focused approach to leveraging salient image features, highlighting practical implications for computer vision tasks and proposing theoretical advancements in data augmentation techniques within machine learning frameworks.