Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels (2101.05022v2)

Published 13 Jan 2021 in cs.CV

Abstract: ImageNet has been arguably the most popular image classification benchmark, but it is also the one with a significant level of label noise. Recent studies have shown that many samples contain multiple classes, despite being assumed to be a single-label benchmark. They have thus proposed to turn ImageNet evaluation into a multi-label task, with exhaustive multi-label annotations per image. However, they have not fixed the training set, presumably because of a formidable annotation cost. We argue that the mismatch between single-label annotations and effectively multi-label images is equally, if not more, problematic in the training setup, where random crops are applied. With the single-label annotations, a random crop of an image may contain an entirely different object from the ground truth, introducing noisy or even incorrect supervision during training. We thus re-label the ImageNet training set with multi-labels. We address the annotation cost barrier by letting a strong image classifier, trained on an extra source of data, generate the multi-labels. We utilize the pixel-wise multi-label predictions before the final pooling layer, in order to exploit the additional location-specific supervision signals. Training on the re-labeled samples results in improved model performances across the board. ResNet-50 attains the top-1 classification accuracy of 78.9% on ImageNet with our localized multi-labels, which can be further boosted to 80.2% with the CutMix regularization. We show that the models trained with localized multi-labels also outperforms the baselines on transfer learning to object detection and instance segmentation tasks, and various robustness benchmarks. The re-labeled ImageNet training set, pre-trained weights, and the source code are available at {https://github.com/naver-ai/relabel_imagenet}.

Citations (133)

View on Semantic Scholar

Summary

The paper introduces a multi-label, localized re-labeling strategy that improves ResNet-50 top-1 accuracy to 78.9% (80.2% with CutMix).
The paper employs a high-performing classifier with pixel-wise predictions to generate refined annotations and reduce label noise inherent in single-label data.
The paper demonstrates that the enhanced labels not only boost transfer learning and object detection performance but also improve model robustness with minimal extra computational cost.

Re-labeling ImageNet: From Single to Multi-Labels, from Global to Localized Labels

The paper explores the inherent limitations of the ImageNet dataset, particularly concerning label noise, and proposes a novel methodology intended to enhance its utility in training image classification models. ImageNet has long been the benchmark for image classification models, yet it has been criticized for its reliance on single-label annotations despite many images containing multiple objects. This discrepancy hinders both model training and evaluation due to potential misannotation and noisy supervision signals during random cropping processes.

To address these challenges, the authors propose a strategy to re-label ImageNet with multi-labels and localized annotations. They introduce a systematic approach utilizing a high-performing image classifier, trained on additional external data, to generate multi-label annotations for the ImageNet dataset. This classifier provides pixel-wise predictions which are leveraged to produce localized labels, thereby aligning the training data more closely with the actual content of the images.

The results demonstrate significant performance improvements when employing this re-labeling approach. Notably, ResNet-50 trained with the newly generated labels achieves a top-1 accuracy of 78.9% on ImageNet, a marked improvement over conventional training methods utilizing original labels. This accuracy can be further boosted to 80.2% with additional techniques such as CutMix regularization. This re-labeling method not only improves task-specific accuracy but also enhances the performance of models on transfer learning tasks, including object detection and instance segmentation, as well as increasing model robustness against various test-time perturbations.

Several experiments corroborate these results across different model architectures, including ResNet and EfficientNet variants. Comparisons made against established methods, such as knowledge distillation and label smoothing, underscore the efficacy and efficiency of the proposed re-labeling strategy. Unlike knowledge distillation, the proposed method incurs only a one-time computational cost for generating label maps, thereby offering a more scalable solution.

The implications of this research are profound, suggesting new avenues for the refinement of large-scale datasets in AI and computer vision. Future developments could explore the wider applicability of multi-label localized annotations in other domains, deepening the understanding of complex image distributions and bolstering the robustness and adaptability of AI models. Furthermore, the open-sourcing of these enhanced labels and pre-trained models lays the groundwork for collaborative advancements in model accuracy and reliability across diverse applications.

PDF Markdown

Related Papers

YouTube

Show All Videos