Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fixing the train-test resolution discrepancy (1906.06423v4)

Published 14 Jun 2019 in cs.CV and cs.LG

Abstract: Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between the typical size of the objects seen by the classifier at train and test time. We experimentally validate that, for a target test resolution, using a lower train resolution offers better classification at test time. We then propose a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ. It involves only a computationally cheap fine-tuning of the network at the test resolution. This enables training strong classifiers using small training images. For instance, we obtain 77.1% top-1 accuracy on ImageNet with a ResNet-50 trained on 128x128 images, and 79.8% with one trained on 224x224 image. In addition, if we use extra training data we get 82.5% with the ResNet-50 train with 224x224 images. Conversely, when training a ResNeXt-101 32x48d pre-trained in weakly-supervised fashion on 940 million public images at resolution 224x224 and further optimizing for test resolution 320x320, we obtain a test top-1 accuracy of 86.4% (top-5: 98.0%) (single-crop). To the best of our knowledge this is the highest ImageNet single-crop, top-1 and top-5 accuracy to date.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hugo Touvron (22 papers)
  2. Andrea Vedaldi (195 papers)
  3. Matthijs Douze (52 papers)
  4. Hervé Jégou (71 papers)
Citations (401)

Summary

  • The paper identifies a critical train-test resolution discrepancy caused by standard data augmentation practices.
  • It proposes a resolution adjustment strategy by fine-tuning select network parameters at test-time resolution, enhancing performance.
  • Empirical results demonstrate significant gains, with models like ResNet-50 and ResNeXt-101 achieving top-1 accuracies of 77.1% and 86.4% on ImageNet.

Fixing the Train-Test Resolution Discrepancy

In the domain of deep learning for image classification, the paper "Fixing the train-test resolution discrepancy" addresses a critical issue regarding the resolution mismatch between training and testing phases. The research meticulously demonstrates that existing data augmentation techniques inadvertently introduce a notable discrepancy between the perceived object sizes in training and testing stages. This paper presents a compelling solution to this problem by proposing the use of differing train and test resolutions, specifically tuning models for test-time resolution, which yields enhanced classifier performance.

Summary of Contributions

The paper makes several key contributions:

  1. Identification of the Train-Test Discrepancy: The authors bring to light the discrepancy caused by resizing images during data augmentation, which means that the objects might appear at different sizes during training and testing. The paper argues that a lower train resolution can paradoxically improve test performance, a potential contradiction for developers maintaining a consistent resolution.
  2. Proposed Resolution Adjustment Strategy: To mitigate the identified train-test discrepancy, a new strategy is proposed, which involves fine-tuning the network at the test-time resolution. By only adjusting a small subset of the network's parameters, this method allows training robust classifiers using low-resolution images, leading to significant computational savings.
  3. Empirical Validation: The paper provides empirical results showcasing the effectiveness of their proposed method. For example, a ResNet-50 model, when trained at 128×128 and fine-tuned for a higher test resolution, achieved a top-1 accuracy of 77.1% on ImageNet. More impressively, a ResNeXt-101 32x48d, pre-trained on a vast dataset of 940 million images, attained a top-1 accuracy of 86.4% after adjusting for a 320x320 test resolution.
  4. State-of-the-Art Results: At the time of its evaluation, the authors report achieving the highest ImageNet single-crop accuracy, demonstrating their method's competitive edge and its potential applicability to contemporary models.

Theoretical and Practical Implications

From a theoretical standpoint, this work challenges the conventional wisdom of training and evaluating models at identical resolutions, highlighting the potential oversights in pre-processing steps. By methodically analyzing the statistical distortions caused by traditional augmentation techniques, it provides a groundwork for revisiting and potentially redesigning augmentation pipelines.

Practically, the proposed methodology offers substantial efficiency improvements in both training time and resource consumption. Notably, it enables models to be trained at reduced resolutions, thus decreasing computational load and memory requirements, which is particularly advantageous for environments with limited GPU resources.

Future Directions

This research opens several avenues for future exploration:

  • Wider Applicability: Extending this approach to other domains of computer vision, such as object detection and semantic segmentation, could yield further efficiency improvements and insights into task-specific dataset augmentation.
  • Model-Specific Adjustments: Investigating the impact of this resolution adjustment across diverse architectures beyond ResNet and ResNeXt, such as Transformer-based models, could assess the generalized effectiveness of the proposed method.
  • Longitudinal Studies on Scale Invariance: Exploring the effects of train-test resolution adjustments over sustained deployment scenarios, particularly as models interact with varied and potentially unseen datasets, will deepen the understanding of scale invariance in models.

In conclusion, this research provides valuable insights and methodologies for addressing a prevalent challenge in image classification tasks, reinforcing a harmonious alignment between train and test phases through a resolution-based approach. The implications of this work transcend its immediate contributions, asserting a potential re-thinking in network training paradigms.