Frozen Feature Augmentation for Few-Shot Image Classification (2403.10519v2)
Abstract: Training a linear classifier or lightweight model on top of pretrained vision model outputs, so-called 'frozen features', leads to impressive performance on a number of downstream few-shot tasks. Currently, frozen features are not modified during training. On the other hand, when networks are trained directly on images, data augmentation is a standard recipe that improves performance with no substantial overhead. In this paper, we conduct an extensive pilot study on few-shot image classification that explores applying data augmentations in the frozen feature space, dubbed 'frozen feature augmentation (FroFA)', covering twenty augmentations in total. Our study demonstrates that adopting a deceptively simple pointwise FroFA, such as brightness, can improve few-shot performance consistently across three network architectures, three large pretraining datasets, and eight transfer datasets.
- Alex Krizhevsky. Learning Multiple Layers of Features From Tiny Images, 2009.
- Improved Few-Shot Visual Classification. In Proc. of CVPR, pages 14481–14490, virtual, 2020.
- DeepMind Lab. arXiv, 1612.03801:1–11, 2016.
- Better Plain ViT Baselines for ImageNet-1k. arXiv, 2205.01580:1–3, 2022.
- AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In Proc. of NeurIPS, pages 16664–16678, New Orleans, LA, USA, 2022.
- PaLI: A Jointly Scaled Multilingual Language-Image Model. In Proc. of ICLR, pages 1–33, Kigali, Rwanda, 2023.
- Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE, 105(10):1865–1883, 2017.
- Reproducible Scaling Laws for Contrastive Language-Image Learning. In Proc. of CVPR, pages 2818–2829, Vancouver, BC, Canada, 2023.
- MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space. In Proc. of ICLR, pages 1–18, virtual, 2021.
- François Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. In Proc. of CVPR, pages 1063–6919, Honolulu, HI, USA, 2017.
- Describing Textures in the Wild. In Proc. of CVPR, pages 3606–3613, Columbus, OH, USA, 2014.
- AutoAugment: Learning Augmentation Strategies From Data. In Proc. of CVPR, pages 113–123, Long Beach, CA, USA, 2019.
- RandAugment: Practical Automated Data Augmentation with a Reduced Search Space. In Proc. of NeurIPS, pages 18613–18624, virtual, 2020.
- CoAtNet: Marrying Convolution and Attention for All Data Sizes. In Proc. of NeurIPS, pages 3965–3977, virtual, 2021.
- Scenic: A JAX Library for Computer Vision Research and Beyond. In Proc. of CVPR, pages 21393–21398, New Orleans, LA, USA, 2022.
- Scaling Vision Transformers to 22 Billion Parameters. In Proc. of ICML, pages 7480–7512, Honolulu, HI, USA, 2023.
- ImageNet: A Large-Scale Hierarchical Image Database. In Proc. of CVPR, pages 248–255, Miami, FL, USA, 2009.
- Dataset Augmentation in Feature Space. In Proc. of ICLR - Workshops, pages 1–12, Toulon, France, 2017.
- An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proc. of ICLR, pages 1–21, virtual, 2021.
- A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches. In Proc. of NeurIPS - Datasets and Benchmarks Track, pages 1–14, virtual, 2021.
- How to Train Vision Transformer on Small-Scale Datasets? In Proc. of BMVC, pages 1–16, London, UK, 2022.
- CLIP-Adapter: Better Vision-Language Models with Feature Adapters. Int. J. Comput. Vis., 132(2):581–595, 2023.
- Tradeoffs in Data Augmentation: An Empirical Study. In Proc. of ICLR, pages 1–27, virtual, 2021.
- Improving Neural Language Models with a Continuous Cache. In Proc. of ICLR, pages 1–9, Toulon, France, 2017.
- Parameter-Efficient Model Adaptation for Vision Transformers. In Proc. of AAAI, pages 817–825, Washington, DC, USA, 2023.
- AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. In Proc. of ICLR, pages 1–15, Virtual, 2020.
- Distilling Knowledge in a Neural Network. In Proc. of NIPS - Workshops, pages 1–9, Montréal, QC, Canada, 2014. (In 2018, ‘NIPS’ was renamed to ‘NeurIPS’).
- Parameter-Efficient Transfer Learning for NLP. In Proc. of ICML, pages 2790–2799, Long Beach, CA, USA, 2019.
- LoRA: Low-Rank Adaptation of Large Language Models. In Proc. of ICLR, pages 1–13, virtual, 2022.
- Visual Prompt Tuning. In Proc. of ECCV, pages 709–727, Tel Aviv, Israel, 2022.
- Adam: A Method for Stochastic Optimization. In Proc. of ICLR, pages 1–15, San Diego, CA, USA, 2015.
- Big Transfer (BiT): General Visual Representation Learning. In Proc. of ECCV, pages 491–507, virtual, 2020.
- Three Towers: Flexible Contrastive Learning with Pretrained Image Models. In Proc. of NeurIPS, pages 31340–31371, New Orleans, LA, USA, 2023.
- SentencePiece: A Simple and Language-Independent Subword Tokenizer and Detokenizer for Neural Text Processing. In Proc. of EMNLP - System Demonstrations, pages 66–71, Brussels, Belgium, 2018.
- A Closer Look At Feature Space Data Augmentation For Few-Shot Intent Classification. In Proc. of EMNLP - Workshops, pages 1–10, Hong Kong, China, 2019.
- The Omniglot Challenge: A 3-year Progress Report. Curr. Opin. Behav. Sci., 29:97–104, 2019.
- Set Transformer: A Framework for Attention-Based Permutation-Invariant Neural Networks. In Proc. of ICML, pages 3744–3753, Long Beach, CA, USA, 2019.
- Vision Transformer for Small-Size Datasets. arXiv, 2112.13492:1–11, 2021.
- The Power of Scale for Parameter-Efficient Prompt Tuning. In Proc. of EMNLP, pages 3045–3059, virtual, 2021.
- Data Augmentation via Latent Space Interpolation for Image Classification. In Proc. of ICPR, pages 728–733, Beijing, China, 2018.
- Efficient Training of Visual Transformers with Small Datasets. In Proc. of NeurIPS, pages 23818–23830, virtual, 2021a.
- PatchDropout: Economizing Vision Transformers Using Patch Dropout. In Proc. of WACV, pages 3942–3951, Waikoloa, HI, USA, 2023a.
- Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proc. of ICCV, pages 10012–10022, virtual, 2021b.
- Learning Multimodal Data Augmentation in Feature Space. In Proc. of ICLR, pages 1–15, Kigali, Rwanda, 2023b.
- Decoupled Weight Decay Regularization. In Proc. of ICLR, pages 1–18, New Orleans, LA, USA, 2019.
- TrivialAugment: Tuning-Free Yet State-of-the-Art Data Augmentation. In Proc. of ICCV, pages 774–782, virtual, 2021.
- Reading Digits in Natural Images with Unsupervised Feature Learning. In Proc. of NIPS - Workshops, pages 1–9, Granada, Spain, 2011. (In 2018, ‘NIPS’ was renamed to ‘NeurIPS’).
- On First-Order Meta-Learning Algorithms. arXiv, 1803.02999:1–15, 2018.
- DINOv2: Learning Robust Visual Features Without Supervision. Trans. Mach. Learn. Res., 1:1–32, 2024.
- TADAM: Task-Dependent Adaptive Metric for Improved Few-Shot Learning. In Proc. of NeurIPS, pages 719–729, Montréal, QC, Canada, 2018.
- Emin Orhan. A Simple Cache Model for Image Recognition. In Proc. of NeurIPS, pages 10128–10137, Montréal, Canada, 2018.
- Learning Transferable Visual Models From Natural Language Supervision. In Proc. of ICML, pages 8748–8763, virtual, 2021.
- Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer. J. Mach. Learn. Res., 21(140):1–67, 2020.
- Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes. In Proc. of NeurIPS, pages 7957–7968, Vancouver, BC, Canada, 2019.
- ImageNet-21K Pretraining for the Masses. In Proc. of NeurIPS - Datasets and Benchmarks Track, pages 1–12, virtual, 2021.
- Embedding Propagation: Smoother Manifold for Few-Shot Classification. In Proc. of ECCV, pages 121–138, virtual, 2020.
- ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis., 115(3):211–252, 2015.
- Adafactor: Adaptive Learning Rates with Sublinear Memory Cost. In Proc. of ICML, pages 4596–4604, Stockholm, Sweden, 2018.
- Prototypical Networks for Few-Shot Learning. In Proc. of NIPS, pages 4077–4087, Long Beach, CA, USA, 2017. (In 2018, ‘NIPS’ was renamed to ‘NeurIPS’).
- How to Train Your ViT? Data, Augmentation, and Regularization in Vision Transformers. Trans. Mach. Learn. Res., 5:1–16, 2022.
- Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. In Proc. of ICCV, pages 843–852, Venice, Italy, 2017.
- Rethinking the Inception Architecture for Computer Vision. In Proc. of CVPR, pages 2818–2826, Las Vegas, NV, USA, 2016.
- Training Data-Efficient Image Transformers & Distillation Through Attention. In Proc. of ICML, pages 10347–10357, virtual, 2021.
- Attention Is All You Need. In Proc. of NIPS, pages 5998–6008, Long Beach, CA, USA, 2017. (In 2018, ‘NIPS’ was renamed to ‘NeurIPS’).
- Manifold Mixup: Better Representations by Interpolating Hidden States. In Proc. of ICML, pages 6438–6447, Long Beach, CA, USA, 2019.
- Matching Networks for One-Shot Learning. In Proc. of NIPS, pages 3637–3645, Barcelona, Spain, 2016. (In 2018, ‘NIPS’ was renamed to ‘NeurIPS’).
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions. In Proc. of ICCV, pages 548–558, virtual, 2021.
- SUN Database: Large-Scale Scene Recognition From Abbey to Zoo. In Proc. of CVPR, pages 3485–3492, San Francisco, CA, USA, 2010.
- SUN Database: Exploring a Large Collection of Scene Categories. Int. J. Comput. Vis., 119(1):3–22, 2016.
- A Large-Scale Study of Representation Learning with the Visual Task Adaptation Benchmark. arXiv, 1910.04867:1–33, 2020.
- Scaling Vision Transformers. In Proc. of CVPR, pages 12104–12113, New Orleans, LA, USA, 2022.
- Sigmoid Loss for Language-Image Pretraining. In Proc. of ICCV, pages 11975–11986, Paris, France, 2023.
- Mixup: Beyond Empirical Risk Minimization. In Proc. of ICLR, pages 1–13, Vancouver, BC, Canada, 2018.
- Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification. In Proc. of ECCV, pages 493–510, Tel Aviv, Israel, 2022.