Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations (2402.03325v2)

Published 8 Jan 2024 in cs.CV and cs.LG

Abstract: Models trained on a labeled source domain (e.g., labeled images from wildlife camera traps) often generalize poorly when deployed on an out-of-distribution (OOD) target domain (e.g., images from new camera trap locations). In the domain adaptation setting where unlabeled target data is available, self-supervised pretraining (e.g., masked autoencoding or contrastive learning) is a promising method to mitigate this performance drop. Pretraining improves OOD error when the generic data augmentations used (e.g., masking or cropping) connect the source and target domains, which may be far apart in the input space. In this paper, we show on real-world tasks that standard fine-tuning after pretraining does not consistently improve OOD error over simply training from scratch on labeled source data. To better leverage pretraining for distribution shifts, we propose Connect Later: after pretraining with generic augmentations, fine-tune with targeted augmentations designed with knowledge of the distribution shift. Pretraining learns good representations within the source and target domains, while targeted augmentations connect the domains better during fine-tuning. Connect Later improves average OOD error over standard fine-tuning and supervised learning with targeted augmentations on 4 real-world datasets: Connect Later achieves the state-of-the-art on astronomical time-series classification (AstroClassification) by 2.5%, wildlife species identification (iWildCam-WILDS) with ResNet-50 by 0.9%, and tumor identification (Camelyon17-WILDS) with DenseNet121 by 1.1%; as well as best performance on a new dataset for astronomical time-series redshift prediction (Redshifts) by 0.03 RMSE (11% relative). Code and datasets are available at https://github.com/helenqu/connect-later.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Dataset shift in machine learning. The MIT Press, 2009.
  2. WILDS: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning (ICML), 2021.
  3. Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90:227–244, 2000.
  4. Domain adaptation with structural correspondence learning. In Empirical Methods in Natural Language Processing (EMNLP), 2006.
  5. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research (JMLR), 8:985–1005, 2007.
  6. Unsupervised learning of visual features by contrasting cluster assignments. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 9912–9924, 2020.
  7. Connect, not collapse: Explaining contrastive learning for unsupervised domain adaptation. In International Conference on Machine Learning (ICML), 2022.
  8. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), volume 139, pages 8748–8763, 2021.
  9. Extending the WILDS benchmark for unsupervised adaptation. In International Conference on Learning Representations (ICLR), 2022.
  10. Beyond separability: Analyzing the linear transferability of contrastive representations to related subpopulations. arXiv, 2022.
  11. Domain-adversarial training of neural networks. Journal of Machine Learning Research (JMLR), 17, 2016.
  12. Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4893–4902, 2019.
  13. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), 2017.
  14. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226, 2010.
  15. Return of frustratingly easy domain adaptation. In Association for the Advancement of Artificial Intelligence (AAAI), 2016.
  16. Cycada: Cycle consistent adversarial domain adaptation. In International Conference on Machine Learning (ICML), 2018.
  17. Out-of-domain robustness via targeted augmentations. In International Conference on Machine Learning (ICML), 2023.
  18. The iwildcam 2020 competition dataset. arXiv preprint arXiv:2004.10340, 2020.
  19. From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE transactions on medical imaging, 38(2):550–560, 2018.
  20. The photometric lsst astronomical time-series classification challenge (plasticc): Data set, 2018.
  21. Kyle Boone. Avocado: Photometric classification of astronomical transients with gaussian process augmentation. The Astronomical Journal, 158(6):257, dec 2019. doi: 10.3847/1538-3881/ab5182. URL https://doi.org/10.3847%2F1538-3881%2Fab5182.
  22. Augmix: A simple data processing method to improve robustness and uncertainty. In International Conference on Learning Representations (ICLR), 2019.
  23. The many faces of robustness: A critical analysis of out-of-distribution generalization. arXiv preprint arXiv:2006.16241, 2020.
  24. BERT: Pre-training of deep bidirectional transformers for language understanding. In Association for Computational Linguistics (ACL), pages 4171–4186, 2019.
  25. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML), pages 1597–1607, 2020.
  26. Momentum contrast for unsupervised visual representation learning. In Computer Vision and Pattern Recognition (CVPR), 2020.
  27. Masked autoencoders are scalable vision learners. In Computer Vision and Pattern Recognition (CVPR), 2022.
  28. Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations (ICLR), 2022.
  29. Whole-slide mitosis detection in h&e breast histology using phh3 as a reference to train distilled stain-invariant convolutional networks. IEEE transactions on medical imaging, 37(9):2126–2136, 2018.
  30. Tarek Allam Jr. and Jason D. McEwen. Paying attention to astronomical transients: Introducing the time-series transformer for photometric classification, 2022.
  31. Photometric selection and redshifts for quasars in the kilo-degree survey data release 4. Astronomy & Astrophysics, 649:A81, 2021.
  32. Photo-zSNthesis: Converting Type Ia Supernova Lightcurves to Redshift Estimates via Deep Learning. Astrophysical Journal, 954(2):201, September 2023. doi: 10.3847/1538-4357/aceafa.
  33. How mask matters: Towards theoretical understandings of masked autoencoders. Advances in Neural Information Processing Systems, 35:27127–27139, 2022.
  34. Self-training with noisy student improves imagenet classification. arXiv, 2020a.
  35. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Association for Computational Linguistics (ACL), 2020.
  36. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683, 2019.
  37. Data augmentation can improve robustness. Advances in Neural Information Processing Systems, 34:29935–29948, 2021.
  38. Ssmba: Self-supervised manifold based data augmentation for improving out-of-domain robustness. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1268–1283, 2020.
  39. Best practices for convolutional neural networks applied to visual document analysis. International Conference on Document Analysis and Recognition, 2:958–964, 2003.
  40. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS), pages 1097–1105, 2012.
  41. Autoaugment: Learning augmentation policies from data. In Computer Vision and Pattern Recognition (CVPR), 2019.
  42. Randaugment: Practical automated data augmentation with a reduced search space. In Computer Vision and Pattern Recognition (CVPR), pages 702–703, 2020.
  43. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
  44. mixup: Beyond empirical risk minimization. ICLR, 2017.
  45. Unsupervised data augmentation for consistency training. In Advances in Neural Information Processing Systems (NeurIPS), 2020b.
  46. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv, 2020.
  47. Interactive self-training with mean teachers for semi-supervised object detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5937–5946, 2021. doi: 10.1109/CVPR46437.2021.00588.
  48. LSST: From Science Drivers to Reference Design and Anticipated Data Products. The Astrophysical Journal, 873(2):111, March 2019. doi: 10.3847/1538-4357/ab042c.
  49. SNANA: A Public Software Package for Supernova Analysis. Proceedings of the Astronomical Society of the Pacific, 121(883):1028, September 2009. doi: 10.1086/605984.
  50. Models and simulations for the photometric LSST astronomical time series classification challenge (PLAsTiCC). Publications of the Astronomical Society of the Pacific, 131(1003):094501, jul 2019. doi: 10.1088/1538-3873/ab26f1. URL https://doi.org/10.1088%2F1538-3873%2Fab26f1.
  51. ASTROMER. Astronomy & Astrophysics, 670:A54, feb 2023. doi: 10.1051/0004-6361/202243928. URL https://doi.org/10.1051%2F0004-6361%2F202243928.
  52. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021.
  53. Learnable fourier features for multi-dimensional spatial positional encoding. Advances in Neural Information Processing Systems, 34:15816–15829, 2021.
  54. Provable guarantees for self-supervised deep learning with spectral contrastive loss. arXiv preprint arXiv:2106.04156, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Helen Qu (13 papers)
  2. Sang Michael Xie (21 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.