Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability (2308.07728v5)
Abstract: Fine-tuning pre-trained neural network models has become a widely adopted approach across various domains. However, it can lead to the distortion of pre-trained feature extractors that already possess strong generalization capabilities. Mitigating feature distortion during adaptation to new target domains is crucial. Recent studies have shown promising results in handling feature distortion by aligning the head layer on in-distribution datasets before performing fine-tuning. Nonetheless, a significant limitation arises from the treatment of batch normalization layers during fine-tuning, leading to suboptimal performance. In this paper, we propose Domain-Aware Fine-Tuning (DAFT), a novel approach that incorporates batch normalization conversion and the integration of linear probing and fine-tuning. Our batch normalization conversion method effectively mitigates feature distortion by reducing modifications to the neural network during fine-tuning. Additionally, we introduce the integration of linear probing and fine-tuning to optimize the head layer with gradual adaptation of the feature extractor. By leveraging batch normalization layers and integrating linear probing and fine-tuning, our DAFT significantly mitigates feature distortion and achieves improved model performance on both in-distribution and out-of-distribution datasets. Extensive experiments demonstrate that our method outperforms other baseline methods, demonstrating its effectiveness in not only improving performance but also mitigating feature distortion.
- Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
- Geography-aware self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10181–10190.
- wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33: 12449–12460.
- Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 1536–1546. IEEE.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33: 9912–9924.
- Domain-specific batch normalization for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 7354–7362.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), 801–818.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607. PMLR.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297.
- Functional map of the world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6172–6180.
- Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
- An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 215–223. JMLR Workshop and Conference Proceedings.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3213–3223.
- The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111: 98–136.
- Distance-Based Regularisation of Deep Networks for Fine-Tuning. In Ninth International Conference on Learning Representations 2021.
- Finetune like you pretrain: Improved finetuning of zero-shot vision models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19338–19347.
- Conformer: Convolution-augmented Transformer for Speech Recognition. Interspeech 2020.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9729–9738.
- Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261.
- Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, 448–456. pmlr.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, 2.
- Krizhevsky, A.; et al. 2009. Learning multiple layers of features from tiny images.
- Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution. In International Conference on Learning Representations.
- Efficient backprop. In Neural networks: Tricks of the trade, 9–50. Springer.
- Improved regularization and robustness for fine-tuning in neural networks. Advances in Neural Information Processing Systems, 34: 27249–27262.
- Delta: Deep learning transfer using feature map with attention for convolutional networks. arXiv preprint arXiv:1901.09229.
- TTN: A domain-shift aware batch normalization in test-time adaptation. arXiv preprint arXiv:2302.05155.
- The norm must go on: Dynamic unsupervised domain adaptation by normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14765–14775.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, 1406–1415.
- Dataset shift in machine learning. Mit Press.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
- Do cifar-10 classifiers generalize to cifar-10? arXiv preprint arXiv:1806.00451.
- How to prepare your task head for finetuning. In The Eleventh International Conference on Learning Representations.
- Breeds: Benchmarks for subpopulation shift. arXiv preprint arXiv:2008.04859.
- Class-imbalanced domain adaptation: an empirical odyssey. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, 585–602. Springer.
- A closer look at model adaptation using feature distortion and simplicity bias. arXiv preprint arXiv:2303.13500.
- Transferable normalization: Towards improving transferability of deep neural networks. Advances in neural information processing systems, 32.
- A convergence analysis of log-linear training. Advances in Neural Information Processing Systems, 24.
- Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, 2825–2834. PMLR.
- Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
- Side-tuning: a baseline for network adaptation via additive side networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, 698–714. Springer.
- Seokhyeon Ha (2 papers)
- Sunbeom Jung (1 paper)
- Jungwoo Lee (39 papers)