Representation Transfer Learning via Multiple Pre-trained models for Linear Regression (2305.16440v2)
Abstract: In this paper, we consider the problem of learning a linear regression model on a data domain of interest (target) given few samples. To aid learning, we are provided with a set of pre-trained regression models that are trained on potentially different data domains (sources). Assuming a representation structure for the data generating linear models at the sources and the target domains, we propose a representation transfer based learning method for constructing the target model. The proposed scheme is comprised of two phases: (i) utilizing the different source representations to construct a representation that is adapted to the target data, and (ii) using the obtained model as an initialization to a fine-tuning procedure that re-trains the entire (over-parameterized) regression model on the target data. For each phase of the training method, we provide excess risk bounds for the learned model compared to the true data generating target model. The derived bounds show a gain in sample complexity for our proposed method compared to the baseline method of not leveraging source representations when achieving the same excess risk, therefore, theoretically demonstrating the effectiveness of transfer learning for linear regression.
- Q. S. Y. Liu, T. Chua, and B. Schiele, “Meta-transfer learning for few-shot learning,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
- Y. Bengio, “Deep learning of representations for unsupervised and transfer learning,” in Proceedings of ICML workshop on unsupervised and transfer learning. JMLR Workshop and Conference Proceedings, 2012, pp. 17–36.
- Y. Jia, Y. Zhang, R. Weiss, Q. Wang, J. Shen, F. Ren, P. Nguyen, R. Pang, I. Lopez Moreno, Y. Wu et al., “Transfer learning from speaker verification to multispeaker text-to-speech synthesis,” Advances in neural information processing systems, vol. 31, 2018.
- J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman, “Learning bounds for domain adaptation,” Advances in neural information processing systems, vol. 20, 2007.
- Y. Mansour, M. Mohri, and A. Rostamizadeh, “Domain adaptation: Learning bounds and algorithms,” in COLT 2009 - The 22nd Conference on Learning Theory, Montreal, Quebec, Canada, June 18-21, 2009, 2009. [Online]. Available: http://www.cs.mcgill.ca/%7Ecolt2009/papers/003.pdf#page=1
- S. S. Du, W. Hu, S. M. Kakade, J. D. Lee, and Q. Lei, “Few-shot learning via learning the representation, provably,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [Online]. Available: https://openreview.net/forum?id=pW2Q2xLwIMD
- K. Chua, Q. Lei, and J. D. Lee, “How fine-tuning allows for effective meta-learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 8871–8884, 2021.
- N. Tripuraneni, C. Jin, and M. Jordan, “Provable meta-learning of linear representations,” in International Conference on Machine Learning. PMLR, 2021, pp. 10 434–10 443.
- A. Maurer, M. Pontil, and B. Romera-Paredes, “The benefit of multitask representation learning,” Journal of Machine Learning Research, vol. 17, no. 81, pp. 1–32, 2016.
- M. Belkin, D. J. Hsu, and P. Mitra, “Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate,” Advances in neural information processing systems, vol. 31, 2018.
- S. Arora, S. Du, W. Hu, Z. Li, and R. Wang, “Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks,” in International Conference on Machine Learning. PMLR, 2019, pp. 322–332.
- P. L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler, “Benign overfitting in linear regression,” Proceedings of the National Academy of Sciences, vol. 117, no. 48, pp. 30 063–30 070, 2020.
- T. Hastie, A. Montanari, S. Rosset, and R. J. Tibshirani, “Surprises in high-dimensional ridgeless least squares interpolation,” The Annals of Statistics, vol. 50, no. 2, pp. 949–986, 2022.
- V. Shah, S. Basu, A. Kyrillidis, and S. Sanghavi, “On generalization of adaptive methods for over-parameterized linear regression,” arXiv preprint arXiv:2011.14066, 2020.
- J. Wu, D. Zou, V. Braverman, and Q. Gu, “Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [Online]. Available: https://openreview.net/forum?id=3X64RLgzY6O
- S. Gunasekar, J. Lee, D. Soudry, and N. Srebro, “Characterizing implicit bias in terms of optimization geometry,” in International Conference on Machine Learning. PMLR, 2018, pp. 1832–1841.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International conference on machine learning. PMLR, 2017, pp. 1126–1135.
- J. Wang, C. Lan, C. Liu, Y. Ouyang, T. Qin, W. Lu, Y. Chen, W. Zeng, and P. Yu, “Generalizing to unseen domains: A survey on domain generalization,” IEEE Transactions on Knowledge and Data Engineering, 2022.
- D. McNamara and M.-F. Balcan, “Risk bounds for transferring representations with and without fine-tuning,” in International conference on machine learning. PMLR, 2017, pp. 2373–2381.
- T. Galanti, L. Wolf, and T. Hazan, “A theoretical framework for deep transfer learning,” Information and Inference: A Journal of the IMA, vol. 5, no. 2, pp. 159–209, 2016.
- H. Zhao, R. T. Des Combes, K. Zhang, and G. Gordon, “On learning invariant representations for domain adaptation,” in International conference on machine learning. PMLR, 2019, pp. 7523–7532.
- S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” Advances in neural information processing systems, vol. 20, 2007.
- P. Stojanov, Z. Li, M. Gong, R. Cai, J. Carbonell, and K. Zhang, “Domain adaptation with invariant representation learning: What transformations to learn?” Advances in Neural Information Processing Systems, vol. 34, pp. 24 791–24 803, 2021.
- Y. Zhang, T. Liu, M. Long, and M. Jordan, “Bridging theory and algorithm for domain adaptation,” in International conference on machine learning. PMLR, 2019, pp. 7404–7413.
- Y. Mansour, M. Mohri, and A. Rostamizadeh, “Domain adaptation with multiple sources,” Advances in neural information processing systems, vol. 21, 2008.
- J. Liang, D. Hu, and J. Feng, “Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,” in International Conference on Machine Learning. PMLR, 2020, pp. 6028–6039.
- S. M. Ahmed, D. S. Raychaudhuri, S. Paul, S. Oymak, and A. K. Roy-Chowdhury, “Unsupervised multi-source domain adaptation without access to source data,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 10 103–10 112.
- G. Shachaf, A. Brutzkus, and A. Globerson, “A theoretical analysis of fine-tuning with linear teachers,” Advances in Neural Information Processing Systems, vol. 34, pp. 15 382–15 394, 2021.
- M. Belkin, D. Hsu, and J. Xu, “Two models of double descent for weak features,” SIAM Journal on Mathematics of Data Science, vol. 2, no. 4, pp. 1167–1180, 2020.
- L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Exploiting shared representations for personalized federated learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 2089–2099.
- V. Koltchinskii and K. Lounici, “Concentration inequalities and moment bounds for sample covariance operators,” Bernoulli, vol. 23, no. 1, pp. 110–133, 2017.
- N. S. Chatterji, P. M. Long, and P. L. Bartlett, “The interplay between implicit bias and benign overfitting in two-layer linear networks,” Journal of machine learning research, vol. 23, no. 263, pp. 1–48, 2022.
- A. Singh, “Clda: Contrastive learning for semi-supervised domain adaptation,” Advances in Neural Information Processing Systems, vol. 34, pp. 5089–5101, 2021.
- S. Mishra, K. Saenko, and V. Saligrama, “Surprisingly simple semi-supervised domain adaptation with pretraining and consistency,” in NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications.