Smoothness Adaptive Hypothesis Transfer Learning (2402.14966v1)
Abstract: Many existing two-phase kernel-based hypothesis transfer learning algorithms employ the same kernel regularization across phases and rely on the known smoothness of functions to obtain optimality. Therefore, they fail to adapt to the varying and unknown smoothness between the target/source and their offset in practice. In this paper, we address these problems by proposing Smoothness Adaptive Transfer Learning (SATL), a two-phase kernel ridge regression(KRR)-based algorithm. We first prove that employing the misspecified fixed bandwidth Gaussian kernel in target-only KRR learning can achieve minimax optimality and derive an adaptive procedure to the unknown Sobolev smoothness. Leveraging these results, SATL employs Gaussian kernels in both phases so that the estimators can adapt to the unknown smoothness of the target/source and their offset function. We derive the minimax lower bound of the learning problem in excess risk and show that SATL enjoys a matching upper bound up to a logarithmic factor. The minimax convergence rate sheds light on the factors influencing transfer dynamics and demonstrates the superiority of SATL compared to non-transfer learning settings. While our main objective is a theoretical analysis, we also conduct several experiments to confirm our results.
- Hypothesis transfer learning with surrogate classification losses: Generalization bounds through algorithmic stability. In International Conference on Machine Learning, pages 280–303. PMLR, 2023.
- Hamsa Bastani. Predicting with proxies: Transfer learning in high dimension. Management Science, 67(5):2964–2984, 2021.
- On regularization algorithms in learning theory. Journal of complexity, 23(1):52–72, 2007.
- Transfer learning for nonparametric regression: Non-asymptotic minimax analysis and adaptive procedure. arXiv preprint arXiv:0000.0000, 2022.
- Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331–368, 2007.
- Joint asymptotics for semi-nonparametric regression models with partially linear structure. The Annals of Statistics, 43(3):1351–1390, 2015.
- Adaptive kernel methods using the balancing principle. Foundations of Computational Mathematics, 10:455–479, 2010.
- Besov spaces on domains in r^{{\{{d}}\}}. Transactions of the American Mathematical Society, 335(2):843–864, 1993.
- Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators. 2017.
- Hypothesis transfer learning via transformation functions. Advances in neural information processing systems, 30, 2017.
- Few-shot learning via learning the representation, provably. arXiv preprint arXiv:2002.09434, 2020.
- Adaptive and robust multi-task learning. arXiv preprint arXiv:2202.05250, 2022.
- Optimal regression rates for SVMs using Gaussian kernels. Electronic Journal of Statistics, 7(none):1 – 42, 2013. doi: 10.1214/12-EJS760. URL https://doi.org/10.1214/12-EJS760.
- Gregory E Fasshauer and Qi Ye. Reproducing kernels of generalized sobolev spaces via a green function approach with distributional operators. Numerische Mathematik, 119:585–611, 2011.
- Sobolev norm learning rates for regularized least-squares algorithms. The Journal of Machine Learning Research, 21(1):8464–8501, 2020.
- Knowledge transfer via multiple model local structure mapping. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 283–291, 2008.
- Sara A Geer. Empirical Processes in M-estimation, volume 6. Cambridge university press, 2000.
- Adaptive learning rates for support vector machines working on data with low intrinsic dimension. The Annals of Statistics, 49(6):3153–3180, 2021.
- Gaussian processes and kernel methods: A review on connections and equivalences. arXiv preprint arXiv:1807.02582, 2018.
- Stability and hypothesis transfer learning. In International Conference on Machine Learning, pages 942–950. PMLR, 2013.
- Fast rates by transferring from auxiliary hypotheses. Machine Learning, 106:171–195, 2017.
- OV Lepskii. On a problem of adaptive estimation in gaussian white noise. Theory of Probability & Its Applications, 35(3):454–466, 1991.
- Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):149–173, 2022.
- On the optimality of gaussian kernel based nonparametric tests against smooth alternatives. arXiv preprint arXiv:1909.03302, 2019.
- A bayesian divergence prior for classiffier adaptation. In Artificial Intelligence and Statistics, pages 275–282. PMLR, 2007.
- On the saturation effect of kernel ridge regression. In The Eleventh International Conference on Learning Representations, 2023.
- On transfer learning in functional linear regression. arXiv preprint arXiv:2206.04277, 2022.
- Optimally tackling covariate shift in rkhs-based nonparametric regression. arXiv preprint arXiv:2205.02986, 2022.
- Regularization in kernel learning. 2010.
- Model adaptation with least-squares svm for adaptive hand prosthetics. In 2009 IEEE international conference on robotics and automation, pages 2897–2903. IEEE, 2009.
- Michael L Stein. Interpolation of spatial data: some theory for kriging. Springer Science & Business Media, 1999.
- Support vector machines. Springer Science & Business Media, 2008.
- An explicit description of the reproducing kernel hilbert spaces of gaussian rbf kernels. IEEE Transactions on Information Theory, 52(10):4635–4643, 2006.
- Optimal rates for regularized least squares regression. In COLT, pages 79–93, 2009.
- Ye Tian and Yang Feng. Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association, pages 1–14, 2022.
- Learning from similar linear representations: Adaptivity, minimaxity, and robustness. arXiv preprint arXiv:2303.17765, 2023.
- On the theory of transfer learning: The importance of task diversity. Advances in neural information processing systems, 33:7852–7862, 2020.
- Rom Rubenovich Varshamov. Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR, 117:739–741, 1957.
- Kaizheng Wang. Pseudo-labeling for kernel ridge regression under covariate shift. arXiv preprint arXiv:2302.10160, 2023.
- Gaussian process regression: Optimality, robustness, and relationship with kernel ridge regression. Journal of Machine Learning Research, 23(193):1–67, 2022.
- Generalization bounds for transfer learning under model shift. In UAI, pages 922–931, 2015.
- Nonparametric risk and stability analysis for multi-task learning problems. In IJCAI, pages 2146–2152, 2016.
- Holger Wendland. Scattered data approximation, volume 17. Cambridge university press, 2004.
- Representation learning beyond linear prediction functions. Advances in Neural Information Processing Systems, 34:4792–4804, 2021.
- On the optimality of misspecified kernel ridge regression. arXiv preprint arXiv:2305.07241, 2023.
- A class of geometric structures in transfer learning: Minimax bounds and optimality. In International Conference on Artificial Intelligence and Statistics, pages 3794–3820. PMLR, 2022.