Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Variance Reduced Online Gradient Descent for Kernelized Pairwise Learning with Limited Memory (2310.06483v1)

Published 10 Oct 2023 in cs.LG

Abstract: Pairwise learning is essential in machine learning, especially for problems involving loss functions defined on pairs of training examples. Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially. However, the pairwise nature of the problem makes scalability challenging, as the gradient computation for a new sample involves all past samples. Recent advancements in OGD algorithms have aimed to reduce the complexity of calculating online gradients, achieving complexities less than $O(T)$ and even as low as $O(1)$. However, these approaches are primarily limited to linear models and have induced variance. In this study, we propose a limited memory OGD algorithm that extends to kernel online pairwise learning while improving the sublinear regret. Specifically, we establish a clear connection between the variance of online gradients and the regret, and construct online gradients using the most recent stratified samples with a limited buffer of size of $s$ representing all past data, which have a complexity of $O(sT)$ and employs $O(\sqrt{T}\log{T})$ random Fourier features for kernel approximation. Importantly, our theoretical results demonstrate that the variance-reduced online gradients lead to an improved sublinear regret bound. The experiments on real-world datasets demonstrate the superiority of our algorithm over both kernelized and linear online pairwise learning algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Pairwise learning via stagewise training in proximal setting. arXiv preprint arXiv:2208.04075, 2022.
  2. Computational complexity of sub-linear convergent algorithms. arXiv preprint arXiv:2209.14558, 2022.
  3. Francis Bach. On the equivalence between kernel quadrature rules and random feature expansions. The Journal of Machine Learning Research, 18(1):714–751, 2017.
  4. Fast convergence of online pairwise learning algorithms. In Artificial Intelligence and Statistics, pages 204–212. PMLR, 2016.
  5. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  6. Large-scale nonlinear auc maximization via triply stochastic gradients. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  7. Online bayesian multiple kernel bipartite ranking. In UAI, 2016.
  8. One-pass auc optimization. In International conference on machine learning, pages 906–914. PMLR, 2013.
  9. Multiple kernel learning algorithms. The Journal of Machine Learning Research, 12:2211–2268, 2011.
  10. Scalable and efficient pairwise learning to achieve statistical accuracy. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3697–3704, 2019.
  11. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36, 1982.
  12. Kernelized online imbalanced learning with fixed budgets. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
  13. A sparse nonlinear classifier design using auc optimization. In Proceedings of the 2017 SIAM International Conference on Data Mining, pages 291–299. SIAM, 2017.
  14. The fairness of risk scores beyond classification: Bipartite ranking and the xauc metric. Advances in neural information processing systems, 32, 2019.
  15. Pairwise relational networks for face recognition. In Proceedings of the European Conference on Computer Vision (ECCV), pages 628–645, 2018.
  16. On the generalization ability of online learning algorithms for pairwise loss functions. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 441–449, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR. URL https://proceedings.mlr.press/v28/kar13.html.
  17. Brian Kulis et al. Metric learning: A survey. Foundations and trends in machine learning, 5(4):287–364, 2012.
  18. Zhu Li. Sharp analysis of random fourier features in classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 7444–7452, 2022.
  19. Online pairwise learning algorithms with convex loss functions. Information Sciences, 406:57–70, 2017.
  20. Stochastic proximal algorithms for AUC maximization. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3710–3719. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/natole18a.html.
  21. Random features for large-scale kernel machines. Advances in neural information processing systems, 20, 2007.
  22. Stochastic variance reduction for nonconvex optimization. In International conference on machine learning, pages 314–323. PMLR, 2016.
  23. Occlusion robust face recognition based on mask learning with pairwise differential siamese network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 773–782, 2019.
  24. Generalization bounds for online learning algorithms with pairwise loss functions. In Conference on Learning Theory, pages 13–1. JMLR Workshop and Conference Proceedings, 2012.
  25. Simple stochastic and online gradient descent algorithms for pairwise learning. Advances in Neural Information Processing Systems, 34, 2021.
  26. Online pairwise learning algorithms with kernels. arXiv preprint arXiv:1502.07229, 2015.
  27. Stochastic online auc maximization. Advances in neural information processing systems, 29:451–459, 2016.
  28. Online auc maximization. Proceedings of the 28th International Conference on Machine Learning ICML 2011:, 2011.
Citations (1)

Summary

We haven't generated a summary for this paper yet.