Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Limited Memory Online Gradient Descent for Kernelized Pairwise Learning with Dynamic Averaging (2402.01146v1)

Published 2 Feb 2024 in cs.LG

Abstract: Pairwise learning, an important domain within machine learning, addresses loss functions defined on pairs of training examples, including those in metric learning and AUC maximization. Acknowledging the quadratic growth in computation complexity accompanying pairwise loss as the sample size grows, researchers have turned to online gradient descent (OGD) methods for enhanced scalability. Recently, an OGD algorithm emerged, employing gradient computation involving prior and most recent examples, a step that effectively reduces algorithmic complexity to $O(T)$, with $T$ being the number of received examples. This approach, however, confines itself to linear models while assuming the independence of example arrivals. We introduce a lightweight OGD algorithm that does not require the independence of examples and generalizes to kernel pairwise learning. Our algorithm builds the gradient based on a random example and a moving average representing the past data, which results in a sub-linear regret bound with a complexity of $O(T)$. Furthermore, through the integration of $O(\sqrt{T}{\log{T}})$ random Fourier features, the complexity of kernel calculations is effectively minimized. Several experiments with real-world datasets show that the proposed technique outperforms kernel and linear algorithms in offline and online scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Pairwise Learning via Stagewise Training in Proximal Setting. arXiv preprint arXiv:2208.04075.
  2. Variance Reduced Online Gradient Descent for Kernelized Pairwise Learning with Limited Memory. arXiv preprint arXiv:2310.06483.
  3. Bach, F. 2017. On the equivalence between kernel quadrature rules and random feature expansions. The Journal of Machine Learning Research, 18(1): 714–751.
  4. Fast convergence of online pairwise learning algorithms. In Artificial Intelligence and Statistics, 204–212. PMLR.
  5. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2: 27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  6. Online Bayesian Multiple Kernel Bipartite Ranking. In UAI.
  7. One-pass AUC optimization. In International conference on machine learning, 906–914. PMLR.
  8. Scalable and efficient pairwise learning to achieve statistical accuracy. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 3697–3704.
  9. Kernelized online imbalanced learning with fixed budgets. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
  10. A sparse nonlinear classifier design using AUC optimization. In Proceedings of the 2017 SIAM International Conference on Data Mining, 291–299. SIAM.
  11. On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions. In Dasgupta, S.; and McAllester, D., eds., Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, 441–449. Atlanta, Georgia, USA: PMLR.
  12. Kulis, B.; et al. 2012. Metric learning: A survey. Foundations and trends in machine learning, 5(4): 287–364.
  13. Li, Z. 2022. Sharp Analysis of Random Fourier Features in Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 7444–7452.
  14. Online pairwise learning algorithms with convex loss functions. Information Sciences, 406: 57–70.
  15. Stochastic Proximal Algorithms for AUC Maximization. In Dy, J.; and Krause, A., eds., Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 3710–3719. PMLR.
  16. Random features for large-scale kernel machines. Advances in neural information processing systems, 20.
  17. Stochastic variance reduction for nonconvex optimization. In International conference on machine learning, 314–323. PMLR.
  18. Generalization bounds for online learning algorithms with pairwise loss functions. In Conference on Learning Theory, 13–1. JMLR Workshop and Conference Proceedings.
  19. Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning. Advances in Neural Information Processing Systems, 34.
  20. Stochastic online AUC maximization. Advances in neural information processing systems, 29: 451–459.
  21. Online pairwise learning algorithms with kernels. arXiv preprint arXiv:1502.07229.
  22. Pairwise matching through max-weight bipartite belief propagation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1202–1210.
  23. Online AUC maximization. Proceedings of the 28th International Conference on Machine Learning ICML 2011:.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets