Kernel Support Vector Machine Classifiers with the $\ell_0$-Norm Hinge Loss
Abstract: Support Vector Machine (SVM) has been one of the most successful machine learning techniques for binary classification problems. The key idea is to maximize the margin from the data to the hyperplane subject to correct classification on training samples. The commonly used hinge loss and its variations are sensitive to label noise, and unstable for resampling due to its unboundedness. This paper is concentrated on the kernel SVM with the $\ell_0$-norm hinge loss (referred as $\ell_0$-KSVM), which is a composite function of hinge loss and $\ell_0$-norm and then could overcome the difficulties mentioned above. In consideration of the nonconvexity and nonsmoothness of $\ell_0$-norm hinge loss, we first characterize the limiting subdifferential of the $\ell_0$-norm hinge loss and then derive the equivalent relationship among the proximal stationary point, the Karush-Kuhn-Tucker point, and the local optimal solution of $\ell_0$-KSVM. Secondly, we develop an ADMM algorithm for $\ell_0$-KSVM, and obtain that any limit point of the sequence generated by the proposed algorithm is a locally optimal solution. Lastly, some experiments on the synthetic and real datasets are illuminated to show that $\ell_0$-KSVM can achieve comparable accuracy compared with the standard KSVM while the former generally enjoys fewer support vectors.
- The MIT Press, Cambridge, December 2001.
- I. Steinwart and A. Christmann, Support Vector Machines. Information Science and Statistics, Springer, New York, 2008.
- M. J. Zaki and W. Meira Jr, Data Mining and Machine Learning: Fundamental Concepts and Algorithms. Cambridge University Press, 2020.
- B. Frenay and M. Verleysen, “Classification in the presence of label noise: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 5, pp. 845–869, 2014.
- X. Huang, L. Shi, and J. A. K. Suykens, “Support vector machine classifier with pinball loss,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 5, pp. 984–997, 2014.
- Y. Feng, Y. Yang, X. Huang, S. Mehrkanoon, and J. A. K. Suykens, “Robust support vector machines for classification with nonconvex and smooth losses,” Neural Computation, vol. 28, no. 6, pp. 1217–1247, 2016.
- H. Wang, Y. Shao, S. Zhou, C. Zhang, and N. Xiu, “Support vector machine classifier via l0/1subscript𝑙01l_{0/1}italic_l start_POSTSUBSCRIPT 0 / 1 end_POSTSUBSCRIPT soft-margin loss,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- H. Wang, Y. Shao, and N. Xiu, “Proximal operator and optimality conditions for ramp loss SVM,” Optimization Letters, vol. 16, no. 3, pp. 999–1014, 2022.
- X. Shen, L. Niu, Z. Qi, and Y. Tian, “Support vector machine classifier with truncated pinball loss,” Pattern Recognition, vol. 68, pp. 199–210, 2017.
- H. Wang, Y. Shao, and N. Xiu, “Proximal operator and optimality conditions for ramp loss svm,” Optimization Letters, vol. 16, no. 3, pp. 999–1014, 2022.
- H. Wang and Y. Shao, “Fast truncated huber loss svm for large scale classification,” Knowledge-Based Systems, vol. 260, p. 110074, 2023.
- H. Wang and Y. Shao, “Sparse and robust svm classifier for large scale classification,” Applied Intelligence, 2023.
- H. Wang, G. Li, and Z. Wang, “Fast svm classifier for large-scale classification problems,” Information Sciences, p. 119136, 2023.
- C. Chen and O. L. Mangasarian, “Hybrid misclassification minimization,” Advances in Computational Mathematics, vol. 5, no. 1, pp. 127–136, 1996.
- C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
- P. Domingos and M. Pazzani, “On the optimality of the simple bayesian classifier under zero-one loss,” Machine learning, vol. 29, no. 2, pp. 103–130, 1997.
- J. P. Brooks, “Support vector machines with the ramp loss and the hard margin loss,” Operations Research, vol. 59, no. 2, pp. 467–479, 2011.
- T. Nguyen and S. Sanner, “Algorithms for direct 0–1 loss optimization in binary classification,” in International Conference on Machine Learning, pp. 1085–1093, PMLR, 2013.
- J. Tang, N. Zhang, and Q. Li, “Robust binary classification via ℓ0subscriptℓ0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-svm,” in 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 1263–1270, 2018.
- A. Dhara and J. Dutta, Optimality Conditions in Convex Optimization: A Finite-dimensional View. CRC Press, Boca Raton, FL, 2012.
- Cambridge: Cambridge University Press, 2005.
- T. R. Rockafellar and R. Wets, Variational Analysis. Springer, 1998.
- B. S. Mordukhovich, “Generalized differential calculus for nonsmooth and set-valued mappings,” Journal of Mathematical Analysis and Applications, vol. 183, no. 1, pp. 250–288, 1994.
- Y. Liu and S. Pan, “Regular and limiting normal cones to the graph of the subdifferential mapping of the nuclear norm,” Set-Valued and Variational Analysis. Theory and Applications, vol. 27, no. 1, pp. 71–85, 2019.
- Y. Liu, S. Bi, and S. Pan, “Several classes of stationary points for rank regularized minimization problems,” SIAM J. Optim., vol. 30, no. 2, pp. 1756–1775, 2020.
- A. D. Ioffe, “Regular points of lipschitz functions,” Transactions of the American Mathematical Society, vol. 251, pp. 61–69, 1979.
- A. L. Dontchev and R. T. Rockafellar, “Regularity and conditioning of solution mappings in variational analysis,” Set-Valued Analysis, vol. 12, no. 1, pp. 79–109, 2004.
- R. Henrion and J. V. Outrata, “Calmness of constraint systems with applications,” Mathematical Programming, vol. 104, no. 2, pp. 437–464, 2005.
- A. D. Ioffe and J. V. Outrata, “On metric and calmness qualification conditions in subdifferential calculus,” Set-Valued Analysis, vol. 16, no. 2, pp. 199–227, 2008.
- H. Gfrerer, “First order and second order characterizations of metric subregularity and calmness of constraint set mappings,” SIAM Journal on Optimization, vol. 21, no. 4, pp. 1439–1474, 2011.
- K. Bai, J. J. Ye, and J. Zhang, “Directional quasi-/pseudo-normality as sufficient conditions for metric subregularity,” SIAM Journal on Optimization, vol. 29, no. 4, pp. 2625–2649, 2019.
- S. Pan, L. Liang, and Y. Liu, “Local optimality for stationary points of group zero-norm regularized problems and equivalent surrogates,” Optimization, vol. 0, no. 0, pp. 1–33, 2022.
- Y. Wu, S. Pan, and S. Bi, “Kurdyka-łojasiewicz property of zero-norm composite functions,” Journal of Optimization Theory and Applications, vol. 188, no. 1, pp. 94–112, 2021.
- H. H. Bauschke, J. M. Borwein, and L. Wu, “Strong conical hull intersection property, bounded linear regularity, jameson’s property (g), and error bounds in convex optimization,” Mathematical Programming, vol. 86, no. 1, pp. 135–160, 1999.
- S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
- B. S. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications. Morgan and Claypool Publishers, 2014.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.