Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A new computationally efficient algorithm to solve Feature Selection for Functional Data Classification in high-dimensional spaces (2401.05765v2)

Published 11 Jan 2024 in stat.ML and cs.LG

Abstract: This paper introduces a novel methodology for Feature Selection for Functional Classification, FSFC, that addresses the challenge of jointly performing feature selection and classification of functional data in scenarios with categorical responses and multivariate longitudinal features. FSFC tackles a newly defined optimization problem that integrates logistic loss and functional features to identify the most crucial variables for classification. To address the minimization procedure, we employ functional principal components and develop a new adaptive version of the Dual Augmented Lagrangian algorithm. The computational efficiency of FSFC enables handling high-dimensional scenarios where the number of features may considerably exceed the number of statistical units. Simulation experiments demonstrate that FSFC outperforms other machine learning and deep learning methods in computational time and classification accuracy. Furthermore, the FSFC feature selection capability can be leveraged to significantly reduce the problem's dimensionality and enhance the performances of other classification algorithms. The efficacy of FSFC is also demonstrated through a real data application, analyzing relationships between four chronic diseases and other health and demographic factors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org.
  2. Agresti, A. 1996. An introduction to categorical data analysis, JohnWiley & Sons. Inc., Publication.
  3. The Survey of Health, Aging, and Retirement in Europe: Methodology. Mannheim Research Institute for the Economics of Aging (MEA).
  4. Impact of age on management and outcome of acute coronary syndrome: observations from the Global Registry of Acute Coronary Events (GRACE). American heart journal, 149(1): 67–73.
  5. Efficacy and safety of more intensive lowering of LDL cholesterol: a meta-analysis of data from 170,000 participants in 26 randomised trials. Lancet (London, England), 376(9753): 1670–1681.
  6. Insights from the NHLBI Sponsored Women’s Ischemia Syndrome Evaluation (WISE) Study: Part II: gender differences in presentation, diagnosis, and outcome with regard to gender-based pathophysiology of atherosclerosis and macrovascular and microvascular coronary disease. Journal of the American College of Cardiology, 47(3S): S21–S29.
  7. Survey participation in the survey of health, ageing and retirement in Europe (SHARE), Wave 1-6. Munich: Munich Center for the Economics of Aging.
  8. Börsch-Supan, A. 2020. Survey of health, ageing and retirement in Europe (SHARE) wave 5. Release version, 7(0).
  9. Data resource profile: the Survey of Health, Ageing and Retirement in Europe (SHARE). International journal of epidemiology, 42(4): 992–1001.
  10. A Highly-Efficient Group Elastic Net Algorithm with an Application to Function-On-Scalar Regression. Advances in Neural Information Processing Systems, 34.
  11. FAStEN: an efficient adaptive method for feature selection and estimation in high-dimensional functional regressions. arXiv preprint arXiv:2303.14801.
  12. Convex optimization. Cambridge university press.
  13. Feature selection in machine learning: A new perspective. Neurocomputing, 300: 70–79.
  14. Obesity, fat distribution, and weight gain as risk factors for clinical diabetes in men. Diabetes care, 17(9): 961–969.
  15. Variable selection in function-on-scalar regression. Stat, 5(1): 88–101.
  16. Seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure. hypertension, 42(6): 1206–1252.
  17. Collaboration, E. R. F.; et al. 2010. Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies. The lancet, 375(9733): 2215–2222.
  18. 2019 ESC Guidelines on diabetes, pre-diabetes, and cardiovascular diseases developed in collaboration with the EASD: The Task Force for diabetes, pre-diabetes, and cardiovascular diseases of the European Society of Cardiology (ESC) and the European Association for the Study of Diabetes (EASD). European heart journal, 41(2): 255–323.
  19. Smoothing noisy data with spline functions. Numerische mathematik, 31(4): 377–403.
  20. Functional data analysis for computational biology. Bioinformatics (Oxford, England), 35(17): 3211.
  21. Classes of nonseparable, spatio-temporal stationary covariance functions. Journal of the American Statistical Association, 94(448): 1330–1339.
  22. Fenchel, W. 1949. On conjugate convex functions. Canadian Journal of Mathematics, 1(1): 73–77.
  23. Feature selection for functional data. Journal of Multivariate Analysis, 146: 191–208.
  24. Hemodynamic patterns of age-related changes in blood pressure: the Framingham Heart Study. Circulation, 96(1): 308–315.
  25. A prospective study of body mass index and the risk of developing hypertension in men. American journal of hypertension, 20(4): 370–377.
  26. Generating easySHARE: guidelines, structure, content and programming. Technical report, SHARE Working Paper Series 17-2014. Munich.
  27. Gene selection for cancer classification using support vector machines. Machine learning, 46: 389–422.
  28. Depression and cardiovascular disease: a clinical review. European heart journal, 35(21): 1365–1372.
  29. Cardiovascular Disease and Cognitive Decline in Postmenopausal Women: Results From the W omen’s H ealth I nitiative M emory S tudy. Journal of the American Heart Association, 2(6): e000369.
  30. Long short-term memory. Neural computation, 9(8): 1735–1780.
  31. Persons with chronic conditions: their prevalence and costs. Jama, 276(18): 1473–1479.
  32. Inference for functional data with applications, volume 200. Springer Science & Business Media.
  33. kernlab: Kernel-Based Machine Learning Lab. R package version 0.9-32.
  34. Sex and gender differences in risk, pathophysiology and complications of type 2 diabetes mellitus. Endocrine reviews, 37(3): 278–316.
  35. Global burden of hypertension: analysis of worldwide data. The lancet, 365(9455): 217–223.
  36. Introduction to functional data analysis. CRC Press.
  37. Multinomial logistic regression. Nursing research, 51(6): 404–410.
  38. Health care utilization and costs of elderly persons with multiple chronic conditions. Medical Care Research and Review, 68(4): 387–420.
  39. Classification using functional data analysis for temporal gene expression data. Bioinformatics, 22(1): 68–76.
  40. A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems. SIAM Journal on Optimization, 28(1): 433–458.
  41. Prevalence of obesity, diabetes, and obesity-related health risk factors, 2001. Jama, 289(1): 76–79.
  42. Physical activity and reduced risk of cardiovascular events: potential mediating mechanisms. Circulation, 116(19): 2110–2118.
  43. Predicting time series with support vector machines. In International conference on artificial neural networks, 999–1004. Springer.
  44. Numerical optimization. Springer.
  45. The role of host genetics in the immune response to SARS-CoV-2 and COVID-19 susceptibility and severity. Immunological reviews, 296(1): 205–219.
  46. Simultaneous variable selection and smoothing for high-dimensional function-on-scalar regression. Electronic Journal of Statistics, 12(2): 4602–4639.
  47. The intervention nurses start infants growing on healthy trajectories (INSIGHT) study. BMC pediatrics, 14(1): 184.
  48. PLS classification of functional data. Computational Statistics, 22(2): 223–235.
  49. Functional data analysis. Springer, 2 edition.
  50. Functional principal component regression and functional partial least squares. Journal of the American Statistical Association, 102(479): 984–996.
  51. Moderate alcohol intake and lower risk of coronary heart disease: meta-analysis of effects on lipids and haemostatic factors. Bmj, 319(7224): 1523–1528.
  52. Rockafellar, R. T. 1976a. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Mathematics of operations research, 1(2): 97–116.
  53. Rockafellar, R. T. 1976b. Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5): 877–898.
  54. Accuracy in wrist-worn, sensor-based measurements of heart rate and energy expenditure in a diverse cohort. Journal of personalized medicine, 7(2): 3.
  55. The emerging clinical role of wearables: factors for successful implementation in healthcare. NPJ Digital Medicine, 4(1): 45.
  56. Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1): 267–288.
  57. Dual-augmented Lagrangian method for efficient sparse reconstruction. IEEE Signal Processing Letters, 16(12): 1067–1070.
  58. Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation. Journal of Machine Learning Research, 12(5).
  59. Sex-based differences in early mortality after myocardial infarction. New England journal of medicine, 341(4): 217–225.
  60. Matrix computations. Johns Hopkins University Press.
  61. Residual lifetime risk for developing hypertension in middle-aged women and men: The Framingham Heart Study. Jama, 287(8): 1003–1010.
  62. Functional data analysis. Annual Review of Statistics and its application, 3: 257–295.
  63. Age and outcome with contemporary thrombolytic therapy: results from the GUSTO-I trial. Circulation, 94(8): 1826–1833.
  64. Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 947–956.
  65. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3): 107–115.
  66. Zou, H. 2006. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476): 1418–1429.
  67. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2): 301–320.
  68. On the adaptive elastic-net with a diverging number of parameters. Annals of statistics, 37(4): 1733.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com