Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multiclass ROC (2404.13147v1)

Published 19 Apr 2024 in stat.ML, cs.LG, and stat.ME

Abstract: Model evaluation is of crucial importance in modern statistics application. The construction of ROC and calculation of AUC have been widely used for binary classification evaluation. Recent research generalizing the ROC/AUC analysis to multi-class classification has problems in at least one of the four areas: 1. failure to provide sensible plots 2. being sensitive to imbalanced data 3. unable to specify mis-classification cost and 4. unable to provide evaluation uncertainty quantification. Borrowing from a binomial matrix factorization model, we provide an evaluation metric summarizing the pair-wise multi-class True Positive Rate (TPR) and False Positive Rate (FPR) with one-dimensional vector representation. Visualization on the representation vector measures the relative speed of increment between TPR and FPR across all the classes pairs, which in turns provides a ROC plot for the multi-class counterpart. An integration over those factorized vector provides a binary AUC-equivalent summary on the classifier performance. Mis-clasification weights specification and bootstrapped confidence interval are also enabled to accommodate a variety of of evaluation criteria. To support our findings, we conducted extensive simulation studies and compared our method to the pair-wise averaged AUC statistics on benchmark datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Comparing classifiers when the misallocation costs are uncertain. Pattern Recognition, 32(7):1139–1147, 1999.
  2. Bivariate random effects meta-analysis of roc curves. Medical Decision Making, 28(5):621–638, 2008.
  3. S Balaswamy and R Vishnu Vardhan. Confidence interval estimation of an roc curve: an application of generalized half normal and weibull distributions. Journal of Probability and Statistics, 2015, 2015.
  4. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21(1):1–13, 2020.
  5. The matthews correlation coefficient (mcc) is more informative than cohen’s kappa and brier score in binary classification assessment. IEEE, 2021.
  6. Jacob Cohen. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46, 1960.
  7. Cost curves: An improved method for visualizing classifier performance. Machine learning, 65(1):95–130, 2006.
  8. Volume under the roc surface for multi-class problems. In European conference on machine learning, pages 108–120. Springer, 2003.
  9. Jan Gorodkin. Comparing two k-category assignments by a k-category correlation coefficient. Computational biology and chemistry, 28(5-6):367–374, 2004.
  10. Empirical comparison of area under roc curve (auc) and mathew correlation coefficient (mcc) for evaluating machine learning algorithms on imbalanced datasets for binary classification. In Proceedings of the 3rd international conference on machine learning and soft computing, pages 1–6, 2019.
  11. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine learning, 45(2):171–186, 2001.
  12. Auc: a statistically consistent and more discriminating measure than accuracy. In Ijcai, volume 3, pages 519–524, 2003.
  13. Brian W Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2):442–451, 1975.
  14. Charles E Metz. Basic principles of roc analysis. In Seminars in nuclear medicine, volume 8, pages 283–298. Elsevier, 1978.
  15. Combining independent studies of a diagnostic test into a summary roc curve: data-analytic approaches and some additional considerations. Statistics in medicine, 12(14):1293–1316, 1993.
  16. Douglas Mossman. Three-way rocs. Medical Decision Making, 19(1):78–89, 1999.
  17. David MW Powers. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061, 2010.
  18. Well-trained pets: Improving probability estimation trees. Raport instytutowy IS-00-04, Stern School of Business, New York University, 2000.
  19. Robust classification systems for imprecise environments. In AAAI/IAAI, pages 706–713, 1998.
  20. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of clinical epidemiology, 58(10):982–990, 2005.
  21. K Ross and D Page. Aucμ𝜇\muitalic_μ: a performance metric for multiclass machine learning models. In Proceedings of the International Conference on Machine Learning, pages 3439–3447, 2019.
  22. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Statistics in medicine, 20(19):2865–2884, 2001.
  23. Anna N Angelos Tosteson and Colin B Begg. A general regression methodology for roc curve estimation. Medical Decision Making, 8(3):204–215, 1988.
  24. Modern applied statistics with s-plus, 1999.
  25. Deviance matrix factorization. Electronic Journal of Statistics, 17(2):3762–3810, 2023.
  26. Computational approaches for exponential-family factor analysis. arXiv e-prints, pages arXiv–2403, 2024.
  27. Qiuming Zhu. On the performance of matthews correlation coefficient (mcc) for imbalanced dataset. Pattern Recognition Letters, 136:71–80, 2020.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com