Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
86 tokens/sec
Gemini 2.5 Pro Premium
43 tokens/sec
GPT-5 Medium
19 tokens/sec
GPT-5 High Premium
30 tokens/sec
GPT-4o
93 tokens/sec
DeepSeek R1 via Azure Premium
88 tokens/sec
GPT OSS 120B via Groq Premium
441 tokens/sec
Kimi K2 via Groq Premium
234 tokens/sec
2000 character limit reached

Is the new model better? One metric says yes, but the other says no. Which metric do I use? (2010.09822v2)

Published 19 Oct 2020 in stat.ME

Abstract: Incremental value (IncV) evaluates the performance change from an existing risk model to a new model. It is one of the key considerations in deciding whether a new risk model performs better than the existing one. Problems arise when different IncV metrics contradict each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of conflicting conclusions is not uncommon, and it creates a dilemma in medical decision making. In this article, we examine the analytical connections and differences between two IncV metrics: IncV in AUC (IncV-AUC) and IncV in AP (IncV-AP). Additionally, since they are both semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (IncV-sBrS), via a numerical study. We demonstrate that both IncV-AUC and IncV-AP are weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, IncV-AP assigns heavier weights to the changes in the high-risk group, whereas IncV-AUC weights the changes equally. In the numerical study, we find that IncV-AP has a wide range, from negative to positive, but the size of IncV-AUC is much smaller. In addition, IncV-AP and IncV-sBr Sare highly consistent, but IncV-AUC is negatively correlated with IncV-sBrS and IncV-AP at a low event rate. IncV-AUC and IncV-AP are the least consistent among the three pairs, and their differences are more pronounced as the event rate decreases.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.