Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression (1912.06686v2)

Published 13 Dec 2019 in q-bio.NC, cs.CV, and eess.IV

Abstract: We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from major depressive disorder (MDD) and healthy control (HC) based on neuroimaging data. Drawing upon structural magnetic resonance imaging (MRI) data from a balanced sample of $N = 1,868$ MDD patients and HC from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of $61\,\%$. Next, we mimicked the process by which researchers would draw samples of various sizes ($N = 4$ to $N = 150$) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes ($N = 20$), we observe accuracies of up to $95\,\%$. For medium sample sizes ($N = 100$) accuracies up to $75\,\%$ were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (18)
  1. Claas Flint (2 papers)
  2. Micah Cearns (1 paper)
  3. Nils Opel (9 papers)
  4. Ronny Redlich (6 papers)
  5. David M. A. Mehler (3 papers)
  6. Daniel Emden (12 papers)
  7. Nils R. Winter (11 papers)
  8. Ramona Leenings (14 papers)
  9. Simon B. Eickhoff (11 papers)
  10. Tilo Kircher (9 papers)
  11. Axel Krug (2 papers)
  12. Volker Arolt (2 papers)
  13. Scott Clark (4 papers)
  14. Bernhard T. Baune (5 papers)
  15. Xiaoyi Jiang (21 papers)
  16. Udo Dannlowski (17 papers)
  17. Tim Hahn (18 papers)
  18. Igor Nenadic (6 papers)
Citations (81)

Summary

We haven't generated a summary for this paper yet.