Cross-validation failure: small sample sizes lead to large error bars (1706.07581v1)

Published 23 Jun 2017 in q-bio.QM, stat.ME, and stat.ML

Abstract: Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, eg $\pm$10% for 100 samples. The standard error across folds strongly underestimates them. These large error bars compromise the reliability of conclusions drawn with predictive models, such as biomarkers or methods developments where, unlike with cognitive neuroimaging MVPA approaches, more samples cannot be acquired by repeating the experiment across many subjects. Solutions to increase sample size must be investigated, tackling possible increases in heterogeneity of the data.

Citations (492)

View on Semantic Scholar

Collections

Summary

The paper demonstrates that predictive models based on small neuroimaging samples produce cross-validation errors up to ±10%.
It uses empirical data and simulations to reveal that sampling noise significantly undermines accuracy estimates.
It recommends using larger datasets and robust validation methods to enhance the reliability of neuroimaging studies.

Understanding Cross-Validation Errors in Neuroimaging with Small Sample Sizes

This paper addresses a critical issue in the evaluation of predictive models used in statistical brain image analysis. Specifically, it highlights the limitations of cross-validation (CV) when applied to small sample sizes, a common scenario in neuroimaging studies. The key finding is that small sample sizes lead to significantly large error bars in cross-validation results, which may compromise the reliability of conclusions drawn using these models.

Core Findings

Predictive models are essential in various neuroimaging applications, such as decoding neural support, MVPA, and biomarker extraction. These models must generalize well to new, unseen data, which is typically validated through cross-validation. However, this paper demonstrates that when applied to small sample sizes, the error bars in cross-validation can be as large as ±10% for a sample size of 100. This has substantial implications, especially in neuroimaging studies where replicating results across many subjects to increase sample size is often impractical.

Experiments and Results

The paper reports several experiments, including empirical analyses with neuroimaging data and simulations with synthetic data. The experiments reveal that the predictions' accuracy, when measured by cross-validation, significantly diverges from true accuracy due to sampling noise inherent in small datasets.

Neuroimaging Data Analysis: Using various neuroimaging datasets, the paper shows that errors in cross-validation measures are consistently around ±10%.
Simulations: Similar observations are made with simulations, reinforcing the notion that cross-validation leads to large errors in measuring prediction accuracy.
Statistical Modeling: The paper further notes that using models such as a binomial distribution, the intrinsic variance and sampling noise can be predicted, revealing the potential underestimation of errors when using standard error calculations.

Implications

The findings have significant theoretical and practical implications. For neuroimaging methods development and diagnostic applications, reliance on small samples can lead to overfitting and confirmation bias. This underlines the need for larger datasets and robust cross-validation techniques to ensure generalizability and reliability.

Recommendations:

Data Pooling: Encouraging data sharing and pooling from various studies can increase sample sizes, although care must be taken to address heterogeneity.
Paradigm Adjustments: Shifting experimental paradigms to those facilitating larger data accumulation might help. This includes using standard protocols or naturalistic stimuli.
Group-Level Analysis: For cognitive neuroscience, group-level statistical analyses rather than subject-level are recommended to mitigate cross-validation errors.
Multiple Dataset Evaluations: For method development, testing across multiple datasets is crucial to avoid pitfalls associated with small sample sizes.

Future Directions

This paper suggests a need for systematic changes in how neuroimaging data is collected and analyzed. Larger datasets and more robust cross-validation strategies are necessary. As the field progresses, innovations in data acquisition and sharing, alongside methodological advancements, are crucial.

In conclusion, while predictive models offer vast potentials for advancing neuroimaging, the methodology must evolve to address the challenges associated with cross-validation on small samples. Ensuring this will be key to realizing the full potential of machine learning in understanding the brain.

PDF Markdown

Paper Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (1)

Gaël Varoquaux