Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A decomposition of Fisher's information to inform sample size for developing fair and precise clinical prediction models -- part 1: binary outcomes (2407.09293v2)

Published 12 Jul 2024 in stat.ME

Abstract: When developing a clinical prediction model, the sample size of the development dataset is a key consideration. Small sample sizes lead to greater concerns of overfitting, instability, poor performance and lack of fairness. Previous research has outlined minimum sample size calculations to minimise overfitting and precisely estimate the overall risk. However even when meeting these criteria, the uncertainty (instability) in individual-level risk estimates may be considerable. In this article we propose how to examine and calculate the sample size required for developing a model with acceptably precise individual-level risk estimates to inform decisions and improve fairness. We outline a five-step process to be used before data collection or when an existing dataset is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model, and an assumed 'core model' either specified directly (i.e., a logistic regression equation is provided) or based on specified C-statistic and relative effects of (standardised) predictors. We produce closed-form solutions that decompose the variance of an individual's risk estimate into Fisher's unit information matrix, predictor values and total sample size; this allows researchers to quickly calculate and examine individual-level uncertainty interval widths and classification instability for specified sample sizes. Such information can be presented to key stakeholders (e.g., health professionals, patients, funders) using prediction and classification instability plots to help identify the (target) sample size required to improve trust, reliability and fairness in individual predictions. Our proposal is implemented in software module pmstabilityss. We provide real examples and emphasise the importance of clinical context including any risk thresholds for decision making.

Summary

  • The paper introduces a five-step method to calculate sample sizes that yield precise individual risk estimates for binary clinical prediction models.
  • It details a robust methodology including the use of Fisher’s unit information matrix and the pmstabilityss software module for practical implementation.
  • The study demonstrates that proper sample size determination substantially improves model fairness and clinical decision-making reliability.

Sample Size Considerations for Developing Binary Outcome Prediction Models

This research paper offers an intricate exploration into the methodology for determining an adequate sample size necessary for developing clinical prediction models geared toward binary outcomes. As prediction models play an increasingly critical role in clinical decision-making, model developers are charged with the responsibility of ensuring their outputs are both accurate and fair across diverse population groups. Consistently, inadequate sample sizes result in overfitting, model instability, poor predictive performance, and issues of fairness, especially when individual-level risk estimation is necessary.

Core Methodology

The authors propose a meticulous five-step framework to ascertain the sample size required for developing a clinical prediction model that reliably estimates individual-level risks. This framework involves defining a core set of predictors, establishing the joint distribution of these predictors, specifying a core model against which predictions will be evaluated, deriving Fisher’s unit information matrix, and ultimately evaluating the impact of the proposed sample size on predictive accuracy and uncertainty. This structured approach contrasts with conventional methods, which primarily emphasize overall event risk and the avoidance of overfitting, neglecting individual prediction precision.

Key Insights

Model Uncertainty: The focus on individual-level prediction accuracy underlines the importance of accounting for uncertainties related to parameter estimates — a nuanced facet of epistemic uncertainty in logistic regression models. The discussion around aleatoric uncertainty, which remains unaddressed in this work, suggests directions for future methodological developments.

Software Implementation: A practical contribution is the introduction of the pmstabilityss software module to implement this methodology, which potentially accelerates calculations by utilizing closed-form solutions to decompose the variance in individual risk estimates. This software promises enhanced applicability for both the development of new models and the evaluation of existing datasets.

Impact on Fairness and Decision-Making: The paper argues that enhancing fairness by ensuring model precision across all subgroups — including minorities — is essential until the point of outcome observation, although this does not fully resolve health inequity issues, which bear further investigation. The decision-making angle, contextualized within stakeholder-driven risk thresholds, aligns with decision theory and highlights areas where predictive uncertainty may compromise clinical utility.

Applications and Practical Implications

The paper provides case studies demonstrating the framework's application, notably within models predicting diabetic foot ulcers and acute kidney injury. These examples elucidate how proper sample size contributes to model reliability and strengthens confidence in clinical decision-making. The choice of sample size profoundly affects the individual-level prediction precision, underscoring the potential dangers of relying solely on population-level metrics.

Conclusion and Future Directions

This research marks a critical step forward in prediction model development by emphasizing individual risk assessment precision. While the authors provide a thorough guide, significant exploration remains, particularly regarding how these techniques apply to penalized regressions, machine learning models, and large-scale predictions involving a vast number of predictors.

Future research should constructively address these challenges, expand on Bayesian approaches to uncertainty estimation, and rigorously explore the trade-offs encountered between model complexity and resource allocation.

For clinical model developers, the insights offered in this paper fundamentally advocate for a more comprehensive evaluation of data requirements prior to model development, laying groundwork for improved model fairness, reliability, and interpretability across diverse patient populations.