Abstract
Background:
External validations are essential to assess the performance of a clinical prediction model (CPM) before deployment. Apart from model misspecification, also differences in patient population, the standard of care, predictor definitions, and other factors influence a model's discriminative ability, as commonly quantified by the AUC (or c-statistic). We aimed to quantify the variation in AUCs across sets of external validation studies and propose ways to adjust expectations of a model's performance in a new setting.
Methods:
The Tufts-PACE CPM Registry holds a collection of CPMs for prognosis in cardiovascular disease. We analyzed the AUC estimates of 469 CPMs with at least one external validation. Combined, these CPMs had a total of 1603 external validations reported in the literature. For each CPM and its associated set of validation studies, we performed a random-effects meta-analysis to estimate the between-study standard deviation (Formula presented.) among the AUCs. Since the majority of these meta-analyses have only a handful of validations, this leads to very poor estimates of (Formula presented.). So, instead of focusing on a single CPM, we estimated a log-normal distribution of (Formula presented.) across all 469 CPMs. We then used this distribution as an empirical prior. We used cross-validation to compare this empirical Bayesian approach with frequentist fixed and random-effects meta-analyses.
Results:
The 469 CPMs included in our study had a median of 2 external validations with an IQR of [1–3]. The estimated distribution of (Formula presented.) had a mean of 0.055 and a standard deviation of 0.015. If (Formula presented.) = 0.05, then the 95% prediction interval for the AUC in a new setting has a width of at least (Formula presented.) 0.1, no matter how many validations have been done. When there are fewer than 5 validations, which is typically the case, the usual frequentist methods grossly underestimate the uncertainty about the AUC in a new setting. Accounting for (Formula presented.) in a Bayesian approach achieved near nominal coverage.
Conclusion:
Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models.
Original language | English |
---|---|
Article number | e70011 |
Journal | Statistics in Medicine |
Volume | 44 |
Issue number | 5 |
DOIs | |
Publication status | Published - 28 Feb 2025 |
Bibliographical note
Publisher Copyright:© 2025 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.