Instability of the AUROC of Clinical Prediction Models

Florian D. van Leeuwen, Ewout W. Steyerberg, David van Klaveren, Ben Wessler, David M. Kent, Erik W. van Zwet*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

3 Downloads (Pure)

Abstract

Background: 

External validations are essential to assess the performance of a clinical prediction model (CPM) before deployment. Apart from model misspecification, also differences in patient population, the standard of care, predictor definitions, and other factors influence a model's discriminative ability, as commonly quantified by the AUC (or c-statistic). We aimed to quantify the variation in AUCs across sets of external validation studies and propose ways to adjust expectations of a model's performance in a new setting.

Methods: 

The Tufts-PACE CPM Registry holds a collection of CPMs for prognosis in cardiovascular disease. We analyzed the AUC estimates of 469 CPMs with at least one external validation. Combined, these CPMs had a total of 1603 external validations reported in the literature. For each CPM and its associated set of validation studies, we performed a random-effects meta-analysis to estimate the between-study standard deviation (Formula presented.) among the AUCs. Since the majority of these meta-analyses have only a handful of validations, this leads to very poor estimates of (Formula presented.). So, instead of focusing on a single CPM, we estimated a log-normal distribution of (Formula presented.) across all 469 CPMs. We then used this distribution as an empirical prior. We used cross-validation to compare this empirical Bayesian approach with frequentist fixed and random-effects meta-analyses. 

Results: 

The 469 CPMs included in our study had a median of 2 external validations with an IQR of [1–3]. The estimated distribution of (Formula presented.) had a mean of 0.055 and a standard deviation of 0.015. If (Formula presented.) = 0.05, then the 95% prediction interval for the AUC in a new setting has a width of at least (Formula presented.) 0.1, no matter how many validations have been done. When there are fewer than 5 validations, which is typically the case, the usual frequentist methods grossly underestimate the uncertainty about the AUC in a new setting. Accounting for (Formula presented.) in a Bayesian approach achieved near nominal coverage. 

Conclusion: 

Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models.

Original languageEnglish
Article numbere70011
JournalStatistics in Medicine
Volume44
Issue number5
DOIs
Publication statusPublished - 28 Feb 2025

Bibliographical note

Publisher Copyright:
© 2025 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.

Fingerprint

Dive into the research topics of 'Instability of the AUROC of Clinical Prediction Models'. Together they form a unique fingerprint.

Cite this