Pitfalls of single-study external validation illustrated with a model predicting functional outcome after aneurysmal subarachnoid hemorrhage

Research output: Contribution to journalArticleAcademicpeer-review

1 Downloads (Pure)

Abstract

Background: 

Prediction models are often externally validated with data from a single study or cohort. However, the interpretation of performance estimates obtained with single-study external validation is not as straightforward as assumed. We aimed to illustrate this by conducting a large number of external validations of a prediction model for functional outcome in subarachnoid hemorrhage (SAH) patients.

Methods: 

We used data from the Subarachnoid Hemorrhage International Trialists (SAHIT) data repository (n = 11,931, 14 studies) to refit the SAHIT model for predicting a dichotomous functional outcome (favorable versus unfavorable), with the (extended) Glasgow Outcome Scale or modified Rankin Scale score, at a minimum of three months after discharge. We performed leave-one-cluster-out cross-validation to mimic the process of multiple single-study external validations. Each study represented one cluster. In each of these validations, we assessed discrimination with Harrell’s c-statistic and calibration with calibration plots, the intercepts, and the slopes. We used random effects meta-analysis to obtain the (reference) mean performance estimates and between-study heterogeneity (I2-statistic). The influence of case-mix variation on discriminative performance was assessed with the model-based c-statistic and we fitted a “membership model” to obtain a gross estimate of transportability. 

Results: 

Across 14 single-study external validations, model performance was highly variable. The mean c-statistic was 0.74 (95%CI 0.70–0.78, range 0.52–0.84, I2 = 0.92), the mean intercept was -0.06 (95%CI -0.37–0.24, range -1.40–0.75, I2 = 0.97), and the mean slope was 0.96 (95%CI 0.78–1.13, range 0.53–1.31, I2 = 0.90). The decrease in discriminative performance was attributable to case-mix variation, between-study heterogeneity, or a combination of both. Incidentally, we observed poor generalizability or transportability of the model. 

Conclusions: 

We demonstrate two potential pitfalls in the interpretation of model performance with single-study external validation. With single-study external validation. (1) model performance is highly variable and depends on the choice of validation data and (2) no insight is provided into generalizability or transportability of the model that is needed to guide local implementation. As such, a single single-study external validation can easily be misinterpreted and lead to a false appreciation of the clinical prediction model. Cross-validation is better equipped to address these pitfalls.

Original languageEnglish
Article number176
JournalBMC Medical Research Methodology
Volume24
Issue number1
DOIs
Publication statusPublished - Aug 2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Fingerprint

Dive into the research topics of 'Pitfalls of single-study external validation illustrated with a model predicting functional outcome after aneurysmal subarachnoid hemorrhage'. Together they form a unique fingerprint.

Cite this