Rapid Evaluation of Coronavirus Illness Severity (RECOILS) in intensive care: Development and validation of a prognostic tool for in‐hospital mortality

Abstract Background The prediction of in‐hospital mortality for ICU patients with COVID‐19 is fundamental to treatment and resource allocation. The main purpose was to develop an easily implemented score for such prediction. Methods This was an observational, multicenter, development, and validation study on a national critical care dataset of COVID‐19 patients. A systematic literature review was performed to determine variables possibly important for COVID‐19 mortality prediction. Using a logistic multivariable model with a LASSO penalty, we developed the Rapid Evaluation of Coronavirus Illness Severity (RECOILS) score and compared its performance against published scores. Results Our development (validation) cohort consisted of 1480 (937) adult patients from 14 (11) Dutch ICUs admitted between March 2020 and April 2021. Median age was 65 (65) years, 31% (26%) died in hospital, 74% (72%) were males, average length of ICU stay was 7.83 (10.25) days and average length of hospital stay was 15.90 (19.92) days. Age, platelets, PaO2/FiO2 ratio, pH, blood urea nitrogen, temperature, PaCO2, Glasgow Coma Scale (GCS) score measured within +/−24 h of ICU admission were used to develop the score. The AUROC of RECOILS score was 0.75 (CI 0.71–0.78) which was higher than that of any previously reported predictive scores (0.68 [CI 0.64–0.71], 0.61 [CI 0.58–0.66], 0.67 [CI 0.63–0.70], 0.70 [CI 0.67–0.74] for ISARIC 4C Mortality Score, SOFA, SAPS‐III, and age, respectively). Conclusions Using a large dataset from multiple Dutch ICUs, we developed a predictive score for mortality of COVID‐19 patients admitted to ICU, which outperformed other predictive scores reported so far.


| INTRODUC TI ON
By 1st of April 2021, 129 million infections with severe acute respiratory coronavirus 2 (SARS-CoV-2) had been confirmed worldwide. At the same time, the resulting coronavirus disease (COVID- 19) had caused an estimated 2.8 million deaths. 1,2 Once patients are admitted to the Intensive Care Unit (ICU), COVID-19 has a high mortality rate. 3 Moreover, the large numbers of patients requiring hospitalization 4 and/or ICU admission have put healthcare systems under immense pressure, with shortcomings in the availability and quality of many aspects of medical treatment. [5][6][7] The severe form of COVID-19 is most notably characterized by respiratory failure. [8][9][10][11][12][13][14][15][16][17][18][19][20][21] In these patients, predicting outcome in the first 24 h of ICU admission is fundamental to the safe, effective and appropriate allocation of key components of ICU treatment. In this regard, some demographic features and markers of illness severity have been reported as helpful in identifying patients at particularly high mortality risk.
Several risk scores have been constructed with the intention of predicting outcome of patients infected with SARS-CoV-2, most importantly mortality. 12,14 The benefit of good quality predictive scores is twofold. First, they can have direct clinical utility if used to stratify patients based on risk, which is often required for triage purposes. Second, they can provide a useful tool in clinical research, where the need to adjust randomization for illness severity is key.
To the best of our knowledge, however, there are still no large, multicenter studies comparing different clinical risk scores for predicting mortality among ICU patients with COVID-19. In this development and validation study, our aim was to take advantage of a large multicenter national database from the Netherlands to construct a novel risk score based on such more detailed and granular data and to study its performance compared to that of previously published scores. In particular, we aimed to test the hypothesis that a better performing predictive score could be developed using routinely collected ICU data to predict in-hospital mortality in COVID-19 patients admitted to ICU.

| Study design and cohort
This was a retrospective, multicenter, observational study in which we developed and validated a prognostic score for the primary outcome of in-hospital mortality. The study cohort consisted of adults (>18 years) admitted to intensive care units (ICUs) with a confirmed SARS-CoV-2 infection, between March 2020 and April 2021, across 25 different hospitals in the Netherlands. Patients who were still in the hospital at the time of writing of this manuscript were excluded from the analysis, together with patients who were transferred to other hospitals, which were not part of the Dutch COVID-19 database. Patients who were discharged from hospital, but were readmitted at a later stage, were treated as separate patient encounters.
A systematic literature review was conducted, in order to determine all the currently reported risk factors for COVID-19 mortality.
After this, the area under receiver operator characteristic curve (AUROC) was inspected for each candidate feature found in the literature. Additionally, a tree ensemble (Random Forest) model was constructed for predicting mortality, with emphasis on inspecting feature importance (based on the Gini index) within such a tree-based model. This resulted in a selection of 10 variables that were most predictive.
Patients who had three or more missing variables were excluded from the cohort. The remaining missing values were imputed, assuming missingness was a sign that a variable lied in the clinically normal range.
The study cohort was split into an approximately 60% development set and 40% validation set. The splitting was based on hospitals, so that model validation data originated from hospitals which were not included in the development cohort. The size of the splits was chosen in order to have more than 855 patients in the validation cohort, which ensured a power of more than 75% of detecting an AUROC improvement of 0.05 over a baseline score with AUROC 0.7, for a test with a 5% significance level, assuming independent predictors.
Instead of selecting variable cut-off points manually by inspecting ROC curves, we determined a clinically relevant range for a feature value, together with an increase that would be deemed clinically significant. For example, the relevant range for blood urea nitrogen was taken to be between 15 and 100 mg/dl, where an increase of 5 mg/dl was deemed significant. The selection of relevant thresholds and associated weights was left to the logistic regression model with a least absolute shrinkage and selection operator (LASSO) penalty.
Several clinical risk scores were considered for comparison with our newly developed score, including the Sequential Organ

Editorial Comment
Using a large dataset from multiple Dutch ICUs, the authors developed a predictive score for mortality of COVID-19 patients admitted to ICU, which outperformed other predictive scores reported so far.
Failure Assessment (SOFA) score, the Simplified Acute Physiology Score (SAPS-III), and the International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) 4C mortality score. A baseline using only age as a continuous predictor was also included.
The performance of the predictive scores was evaluated by inspecting the receiver operator characteristic (ROC) and precisionrecall (PR) curves, together with the commonly used measure of area under the curve (AUC). The PR curve indicates the positive predictive value (PPV) of a score, for each value of sensitivity. We reported the 95% confidence intervals (CIs) of points on the ROC and PR curves, together with confidence intervals of AUROC and area under PR curve (AUPRC), which were obtained using bootstrap.
We tested several null hypotheses H 0 in order to determine whether our score significantly improved performance upon the above predictive scores. Due to a smaller proportion of women in the cohort, in a sensitivity analysis, we inspected the ROC curve when splitting the entire cohort on sex. Additionally, we also inspected the ROC curve when splitting the cohort based on admission wave (first wave defined as admissions before August 2020, second wave admissions after August 2020). Lastly, we inspected the calibration of our newly developed score, reporting the average mortality rate (with a 95% CI) for each score level. Also, as the score was derived based on a logistic model, we provided a formula to compute the expected mortality rate based on the score value.

The Medical Ethics Committee at Amsterdam UMC, location
VUmc waived the need for patient informed consent and approved of an opt-out procedure for the collection of COVID-19 patient data during the COVID-19 crisis. For statistical analysis and data loading, we used the ricu R-package 39   Severity (RECOILS) and is presented in Table 2. The β-coefficient estimates from which the score was constructed are presented in Table S3. The ROC and PR curves of the score, alongside clinical baseline scores, are presented in Figure 1.  Model calibration was inspected by plotting the mortality rate for each value of the score ( Figure 2) and the model was found to be well calibrated (Brier score 0.167). The equation that can be used to estimate the expected mortality rate based on the RECOILS score is given by

| RE SULTS
The comparison of the observed mortality rates for each value of RECOILS with the estimates based on the above formula is reported graphically in Figure 2 and also in Table S4.

| Key findings
Using a large national database, we developed a novel risk score for <60 - Glasgow coma scale (points)

≥38 1
Urea nitrogen (mg/dl) Values of every subcomponent are added together to obtain the final score.

| Implications of study findings
Our findings imply that it is possible to construct a predictive score for mortality of COVID-19 patients admitted to the ICU, which can be evaluated within 24 h of ICU admission, and that outperforms all published predictive scores so far. Moreover, our proposed score can be used for clinical research, to adjust any identified effects of treatment according to baseline risk and also to compare outcomes across different hospital centers based on the expected mortality proportion for each score value. Finally, this score can also be used in trials to stratify randomization according to baseline risk.

| Strengths and limitations
There are several strengths to our study. We conducted a multicenter study, involving 25 different hospitals and almost 2500 patients, making it the largest study on clinical prediction scores for ICU patients with COVID-19 so far. The systematic review of the variables reported in the literature and the data-driven approach to threshold selection served to minimize bias in the score construction. Additionally, the validation of the score on a set of hospitals, separate from those in the development cohort, makes the findings of this study more likely to be both robust and generalizable.
We also acknowledge several limitations to our study. It is an observational study, therefore prone to possible sampling bias, and causal inferences cannot be drawn from this study. Second, even though the Dutch COVID-19 database contains information on most F I G U R E 2 Calibration of RECOILS score. Increasing values of the RECOILS score are associated with increased mortality, showing good score calibration. The average mortality rate for different values of the RECOILS score, with 95% confidence intervals are shown as bars.
The red line represents the mortality risk estimated based on the formula provided in the main text of the important comorbidities identified to be associated with mortality from COVID-19, it is possible that some comorbidities were underreported. Third, some previously published risk scores, such as the ISARIC 4C mortality score, were designed for the emergency room setting. For this reason, they suffer in performance when applied to the ICU setting. This, however, emphasizes the importance of developing a risk score specific to the ICU setting. Lastly, the Dutch intensive care system is that of a resource-rich country and the findings of our predictive score are likely to be relevant to similar systems, but less likely to be directly relevant to ICU systems in middle or low-income countries.

| CON CLUS ION
In a multicenter study involving over 2400 COVID-19 ICU patients admitted to 25 different hospitals across the Netherlands, we developed and externally validated a predictive risk score (RECOILS) for in-hospital mortality, which significantly outperformed all COVID-19 specific outcome prediction scores published thus far. This score can be used by clinicians to help prognosticate; make decisions in relation to resource allocation; assist in treatment efficacy assessment; and help stratify patients for randomization in clinical trials.