Heart failure subphenotypes based on repeated biomarker measurements are associated with clinical characteristics and adverse events (Bio-SHiFT study)

This study aimed to identify


Introduction
The burden of heart failure (HF) is estimated to grow further in the coming years [1][2][3]. Despite advances in HF management, the 5-year mortality rate of heart failure with reduced ejection fraction (HFrEF) remains 67% [4]. Current HFrEF treatment guidelines do not sufficiently consider heterogeneity in clinical profiles and etiologies [5,6]. HFrEF involves many different pathophysiological mechanisms, including neurohormonal, inflammatory and cardio-renal pathways [7,8]. Circulating plasma proteins representing these mechanisms may be used as biomarkers to provide more detailed insights into underlying biological processes in individual HF patients.
Accordingly, previous studies have identified HF subphenotypes based on these proteins and clinical characteristics [5,[9][10][11][12][13]. More specific phenotyping may enable treatment strategies directed towards the different HF subphenotypes' mechanisms. Moreover, improved phenotyping may provide opportunities for more accurate prognostication by associating subphenotypes with the occurrence of clinical endpoints [14][15][16]. These prognostic insights may ultimately contribute to personalized monitoring and timing of treatment strategies.
In this context, subphenotypes of heart failure with preserved ejection fraction (HFpEF) have been investigated in several studies [9][10][11]13,17]. However, studies investigating HFrEF subphenotypes are scarce [5,12]. The latter have based subphenotyping on patient characteristics, medical history, quality of life, exercise capability, and baseline biomarker measurements. They emphasize the heterogeneity within the HFrEF population and the necessity of improved phenotyping. In addition, the syndrome of HF tends to be progressive and includes episodes of decompensation. Thus, using serial biomarker measurements to distinguish subphenotypes, as opposed to single measurements, may enable us to capture this dynamic nature [18][19][20][21][22]. Therefore, in the current study, we aimed to identify HFrEF subphenotypes based on a set of 92 proteins measured in repeatedly collected blood samples of 250 HFrEF patients, and to investigate the association of these subphenotypes with clinical characteristics and adverse events.

Study population
This investigation was performed within the 'Serial Biomarker Measurements and New Echocardiographic Techniques in Chronic Heart Failure Patients Result in Tailored Prediction of Prognosis' (Bio-SHiFT) study. This is a prospective, multi-center cohort study of 398 patients with chronic HF, conducted in the Netherlands in Erasmus MC, Rotterdam, and Noordwest Ziekenhuis-group, Alkmaar. The medical ethics committee of the Erasmus MC approved the study, and it was conducted according to the ethical guidelines of the 1975 Declaration of Helsinki. The study is registered as NCT number 01851538 in ClinicalTrials.gov. Study design has been described in previous papers [18][19][20][21][22]. In brief, the study included outpatients with clinically stable HF (meaning no hospitalization in the past three months). All patients provided written informed consent. Inclusion criteria were ≥ 18 years old; capable of understanding and signing informed consent; and a diagnosis of chronic HF (with reduced or normal ejection fraction) according to the guidelines of the European Society of Cardiology (ESC) [23]. Exclusion criteria were HF secondary to circulatory high output conditions; a scheduled surgery or intervention for a coronary or non-coronary indication; severe renal failure requiring dialysis; pre-existing moderate or severe liver disease; Chronic Obstructive Pulmonary Disease (COPD) Gold stage IV; congenital heart disease; life expectancy of ≤1 year; being unlikely to appear at all scheduled follow-up visits; and a linguistic barrier.
The current investigation used data from 250 HFrEF patients enrolled in the first inclusion round between October 2011 and June 2013 and followed-up until November 2015.

Data-collection and follow-up
Patients received usual care from their physician at the outpatient clinic. Physicians were blinded to results of the biomarker measurements, because these were performed batch-wise after completion of follow-up. Parallel to usual care, study visits were scheduled at baseline and then every three months (± one month) until the end of follow-up or death. Maximum follow-up consisted of ten tri-monthly visits, consisting of a short medical assessment by a research physician and venipuncture. Clinical data were recorded in electronic case report forms. Collected data included demographics, medical history, medication use, and information on occurrence of adverse events.

Selection of blood samples
Blood samples were collected at the baseline and follow-up visits, with a maximum of 11 venipunctures per patient. In the first inclusion round, a total of 1984 samples was collected before occurrence of the primary endpoint (PEP) or censoring (median [25th-75th percentile]: 9 [5][6][7][8][9][10] samples per patient).
For reasons of efficiency, for the current investigation, a selection was made from these samples: all baseline samples, the last sample available in patients in whom the PEP did not occur during follow-up, and the two samples available closest in time before the PEP. Previous investigations in this cohort have demonstrated that several biomarker levels increase in the months preceding the clinical adverse event [20,22]. Thus, by selecting the last two samples before the endpoint, we aimed to capture this increase. Conversely, in event-free patients, previous investigations showed stable biomarker levels, in which case one additional sample suffices. Altogether, the selection resulted in 537 samples for the current analysis.

Biomarker measurements
Within two hours after collection, blood samples were processed and stored at − 80 • C. Biomarkers were measured batch-wise after completion of follow-up. Laboratory personnel were blinded to clinical patient information. The Olink Multiplex platform (Olink Proteomics AB, Uppsala, Sweden) Cardiovascular (CVD) III panel was used. The Olink assay is a Proximity Extension Assay (PEA) enabling high-throughput, multiplex immunoassay-qPCR that measures 92 proteins across 88 samples simultaneously [24,25]. The biomarkers are listed in supplemental table 1.

Primary endpoint
A clinical event committee adjudicated study endpoints based on hospital records and discharge letters, blinded to biomarker results. The PEP comprised cardiovascular death, heart transplantation (HTX), Left Ventricular Assist Device (LVAD) implantation, and hospitalization for the management of acute or worsened HF. Hospitalization for the management of acute or worsened HF was defined as hospitalization due to HF symptom exacerbation, combined with two of the following: BNP or NT-proBNP more than three times the upper limit of normal, signs of worsening HF, increased dosage or intravenous diuretics, or usage of positive inotropic agents [26].

Statistical methods
The full description of the statistical methods is included in the supplemental material, including power and sample size considerations. Continuous variables were described as mean (SD) when normally distributed, and as median (25th-75th percentile) when not normally distributed. Categorical variables were described as counts (percentage). The biomarker measurements provided by Olink Proteomics were already log-transformed.
To perform cluster analysis on repeated biomarker data, we first applied linear mixed modeling to compute individual intercepts and slopes of the temporal biomarker trajectories. Then winsorization [27] and standardization was performed.
The optimal number of clusters was determined using the NbClust R package [28]. Cluster stability was verified by resampling using several schemes (including bootstrapping and noise replacement) [29,30]. Then unsupervised K-means cluster analysis partitioned the patients into this optimal number of clusters. [31].
We repeated clustering and the subsequent analysis described below using 1) solely baseline biomarker measurements, and 2) solely individual intercepts from linear mixed models containing random intercepts only.
Once the clusters were constructed, the biomarker profiles of the clusters were displayed using a heatmap [32]. Individual biomarkers were compared across clusters using one-way analysis of variance (ANOVA) or t-tests, depending on the number of clusters, and were corrected for multiple testing with a Bonferroni correction.
Clinical characteristics were described according to cluster, including age, gender, medical history, risk factors, underlying etiology of left ventricular dysfunction, medication use, symptoms of HF, and echocardiographic measurements. We compared characteristics across clusters using t-tests, Mann-Whitney tests, ANOVA, Kruskal-Wallis tests, χ 2 -tests, and Fisher's exact tests, depending on variable distributions and the number of clusters. Pairwise comparisons were made, including a Bonferroni correction.
The association between cluster membership and the PEP was investigated with Kaplan-Meier curves and Tarone Ware tests. We also fitted accelerated failure time (AFT) models, which produce a time ratio. This time ratio is the exponentiated regression coefficient for cluster membership from the log-linear form of the AFT model, which shows the mathematical relation between the log of time and a set of covariates. Thus the time ratio can be interpreted as the average event-free survival   [33,34]. AFT model performance was assessed using the Akaike Information Criterion (AIC), and a difference of two AIC units was considered significant. All analyses were performed with R Statistical Software version 4.1.1 [35]. Two-tailed p-values below 0.05 were considered as significant.

Clustering based on repeated biomarker measurements
The optimal number of clusters was three, and the K-means clustering algorithm resulted in stable clusters of 93, 78 and 79 patients. The biomarker profiles of the three clusters are displayed in Fig. 1. The ANOVAs showed that 70 out of 92 (76.1%) biomarker intercepts and 56 out of 92 (60.9%) biomarker slopes differed significantly between the clusters. Supplemental table 3 depicts the statistical significance of the differences per biomarker, ranked by p-value. The biomarkers that were in the top 10% both in terms of intercepts and slopes included tumor necrosis factor receptor 1, growth differentiation factor 15, lymphotoxin-beta receptor, trefoil factor 3, and interleukin-18 binding protein. Overall, cluster 3 showed higher intercepts than clusters 1 and 2, signifying higher modeled baseline biomarker levels. Cluster 2 showed larger slopes for most of the biomarkers compared to clusters 1 and 3, which signifies a larger change of levels over time. Additionally, 16 biomarkers showed larger slopes in cluster 3 compared to cluster 2. We identified these biomarkers by eyeballing the heatmap patterns (supplemental fig. 1). Several of these proteins, including NT-proBNP, ST2, GDF-15 and galectin-3, have previously been shown to be important in cardiovascular disease and HF. Based on previous literature, the biological processes in which these 16 proteins are involved include proteolysis, catabolic processes, inflammatory responses, and cardiac remodeling [36][37][38][39][40][41].
The results of the AFT models can be found in Table 2. According to univariable AFT model 1, the time ratio for repeated measurements cluster 3 was 0.21 (95% CI 0.11-0.40, p-value <0.001). Thus, patients in cluster 3 had a statistically significantly poorer prognosis than patients in cluster 1. This difference persisted when correcting for potential confounders in models 2 to 4, with model 4 being the most elaborate. According to model 4, the adjusted time ratio for cluster 3 compared to cluster 1 was 0.25 (95% CI 0.12-0.50, p-value <0.001). The time ratios for cluster 2 compared to cluster 1 were not statistically significant in any AFT model. All models including cluster membership (models 1-4) showed a better model fit than the clinical characteristics only model.

Comparison to single biomarker measurements
For convenience, the clusters based on intercepts and slopes from the repeated measurements mixed models described so far, will be referred to as cluster 1IS, cluster 2IS, and cluster 3IS here. When only using baseline biomarker measurements, 2 clusters of sizes 141 and 109 were found: cluster 1B and cluster 2B. Cluster 2B had higher, and cluster 1B lower baseline biomarker levels (supplemental fig. 2). Cluster 1B mostly overlapped with cluster 1IS and 2IS, while cluster 2B mostly overlapped with cluster 3IS (supplemental fig. 3). Clinical characteristics of clusters 1B and 2B showed similar patterns to those of the repeated measurements clusters, cluster 2B most resembling cluster 3IS (supplemental table 4). Cluster 2B had a significantly poorer prognosis compared to cluster 1B (supplemental fig. 4), also when correcting for potential confounders. Model 3 (corrected for age, sex, NYHA class, duration of HF and HF etiology cardiomyopathy) was the only model that, according to the AIC values, showed a better model fit than the clinical characteristics only model. Notably, the repeated measurements AFT models all had better model fit than the baseline AFT models (Table 2).
Clustering based on linear mixed models with random intercepts only resulted in 2 clusters of sizes 150 and 100 (cluster 1I and cluster 2I, respectively), with biomarker patterns similar to those of the baseline clusters (supplemental fig. 5). Clusters 1IS and 2IS were mostly combined into cluster 1I (supplemental fig. 6). Clinical characteristics again showed similar patterns to the repeated measurements clusters described earlier, with cluster 2I most resembling cluster 3IS (supplemental table 5). Cluster 2I had a worse prognosis compared to cluster 1I (supplemental fig. 7), which persisted after correcting for potential confounders. Model fit of AFT-models including clusters 1I and 2I was better than that of the AFT-models including cluster 1B and cluster 2B (supplemental table 6).

Discussion
In this investigation, we used a set of 92 repeatedly measured blood biomarkers in a cohort of 250 HFrEF patients. Based on their temporal biomarker patterns, we identified three HFrEF clusters, or subphenotypes. The clusters showed significant differences in clinical characteristics and survival. Moreover, survival models including clusters based on repeated biomarker measurements showed better model fit than models including clusters based on baseline biomarker measurements only, as well as a model containing clinical characteristics only. Altogether, our results underscore that identification of subphenotypes based on repeated measurements of a large set of cardiovascular-related biomarkers carries potential to improve prognostication of HFrEF patients.
Proteomics phenotyping is upcoming in patients with heart failure with preserved ejection fraction (HFpEF) [9][10][11]13]. However, studies on HFrEF subphenotypes are still scarce. Ahmad et al. [5] clustered 1619 HFrEF patients based on 45 baseline clinical variables and found 4 clusters. Tromp et al. [12] found 6 clusters based on 92 single 'baseline' cardiovascular biomarker measurements in 1802 HFrEF patients. Similarly to our investigation, these studies found an association between cluster membership and prognosis. These studies based their clustering on patient characteristics, medical history, quality of life scores, exercise capability, and baseline biomarker measurements. The current investigation extends their findings by using repeated biomarker measurements for clustering, which accounts for the dynamic nature of HF and therefore may uncover additional properties of HFrEF subphenotypes.
Cluster 3 showed the highest average age, a longer duration of HF, more advanced HF, lower eGFR levels, and more pre-existing diabetes mellitus, hypertension, and atrial fibrillation. As well as the highest baseline biomarker levels and the worst prognosis. Comparing our results to previous studies, a 'phenotype' similar to our cluster 3 was found by Ahmad et al. [5]. Patients in their cluster 1 were older, had preexisting hypertension, diabetes mellitus and atrial fibrillation, and had more advanced HF. Ahmad et al. also found higher levels of some blood biomarkers (NT-proBNP, galectin-3 and ST2) and the worst prognosis in this cluster. The clusters found by Tromp et al. [12] had less similar characteristics, with 'endotype' 4 most resembling cluster 3 of the current study. Namely, endotype 4 more often showed pre-existing atrial fibrillation, had higher NT-proBNP levels and had the worst prognosis.

Table 2
Associations between clusters based on baseline or repeated biomarker measurements, and the primary endpoint. To our knowledge, this is the first study that associates HFrEF subphenotypes based on temporal evolutions of circulating proteins with prognosis. Circulating protein levels fluctuate over time and repeated protein measurements may account for these fluctuations and thus enhance the prognostic value of the ensuing subphenotypes. Indeed, our prognostic models containing clusters that were based on full temporal trajectories of biomarkers, performed better compared to those containing clusters based on biomarkers measured at one single time-point. This demonstrates the advantage of using temporal biomarker trajectories in subphenotyping of HFrEF.
In addition, we found interesting differences between the clusters' dynamic biomarker profiles. Cluster 2 showed relatively low modeled baseline biomarker levels, with increases in levels of the majority of the biomarkers over time, while cluster 3 showed higher modeled baseline biomarker levels, and increasing levels over time of the 16 remaining biomarkers. Considering the lower average age, shorter duration of HF, and less advanced HF in cluster 2 compared to cluster 3, a tentative hypothesis may be that these phenotypes represent chronologically successive states. Phenotype 2 (milder HF) could transition into phenotype 3 (more advanced HF); i.e., the biomarkers might increase over time in patients with phenotype 2, until the high 'baseline' level of phenotype 3 is reached, whereafter the remaining biomarker levels rise further. We showed the last-mentioned set of biomarkers to include well known cardiovascular-related proteins such as NT-proBNP, ST2, GDF-15 and galectin-3. In this context, the biomarkers that increased over time in cluster 2 and not cluster 3, might potentially have prognostic value in an earlier stage of HF.
Some study limitations should be acknowledged. The sample size and number of endpoints was limited, which limited our ability to correct survival models for potential confounders. We therefore chose to construct multiple AFT models with multiple sets of potential confounders. Moreover, a larger sample size might have better enabled us to uncover heterogeneity and might have provided additional opportunities for identifying subphenotypes. Also, the investigated biomarker panel contained proteins which had previously been linked to cardiovascular disease. A broader panel might better identify specific biomarkers distinguishing clusters and provide additional opportunities for unveiling specific HF mechanisms. Additionally, our results might not be generalizable to HF patients with for example advanced COPD or renal failure. Finally, the Bio-SHiFT study population is mostly Caucasian, and proportion of women is limited.
Clinical application of our findings warrants further attention and additional research. Once our findings are confirmed in additional HFrEF cohorts, strategies should be developed to enable identification of HFrEF subphenotypes in clinical practice. Implementation of extensive biomarker panels may potentially be challenging; smaller multiplex panels that contain the biomarkers which are most essential for discrimination between phenotypes, may perhaps benefit clinical applications. Ensuing phenotype information could be taken into account in addition to clinical information. Treatment could be adapted to prognosis associated with a particular phenotype, and potentially also to its underlying mechanisms. For example, subphenotypes with a worse prognosis might benefit from more intensive monitoring and earlier intervention. Hereby, phenotyping may contribute to further personalization of treatment plans in the future.
In conclusion, we identified three HFrEF subphenotypes based on a set of 92 repeatedly measured proteins. These subphenotypes show significant differences in clinical characteristics and are significantly associated with adverse events. Thus, phenotyping based on temporal biomarker patterns carries potential to contribute to improved prognostication and personalized risk assessment.

Funding
This work was supported by the Jaap Schouten Foundation and the Noordwest Academie. The funding sources had no role in the study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

Declaration of Competing Interest
The authors report no relationships that could be construed as a conflict of interest.