Thyroglobulin and thyroglobulin antibodies: assay-dependent management consequences in patients with differentiated thyroid carcinoma

Objectives: International guidelines recommend fixed cut-off values for thyroglobulin (Tg). These cut-offs do not take potential assay differences into account. This study aimed to evaluate if different assays for Tg and Tg antibodies (TgAb) affect management guidance for differentiated thyroid cancer (DTC) patients. Methods: In 793 samples derived from 413 patients with DTC, Tg and TgAb were simultaneously measured with two immunometric assays: Immulite 2000XPi and Kryptor compact plus . In addition, a qualitative measurement for TgAb interference (recovery test) was performed on the Kryptor compact plus platform. The extent to which different assayslead todifferent classi ﬁ cations ofresponse to therapy was evaluated when applying the current cut-offs for Tg. Results: Mean Tg concentrations were 37.4% lower with Kryptor as compared with Immulite . Applying guideline based cut-off values for Tg, 33 (4.7%) samples had a Tg-on concentration ≥ 1.0 μ g/L with Immulite and <1.0 μ g/L with Kryptor . Of the samples tested as TgAb + with at least one assay (n=125), 68 (54.4%) samples showed discrepancy in TgAb status. Differences between Immulite and Kryptor measurements resulted in a change in the response to therapy classi ﬁ cation in 94 (12.0%) measurements derived from 67 (16.2%) individual patients. Conclusions: substantial portion of DTC patients were classified differently dependent on which Tg and TgAb assays are used, when applying the cut-off values as defined in clinical guidelines. Such differences can significantly affect clinical management. In the context of large between-method variation, the recommended Tg cut-offs in guidelines should be used with wisdom rather than as fixed cut-offs.


Introduction
Measurement of serum thyroglobulin (Tg) has an essential role in follow-up monitoring of patients with differentiated thyroid cancer (DTC). Tg measurements in combination with a neck ultrasound have a high sensitivity for detecting persistent or recurrent disease in patients previously treated with thyroidectomy and radioactive iodine (RAI) ablation [1].
Currently, most laboratories have adopted an automated immunometric assay for measurement of Tg. In recent years the diagnostic sensitivity has been improved with the introduction of assays with functional sensitivities (FS) ≤0.1 μg/L, also referred to as high-sensitive Tg assays (hs-Tg).
The measurement of Tg is complicated by multiple factors. First, a large between-method variation regarding sensitivity, accuracy and precision prevents comparison of (consecutive) Tg measurements with different assays [2][3][4].
Second, Tg assays suffer from interfering endogenous Tg antibodies (TgAb), which are detected in up to 20% of patients with DTC [2]. As for Tg, immunometric assays for TgAb also show large between-method variation, which impedes the correct identification of patients with interfering TgAb [5][6][7].
This study aimed to evaluate to what extent different methods for Tg and TgAb measurement affect management guidance for individual DTC patients. Therefore, in a large cohort of DTC patients, Tg and TgAb were measured simultaneously with Immulite 2000XPi and Kryptor compact plus, both immunometric assays, and related to clinical disease status.

Selection of patients
This study was performed at the Erasmus University Medical Center, Rotterdam, The Netherlands, a regional referral center for thyroid cancer. Samples for Tg and TgAb measurements were taken from DTC patients who had a follow-up visit between January 2018 and April 2019. All patients had been treated with a total thyroidectomy and RAI ablation. Follow-up was in accordance with national guideline recommendations (see Supplementary Methods). Tg-on, rh-TSH-Tg and Tg after withdrawal for radioiodine treatment (Tg-off) were included in the analysis. Measurements taken within the first six months after RAI ablation were excluded, as were samples from patients with tumors containing areas of poor differentiation. Clinical data were retrieved from medical records. At time of Tg measurement the response to therapy was assessed according to the classification proposed in the 2015 American Thyroid Association (ATA) Guideline [8]. Patients were considered to have an excellent response to therapy if Tg-on was <0.2 μg/L or rh-TSH-Tg was <1 μg/L in combination with negative imaging and undetectable TgAb. In case of Tg-on ≥1 μg/L, rhTSH-Tg ≥10 μg/L or rising TgAb concentrations, a patient was classified as having a biochemical incomplete response. Patients with structural or functional evidence of disease were classified as structural incomplete response. Finally, patients with one of the following: Tg-on >0.2 μg/L but <1 μg/L or rh-TSH-Tg >1.0 μg/L but <10 μg/L, nonspecific findings on imaging studies, stable or declining TgAb, were considered to have an indeterminate response. Measurements with Tg-on levels >1.0 μg/L in combination with TgAb above the cut-off value were considered as biochemical disease.
Existing criteria for disease classifications were tailored to allow for comparison between assays. Patients with a Kryptor Tg-on between 0.15 μg/L and 1.0 μg/L in the absence of a Kryptor rh-TSH-Tg measurement were categorized as having an indeterminate response, irrespective of Immulite rh-TSH-Tg measurements performed before the study period. In the absence of an rh-TSH-Tg measurement, patients were classified as having an excellent response if follow-up since diagnosis has lasted for more than 10 years without detectable Tg-on or radiological signs of recurrent disease. If the assessment of the response to therapy had not yet been performed at all (e.g., after additional treatment for recurrent disease) the disease status was indicated as "not yet stratified". This study was approved by the medical Ethical Board of the Erasmus University Medical Center (MEC-2018-1195).

Laboratory procedures
Tg assays: Serum Tg was measured with two immunometric assays: Immulite 2000XPi (Siemens Healthcare Diagnostics Inc, Tarrytown, NY, USA) and Kryptor compact plus (B.R.A.H.M.S Thermo Fischer Scientific, Hennigsdorf, Germany), according to manufacturers' instructions. Both Tg assays have been standardized against the same Tg reference material (CRM 457). The functional sensitivity for Tg measurement (defined as an interassay coefficient of variation of 20%) as provided by the manufacturer is 0.9 μg/L for Immulite and 0.15 μg/L for Kryptor, both verified in an ISO15189 accredited laboratory. Verification performed in our laboratory showed a negative analytical bias for the Kryptor assay of 35% (Supplementary Figure S1). Additionally, an institutional cut-off was established for the Kryptor Tg assay in a subset of 55 patients with both basal Tg-on and rh-TSH-Tg measurements available. The optimum cut-off for Kryptor Tg-on was found to be 0.15 μg/L, which is similar to the functional sensitivity as defined by the manufacturer (see Supplementary Text and Supplementary Figure S2). For the purpose of comparing mean concentrations between Immulite and Kryptor, undetectable Tg concentrations were plotted as 0.0.
TgAb assays and recovery analysis: TgAb were measured with assays from the same manufacturers as the Tg assays. Both TgAb assay have been standardized against the international reference preparation (MRC 65/93).
For the Kryptor TgAb assay a cut-off of 33 U/mL was used, corresponding to the functional sensitivity (Thermofisher instructions for use). For the Immulite TgAb assay the functional sensitivity was not provided by the manufacturer. An institutional cut-off of 9 U/mL was defined above which a clinically relevant interference with Tg measurement of at least 80% was measured. The cut-off of 9 U/mL is compatible with the upper reference limit for the Immulite TgAb assay as established in healthy individuals according to the National Academy of Clinical Biochemistry guidelines [15]. A Tg recovery test was performed on the Kryptor platform. An automatically adapted Tg concentration was added to the serum, based on the native Tg concentration. If the native Tg concentration was <5 μg/L, the Tg concentration of the added solution was 2.5 μg/L. If the native Tg concentration exceeded 5 μg/L, the concentration of the added solution was 20 μg/L. According to the manufacturers' instructions a recovery between 80 and 120% was considered as undisturbed.
Median Tg concentrations were significantly higher in TgAb− measurements compared with TgAb+ measurements (Table 2).

Impact on clinical practice
Applying guideline based cut-off values, 33 (4.7%) samples in 25 (6%) patients had a Tg-on concentration ≥1.0 μg/L with Immulite and <1.0 μg/L with Kryptor ( Figure 2). In rh-TSH-Tg measurements, 3 (4.3%) measurements in three individual patients, showed discrepancy with regard to the cut-off of 1.0 μg/L: Tg measured with Immulite was above the cut-off and with Kryptor below the cut-off of 1.0 μg/L. No discrepancy between the assays was observed for the cut-off of 10 μg/L. Of the samples determined as TgAb+ with at least one assay (n=125), 68 (54.4%) samples, derived from 47 (11.4%) individual patients, showed discrepancy in TgAb positivity between either assay.

Recovery
In 675 samples (85.1%) a Kryptor recovery measurement was performed in addition to the quantitative TgAb A disturbed recovery with a percentage above 120% was observed in 44 (5.5%) samples, derived from 37 patients. In 6 (13.6%) of these samples TgAb were tested as positive with one or both assays.
In the subset of measurements with a normal recovery value (80-120%) median Kryptor Tg was significantly   higher in the TgAb− group compared to the TgAb+ group, whereas median Immulite Tg did not significantly differ between TgAb+ and TgAb− groups ( Table 2).

Loco regional or distant metastases
In 21 out of 76 patients (27.6%) with structural disease, Tgon was undetectable with one or both assays. For 15 patients recovery analysis was available and ranged from 75% to 125% (Table 4). Although Immulite failed to detect Tg in all these patients, Kryptor detected Tg in 12 out of 15 patients.

Discussion
This study evaluated the impact of guideline recommended Tg cut-offs in clinical management of DTC in a cohort of 413 DTC patients. The data show that, when applying these fixed cut-offs, a substantial proportion of DTC patients are classified differently dependent on which Tg assay is used. This implies that clinical management of these patients can differ dependent on which assay is used.
International guidelines advocate the use of both Tgon and rh-TSH-Tg measurements for determining the response to therapy throughout follow-up in patients who have initially been treated with thyroidectomy and RAI ablation. Patients are stratified as having an excellent, indeterminate or incomplete response based on fixed Tg cut-offs [8,9]. The guideline cut-offs for Tg are based on studies measuring Tg with a variety of methods, mostly with a FS of 1.0 μg/L [10][11][12][13][14]. Although between-method differences for Tg measurement have been clearly recognized, these differences were not addressed in the studies providing the guideline cut-offs [2]. Discrepancies between Tg assays relate, amongst others, to specificity differences in the antibody reagents of the assay, heterogeneity in tumor derived Tg and interference of TgAb [2,16,17]. Even after standardization against the same reference standard a two-fold between-method difference in Tg concentrations has been reported [2,3]. Despite these differences, international guidelines propose fixed cut-offs [8,9].
When applying the most clinically relevant Tg cut-off of 1.0 μg/L, discrepancy in staging was observed between the Immulite and Kryptor assays in 6.0% and 4.3% of patients for Tg-on and rh-TSH-Tg, respectively. Disagreement on response to therapy was observed in 16.2% of patients. The observed discordance on response to therapy classification is caused by between-method slope bias, differences in FS and disagreement on TgAb positivity. The analytical bias, which can be positive or negative, may add to Tg, thyroglobulin; TgAb, Tg antibodies; PTC, papillary thyroid cancer.
van Kinschot et al.: Tg and Tg antibodies: assay-dependent management consequences in DTC decreased concordance between-methods as demonstrated in a study of Ross et al. [18] The analytical bias for the studied assays was assessed, and was 35%, with Immulite generating higher results. The negative analytical bias for the Kryptor assay was in line with previously published data from our group [19]. Due to the lower FS of Kryptor (0.15 μg/L compared to 0.9 μg/L for Immulite) more patients will be classified as having an indeterminate response with Kryptor, at the cost of the number of patients with an excellent response. In 39 of these patients an Immulite rh-TSH-Tg was performed before the study period, but a Kryptor rh-TSH-Tg measurement was not available. Based on the institutionally established performance of Kryptor Tg-on as a predictor of Kryptor rh-TSH-Tg (positive predictive value of 68%, see Supplementary Figure S2), one would expect that 32% of these patients will have a Kryptor rh-TSH-Tg below the cut-off of 1.0 μg/L. This implicates that, if a Kryptor rh-TSH-Tg would have been performed, an estimated number of 12 of these 39 patients would have remained in the excellent response category instead of being reclassified from excellent to indeterminate response. Therefore, even after correction for the missing Kryptor rh-TSH-Tg measurements, the use of the Kryptor assay will result in a larger proportion of patients with an indeterminate response to therapy classification. These patients will be exposed to more intensive follow-up, longer TSH suppressive therapy and additional therapies.
The clinical significance of detectable Tg levels in the low range (0.2-1.0 μg/L) is not clear. The origin might be thyroid remnant, (irradiated) tumor tissue or reflect microscopic tumor foci that will eventually result in clinically apparent disease recurrence. To minimize Tg produced by thyroid remnant, measurements taken in the first six months after the first RAI ablation were excluded from the analysis. Assays with lower FS have been reported to have higher sensitivity for predicting recurrent disease compared to assays with higher FS although it is unclear if long term outcomes are influenced [3,20,21]. However, the increased sensitivity of these assays is at the cost of specificity. Multiple studies have shown that measurable Tg levels up to 1.0 μg/L spontaneously convert into undetectable Tg in the majority of patients without further treatment, including in patients with high risk for recurrence [22][23][24][25][26]. Castagna et al. showed that only rh-TSH-Tg measurements could distinguish patients with persistent or biochemical disease from patients free of disease but others have argued that hs-Tg assays can replace rh-TSH-Tg measurements [27,28]. In our cohort 3 out of 70 rhTSH-Tg measurements showed discrepancy between the assays when the cut-off of 1.0 μg/L was applied. The comparability between Tg measurements is also compromised by interfering TgAb. Immunoassays are not able to detect TgAb-bound Tg which may result in underestimation of Tg concentrations [29][30][31]. In our cohort, Tg concentrations were significantly lower in TgAb+ samples compared to TgAb− samples. As the presence of TgAb is associated with a higher risk of recurrence, guidelines recommend to measure TgAb simultaneously with Tg and monitor TgAb+ patients more frequently.
In this study, disagreement on TgAb positivity between the assays was observed in 11.4% of patients. In these patients Tg would be judged as a reliable marker with one assay, but not with the other, which has an obvious impact on the follow-up strategy. TgAb+ patients have often been excluded in the studies providing the cut-offs for Tg concentration included in guidelines [10][11][12]14]. As determination of TgAb positivity is TgAb assay-dependent, patients with present but undetectable TgAb with the specific assays used, have possibly been included in these analyses, potentially influencing the established cut-off for Tg [32].
Between-method discordance for TgAb measurement is caused by heterogeneity of endogenous TgAb, differences in sensitivity of the TgAb method and the (arbitrary) selected cut-off for TgAb positivity [7,30,33]. If liquid chromatography-tandem mass spectrometry (LC-MS/MS), which is considered not to be influenced by interfering TgAb, is used as a reference, four tested TgAb immunoassays (including Immulite) missed 16-54% of interfering TgAb [31]. In addition, the concentration of TgAb is not linearly associated with interference and undetectable TgAb does not rule out interference [30,31,34]. The clinical impact of interfering antibodies remains unclear. A recent study showed that in patients with evidence of structural disease, TgAb interference rarely resulted in undetectable Tg levels [35]. However, low volume recurrent disease can potentially be missed. As Tg measurement with LC-MS/MS does not suffer from TgAb interference, this method is a promising candidate for reliable Tg measurements. Currently, the inferior FS compared to immunoassays and time consuming procedures hinder implementation of LC-MS/MS in clinical care [28].
TgAb interference can also be qualitatively investigated with a recovery method. In our study a recovery test was performed on the Kryptor platform. Median recovery values were significantly lower in TgAb+ samples. However, also in measurements with a normal recovery value according to the manufacturers' cut-off, median Tg concentration were lower in TgAb+ measurements. This suggests that a normal recovery does not rule out TgAb interference. When applying the cut-off for disturbed recovery (<80%), only 2.6% of samples were identified as such. In 5.5% of measurements recovery was >120%, most likely caused by interfering heterophilic antibodies (HAbs), which have a reported prevalence of 1.5-3% [36].
Recovery values were investigated in the subcategory of patients with structural disease and undetectable Tg with one or both assays (assuming false negative results due to TgAb interference in at least a proportion of patients). Only one out of 15 samples showed a recovery <80% and a very high concentration of TgAb (>3,000 U/mL) was detected in that specific sample. Although Immulite failed to detect Tg in all these patients, Kryptor Tg was detectable in 12 out of 15 patients. This illustrates the increased diagnostic accuracy of a high-sensitive assay and that a recovery test does not aid in the discrimination of patients with unreliable Tg concentrations due to TgAb interference. Our study confirmed previous studies reporting a limited value of recovery measurement in determining TgAb interference [17,35,37] and is in agreement with current guidelines advocating against the use of the recovery method [4,8,38].
The results of this study show that comparability between Tg and TgAb assays is affected by the use of fixed cut-offs, which may be ameliorated by the application of assay-specific cut-offs. Trimboli et al. showed that assayspecific cut-offs increases comparability in clinical performance [39]. According to a study of Giovanella et al. these cut-offs should be further tailored based on TgAb status. Lower Tg cut-offs should be applied in TgAb+ patients compared to TgAb− patients [40].
Our study has its strengths and limitations. The strength of this study is the simultaneous measurement of Tg and TgAb with two distinct assays in a large cohort of DTC patients. As all samples were measured directly after the blood was drawn from patients, bias due to freeze-thaw cycles or storage was prevented. Assays were not only compared mutually, but the impact on response to therapy classification was also investigated, contributing to assessing the clinical relevance of the between-method variations observed. We would like to mention a number of limitations. First, although the percentage of patients with positive TgAb and with disease recurrence was as expected and in line with the literature, the total numbers of these patients where relatively small. Second, different methods were used for establishing the cut-off for TgAb positivity for the included TgAb assays. Originally, TgAb were used as a marker for thyroid auto-immunity and the reference values were aimed at distinguishing patients with and without thyroid auto-immunity with adequate specificity [41]. At the time of introduction of the Immulite TgAb assay into our hospital, only the upper reference limit based on detecting auto-immunity was provided by the manufacturer. With the increasing use of Tg as a follow-up marker in DTC patients, it became clear that any level of TgAb can potentially interfere with Tg measurement. Therefore, an institutionally determined upper reference limit for the Immulite TgAb assay was used in the follow-up of DTC patients in our hospital. This cut-off was based on the level at which clinically relevant interference of Tg measurement was observed. For the Kryptor TgAb assay the functional sensitivity of the assay was used as the cut-off, in line with recommendations in current guidelines [4]. Evolving insights into the appropriate cut-off for TgAb positivity limits a full comparison of the studied assays. However, our study represents real-life observational data which may be relevant for daily clinical practice. Third, we only compared two assays, limiting the applicability to other Tg/TgAb assays.
Our study indicates that classification and management of a substantial subset of DTC patients is affected, dependent on which Tg and TgAb assays are used, should fixed Tg cut-offs be applied. Therefore, the recommended Tg cut-offs in guidelines should be used with wisdom rather than as fixed cut-offs.
Research funding: None declared. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.