Long-Term Effects of Psychological Interventions to Improve Adherence to Antiretroviral Treatment in HIV-Infected Persons: A Systematic Review and Meta-Analysis

We examined the efﬁcacy of psychological adherence-enhancing interventions (AEIs) compared with usual care in HIV-infected adults under antiretroviral treatment (ART) by focusing on adherence and clinical HIV markers as outcomes in the short term and long term. We searched relevant databases for controlled studies that compared psychological AEIs with usual care. We included 31 comparisons from 27 individual studies in our meta-analyses. Psychological AEIs were signiﬁcantly superior to usual care in improving adherence [standardized mean difference (SMD) 0.30, 95% CI 0.20–0.40] and reducing HIV viral load (SMD 0.15, 0.07–0.23) at the end of treatment. At the last follow-up, we found no difference between psychological AEIs and usual care, neither on adherence (SMD 0.07, - 0.11–0.24) nor on clinical markers (SMD 0.06, - 0.03–0.15). After excluding outliers from the analyses, between-study heterogeneity was small, and we did not identify any relevant moderators of intervention effects. In summary, psychological AEIs may signiﬁcantly improve ART adherence and HIV viral load compared with usual care in the short term, but fail to be superior in achieving long-lasting improvements on ART adherence and clinical HIV markers as compared with usual care. Owing to limited quality and the majority of studies being conducted in the United States or Europe, our results have to be interpreted with caution, and are most relevant to the United States and Europe. The consistently reported difﬁculties to achieve sustained ART adherence improvements in previous and the present meta-analyses highlight the need to focus on maintaining ART adherence improvements in future research.


Introduction
A dherence to the prescribed antiretroviral treatment (ART) has been described as the ''critical link between a prescribed regiment and treatment outcome.'' 1 Nevertheless, inadequate medication adherence is prevalent, also in HIVinfected persons. 2 Since the emergence of ART, HIV was to be considered a potentially chronic rather than uniformly fatal disease. 3 Initially, sustained high levels of ART adherence were required to maintain viral suppression, 4 and to enable the reconstitution of CD4 cells, which, in turn, reduce the risk of opportunistic infections. HIV viral load, which exceeds a certain threshold, has significant implications on both morbidity and mortality, 5 and is associated with the transmission of the virus to uninfected persons. 6 But, due to a higher potency and longer half-life of modern ART, 7 new ART regimens are assumed to allow a certain level of nonadherence while still maintaining suppression of viral load. 8,9 Poor ART adherence may be rooted in a large number of reasons, which range from simple unintended forgetting to rather intentional and active decisions not to take the prescribed medication. Reasons for intentional nonadherence may include negative concerns, fear for stigmatization, beliefs, and preferences, but also side effects and toxicity. 1,[10][11][12] In addition, polypharmacy and a high daily pill burden have been shown to decrease ART adherence, pointing to the importance of single tablet antiretroviral regimen in improving ART adherence. 13 Accordingly, medication adherence may be seen as part of a learning cycle, in which medication adherence, experience with the medication, and beliefs about the medication mutually impact each other. 14 It is important to note, therefore, that ART nonadherence and adherence are subjectively motivated, 11 and adherence-enhancing interventions (AEIs) should thus not only include simple reminders but should rather address the multi-factorial psychological nature of nonadherence.
Given the observation of both intentional and unintentional reasons for ART nonadherence, psychological AEIs seem particularly promising in improving ART adherence in the long term, because they rely on behavior change theories (e.g., cognitive behavioral model or information-motivationbehavioral skills model), 15 and address individual adherence barriers and patient-level motivation. 16 This is in contrast to more directive or less complex AEIs such as directly observed therapy (DOT) and simple reminders (e.g., alarm clocks), which may be helpful in cases of unintentional nonadherence, but do not have the potential to change motivated constraints in persons who do not adhere to the prescribed ART regimen intentionally.
Previous meta-analyses mostly included a variety of AEI strategies, with and without an underlying psychological theory of behavior change [e.g., cognitive behavioral treatment (CBT) as well as alarm devices]. [17][18][19] For example, a recent network meta-analysis on different types of AEIs (e.g., device reminders, telephone calls, short message services, and cognitive behavioral therapy) found only little evidence for the superiority of any kind of AEIs over usual care in achieving sustainable ART adherence improvements. The findings confirm previous conclusions that maintaining longterm ART adherence is a challenge. [18][19][20][21][22] Unlike most previous systematic reviews and metaanalyses, we restricted our meta-analysis to examine whether complex psychological AEIs, including interactive discussions of cognitions, motivations, and expectations, are successful in increasing ART adherence as primary outcome, and clinical markers as secondary outcomes, particularly in the long term. With this meta-analysis, we aimed to complement the recently published comprehensive network meta-analysis, 19 as we did not restrict the psychological AEIs to cognitive behavioral treatment approaches, but rather included additional studies that implemented AEIs that were based on any psychological theory of behavior change (e.g., the social-cognitive theory of behavior change), 23 and we preferred continuous over dichotomous outcome reports. 24 We included previously identified moderators in our analyses and assessed the quality of the included studies.

Search strategy and selection criteria
We searched the electronic databases PubMed, Embase, CINAHL, Cochrane Central, Web of Science, and BIOSIS for controlled studies that compared the effectiveness of a psychological AEI with usual care in HIV-infected persons on ART, and that were published as journal articles until July 3, 2017 in German or English (see Supplement A). We also checked the reference lists of relevant systematic reviews and meta-analyses. 17,18,25 Duplicates were eliminated in End-Note (EndNote X3; Thomson Reuters). We did not restrict included studies to RCTs, neither did we restrict studies to those focusing on patients with adherence problems. We included studies that assessed both medication adherence and clinical markers as outcomes. Psychological AEIs qualified for inclusion if they were based on psychological principles and implemented individualized strategies to improve ART adherence. We excluded studies that did not implement any individualized psychological counseling, but rather evaluated a particular form of ART delivery (i.e., DOT) as well as interventions that used electronic devices only (e.g., alarm clocks).
One researcher (C.L.) screened titles and abstracts of the retrieved records and excluded clearly irrelevant references. Two researchers (H.G. and C.L.) then independently reviewed the full text of potentially relevant publications. Ambiguities were resolved by consensus between the two researchers. Study authors were contacted in case that a particular publication was not available through the university libraries.

Data extraction
All study data were extracted in duplicate (H.G. and C.L.) on a standardized form (Microsoft Office Excel 2011) after intensive training in using the manual with operational descriptions of each item. Disagreements were resolved by consensus between the two investigators. We extracted means (Ms) and standard deviations (SDs) for continuous outcomes. If SDs were not provided, we calculated them from standard errors (SEs), confidence intervals (CIs), or other measures. 26,27 If we were unable to calculate SDs, we imputed them by the mean of SDs that were based on the same outcome measure. 28 If n was missing in the table of analysis, we used the n of the descriptive statistics.
In the absence of continuous outcome data, we extracted the number of persons who fulfilled a certain criterion (e.g., viral suppression) and the number of persons who did not fulfill the respective criterion. We extracted the authors' definition of ''success'' as well as the time points of assessments. We planned to extract the effect size (ES) provided by the study authors only if no other information was available for ES calculation. We recorded characteristics of the intervention, usual care, the patient sample, and the study. We assessed risk of bias using the Cochrane risk of bias tool. 29 With respect to the intervention, first we recorded the number of sessions that aimed at adherence improvement in the treatment condition, second we coded the presence of booster sessions, third we recorded the theoretical behavior change underpinning of the intervention, fourth we coded the content of the intervention (i.e., the number of components among education, counseling, cognitive behavioral components, motivational interviewing, or other psychological treatment components; range, 1 to 5), and fifth we coded the intensity of the intervention (i.e., the number of modalities among psychological, behavioral, or external reminder strategies; range, 1 to 3). Interventions with more components and more modalities were considered more complex or more intense, respectively. Further, we coded characteristics of the usual care condition: first we coded whether the authors 132 LOCHER ET AL.
reported that the importance of medication adherence was explicitly mentioned in the usual care condition, and second we coded whether the control condition received some additional social support (e.g., additional health counseling) as adjunct to the usual care. Then, we coded whether the patient sample was described as restricted with respect to certain characteristics besides being infected with HIV (e.g., inclusion of >80% men who have sex with other men, or >80% of injection drug abusers). We coded additional aspects of the study: first we recorded baseline differences between the groups on all outcomes, as a random allocation was not a prerequisite of study inclusion. Second we recorded the year of publication and assessed the quality of each study following the recommendations of the Cochrane Handbook for Systematic Reviews. 29 If information regarding a certain item was not reported, we assumed the item was not fulfilled.
We calculated a quality score, which reflects the number of quality items that were fulfilled (range, 1 to 7).

Outcome measures
Our prespecified primary outcome was ART adherence. If more than one adherence scale was used, we selected the most commonly used scale, according to a hierarchy that was specified before the beginning of the data extraction (i.e., electronic monitoring was the adherence measure with the highest priority, and unstandardized self-reports were given the lowest priority).
For our secondary outcomes (i.e., HIV viral load and CD4 cell count), we extracted continuous data (i.e., number of prescribed doses taken, total HIV viral load, and total CD4 cell counts) as well as dichotomous data [i.e., number of HIVinfected persons above or below a certain percentage of (a) doses taken, (b) HIV viral loads, or (c) CD4 cell counts, respectively], with a preference for the continuous data over the dichotomous data. We extracted post-treatment data (i.e., the first assessment after the end of treatment) and follow-up data (i.e., the last available assessment after the end of treatment). We preferred objective measures (i.e., electronic pill counts) over subjective measures (e.g., adherence questionnaires), as well as intention-to-treat data over data from treatment completers.

Effect sizes
For the continuous outcomes, we calculated the standardized mean difference (SMD) with small sample correction for each study 26 as ES of differences between the two intervention groups. A positive SMD indicates a beneficial effect of the psychological AEI as compared with usual care. The magnitude of ESs was interpreted as small, moderate, or large, with 0.20, 0.50, and 0.80 SD units, respectively. 30 Corresponding SEs were calculated for all ES indicators. If no continuous data was available, we calculated odds ratios as ES between groups 26 and transformed them into SMDs according to the recommendations in the Cochrane Handbook of Systematic Reviews. 31 (Separate results for dichotomous and continuous outcomes can be found in Supplement B). For both ART adherence and clinical markers, we calculated short-term and long-term effects (i.e., end of treatment and longest follow-up available).
If more than one AEI was used in a study and the interventions differed with respect to the psychological content, we included both interventions and divided the number of HIV-infected persons in the usual care control accordingly. If two AEIs were used in a study that differed only with respect to dose or other nonpsychological characteristics, we combined the statistical data necessary for ES calculation according to the recommendations in the Cochrane Handbook of Systematic Reviews. 27

Statistical analysis
We applied a random-effects model using the method of DerSimonian and Laird, with the estimate of heterogeneity being taken from the Mantel-Haenszel model. 32 Further, we conducted random-effects meta-regressions using a residual maximum likelihood to estimate the additive (between-study) component of variance s 2 . SEs were calculated using a method developed by Knapp, Hartung and tested for statistical significance using the t distribution. Using this procedure in metaregression reduces false positive rates compared with z tests. 33 We assumed statistical significance if p was <0.05.
First, we calculated an unconditioned model without any predictors based on the SMD to establish the overall relative effect between psychological AEI and usual care on primary and secondary outcomes at the end of treatment and at the last available follow-up.
We conducted a sensitivity analysis of the meta-analyses at the end of treatment and at follow-up, excluding the studies in which we had imputed the SDs, and then we performed outlier analyses by drawing Galbraith plots. We conducted a sensitivity analysis excluding the studies from the analyses if they were clearly identified as outliers with a potential to bias the overall estimate (i.e., if the Galbraith plots indicated a study as outlier and the CI of that study ES did not overlap with the CI of the overall estimate).
Next, we controlled for the impact of potential moderators on the primary outcome ART adherence. For this purpose, we conducted meta-regressions including (a) characteristics of the AEI, (b) characteristics of usual care, (c) characteristics of the patient sample, and (d) characteristics of the study. We entered the respective scores as single predictors in individual meta-regressions.
Finally, we explored the presence of a small sample bias and publication bias by assessing funnel plot asymmetry (i.e., whether studies with negative or nonsignificant results are missing) with a regression test. 34 Given the relatively large number of moderator analyses, we planned the Bonferroni correction of p-values that has been described as a conservative correction of multiple testing in case of significant moderators. 33 As the moderator analyses were planned to explain potentially occurring between-study heterogeneity by identifying characteristics of the studies that modified the observed SMDs, we planned to exclude outliers from the moderator analyses, if the between-study heterogeneity was considerably explained by the exclusion of the outliers.
To evaluate heterogeneity between studies, we examined s 2 , which is an estimate of the variance among true ESs. Higher s 2 values indicate greater variability between studies than would be expected by chance. Based on the definition of small, moderate, and large ES estimates according to Cohen, 30 we interpreted s 2 as follows: s 2 = (0.2/2) 2 = 0.01 was considered to represent low heterogeneity, s 2 = (0.5/2) 2 = 0.06 moderate heterogeneity, and s 2 = (0.8/2) 2 = 0.16 high heterogeneity between studies. In addition, we report I 2 ,

Patient involvement
Patients were not involved at any stage of the research process.

Results
We identified 4837 records, screened 4441 titles and abstracts after we had removed duplicates, and finally included 31 comparisons that were published in 27 reports with 5479 allocated participants.  The number of participants per study ranged between 42 and 600 with a median of 155.
The procedure of study selection, including reasons for exclusion after full-text review, is shown in Fig. 1. Studies were conducted in North America (21), South America (2), Europe (3), and Africa (1) and published between 2000 and 2016, with 2008 being the median year of publication. The mean age across the samples of the included studies ranged between 36.4 and 48.0 years. We identified 16 studies with a restricted population, that is, >80% of the study sample fulfilling a certain characteristic (Table 1). In 10 studies, the study sample consisted mostly of persons described as nonwhite, African American, or Hispanic, in 5 studies, drug dependence or a history of drug dependence was reported for >80% of the sample, in 6 studies, most of the participants were unemployed or had low income, 2 studies included mostly male, and 1 study included mostly female participants.
In 5 studies, CBT was applied, in 8 studies, motivational interviewing techniques were applied, and 14 studies refer to other psychological behavior change theories, including the information-motivation-behavioral skills model. The number of sessions ranged between 1 and 12 in usual care (median 4) and between 3 and 14 in the AEI (median 5). Time between baseline and end of treatment adherence assessments ranged between 0.9 and 18 with a median of 3.7 months for adherence assessments, a median of 5.8 for viral HIV viral load, and a median of 3 months for CD4 cell counts. The time between baseline and last follow-up ranged between 2.8 and 24.6 months with a median of 9 months for adherence assessments and CD4 cell counts, and a median of 12 months for HIV viral loads (Table 1).
Three studies fulfilled all seven quality criteria and two studies fulfilled only one criterion (median 4). Two studies did not use random allocation of participants to interventions, whereas 25 studies used random allocation of participants to interventions. The number of study quality criteria that were fulfilled in the studies significantly increased over the years (r = 0.57, p = 0.0008). The study quality of each individual study can be found in Supplement C. Figure 2 shows an overview on the time course of the continuous outcome data (i.e., ART doses taken, log HIV viral loads, and CD4 cell counts).

Effectiveness of psychological AEIs
Primary outcome ART adherence. The initial metaanalysis showed a significant small to moderate superiority of psychological AEIs over usual care at the end of treatment (Table 2), with moderate heterogeneity (s 2 = 0.04) and a significant Egger test ( p = 0.009). At the last follow-up, superiority of psychological AEIs over usual care was small and not statistically significant. The ES differed between individual studies as indicated by moderate to large heterogeneity (s 2 = 0.09). However, the Egger test was not significant. Our outlier analyses identified relevant outliers.
The exclusion of four outliers in the analyses at the end of treatment reduced the ESs only slightly ( Table 2), but heterogeneity was explained in this analysis (s 2 < 0.001), and the Egger test was not significant anymore ( p = 0.835). A similar, however, less pronounced pattern was found in an analysis excluding one outlier using the follow-up data: a slight reduction of the ES, but a considerable reduction of between-study heterogeneity (s 2 = 0.008). Thus, excluding the outliers seemed reasonable to us for the exploration of potential moderators.
Meta-regressions did not identify significant moderators of intervention effects neither for the analyses at the end of treatment nor for the analyses at last follow-up (Table 3).
Secondary outcome clinical markers. We found a small and significant superiority of psychological AEIs over usual care on HIV viral load at the end of treatment (Table 2), with low between-study heterogeneity (s 2 = 0.006) and a nonsignificant Egger test. At the last follow-up, superiority of psychological AEIs over usual care was small and not statistically significant, with very low heterogeneity and a nonsignificant Egger test. The exclusion of two outliers at the end of treatment showed comparable ESs (Table 2), and between-study heterogeneity was explained in this analysis.
We found no superiority of psychological AEIs over usual care on CD4 cell counts neither at the end of treatment nor at follow-up (Table 2). Between-study heterogeneity was moderate to small and the Egger tests were not significant. We did not identify outliers in these meta-analyses.
The sensitivity analyses excluding the studies for which we had to impute missing SDs did not alter the results (Supplement B).

Discussion
We identified 27 studies that evaluated the efficacy of theory-based psychological AEIs as compared with usual care. Our findings indicate that there is a significant superiority of psychological AEIs on ART adherence and on HIV viral load compared with usual care directly after termination of the AEI. However, there was a lack of long-term effects on ART adherence and clinical markers when we looked at the data of the last available follow-up assessment. It is important to note that only 3 out of 27 studies were conducted outside the United States or Europe with the large majority (21 studies) being conducted in the United States. Therefore, our results and conclusions can be considered most relevant to the United States and Europe.
The quality of the included studies was unsatisfactory for a large number of included studies with only three study reports fulfilling all seven assessed quality indicators. The number of fulfilled quality criteria was not associated with intervention effects, however ( were large enough to detect a moderate AEI superiority (ES = 0.50 under the assumption of a power of 0.80 and a two-tailed p = 0.05). In contrast to previous meta-analyses, we focused our meta-analysis on studies using theoretically founded psychological AEIs, but did not restrict psychological AEIs to cognitive behavioral treatment approaches as done before. 19 With this approach, we included 11 additional studies, which were not part of the recently published network meta-analysis. 19 In a moderator analysis, we found no significant differences between studies that used cognitive behavioral AEIs and AEIs based on other psychological behavior change theory (including motivational interviewing for instance). The lack of statistical heterogeneity indicates that despite the considerable variation in the conduct of the individual studies (e.g., in adherence measurements or time points of assessment), the outcomes of the obviously diverse psychological interventions are similar enough to warrant the aggregation of the identified studies with meta-analytic procedures. Accordingly, our moderator analyses revealed no statistically significant moderators.
In line with our findings, previous meta-analyses also did not identify consistent patterns of moderators. 18,25 However, with respect to the content and quality of usual care, previous meta-analyses found that the superiority of AEIs over usual care to be smaller with increasing levels of usual care. 63,64 As we used just two characteristics of usual care as potential moderators in our meta-regressions, we cannot rule out that other characteristics of the implementation of usual care may impact outcome.
When looking at the long-term outcome data, we did not find a significant difference in absolute adherence between AEI and usual care at the last follow-up, which was due to a return to the baseline adherence level in the AEI group after the end of treatment (Fig. 2). Hence, the argument that simply being part of a clinical study might have improved the effectiveness of usual care cannot explain the observed nonsuperiority of the psychological AEIs over usual care in the long term.
Our observations confirm previous findings, 18,[20][21][22] and are well in line with the notion that different factors may influence the initiation and maintenance of behavior change 65not least due to the trade-off between short-term costs and long-term benefits. 66 In contrast, as most of the included studies were conducted in the United States and Europe, a general high quality of usual care in the context of HIV treatments in these countries may explain the lack of a superiority of the psychological AEIs over usual care in the long term. Thus, the lack of long-term superiority of psychological AEIs over usual care might be seen as questioning the value of implementing such time-intensive interventions to improve ART adherence in clinical practice (at least in Europe and the United States), as mirrored by the conclusion of a previous meta-analysis, which ''clinical practice may be best served by implementing current best practice.'' 64 Accordingly, we claim to carefully weigh the potential benefits of psychological AEIs against their expenditure in terms of time and money.
Our meta-analysis has several strengths. First, we restricted the included AEIs to theoretically funded psychological AEIs, and identified a study pool of 27 studies with        AEI, adherence-enhancing intervention; ART, antiretroviral treatment; CI, confidence interval; I 2 , percentage of overall heterogeneity that is due to variation of the true effects; k, number of included comparisons; SMD, standardized mean difference; s 2 , variability between studies.
Italics indicate results from the analyses excluding outliers. These analyses showed no statistical heterogeneity and no risk for publication bias, and are, therefore, considered most valid.
little heterogeneity between estimates of individual studies. Second, we assessed medication adherence as well as clinical outcomes. Third, we focused on long-term outcomes in addition to intervention effects immediately after treatment termination. Fourth, we assessed the potentially moderating role of several characteristics of the study, interventions, control treatments, and the patient sample.
Our analyses, however, have several limitations as well. First, for adherence as well as for the clinical outcome data, we chose to combine data that were based on diverse measurements (e.g., electronic pill count and adherence questionnaires), continuous and dichotomous data, as well as intention-to-treat and data from treatment completers. Although this procedure appears to introduce variation, the indicators of statistical heterogeneity do not indicate more variation than would be expected by chance, after the exclusion of individual outlier studies, in most analyses.
Second, the number of studies that reported long-term data was limited. Thus, the power to detect relevant differences between intervention and control groups was limited. However, the inspection of the course of outcomes over time looking at the available continuous data (Fig. 2) indicates rather small differences, even in the case of significant findings in our analyses.
Third, the generalizability of our findings has to be considered thoroughly. As most of the included studies were conducted in the United States, it remains unclear whether the observed patterns may be generalized to other geographic regions. With respect to the included participants, we found a variety of restricted participant samples (Table 1), but no differences between studies that included nonrestricted and restricted study samples in a moderator analysis. However, it remains unclear whether our findings may be generalized to HIV-infected persons with other characteristics.
Fourth, recent studies reveal that the incidence of neurocognitive impairment 67,68 as well as psychosocial variables (e.g., depression and use of stimulants) 69 are associated with poor ART adherence in HIV-infected persons. We did not exclude studies on HIV-infected persons with comorbid psy-chiatric disorders (e.g., depression) or HIV-associated neuropsychiatric problems. But the studies included in this metaanalysis did not specifically focus on such mental health issues, and in none of the studies >80% of the sample fulfilled criteria for such comorbid problems (see Table 1 for further details).
Last, as only one out of six studies that used booster sessions reported long-term outcome data, we cannot draw conclusions regarding the incremental value of booster sessions 70 on longterm efficacy of psychological AEIs to date. This and the incremental value of other recent developments, such as ongoing adherence check-ins 71,72 or mobile health interventions, 73 which were not within the scope of the present meta-analysis, need to be addressed in future research.
Therefore, future research should (a) include more longterm follow-up assessments, as the identified data base was unsatisfactorily small; (b) increase the quality of research to allow more valid conclusions; and (c) investigate what differentiates AEIs (including usual care treatments) that are successful versus less successful in improving ART adherence and, in turn, clinical HIV markers (e.g., investigating potential long-term benefits of booster sessions). To enhance the comparability between individual studies, future research should rather report continuous data instead of, or at least in addition to, dichotomous data that are based on ever-changing cutoffs. Further, the measurement of adherence as well as the most relevant time points of assessment should be standardized.
To conclude, given the repeatedly reported difficulties to identify interventions that improve ART adherence over and above the improvements seen in usual care in the long term, it seems most relevant to focus the attention on long-term benefits with respect to sustained medication adherence. However, due to the large heterogeneity in usual care services across countries, our results should be interpreted with caution, when translating them to countries other than the United States and in Europe. Christoph Werner, Cora Wagner, Sarah Bürgler, Dilan Sezer, and Linda Kost for their assistance with editing the article. Further, we thank Alice Graser and Joe Kossowsky for conducting a pilot study and Joe Kossowsky for his helpful comments on a previous version of the article. Investigators: C.L. is a research fellow in clinical psychology with expertise in meta-analytic and experimental research mainly in the fields of antidepressants and placebo research. M.M. holds a PhD in pharmaceutical sciences. He has previously published on adherence measurement and AEIs. J.G. is a full professor in clinical psychology and psychotherapy and an accredited psychotherapist with a preference for humanistic psychotherapy. He has previously published on HIV, placebo, and psychotherapy. H.G. is a senior research fellow in clinical psychology and psychotherapy. She has been trained in solutionfocused brief therapy. She has expertise in meta-research and has published meta-analyses in various areas of health interventions as well as on the quality of meta-research. The authors share research interests on the impact of contextual and extratherapeutic factors on intervention effects. Using constructive feedback loops during the analytic process, the authors worked on minimizing the potential dominance of either professional background on the study results. , and H.G. critically revised the article and gave important intellectual contribution to it. All authors had full access to all of the data in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis. H.G. is the guarantor.

Author Disclosure Statement
All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare no financial relationships with any organizations that might have an interest in the submitted work in the previous 3 years.

Data Sharing
Supplementary Data include the search strategy, descriptive information as well as ESs for each included study, and results from Egger's test and sensitivity analyses for the individual meta-analyses.

Transparency Declaration
H.G. affirms that this article is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

Supplementary Material
Supplement A Supplement B Supplement C