Sample size calculation for clinical trials analyzed with the meta‐analytic‐predictive approach

The meta‐analytic‐predictive (MAP) approach is a Bayesian method to incorporate historical controls in new trials that aims to increase the statistical power and reduce the required sample size. Here we investigate how to calculate the sample size of the new trial when historical data is available, and the MAP approach is used in the analysis. In previous applications of the MAP approach, the prior effective sample size (ESS) acted as a metric to quantify the number of subjects the historical information is worth. However, the validity of using the prior ESS in sample size calculation (i.e., reducing the number of randomized controls by the derived prior ESS) is questionable, because different approaches may yield different values for prior ESS. In this work, we propose a straightforward Monte Carlo approach to calculate the sample size that achieves the desired power in the new trial given available historical controls. To make full use of the available historical information to simulate the new trial data, the control parameters are not taken as a point estimate but sampled from the MAP prior. These sampled control parameters and the MAP prior based on the historical data are then used to derive the statistical power for the treatment effect and the resulting required sample size. The proposed sample size calculation approach is illustrated with real‐life data sets with different outcomes from three studies. The results show that this approach to calculating the required sample size for the MAP analysis is straightforward and generic.

• The approach to calculate the sample size of a new trial analyzed with the MAP approach is unclear.
What is new • The study proposed a simulation-based approach to calculate the sample size of a new trial analyzed with the MAP approach. • The proposed approach is straightforward and generic for the sample size calculation in the context of MAP analysis.
Potential impact for Research Synthesis Methods readers outside the authors' field • The proposed approach could also be used in the sample size calculation of trials/experiments in other fields, for example, education, psychology, to be analyzed with the MAP approach.

| INTRODUCTION
In the design and analysis of randomized clinical trials (RCT), it is possible that researchers incorporate information of control arms from previous trials focusing on the same health outcome. 1 Therefore, several dynamic borrowing methods that incorporate the historical controls in the design and analysis of the new trial have been proposed. 2 The meta-analytic-predictive (MAP) approach is one of the most prominent dynamic borrowing methods, and it has gained popularity in recent years. This approach assumes that control parameters, that is, parameters in the control arm, in different trials are exchangeable. Using this assumption, the predictive distribution for the control parameters in the new trial can be derived based on the meta-analytic methodology. 3 Although the MAP approach is based on meta-analytic methodology, it differs from regular meta-analysis in several ways. First, meta-analysis often focuses on the synthesis of treatment effect estimates from different studies, whereas the MAP approach in the setting of historical controls focuses on the incorporation of the control parameters. In the development of a novel treatment where information on the treatment effect is not available, historical controls are the only source of historical information. Therefore, in that context it is impossible to synthesize treatment effects of different studies via meta-analysis. Second, in meta-analysis, the posterior distribution of the population mean is of primary interest; the MAP approach takes a further step by using the posterior of the population mean to derive the predictive distribution for the control parameter in the new trial. The derived MAP prior is informative and can lead to more precise parameter estimates and higher statistical power. The higher statistical power should enable a reduction of the sample size of the new trial. Hence, it is of interest to determine how to accurately calculate the required sample size for the MAP analysis based on the available historical control arms.
In the context of the MAP analysis, Neuenschwander et al. proposed the prior effective sample size (ESS) for a control parameter (e.g., response rate, control mean) as a metric to represent the number of subjects the MAP prior is worth. 3 After deriving the prior ESS, the sample size of the control arm may be reduced by the prior ESS. [4][5][6] However, whether the planned statistical power is preserved as in a traditional sample size calculation (i.e., without historical data) using the prior ESS is unclear. In addition, there are also different definitions of the prior ESS to quantify the amount of information contained in a certain prior (not necessarily to be a MAP prior), 7,8 but whether the prior ESS is an appropriate metric to calculate the sample size of a new trial based on the MAP analysis is still questionable.
There are two different frameworks for the sample size calculation, namely the frequentist framework and the Bayesian framework. 9 In the frequentist framework, only single values of the parameters are used in the sample size calculation formula or the simulation study, whereas the Bayesian approach allows to take into account the uncertainty of the parameters. In the above two frameworks, the information for the effect size and the variability of the outcome could be derived from similar historical studies. 10 The MAP approach models heterogeneity between trials, leading to a wide range of possible parameter values for the control arm of the new trial. Therefore, we argue that the frequentist framework, which would ignore the uncertainty in the historical data, is not suitable for this specific situation. We thus investigate the best way to perform the sample size calculation with a Bayesian framework.
To properly account for the existing evidence in the sample size calculation of a new trial, Sutton et al. proposed a simulation-based approach to determine the sample size of a new trial based on the historical trial data. 11 Unlike previous sample size calculation methods, the approach determined the new trial sample size to achieve the desired power of an updated meta-analysis of the treatment effect that includes the upcoming trial and the historical data. Moreover, there was a study in designing future studies based on the result of a network meta-analysis. 12 In this paper, we propose an adaptation of Sutton et al.'s approach to fit in the context of the sample size calculation for the MAP analysis. The aim of our proposed method is to calculate the required sample size for a new trial for the situation that the data of the new trial will be analyzed in combination with data of historical controls using the MAP approach.
The remainder of the paper is organized as follows. Section 2 presents basic methodology of this study. Section 3 elaborates the proposed sample size calculation approach. Section 4 illustrates the proposed sample size calculation approach in motivating studies and interprets the results. Section 5 is the conclusion.

| BACKGROUND
In this section, the MAP approach is first introduced, and the prior ESS, a metric to quantify the amount of information in the MAP prior, is then introduced. Finally, information on frequentist and Bayesian sample size calculation frameworks in clinical trials are provided.

| The MAP approach
The MAP approach is a dynamic borrowing method that synthesizes the information from historical controls and derives the informative prior for the control parameters using the meta-analytic methodology. In other words, the MAP approach is in fact a random-effects metaanalysis of historical controls in combination with the prediction of the control parameter in the new trial. In the approach, control parameters from different trials are assumed to be exchangeable, with trial-specific parameters for the control arm assumed to originate from a common normal distribution. In the univariate Gaussian case, suppose that there are J comparable historical control arms, Y j denotes the estimate of the control parameter in the jth historical trial with standard error σ j Y j j θ j $ N θ j , σ 2 j , and where θ Ã is the underlying control parameter in the new control arm, μ denotes the grand (overall) mean of the study-specific parameters, and τ is the between-trial standard deviation of the study-specific means.
For known between-trial standard deviation, the posterior distribution of μ is j þτ 2 denotes the weight in the jth trial, and the posterior distribution for μ is of primary interest in the random effects meta-analysis.
In the MAP approach, we take a step further to derive the prior for the control parameter in the new trial, which is given by Note that τ needs to be estimated in practice, and different approaches have been proposed for the estimation, such as the method of moments, 13 fully Bayesian estimation. 3 The fully Bayesian method with a suitably chosen prior for τ is used in this study, and details can be found in Section 3.4. The core assumption of the MAP approach is that the control parameters are exchangeable across studies. To be able to make this assumption, the historical trials should be selected carefully, and their comparability with each other and with the new trial should be assessed on substantive grounds, for example by comparing study designs, treatments, inclusion criteria and outcome measures using criteria such as those proposed by Pocock. 14,15 A careful selection of the historical trials should allow the researcher to effectively rule out the possibility of clearly biased or highly heterogeneous controls. Nevertheless, even after a careful selection of the historical controls there may still be a risk of a priordata conflict where the new study deviates from all the historical studies. To account for this possibility Schmidli et al. proposed a robust MAP prior. 4 In the robust MAP, there is a robust component with a typically small weight (say 10%) and a vague prior for the control arm parameters.

| Prior effective sample size
The MAP approach produces an informative prior with the aim of improving the inference of the parameter of interest. Quantifying the amount of prior information contained in the prior distribution via prior ESS may help in the study design phase and in communication with clinicians. 7 The following uses of the prior ESS have been recognized: (1) avoiding the use of an excessively informative prior which dominates the data, (2) interpreting or reviewing the prior used in other Bayesian analyses or designs, (3) performing sensitivity analyses in terms of the prior's informativeness (by modifying the prior ESS value), 7 and (4) quantifying by how much the sample size of the current trial can be reduced by including historical information. [4][5][6]8 There have been several information-based approaches to calculate the prior ESS, including the variance/precision ratio (VR/PR) method, 3,16 the Morita-Thall-Muller (MTM) method, 7 and the expected local-information-ratio (ELIR) method. 8 Neuenschwander et al. presented a detailed overview of different versions of the prior ESS, 8 in which the above methods yielded the same result with conjugate priors in the single-parameter case, but led to surprisingly different results with nonconjugate priors.
In addition to different versions of prior ESS, Wiesenfarth and Calderazzo proposed an alternative sample size metric, that is, the effective current sample size (ECSS), which calculates the effective sample size of a prior distribution in the new trial accounting for the prior-data conflict. 17 The rationale of ECSS is that the aforementioned prior ESSs are independent of newly observed data, and do not capture "loss of information" led by potential prior-data conflict. In contrast, the ECSS is calculated accounting for the prior-data conflict, and varies with different levels of prior-data conflict.
Our primary aim is sample size calculation of RCTs based on the MAP approach, thus reducing the size of the control arm based on prior ESS (or ECSS) would appear to be a sensible choice. However, the prior ESS is a metric for only one control parameter, whereas the distribution of the control arm data may involve multiple parameters. Although one metric for prior ESS has been defined for two parameters, it only applies in limited scenarios. 18 Even if historical information of multiple parameters is included to derive multiple prior ESS values, it is not clear which ESS value should be used to reduce the sample size for the new control arm. Moreover, none of the available definitions of prior ESS and ECSS serves the ultimate goal of designing an RCT, that is, achieving the desired statistical power for the treatment effect. Therefore, a more straightforward sample size calculation approach is required for a new trial to be analyzed with the MAP approach taking the statistical power of the treatment effect into consideration.

| Sample size calculation in clinical trials
There are generally two sample size calculation frameworks in terms of the way the uncertainty for the parameters is handled, namely the frequentist framework and the Bayesian framework. 9 In the frequentist framework, only single values of the parameters are used in the sample size calculation, and either a formula or a Monte Carlo approach could be used depending on the availability of a closed-form solution. 19 A disadvantage of this framework is that the parameters required in the calculation are assumed to be known. 9 However, ignoring uncertainty of the parameters may underestimate the required sample size and lead to underpowered trials. 10 The Bayesian framework overcomes the aforementioned drawback by averaging over the prior distribution for the unknown parameters to obtain the unconditional probability that the trial will lead to a positive outcome (assurance), that is, R power θ ð Þp θ ð Þdθ. 20,21 Due to the lack of closed-form solution, Bayesian clinical trial simulation (BCTS) is often required. 20 In BCTS, parameter values are sampled from the priors and used to simulate the new trial data, the statistical power is calculated based on the simulated data, and the required sample size is then determined according to the power.

| METHOD
In this section, we give an introduction to the sample size calculation approach proposed by Sutton et al., and propose an adaptation of their method for sample size calculation in the context of the MAP approach.

| Sample size calculation in the context of meta-analysis
Sutton et al. proposed a sample size calculation approach in the context of meta-analysis using a variant of the aforementioned Bayesian framework. 11 Their motivation was that instead of designing the new trial to have sufficient power for the treatment effect in the new trial δ Ã ð Þ in isolation, sample size calculation should be done so that the overall mean of the treatment effect μ δ ð Þ in the updated meta-analysis including the new trial will have sufficient power. 11 Statistical power for the treatment effect was calculated as the proportion of simulations in which the null hypothesis is rejected, that is, the 95% credible interval does not cover zero. In this way, the sample size calculation is done in the context of designing a new trial to contribute to a larger body of evidence. They proposed a simulation-based approach to sample size planning based on the above perspective, and the algorithm is iterative with a large number of simulations N sim ð Þ per sample size until the desired power for μ δ is achieved. The algorithm is visualized in Figure 1.
The algorithm derives the prior distribution for the parameter of interest from the meta-analysis and does not only use a point estimate, which is a Bayesian feature. The prior for the treatment effect could be derived with either a fixed effect meta-analysis (pooling estimates from different trials) or a random effects metaanalysis (assuming study-specific effects). In the context of meta-analysis, the random effects meta-analysis is often preferred because it accounts for the between-study heterogeneity. The results can be different from the traditional sample size calculation based on point estimates.
However, the meta-analysis may include highly heterogeneous historical trials. Including highly diverse studies in a single meta-analysis is like "combining apples with oranges," and genuine differences in effects may be obscured and uninterpretable. 22,23 The weight of the ith trial in a random effects meta-analysis is 1 σ 2 i þτ 2 , the weight of a large new trial is bounded by 1 τ 2 with a considerable level of between-study heterogeneity, and the contribution of a new trial to the existing evidence is limited despite of its large size. In an example of Sutton et al., the sample size per arm based on a random effects metaanalysis for the proportion comparison between two groups was larger than 5 Â 10 4 with a power less than 5%, and even five trials would be conducted in the future, more than 2 Â 10 4 subjects was required per arm per trial to achieve a 80% power for the updated meta-analysis. 11 Thus calculating the sample size based on an updated random effects meta-analysis for highly heterogeneous trials may lead to impractically large sample size.
Therefore, we propose an adaptation to Sutton et al's approach in a new setting, that is, focusing on the power of the treatment effect in the new trial estimated with the MAP approach. The comparison between the metaanalysis and the MAP approach is presented in the following subsection.

| Comparison between meta-analysis and the MAP approach
Meta-analysis and the MAP approach share the characteristics that they are both based on meta-analytic methodology, and the implementation of both approaches requires the availability of historical RCTs in advance.
The MAP approach differs from meta-analysis in terms of (1) the data and (2) the parameter of interest. The MAP approach does not require the treatment arm of all historical trials to be the same, instead it requires the treatment in the control arm to be the same and only historical controls are used. With regard to the parameter of interest, meta-analysis estimates the overall mean of the treatment effect, whereas the MAP approach derives the informative prior for the control parameter.

| Sample size calculation for the MAP approach
In the proposed approach, the focus is on the subpopulation in which the new trial will be conducted, and the parameter of interest is the treatment effect in the new trial.
The prior for the control parameter in the new trial is derived based on the MAP approach, that is, the control parameter is predicted using the random effects meta-analysis. Because historical controls do not provide information on the treatment effect, the treatment effect needs to be prespecified based on external information, for example, expert opinion or the minimum clinically important difference. 24 The power is derived based on the hypothesis test of the treatment effect in the analysis of the new trial combining historical controls, that is, the meta-analytic-combined (MAC) analysis. 4 Moreover, in addition to the balanced design in Sutton et al.'s approach, an unbalanced design in the proposed approach is also possible because only information of historical controls are considered.
Suppose that θ Ã 0 and θ Ã 1 are the control mean and the treatment mean in the new trial respectively, and δ Ã ¼ θ Ã 1 À θ Ã 0 is the treatment effect in the new trial. The required sample size for the balanced design of the proposed approach is chosen as the smallest integer n 1 ð Þ such that where power MAP : ð Þ is the statistical power of the MAP analysis, p θ Ã 0 À Á is the MAP prior for θ Ã 0 , β is the desired type II error rate, which is usually 10% or 20%. Details of the algorithm are shown in Algorithm 1. Note that Step 3 is implemented using the MAC approach. The twostage MAP analysis (i.e., first specifying the MAP prior and then using this prior to analyze the new trial data) is equivalent to the one-stage MAC analysis. 4 Unlike Algorithm 1, traditional BCTS often uses an informative prior (e.g., the MAP prior) as the sampling prior in the simulation of the new trial data whereas a noninformative prior is used in the analysis of the simulated new trial data as the fitting prior. In this case, the required sample size is chosen as the smallest integer n 0 ð Þ such that where power NB : ð Þ is the statistical power of the analysis of the new trial data with a noninformative fitting prior.
In the design of clinical trials using dynamic borrowing methods, it is possible that the treatment arm keeps the required sample size n 0 ð Þ while the sample size of the control arm could be reduced. 1 As a result, more trial resources could be devoted to the novel treatment with the desired power retained. Therefore, we could modify Algorithm 1 to reduce the sample size only for the new control arm, which leads to an unbalanced trial.
The required sample size for the control arm with a treatment arm of size n 0 in the unbalanced design is the smallest integer n 2 that fulfills Details of the algorithm with an unbalanced design with more subjects in the treatment arm is shown in Algorithm 2. Note that various combinations of n 0 and n 2 may all achieve the same desired statistical power, and that it is possible to search for the pair of sample sizes simultaneously. However, a balanced trial with n 0 subjects per arm leads to the smallest sample size to achieve the desired power if no historical information is incorporated in the analysis. Therefore, fixing the sample size of the new treatment arm as n 0 and substituting some of subjects (i.e., n 0 À n 2 ) in the new control arm with historical information should lead to the pair with the smallest total sample size, n 0 þ n 2 . The sample size n 0 represents the required sample size without incorporating historical information in the analysis. Although no prior information is used in the analysis of Step 3 of Algorithm 2, we still prefer to use the available historical information in Step 2 of the algorithm by sampling a value from the MAP prior p θ Ã 0 À Á .

ALGORITHM 1 SAMPLE SIZE CALCULATION ALGORITHM FOR THE MAP APPROACH FOR A TWO-ARMED BALANCED TRIAL
1: Implement the MAP approach to the historical data to derive the prior for the control mean p θ Ã 0 À Á . 2: Sample a value from p θ Ã 0 À Á and choose an initial value of the sample size per arm. Use the sampled value to generate a realization of the new control arm. To generate a realization of the new treatment arm, use the sampled value in combination with a prespecified value δ Ã of the treatment effect. 3: Apply a MAC analysis to the simulated new trial data by incorporating the available historical controls, and conduct a hypothesis test of δ Ã . 4: Repeat Steps 2 through 3 a large number of times N sim ð Þ, and calculate the power as the proportion of the N sim simulations where the null hypothesis is rejected. 5: Repeat Steps 2 through 4 with different sample sizes until the desired power is achieved. The required sample size is n 1 .
The alternative approach of specifying a fixed value for θ Ã 0 would ignore the uncertainty in any estimate of θ Ã 0 . The above algorithms also provide an estimate of the amount of sample size reduced using the historical data. With a balanced design, the number of subjects could be reduced by 2 n 0 À n 1 ð Þsubjects, whereas the reduction in the number of subjects in the control arm would be Þin an unbalanced trial. In the above algorithms, a grid of candidate sample sizes is used and a corresponding grid of power is derived via simulation. The linear interpolation technique is used to derive the required sample size, that is, the minimum integer that achieves the desired statistical power (80%).
The proposed approach is a hybrid frequentist-Bayesian approach. Power and power analysis are frequentist concepts, which can nevertheless be applied to Bayesian statistical methods. 9 Our approach is Bayesian in the analysis model and the simulation of the control arm parameter, both using the MAP approach. However the assumed treatment effect is chosen to be fixed (i.e., specified without uncertainty), which uses principles from frequentist statistics. The rationale for assuming a fixed treatment effect is that in the context of dynamic borrowing of historical controls, no empirical information on the treatment effect is available. This means that the assumed treatment effect should be specified based on expert opinion or other sources of external information, for which the specification and elicitation of a sensible prior distribution would be difficult.

| Implementation
In the implementation of the above algorithms, there are several questions worth elaborating, including (1) the choice of N sim , (2) the choice of sensible prior for τ, and (3) the possibility to use the robust MAP in the analysis.
Ideally, it is favorable to choose N sim as large as possible to get a precise power curve. However, large N sim can be highly computationally intensive. The central issue in the choice for N sim is the Monte Carlo standard error (SE) of the power. 25 Given the desired Monte Carlo SE and the power, the required number of simulations is We recommend a Monte Carlo SE of 1% or less, which would imply at least In the examples in the next section, we chose N sim as 5000 to achieve a more precise power (Monte Carlo SE: 0.57%).
The estimation of between-study standard deviation τ is of vital importance in the implementation of the MAP approach. If the standard deviation τ is estimated with a fully Bayesian approach, then its prior should be chosen carefully, especially if the number of historical studies is limited. 26 According to previous studies for Bayesian meta-analysis and the MAP approach, a sensible prior could be a prior that puts a small probability (e.g., 5%) on values that correspond to substantial heterogeneity among the parameters, which implies almost no sizes until the desired power is achieved, the required sample size is n 0 . 6: Sample a value from p θ Ã 0 À Á and generate a realization of the new trial based on the sampled value and the prespecified δ Ã with n 0 subjects in the treatment arm and a specific sample size for the control arm. 7: Apply a MAC analysis to the simulated new trial data in Step 6 by incorporating the available historical controls, and conduct a hypothesis test of δ Ã . 8: Repeat Steps 6 and 7 a large number of times (N sim in this study, a different value is also possible), and calculate the power. 9: Repeat Steps 6 through 8 with different sample sizes for the control arm until the desired power is achieved, the required sample size for the control arm is n 2 .
borrowing from historical data. 3,27,28 We recommend the use of a half-normal distribution, but other alternatives (e.g., exponential distribution) could be used. 29 The specification of the reference value of τ that implies large between-study heterogeneity should be based on withinstudy standard deviations of historical studies, because the within-study to between-study variance ratio σ 2 =τ 2 determines the level of between-study heterogeneity. 5 A ratio σ 2 =τ 2 equal to or larger than 1 should result in almost no borrowing of the historical data.
In principle, our proposed approach should work for the robust MAP approach, but data simulation based on the vague prior of the robust component may be difficult if it is very diffuse or even improper. Therefore, our recommendation is that, if the robust MAP prior will be used in the analysis, it is only incorporated in Step 3 of Algorithm 1 and Step 7 of Algorithm 2. To sample parameters from p θ Ã 0 À Á , the ordinary MAP approach could be used.
The newly proposed approach has been implemented with R and JAGS, and the syntax for the examples in the following section can be found on GitHub at https:// github.com/QiHongchao/SampleSizeCalculation_MAP_ publication.

| MOTIVATING EXAMPLES
In this section, we illustrate the proposed approach based on multiple motivating data sets with cross-sectional binary, cross-sectional normal, and longitudinal normal outcomes. In the analysis, we used the proposed algorithms to calculate the sample size of a new trial with different assumptions of the effect size based on the available historical trial data. In all the examples, the significance level was set to two-sided 5%, and the desired power was 80%. The purpose to include multiple values of the effect size was to assess the influence of the effect size on the updated sample size.

| Steroid-resistant rejections after pediatric liver transplantation
Liver transplantation can be used to treat specific pediatrics liver diseases, and effective and safe immunosuppression after the procedure is an indication of successful treatment. To achieve the desired immunosuppression, a number of immunosuppressant drugs have been developed including Interleukin-2 receptor alpha (IL-2RA). 30 Crins et al. conducted a systematic review and metaanalysis of clinical trials of IL-2RA in pediatric liver transplantation with several outcomes of interest, for example, acute rejections, steroid-resistant rejections (SRRs), graft-loss, death. 31 Suppose that we are interested in designing a new trial to evaluate the effect of a new medication rather than IL-2RA on the incidence of SRR, a total of three historical control arms [32][33][34] can be included in the MAP analysis. The data sets were also used in Friede et al.'s study on the meta-analysis of few small studies. 26 Incidence rates and the corresponding 95% credible intervals (CIs) of SRR in the three historical control arms and the derived MAP prior are presented in Figure 2.
In this example, three algorithms in the sample size calculation approach elaborated in Section 3.3, that is, "No borrowing" (Step 1-Step 5 in Algorithm 2), "MAP + balanced" (Algorithm 1), and "MAP + unbalanced" (Algorithm 2), were used. The shared characteristics of the three algorithms is that the sampling prior for the SRR rate in the new control arm p Ã 0 À Á is derived based on the MAP approach, while the difference is that the power for the treatment effect is based on different analysis models, namely (1) the stand-alone analysis of the new trial data, (2) the MAP analysis of the balanced new trial data, and (3) the MAP analysis of the unbalanced new trial data.

| Analysis
The MAP approach was first implemented for the logodds of historical control arm SRR rates to derive the informative prior for p Ã 0 . For binary outcomes on the logodds scale, a half-normal prior for the between-study standard deviation τ with scale parameter 1 that puts about 5% probability for τ larger than 2 has been recommended, 3 that is, HN 1 ð Þ. For the overall mean of log-odds of the SRR rate, N 0, 2 2 ð Þwas used in that a standard deviation of 2 implies a large between-study heterogeneity. 35 To derive the MAP prior for p Ã 0 , four Markov Chain Monte Carlo (MCMC) chains with 2 Â 10 5 iterations and a percent of burn-in of 20%, that is, 4 Â 10 4 , were run. All the Markov Chain Monte Carlo (MCMC) chains achieved convergence according to the Gelman-Rubin statistic. 36 For the sample size calculation with different algorithms, the new trial data was simulated with different sample sizes and treatment effects. The treatment effect was parameterized with OR, an OR less than 1 implied a beneficial effect of the new treatment. The new trial data was simulated with OR ¼ 0:6, 0.4, and 0.2 in this example. Five thousand simulations were done for each combination of candidate sample size and treatment effect. In the analysis of the new trial data, a N 0, 2 2 ð Þ prior was used for log odds of the SRR rate in the treatment arm. The same simulation settings in the MAP prior derivation were also used in the analysis of the simulated new trial data.

| Results
Statistical powers of the treatment effect corresponding to each combination of the sample size per arm (or the sample size of the treatment arm for Algorithm 2) and the treatment effect were derived based on "No borrowing," Algorithm 1, and Algorithm 2, and power curves of the above three algorithms are shown in Figure 3.
The required sample size in the three sample size calculation algorithms calculated based on linear interpolation (No borrowing: 2n 0 , Algorithm 1: 2n 1 , Algorithm 2: n 0 þ n 2 ð Þ) and the reductions in sample size of Algorithm 1 and Algorithm 2 (Algorithm 1 2 n 0 À n 1 ð Þ, Algorithm 2: n 0 À n 2 ð Þ) and the relative reduction in sample size compared to 2n 0 with different treatment effects in this example are presented in Table 1. Note that the table presents total sample sizes of the algorithms, whereas the power curves are based on the sample size per arm (or the sample size for the treatment arm for Algorithm 2).
According to Table 1, the reductions in sample size in Algorithm 1 and Algorithm 2 were negatively related to the effect size, that is, the larger the treatment effect, the less the reduction in sample size. Besides, Algorithm 2 led to more sample size reduction than Algorithm 1 did. In contrast to the absolute numbers, the relative reduction in sample size was positively related to the effect size in both algorithms. In addition, different versions of the prior ESS for the MAP prior were also calculated using the RBesT package in R, 37

| Vascular endothelial growth factor inhibitors for wet age-related macular degeneration
In this subsection, the sample size calculation of a new trial with normal outcome is considered. The motivating data are from a network meta-analysis of clinical trials for wet age-related macular degeneration (AMD) by Szabo et al., 38 which included five trials with a common control arm (i.e., Ranibizumab 0.5 mg monthly) and different treatment arms. [39][40][41][42] The effects of different vascular endothelial growth factor inhibitors (e.g., Ranibizumab with different doses, Aflibercept) on wet AMD in terms of bestcorrected visual acuity (BCVA) were evaluated in the trials. The parameter of interest was change from baseline BCVA at 3 months, the point estimates and the corresponding 95% CIs of the BCVA change in control arms of the above trials along with the MAP prior are visualized in Figure 4

| Analysis
The original scale of the outcome was used in this example. For the specification of the prior for the betweenstudy standard deviation τ, a half-normal distribution with scale parameter σ=2, that is, HN σ=2 ð Þ is plausible, 5,6,43 where σ is the within-study standard deviation of the primary outcome (the change of BCVA from baseline in the historical control). An estimate of σ is needed to formulate the prior for τ based on multiple historical controls. For each historical control, σ was calculated by multiplying the standard error (as reported in the original publications) of the BCVA change by the square root of the sample size. Because the historical controls had similar standard deviations, with σ ≈ 12 in each study, the use of a pooled estimate was considered reasonable. Therefore, a HN σ=2 ð Þ¼HN 6 ð Þ was used as prior for τ in this study. The prior for the overall control mean μ 0 ð Þ was N 0, 1 Â 10 6 ð Þ . In this example, a positive mean difference of BCVA between the treatment arm and the control arm implies remission of wet AMD, that is, a beneficial treatment. Therefore, the treatment effect was considered to be 2, 3, and 4 in the simulation of the new trial data. For the analysis of the new trial data, N 0, 1 Â 10 6 ð Þwas used for the mean change in the treatment arm of the new trial (μ 1 ). The algorithms and simulation settings used in the MAP prior derivation and the sample size calculation were the same as those in the above example, and all MCMC chains achieved convergence.  The required sample size of the three sample size calculation algorithms and the absolute/relative reductions in sample size of Algorithm 1 and Algorithm 2 with different treatment effects in this example are presented in Table 2.

| Results
The required sample sizes per arm derived from the "No borrowing" algorithm n 0 ð Þ were equivalent to those derived based on the sample size formula, that is, 2 Z α=2 þ Z β À Á 2 σ 2 = δ Ã ð Þ 2 , with different treatment effects. Similarly, the absolute reductions in sample size were negatively related to effect size of the treatment effect, whereas the relative reductions showed the opposite relationship. Algorithm 2 reduced more sample size than Algorithm 1. Moreover, different versions of the prior ESS for the MAP prior derived from the historical controls were calculated using the RBesT package yielding ESS VR ¼ 86, ESS MTM ¼ 604, and ESS ELIR ¼ 271, and they also differed from the sample size reductions presented in Table 2.

| Progress of Alzheimer's disease with pharmaceutical intervention
In addition to cross-sectional outcomes, the algorithms could also be implemented in clinical trials with longitudinal outcomes and multiple parameters with some modifications.
Alzheimer's disease, a neurodegenerative disorder, is one of the most severe health problems in aging populations worldwide and the most common cause of dementia. 44 Currently, no drugs are available for treating behavioral and psychiatric symptoms that may develop in the moderate and severe stages of Alzheimer's disease. 45 Therefore, the ADCS was initiated to investigate potential treatments for the symptoms, which included a number of RCTs evaluating compounds that may ease symptoms of Alzheimer's disease. In this study, we included six RCTs from ADCS to illustrate the proposed algorithms. In the included trials, the score of the Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) of patients, which is considered to be the "gold standard" for assessing antidementia treatments, 46 was measured before and 12-month after the treatment. The trials all had a pretest-posttest design, and an analysis of covariance (ANCOVA) model can be used to estimate the treatment effect by adjusting for the baseline outcome. The model is given by where y Fi denotes the response of the ith subject at the follow-up, β 0 is the intercept, β 1 is the effect of pretreatment measurement on the post-treatment measurement, that is, the baseline effect, y Bi denotes the baseline measurement for the ith subject, δ is the treatment effect, trt i is the treatment assignment for the ith subject (trt i = 0 for the control arm, and 1 for the treatment arm), and ε i $ N 0, σ 2 ε À Á is the random error. The forest plots of estimates for the intercept and the time effect and the derived MAP priors are presented in Figure 6

| Analysis
The implementation of Algorithm 1 and Algorithm 2 is slightly different for ANCOVA models in that a multivariate MAP approach is more favorable for ANCOVA models, please refer to our previous study for more details. 29 Note that the focus of our previous study was to implement the MAP approach in ANCOVA models, whereas this study aims to investigate the sample size calculation of a new trial analyzed with the MAP approach (regardless of the analysis model).
In this study, historical information on both the intercept and baseline effect was incorporated. The intercept and baseline effect of the historical controls and new trial are from a bivariate normal distribution with overall mean μ ¼ μ 0 , μ 1 ð Þ and covariance matrix The prior for μ 0 was N 0, 1 Â 10 4 ð Þ , and the prior for μ 1 was N 0, 1 Â 10 2 ð Þ . A separation strategy (i.e., modeling the variances and correlations separately) was used to specify the prior for the betweenstudy variance-covariance matrix S β , because separate priors outperform the conventional inverse-Wishart prior due the robustness in estimation and the flexibility in prior information incorporation. 47 The priors for τ β 0 and τ β 1 were HN 10 ð Þ and HN 0:36 ð Þ because the withinstudy standard deviation for the intercept and the baseline effect were 20 and 0.72, respectively. For the between-study correlation between β 0 and β 1 , a Uniform À1, 0 ð Þ prior was used as recommended by Burke et al. 48 The prior for the error variances was Inverse À Gamma 1 Â 10 À3 ,1 Â 10 À3 ð Þ .
In the generation and analysis of the new trial data, the treatment effect δ Ã was considered to be À2, À3, and T A B L E 2 The required sample sizes and the reductions in sample size (%) with different treatment effects in the wet AMD example with a desired power of 80%

| Results
Power curves of "No borrowing," Algorithm 1, and Algorithm 2 were derived in different scenarios with different treatment effects and sample sizes, which are shown in Figure 7.
The required sample size of "No borrowing," Algorithm 1, Algorithm 2, and the absolute and relative reductions in sample size of the latter two algorithms with different treatment effects in this example are presented in Table 3.
The relationship between the absolute/relative reduction in sample size and the effect size was in line with that in previous examples, and Algorithm 2 also reduced more sample size than Algorithm 1 did. The prior ESS in this example was not calculated because a single prior ESS is not available for multiple parameters in ANCOVA models.

| Main findings
In the above examples, there are three main findings, namely (1) the sample size reduction was consistently F I G U R E 6 Effect estimates of (a) the intercept, (b) the baseline effect (95% CI) of the historical controls and the derived MAP priors [Colour figure can be viewed at wileyonlinelibrary.com] smaller with larger effect sizes; (2) the unbalanced design (using Algorithm 2) yielded more sample size reduction than the balanced design (using Algorithm 1) in all examples; and (3) the sample size reduction according to our method may differ substantially from the prior ESS. The explanations are as follows.
The size of the treatment effect determines the required sample size of the new trial, that is, a smaller treatment effect leads to a larger new trial data, and this sample size could affect the estimation of τ. A larger simulated new trial data set with similar characteristics as the historical controls has a larger weight in the updated MAP analysis, which can lead to a lower τ. As a result, the historical information is less discounted with a lower τ and the sample size reduction is thus larger. Results of the simulation have also shown the influence of the effect size on the estimation of τ. Posterior means of τ based on Algorithm 1 (n 1 subjects in both arms) and Algorithm 2 (n 0 subjects in the treatment arm and n 2 subjects in the control arm) were derived from 5000 simulations with different treatment effects in Sections 4.1 and 4.2, which are visualized in Figure 8.
According to Figure 8, the median of posterior means of τ was positively related to the effect size in both examples. That is, a simulated new trial with smaller effect size led to a lower estimate of τ, which implied the historical information on the control parameter was less downweighted with smaller effect size. Therefore, the historical  information was more concentrated and led to more sample size reduction. The unbalanced design with fewer subjects in the control arm had more sample size reduction than the balanced design with equal numbers of subjects in both arms. Given the total sample size for a trial, the precision of the treatment effect estimate is maximized with equal numbers of subjects in both arms with the historical information ignored. 14 However, when the historical information is incorporated, a design with balanced randomization in the new trial is actually unbalanced in the total analysis. Given a specific sample size of the new trial, that is, n 0 þ n 2 ð Þ, Algorithm 2 with an unbalanced design can lead to the most precise treatment effect estimate.
In the comparison between the reduction in sample size and prior ESS, although the two concepts are both closely related to sample size calculation, their calculations are based on different assumptions and methods. The approaches to prior ESS considered in this study are all information-based, 7,8,28 and they determine the prior ESS based on the historical data without considering the new trial data. Despite their sharing characteristics (i.e., information-based, independent of the new trial data), these approaches could yield "surprisingly different" results in nonconjugate settings, 8 which is also shown in Sections 4.1 and 4.2. On the contrary, the reduction in sample size is calculated based on the desired statistical power accounting for the (simulated) new trial data. The estimation of τ is affected by the treatment effect, and it is reasonable that they differ from the prior ESS.

| DISCUSSION
In this study, we have proposed a new approach to calculate the sample size of a new trial analyzed with the MAP approach. Within the approach, two algorithms with either a balanced design or a unbalanced design are available. The advantage of the approach is that it can determine the required sample size in the context of the MAP analysis in a straightforward and generic manner, which can be applied in scenarios with a nonconjugate prior or even multiple variables.
The proposed approach was developed based on Sutton et al.'s approach that calculates the sample size of a new trial based on the result of an updated meta-analysis which will include it. 11 Sutton et al.'s approach is a useful tool in the sample size calculation of a new trial to improve the whole evidence body in different types of clinical trials, for example, cluster randomized trial, 49 diagnostic tests. 50 Due to its meta-analytic nature, we modified the approach to fit in the context of the MAP analysis.
Although our approach is inspired by a previous approach mentioned above, it is intended for a very different context, that is, dynamic borrowing of historical controls. In dynamic borrowing methods, the shrinkage estimate of treatment effect in the new trial rather than the overall estimate of the treatment effect is of most interest. Dynamic borrowing methods are often more relevant than meta-analysis because clinical trials focusing on the same health outcome often have the same standard of care but less often with the same new treatment strategy. 1 Besides, summarizing the treatment effect using a meta-analysis is not necessarily optimal, especially when meta-analyzing highly heterogeneous studies (e.g., due to variation in geographic areas, genetic patterns, and levels of socioeconomic development), which could lead to uninterpretable overall estimates. 23 Besides, Sutton et al.'s approach with the random effects assumption may lead to an unrealistically large sample size, 11,49,50 which is not a problem with our approach because it typically gives a smaller sample size than the conventional sample size calculation approach.
Both algorithms in the proposed approach can lead to a lower required sample size due to the incorporation of historical control information in the analysis using the MAP approach. The sample size reduction (in absolute terms) decreased with the effect size of the treatment effect, which was caused by different estimates of between-study heterogeneity in simulated new trials with different effect sizes. With more historical controls included, the estimation of τ may be less affected and the effect size will have less impact on the absolute reduction in sample size. 11 However, the number of historical controls with similar characteristics is limited in practice, and the sample size reduction is likely to be influenced by the assumed effect size.
In contrast to the relationship between the effect size and the absolution reduction in sample size, the relative reduction (in percentage terms) is positively related to the effect size. In practice, the relative reduction can be used to guide decision making on the incorporation of historical information in the new trial. For instance, it may not be worth the effort to deal with the methodological and regulatory approval difficulties in the incorporation of historical information if the relative reduction is too low (e.g., less than 5%).
In this study, reductions in sample size were larger in an unbalanced design. The algorithm with an unbalanced design is more favorable due to more sample size reduction. The finding is line with Pocock's finding that more subjects allocated to the treatment arm could lead to more precise estimates than equal allocation given the total sample size if information of the historical controls is incorporated. 14 In fact, Algorithm 1 is unbalanced in terms of the amount of information whereas Algorithm 2 is balanced with regard to information and can lead to the largest precision of the treatment effect estimate. Moreover, in clinical trials investigating the effect of a novel treatment on a health outcome, it is more sensible and ethical to assign sufficient participants to the treatment arm while saving the sample size in the control arm based on the historical information. 1,4 Previous studies on dynamic borrowing methods also discussed and justified the adaptive allocation rather than the conventional 1:1 allocation ratio. 1,51 In addition to the ordinary MAP prior used to analyze the new trial in the proposed approach, the robust MAP with a robust component can be used to deal with the potential prior-data conflict. 4 In the simulation of new trial data, the vague prior in the robust component may lead to unrealistic realization of the new trial data. Therefore, it is sensible to use the ordinary MAP prior for the simulation and the robust MAP prior for the analysis of the new trial, which is in line with the proposal that separated priors for the simulation (i.e., sampling prior) and analysis (i.e., fitting prior) of the new trial data could be used in previous studies. [52][53][54] In some cases, instead of the one-stage MAC analysis, it is also possible to first approximate the MAP prior using a parametric mixture density and then do the twostage MAP analysis for the two algorithms using the RBesT package. Although it may not always be feasible to approximate the MAP prior (for instance, in a multivariate case), the speed gain by doing a two-stage MAP analysis may outweigh the loss in generality in specific cases, for example, univariate normal.
Regarding the prior ESS, different approaches yielded different ESS values when the prior is nonconjugate, which is common in the MAP analysis. Unlike the sample size reduction calculated using the proposed algorithms, the calculation of prior ESS is independent of the new trial data, which may not be sensible if the statistical power of the new treatment effect is of interest. The pattern of varying reductions in sample size found in this study has some similarities with the pattern of ECSS, 17 that is, the ECSS varies with different levels of prior-data conflict. However, the ECSS is calculated based on the similarity between the prespecified "true" new control parameter and the prior distribution without considering the treatment effect, whereas the new control parameter is sampled from the MAP prior in this study. It is more reasonable to simulate the new trial data based on an informative prior from historical controls with similar characteristics rather than an assumed control parameter.
In the implementation of the proposed approach this study, summary statistics were used for the univariate examples, whereas individual participant data (IPD) were used for the ADCS example. Note that IPD are not indispensable in the implementation if the sufficient statistics are available. 55 In addition it should be noted that the proposed approach only calculates the required sample size for a desired level of statistical power. In case drop-out of patients or other types of missing data are expected, it is recommended to upwardly adjust the calculated sample size to account for missing data.
In summary, we have outlined a logical approach to determine the sample size of a new trial to be analyzed with the MAP approach in this study, and the unbalanced Algorithm 2 is recommended because it is ethically preferable and provides a greater reduction in sample size.
AUTHOR CONTRIBUTIONS Hongchao Qi: conceptualization, methodology, software, formal analysis, and writing original draft. Dimitris Rizopoulos: conceptualization, methodology, review and editing of the manuscript, and supervision. Joost van Rosmalen: conceptualization, methodology, review and editing of the manuscript, and supervision.