Multi-arm clinical trials with treatment selection: what can be gained and at what price?

2015 With current success rates of confirmatory studies being only around 50%, new approaches to drug development are paramount. Many trials fail simply because ineffective treatments are identified too late. In this paper, we discuss the utility of multi-arm studies with treatment selection as a potential strategy that can reduce the high attrition rate. We illustrate the large gains in efficiency that are possible based on an example in Alzheimer’s disease while outlining the additional challenges that need to be overcome to implement such studies.

The development of medicinal products and health technologies is time consuming and very costly. In the context of pharmaceutical products, for example, it is estimated that the development of a novel item takes 10-15 years and costs several hundred million pounds on average [1]. Among the largest contributors to both time and cost are confirmatory (Phase III) clinical trials that often involve thousands of patients with follow-up period frequently lasting years [2]. In recent years, however, around 50% of confirmatory clinical trials have failed to show a beneficial effect or been rejected at regulatory submission [3] resulting in a large number of participants in these trials being exposed to an ineffective or even harmful treatment while at the same time costing substantial amounts of money. The situation within Phase II studies is even worse with only 18% of these studies progressing a drug candidate into Phase III trials [4]. As a result of these shockingly high failure rates, alternative approaches to drug development are being explored. In this paper, we describe the advantages and additional complications of multi-arm studies that select/drop treatments during the conduct of the study. We begin with a description of the designs followed by relevant additional (practical) aspects that need to be considered before embarking on such a design. We then provide an illustrative example highlighting the efficiency gained by these approaches on the basis of trials in Alzheimer's disease before we finish with some general conclusions.

An overview of different types of multi-arm studies
Multi-arm studies A multi-arm study is a study which compares several experimental treatments against a common control group. An immediate advantage of such an approach over separate two-arm studies is that only a single control group is used. As a consequence, a patient's chance to receive an experimental treatment is increased which has been argued could help with recruiting patients to such studies [5,6]. Additionally, such studies allow a fair, contemporary comparison of different experimental treatments as the comparisons are made against the same control group and under a single protocol so that relevant features of the study, such as inclusion/exclusion criteria, are the same.

Multi-arm studies with treatment selection
One of the drawbacks of traditional two-armed studies is that they do not allow For reprint orders, please contact reprints@future-science.com Figure 1. A multi-arm study which selects all promising treatments at interim analysis. Three analyses are planned for a study with four experimental treatments versus control. At the first interim analysis, treatments 1 and 3 are dropped from the study as they are below the futility threshold. At the second interim analysis, the second test statistic exceeds the upper bound so that superiority of treatment 2 over control can be concluded and the study can be stopped. Clinical Trial Methodology Jaki for early conclusions (for better or for worse) about the treatment. To overcome this, group-sequential designs that allow the study to stop early, either because the evidence is already sufficient to claim superiority of the treatment over control or because it is unlikely to reach such a claim, have been developed [7,8] and are now routinely used in practice.
In the same spirit, multi-arm studies can be made more efficient by adding interim analyses that allow early stopping because the evidence collected is already sufficient to conclude that one or more treatments is superior to control or to stop because none of the experimental treatments looks sufficiently promising. Additionally, interim analysis can be used to select which treatment(s) warrant further experimentation. Typical selection rules used select the best performing or the k-best treatments [9,10], select any treatments that are close to the best performing one [11] or select any treatment that looks promising [12,13]. Figure 1 shows an example of such a design where all promising treatments continue in the study. In this example, two interim analyses and a final analysis are planned. After sufficient patients have been recruited for the first interim analysis, test statistics comparing each experimental treatment to control are found. In the fictitious example, two of the statistics, corresponding to experimental treatments 1 and 3, fall below the lower bound indicating that not further experimentation is warranted on these treatments and consequently these arms are dropped from the study. The remaining two test statistics, corresponding to treatments 2 and 4, are above the lower threshold but not above the upper bound. Therefore, additional information is required on these arms (plus control) to reach a definite conclusion. More patients are consequently recruited to these treatments and control. At the second interim analysis, the test statistic comparing experimental treatment 2 to control exceeds the upper boundary and hence superiority of treatment 2 over control can be claimed and the trial can be stopped.
Because treatments are removed from consideration early and the trial can be stopped before the maximum number of patients are recruited, the required sample size of such studies will typically be smaller than a multi-arm study without treatment selection. It is, however, possible that the realized sample size is in fact larger than a multi-arm study without selection. In particular, in the unlikely case that no treatment arm can be dropped early and no early claim of superiority is possible, the sample size will be larger. This is because allowing for early claim of superiority also gives additional opportunities to wrongly make such a claim. To counteract such mistakes and to ensure that the overall type I error of the procedure is controlled, more stringent critical values than the ones utilized for multi-arm studies are required. The impact of allowing arms to be dropped on the other hand is an increase in type II error if the sample size is kept the same.
To design a multi-arm study with treatment selection, two different statistical approaches can be utilized. The so called 'pre-planned' adaptive designs [9][10]12,14], which are extensions of group-sequential methods, require specification of how treatments will be selected (e.g., select the best treatment or any treatment surpassing a predetermined threshold) while 'fully flexible' adaptive designs [15,16] do not require such pre-specification. The cost for the additional flexibility of the latter approach is a potential loss in efficiency (typically power is lowered by a few% [17]). In either case, we believe that it is paramount that the overall type-I error is controlled [18]. Adding additional treatments would otherwise increase the change of finding an effect (even when all are truly ineffective). As a consequence in a trial without overall type-I-error control, one could simply include numerous doses of the same treatment in a study to be almost certain to show that the treatment has an effect. In current practice, overall type-I-error control is, however, not always adhered to [19].
Besides the added efficiency that will be illustrated below, a further advantage of a multi-arm study with treatment selection, illustrated in Figure 2, is that one can use an interim analysis to mark the end of Phase II and beginning of Phase III thereby allowing a seamless Phase II/III study which removes the white  Adaptive multi-arm clinical trials Clinical Trial Methodology space between the phases. Although this may not always be desirable [20], it can reduce the time of drug development notably.

Adaptive multi-arm studies with treatment selection
When preforming an interim analysis for treatment selection, it is natural to also consider other adaptations. Most commonly sample size reassessment is of interest. Such an adaptation uses the accumulated trial data to verify assumptions made about factors of the study and potentially adjusts the sample size required based on these new estimates. Typically, a factor that is not of primary interest (e.g., variability of the end point) is estimated based on the accumulated trial data and if the estimate deviates from the value assumed on initiation of the study, the new estimate is used to update the sample size required. Including such additional modifications in the study are straightforward if fully flexible designs are used. For pre-planned adaptive designs, the conditional error approach [21,22] can be used to incorporate such additional adaptations. Using a pre-planned design and making it flexible is, however, only advisable when such adaptations are unplanned as a fully flexible design will typically be more efficient otherwise [23]. For an easy to follow overview of fully flexible adaptive design ideas, see [24].

An example of a multi-arm study with treatment selection
In the above section, we have described the general concept of a multi-arm study with treatment selection and argued that they are an efficient way to investigate several different treatments against a common control group. In this section, we will give a numerical illustration of the gains possible, based on trials in Alzheimer's disease. A review in Alzheimer's disease published in 2010 [25] found that a large number of different treatments are currently being tested. No less than 13 Phase III studies were on going with most using traditional two-arm designs, meaning that equally many control groups are being used. Although not all treatments are targeting the same mechanism of actions and hence are not immediately comparable, there are still three or four treatments being evaluated when only considering treatments that are targeting the same mechanism of action. In this illustration we will compare the sample size requirements of different strategies to evaluate 3 experimental treatments in Alzheimer's disease. The first strategy evaluates all three experimental treatments in three distinct two-arm trials while the second utilizes separate group-sequential designs with triangular stopping boundaries [26]. The third strategy evaluates all three experimental treatments in a single study while a design with multi-arm with treatment selection of all promising treatments [27] is used as the final alternative.
For this hypothetical example, we will use the design parameters used in the recently completed LADDER trial [28]. The primary end point of interest is the change from baseline in the 11-item Alzheimer's Disease Assessment Scale-cognitive subscale [29] at week 24 and we model the outcome as normally distributed. In line with [28], we assume a standard deviation of 6 in the primary outcome and that a 2 point difference future science group Clinical Trial Methodology Jaki is considered a clinically relevant effect. A one-sided type-I error of 2.5% and power of 90% are used. Table 1 provides the (maximum) sample sizes of the four different strategies to evaluate three experimental treatments. We consider conducting three separate two-armed trials, three group-sequential trials as well as multi-arm trials with and without selection and use equal allocation of patients to all arms. Calculations were performed using the R package MAMS [30]. The group-sequential design and the multi-arm trial with selection each use one interim analysis conducted at the half-way point of the study. As group-sequential designs and multi-arm designs with treatment selection offer the opportunity to stop early, the expected sample sizes when no treatment is better than control and when exactly one treatment is superior to control are also provided. The (maximum) sample size of using separate trials to evaluate the experimental treatments is larger than the sample size required if a multi-arm trial design (with or without selection) is used. This is despite the fact that no attempt has been made to correct for multiplicity when using separate studies. The sample size of three separate single-stage trials when using a Bonferroni correction to ensure overall type-I-error control -something that the multi-arm designs discussed here do provide automatically -is at 1464 patients, about 50% larger than the multiarm strategies. When acknowledging that a multi-arm design with treatment selection is expected to drop at least some arms, we find the advantage of the multiarm design to be even larger. With only around 640 patients expected to be required before a definitive conclusion is reached, a multi-arm strategy is clearly more efficient than conducting separate studies.

Practical considerations for multi-arm studies
Clearly there is a substantial efficiency advantage in using a multi-arm study with treatment selection instead of conducting several separate trials. The above arguments, though statistically correct, are however a little bit over-enthusiastic. This is because some additional considerations and administrative hurdles need to be overcome to benefit from these, potentially large, gains. First there are considerations that come from the desire to evaluate several experimental treatments against a common control and second there are considerations that only apply because interim analyses are used for treatment selection. The latter are by large similar to challenges encountered in two-arm group-sequential designs.
The first challenge introduced by comparing multiple arms is that different trials comparing a single treatment against control are often initiated and conducted by different centres. As a result, they have different inclusion and exclusion criteria, may use different primary and secondary end points and possibly a different comparator treatment. All of these must be standardized for a multi-arm trial that requires negotiations and compromises between investigators. Since a multi-arm trial operates as a single trial under one protocol all treatments in the study need to be available at the same time to ensure contemporary evaluation. Additionally, a multi-arm study implicitly assumes that all experimental treatments start at an equal footing and hence they will only be efficient if there is no reason to believe that one treatment will have a better chance of yielding an improvement over control than any other.
A second challenge is to ensure that no bias in the evaluation is introduced in multi-center multi-arm studies through imbalances between allocations to treatments at different centers/regions. It is therefore paramount that randomization to all arms (including the control arm) is stratified by center or region to ensure that the risk of bias is minimized.
The third challenge concerns the analysis of such studies. At the end of a multi-arm study estimating the effect of the best experimental treatment is often of main interest. Using standard analysis methods for this purpose will result in an over-enthusiastic (upward biased) estimate of the effect [31]. Specialized methods that lead to unbiased estimators [31] or reduce the bias [32] are therefore necessary for analyzing such  [33,34]. The first important consideration when allowing interim analyses for treatment selection is that the maximum sample size required will be larger than for a multi-arm study without selection (although the expected number of patients is typically notably smaller). Even though the increase in maximum sample size is typically small, recruiting the maximum number of patients still needs to be possible and investigators need to be prepared to recruit that many patients in the unlikely event that no treatment can be dropped early and no early claim of superiority is possible.
Second, in order to observe (notable) reductions in the sample size required, the end point utilized for treatment selection (typically the primary end point or some short-term surrogate) needs to be available quickly relative to the recruitment rate. The reason for this is that patients will continue to be randomized to each arm while the data to make the interim treatment selection are being collected. In the most extreme case therefore all patients could already be recruited by the time the information from assessed patients is available for making the treatment selection decision.
Third, the organization of interim analyses must be efficient with data monitoring and statistical analysis done to tight deadlines as delays in the selection decision reduces the benefit of such a design as argued above. Additional resource may therefore be required to allow a quick decision making as well as ensuring blinding and trial integrity is maintained. To achieve this, efficient communication between the investigators, data management and statisticians is essential.
A fourth consideration is around communicating the more complex design of a multi-arm study with treatment selection to both patients and investigators. In particular, the informed consent procedure requires careful consideration as patients need to be fully aware of all possibilities. In the STAMPEDE trial, for example, a two-part patient information sheet was used. Information on all arms was provided to all patients while further details on the allocated arms were made available after randomization [35].
The final challenge concerns planning and ensuring treatment supply. The maximum drug supply is uncertain as arms can be stopped prior to the end of the study. Although the same issue is present in group-sequential designs, the additional arms make this challenge more pronounced in multi-arm studies with treatment selection. Accurate planning and in particular precise estimation of recruitment rates -particularly for multi-center studies (e.g., [36]) -are paramount.

Discussion
Multi-arm studies with treatment selection are an efficient means for drug development when several potentially useful treatments are available for testing, and a number of different studies are now being run under this framework in a variety of disease areas [37][38][39]. In this paper, we have not only highlighted the potential gains that are possible when using such an approach but also discussed the additional complexities such designs bring with them. While we have kept the illustration simple, it should be noted that in-depth evaluations of the design alternatives, usually via simulations, are crucial when deciding which design is best. To support the implementation of these ideas, various software solutions exist. Commercial software such as Add-Plan [40] or EAST [41] provide tools to design, simulate and analyze such studies. Additionally, the addon packages MAMS [30] and asd [42] for the statistical software R [43] are freely available.

Future perspective
Multi-arm designs will become more widely used in the future as an efficient tool to make evidence-based decisions about different licensed treatments. Additionally, their use during the development of novel treatments will increase as familiarity with these ideas rises.