Yarahmadi, Amin and Jacko, Peter and Glazebrook, Kevin (2023) Stochastic Models for Dynamic Resource Allocation. PhD thesis, Lancaster University.
Abstract
Determining the efficacy of a novel intervention is vital before making it available to the public. The standard equal fixed randomisation procedure in the design of (static) experiments leads to an unbiased Maximum Likelihood Estimator (MLE) for each intervention. However, this approach results in a heavily suboptimal cumulative reward. On the other hand, it imposes limitations in some situations, especially for rare diseases, when it is desirable to design a clinical trial on a small number of subjects while treating them as well as possible. This motivates the use of response-adaptive procedures where the allocation ratios to each arm can be skewed toward the better-performing intervention as subject responses become available. Hence, we consider the Bayesian Beta-Bernoulli finite-horizon two-armed bandit problem with binary responses and the objective function of maximising the Bayes-expected total number of subject successes in the trial, which we call the subject benefit. Using a memory-efficient implementation, dynamic programming is utilised as the solution method for the proposed model to derive the randomised designs. Despite the type of randomisation procedure, the MLE is estimated in a frequentist way using DP-based solutions at the end of the trial. We first evaluate the bias of MLE and show that it is unacceptably high and variable due to the model's adaptiveness. We propose a new augmented estimator with the aim of mitigating the estimation bias whilst the DP actions are deterministic. Moreover, by modifying the allocation decision at every time step, we introduce two novel allocation procedures that mitigate the bias induced by the DP procedure: (i) DP using an augmented estimator, which adds a number of pseudo-successes to the worse-performing intervention, and (ii) randomised DP procedure, which perturbs the Bayes-optimal allocation decision with a given probability. Lastly, another DP design is proposed based upon setting an interim analysis, in which some novel and non-trivial stopping criteria have been developed, in the middle of the trial. The interim analysis look can be implemented in the simulation step or both the DP procedure and the simulation step, identically. We evaluated the proposed designs via extensive simulation studies in a broad range of scenarios. This thesis addresses some key issues in the trade-off between reducing the bias in the estimation and improving the subject benefit in the bandit models, which can be considered as a limitation preventing bandit models from being implemented in practice.