Glazebrook, K D and Minty, R J (2009) A generalised gittins index for a class of multi-armed bandits with general resource requirements. Mathematics of Operations Research, 34. pp. 26-44. ISSN 0364-765X
10.pdf - Submitted Version
Download (200kB)
Abstract
We generalise classical multiarmed bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource, which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch, provided they do not consume more resource than is available. We propose suitable bandit indices that reduce to those proposed by Gittins [Gittins, J. C. 1979. Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B41 148–177] for the classical models. The index that emerges is an elegant generalization of the Gittins index, which measures in a natural way the reward earnable from a bandit per unit of resource consumed. The paper discusses both how such indices may be computed and how they may be used to construct heuristics for resource distribution. We also describe how to develop bounds on the closeness to optimality of index heuristics and demonstrate a form of asymptotic optimality for a greedy index heuristic in a class of simple models. A numerical study testifies to the strong performance of a weighted index heuristic.