BinaryBandit : An Efficient Julia Package for Optimization and Evaluation of the Finite-Horizon Bandit Problem with Binary Responses

Jacko, Peter (2019) BinaryBandit : An Efficient Julia Package for Optimization and Evaluation of the Finite-Horizon Bandit Problem with Binary Responses. Working Paper. Lancaster University Management School, Lancaster.

[thumbnail of Jacko2019_binarybandit_wp]

Text (Jacko2019_binarybandit_wp)
Jacko2019_binarybandit_wp.pdf - Published Version
Download (665kB)

Abstract

Variants of the multi-armed bandit problem for design of sequential experiments have been studied in several disciplines for almost a century, but the performance evaluation of proposed designs or finding a Bayes-optimal design over the finite horizon has resisted derivation of a closed formulae. Computational optimization and evaluation is thus the only possible approach. The BinaryBandit package in Julia programming language has been developed to provide such framework with a number of designs, easily extendable to add new designs. The package is based on the use an efficient implementation of backward recursion which gives accurate (up to computer accuracy) evaluation for small and moderate horizons. For instance, on a standard laptop or desktop computer, the Bayes-optimal design for the two-armed problem can be computed for offline use or evaluated in online fashion in a few minutes (horizon around $ 1,000 $), in a few hours (horizon around $ 2,000 $), or in a few days (horizon around $ 4,000 $). 32GB of RAM allows storing (e.g., for offline use) of the whole design up to horizon around $ 1440 $; when its storing is not needed (e.g., for Bayesian evaluation or for calculation of the initial action) it allows up to horizon around $ 4440 $. These problems are significantly larger than what has been reported in the literature, since moderate and large horizons have only been evaluated by simulation, trading-off accuracy. This paper describes the details of the backward recursion implementation and gives an example of the package usage.

Item Type:

Monograph (Working Paper)

Subjects:

?? multi-armed bandit problemdesign of sequential experimentsbayesian decision theorydynamic programmingindex rulesresponse-adaptive randomizationjulia programming language ??

Departments:

Lancaster University Management School > Management Science

ID Code:

136340

Deposited By:

ep_importer_pure

Deposited On:

27 Aug 2019 12:25

Refereed?:

Published?:

Published

Last Modified:

12 Jul 2025 02:03

URI:

https://eprints.lancs.ac.uk/id/eprint/136340