Regret bounds for Gaussian process bandit problems

Grunewalder, S. and Audibert, J.-Y. and Opper, M. and Shawe-Taylor, J. (2010) Regret bounds for Gaussian process bandit problems. In: Artificial Intelligence and Statistics (AISTATS). UNSPECIFIED, pp. 273-280.

Full text not available from this repository.

Abstract

Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.

Item Type: Contribution in Book/Report/Proceedings
Departments: Faculty of Science and Technology > Mathematics and Statistics
ID Code: 85099
Deposited By: ep_importer_pure
Deposited On: 07 Mar 2017 11:52
Refereed?: Yes
Published?: Published
Last Modified: 01 Jan 2020 05:52
URI: https://eprints.lancs.ac.uk/id/eprint/85099

Actions (login required)

View Item View Item