Filtered Poisson process bandit on a continuum

Grant, James and Szechtman, Roberto (2021) Filtered Poisson process bandit on a continuum. European Journal of Operational Research. ISSN 0377-2217

[img]
Text (FPPBanditEJOR-9)
FPPBanditEJOR_9.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (4MB)

Abstract

We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys \tilde{O}(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.

Item Type:
Journal Article
Journal or Publication Title:
European Journal of Operational Research
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/1800/1802
Subjects:
ID Code:
153070
Deposited By:
Deposited On:
23 Mar 2021 13:30
Refereed?:
Yes
Published?:
Published
Last Modified:
24 Jun 2021 04:01