Alsomali, Mohammad and Rodrigues-Filho, Roberto and Soriano Marcolino, Leandro and Porter, Barry (2024) An Online Incremental Learning Approach for Configuring Multi-arm Bandits Algorithms. In: ECAI : European Conference On Artificial Intelligence. UNSPECIFIED. (In Press)
m2033.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.
Download (1MB)
Abstract
This paper introduces Dynamic Bayesian Optimisation for Multi-Arm Bandits (DBO-MAB), an algorithm that dynamically adapts hyperparameters of multi-arm bandit algorithms using incremental Bayesian optimisation. DBO-MAB addresses the challenge of tuning hyperparameters in uncertain and dynamic environments, particularly for applications like web server optimisation. It uses a dynamic range adjustment approach based on the interquartile mean (IQM) of observed rewards to focus the search space on promising regions. Evaluated across diverse static and dynamic environments, DBO-MAB outperforms state-of-the-art algorithms such as Bootstrapped UCB and f-Discounted-Sliding-Window Thompson Sampling, reducing average response time by ≈ 55%.