Bandit learning in concave N-player games

Bravo, Mario and Leslie, David Stuart and Mertikopoulos, Panayotis (2018) Bandit learning in concave N-player games. In: NeurIPS Proceedings :. UNSPECIFIED, CAN, pp. 5661-5671. ISBN 9781-5108-84472

[thumbnail of BravoLeslieMertikopoulosNIPS2018.pdf]

Preview

PDF (BravoLeslieMertikopoulosNIPS2018.pdf)
Main.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.
Download (315kB)

Abstract

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

Item Type:

Contribution in Book/Report/Proceedings

Departments:

Faculty of Science and Technology > Mathematics and Statistics

ID Code:

128155

Deposited By:

ep_importer_pure

Deposited On:

12 Oct 2018 14:16

Refereed?:

Yes

Published?:

Published

Last Modified:

10 Dec 2025 19:22

URI:

https://eprints.lancs.ac.uk/id/eprint/128155

Altmetric