Individual Q-learning in normal form games

Leslie, David S. and Collins, E. J. (2005) Individual Q-learning in normal form games. SIAM Journal on Control and Optimization, 44 (2). pp. 495-514. ISSN 0363-0129

Full text not available from this repository.

Abstract

The single-agent multi-armed bandit problem can be solved by an agent that learns the values of each action using reinforcement learning. However, the multi-agent version of the problem, the iterated normal form game, presents a more complex challenge, since the rewards available to each agent depend on the strategies of the others. We consider the behavior of value-based learning agents in this situation, and show that such agents cannot generally play at a Nash equilibrium, although if smooth best responses are used, a Nash distribution can be reached. We introduce a particular value-based learning algorithm, which we call individual Q-learning, and use stochastic approximation to study the asymptotic behavior, showing that strategies will converge to Nash distribution almost surely in 2-player zero-sum games and 2-player partnership games. Player-dependent learning rates are then considered, and it is shown that this extension converges in some games for which many algorithms, including the basic algorithm initially considered, fail to converge.

Item Type:

Journal Article

Journal or Publication Title:

SIAM Journal on Control and Optimization

Uncontrolled Keywords:

/dk/atira/pure/subjectarea/asjc/2600/2606

Subjects:

?? control and optimizationapplied mathematics ??

Departments:

Faculty of Science and Technology > Mathematics and Statistics

ID Code:

70770

Deposited By:

ep_importer_pure

Deposited On:

12 Sep 2014 11:28

Refereed?:

Yes

Published?:

Published

Last Modified:

11 Dec 2025 00:13

URI:

https://eprints.lancs.ac.uk/id/eprint/70770