Construction of approximation spaces for reinforcement learning

Böhmer, Wendelin and Grunewalder, Steffen and Shen, Yun and Musial, Marek and Obermayer, Klaus (2013) Construction of approximation spaces for reinforcement learning. Journal of Machine Learning Research, 14. pp. 2067-2118. ISSN 1532-4435

Full text not available from this repository.

Abstract

Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modification to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance of approximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modified SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance.

Item Type:

Journal Article

Journal or Publication Title:

Journal of Machine Learning Research

Uncontrolled Keywords:

/dk/atira/pure/subjectarea/asjc/1700/1702

Subjects:

?? reinforcement learningdiffusion distanceproto value functionsslow feature analysisleast-squares policy iterationvisual robot navigationartificial intelligencesoftwarestatistics and probabilitycontrol and systems engineering ??

Departments:

Faculty of Science and Technology > Mathematics and Statistics

ID Code:

76773

Deposited By:

ep_importer_pure

Deposited On:

24 Nov 2015 09:04

Refereed?:

Yes

Published?:

Published

Last Modified:

15 Jul 2024 15:36

URI:

https://eprints.lancs.ac.uk/id/eprint/76773