Optimality of LSTD and its relation to MC

Grunewalder, Steffen and Hochreiter, Sepp and Obermayer, Klaus (2007) Optimality of LSTD and its relation to MC. In: International Joint Conference on Neural Networks, 2007. IJCNN 2007 :. IEEE. ISBN 9781424413799

Full text not available from this repository.

Abstract

In this analytical study we compare the risk of the Monte Carlo (MC) and the least-squares TD (LSTD) estimator. We prove that for the case of acyclic Markov Reward Processes (MRPs) LSTD has minimal risk for any convex loss function in the class of unbiased estimators. When comparing the Monte Carlo estimator, which does not assume a Markov structure, and LSTD, we find that the Monte Carlo estimator is equivalent to LSTD if both estimators have the same amount of information. Theoretical results are supported by an empirical evaluation of the estimators.

Item Type:
Contribution in Book/Report/Proceedings
ID Code:
85096
Deposited By:
Deposited On:
07 Mar 2017 11:28
Refereed?:
Yes
Published?:
Published
Last Modified:
16 Jul 2024 03:41