Temporal-difference Learning with Sampling Baseline for Image Captioning

Chen, Hui and Ding, Guiguang and Zhao, Sicheng and Han, Jungong (2018) Temporal-difference Learning with Sampling Baseline for Image Captioning. In: 32nd AAAI Conference on Artificial Intelligence 2018 :. AAAI, Palo Alto, pp. 6706-6713. ISBN 9781577358008

Preview

PDF (2018-4)
2018_4.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.
Download (1MB)

Abstract

The existing methods for image captioning usually train the language model under the cross entropy loss, which results in the exposure bias and inconsistency of evaluation metric. Recent research has shown these two issues can be well addressed by policy gradient method in reinforcement learning domain attributable to its unique capability of directly optimizing the discrete and non-differentiable evaluation metric. In this paper, we utilize reinforcement learning method to train the image captioning model. Specifically, we train our image captioning model to maximize the overall reward of the sentences by adopting the temporal-difference (TD) learning method, which takes the correlation between temporally successive actions into account. In this way, we assign different values to different words in one sampled sentence by a discounted coefficient when back-propagating the gradient with the REINFORCE algorithm, enabling the correlation between actions to be learned. Besides, instead of estimating a "baseline" to normalize the rewards with another network, we utilize the reward of another Monte-Carlo sample as the "baseline" to avoid high variance. We show that our proposed method can improve the quality of generated captions and outperforms the state-of-the-art methods on the benchmark dataset MS COCO in terms of seven evaluation metrics.

Item Type:

Contribution in Book/Report/Proceedings

Additional Information:

Subjects:

?? image captioningreinforcement learninglstm ??

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

123576

Deposited By:

ep_importer_pure

Deposited On:

22 Feb 2018 16:40

Refereed?:

Yes

Published?:

Published

Last Modified:

30 Jun 2026 20:56

URI:

https://eprints.lancs.ac.uk/id/eprint/123576

Altmetric