‘Until it bores me’: Learning Progress Maximization as the Reward Mechanism to solve the Exploration-Exploitation Dilemma in Infants

Altmann, E. C. and Bazhydai, Marina and Westermann, Gert (2021) ‘Until it bores me’: Learning Progress Maximization as the Reward Mechanism to solve the Exploration-Exploitation Dilemma in Infants. In: Development in Motion Conference 2021, 2021-06-222021-06-24, Online.

[img]
Image (Altmann, Bazhydai, Westermann (2021) DevMoCon Poster)
Altmann_Bazhydai_Westermann_2021_DevMoCon_Poster.png - Published Version

Download (1MB)

Abstract

Infants explore the world to learn about it based on their intrinsically motivated curiosity. However, the mechanisms underlying such exploratory behavior are largely unknown. We propose a new theory in which active learners explore randomly until encountering a familiar entity (e.g. a second stimulus from a previously encountered category) because here, learning is suddenly maximized. Such a category will then be exploited as long as the learning progress is above an individually varying ‘boredom threshold’; Above this threshold, learning is rewarding – positively reinforcing exploitation. Below this threshold, the learning progress is too small to be rewarding, and they will return to random exploration. The threshold itself can be lowered through inhibition, allowing sustained attention despite smaller learning progress. Here, we will first test this theory in a gaze-contingent eye-tracking task: 10-month-old infants will be introduced to two novel stimulus categories with 30 exemplars each (Fribbles, TarrLab). Two identical “houses” will be presented on a computer screen, and a new exemplar from either category will be revealed when the infant fixates on the corresponding house. This design will enable us to distinguish between exploration – switching from one category to the other – and exploitation – consecutively triggering exemplars from the same category. In follow-on studies we will test older children as well as adults, who will be able to trigger exemplar presentations via key presses. Across age groups, we will measure the number, speed, and sequence of trigger-events, as well as the switches between categories. We hypothesize that if a category was triggered twice it is more likely to be triggered again; the first two triggers establish familiarity and allow for learning which will be rewarding, reinforcing further exploitation. While the length of ‘exploitation-runs’ may differ between participants (representing varying boredom thresholds), constant switching between categories is unlikely as it inhibits maximized learning.

Item Type:
Contribution to Conference (Poster)
Journal or Publication Title:
Development in Motion Conference 2021
ID Code:
156675
Deposited By:
Deposited On:
30 Jun 2021 16:00
Refereed?:
Yes
Published?:
Published
Last Modified:
22 Oct 2021 00:07