Hybrid Safe Reinforcement Learning : Tackling Distribution Shift and Outliers with the Student-t’s Process

Hickman, Xavier and Lu, Yang and Prince, Daniel (2025) Hybrid Safe Reinforcement Learning : Tackling Distribution Shift and Outliers with the Student-t’s Process. Neurocomputing, 634: 129912. ISSN 0925-2312

Text (Hybrid_Safe_Reinforcement_Learning__Tackling_Distribution_Shift_and_Outliers_with_the_Student_t_s_Process_pure)
Hybrid_Safe_Reinforcement_Learning_Tackling_Distribution_Shift_and_Outliers_with_the_Student_t_s_Process_pure.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (1MB)

Abstract

Safe reinforcement learning (SRL) aims to optimize control policies that maximise long-term reward, while adhering to safety constraints. SRL has many real-world applications such as, autonomous vehicles, industrial robotics, and healthcare. Recent advances in offline reinforcement learning (RL) - where agents learn policies from static datasets without interacting with the environment - have made it a promising approach to derive safe control policies. However, offline RL faces significant challenges, such as covariate shift and outliers in the data, which can lead to suboptimal policies. Similarly, online SRL, which derives safe policies through real-time environment interaction, struggles with outliers and often relies on unrealistic regularity assumptions, limiting its practicality. This paper addresses these challenges by proposing a hybrid-offline-online approach. First, prior knowledge from offline learning guides online exploration. Then, during online learning, we replace the popular Gaussian Process (GP) with the Student-t's Process (TP) to enhance robustness to covariate shift and outliers.

Item Type:

Journal Article

Journal or Publication Title:

Neurocomputing

Uncontrolled Keywords:

Research Output Funding/yes_internally_funded

Subjects:

?? yes - internally fundedartificial intelligencecognitive neurosciencecomputer science applications ??

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

228172

Deposited By:

ep_importer_pure

Deposited On:

11 Mar 2025 11:55

Refereed?:

Yes

Published?:

Published

Last Modified:

19 Sep 2025 20:14

URI:

https://eprints.lancs.ac.uk/id/eprint/228172

Altmetric