In All Likelihoods : Robust Selection of Pseudo-Labeled Data

Rodemann, Julian and Jansen, Christoph and Schollmeyer, Georg and Augustin, Thomas (2023) In All Likelihoods : Robust Selection of Pseudo-Labeled Data. In: Proceedings of the Thirteenth International Symposium on Imprecise Probabilities: Theories and Applications (ISIPTA '23) :. PMLR . PMLR, pp. 412-425.

Full text not available from this repository.

Abstract

Self-training is a simple yet effective method within semi-supervised learning. Self-training’s rationale is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we render PLS more robust towards the involved modeling assumptions. To this end, we treat PLS as a decision problem, which allows us to introduce a generalized utility function. The idea is to select pseudo-labeled data that maximize a multi-objective utility function. We demonstrate that the latter can be constructed to account for different sources of uncertainty and explore three examples: model selection, accumulation of errors and covariate shift. In the absence of second-order information on such uncertainties, we furthermore consider the generic approach of the generalized Bayesian α-cut updating rule for credal sets. We spotlight the application of three of our robust extensions on both simulated and three real-world data sets. In a benchmarking study, we compare these extensions to traditional PLS methods. Results suggest that robustness with regard to model choice can lead to substantial accuracy gains.

Item Type:

Contribution in Book/Report/Proceedings

Uncontrolled Keywords:

Research Output Funding/no_not_funded

Subjects:

?? no - not fundedno ??

ID Code:

221176

Deposited By:

ep_importer_pure

Deposited On:

10 Jun 2024 12:35

Refereed?:

Yes

Published?:

Published

Last Modified:

06 Jun 2025 11:42

URI:

https://eprints.lancs.ac.uk/id/eprint/221176