Creating a Hybrid Rule and Neural Network Based Semantic Tagger using Silver Standard Data : the PyMUSAS framework for Multilingual Semantic Annotation

Moore, Andrew and Rayson, Paul and Knight, Dawn and Czerniak, Tim and Archer, Dawn and Lal, Daisy and Ó Donnchadha, Gearóid and Ó Meachair, Mícheál and Piao, Scott and Uí Dhonnchadha, Elaine and Vuorinen, Johanna and Yabo, Yan and Yang, Xiaobin (2026) Creating a Hybrid Rule and Neural Network Based Semantic Tagger using Silver Standard Data : the PyMUSAS framework for Multilingual Semantic Annotation. In: Fifteenth biennial Language Resources and Evaluation Conference, 2026-05-11 - 2026-05-16, Palau de Congressos de Palma. (In Press)

[thumbnail of Neural_Tagger_LREC_2026_accepted]
Text (Neural_Tagger_LREC_2026_accepted)
Neural_Tagger_LREC_2026_accepted.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (1MB)

Abstract

Word Sense Disambiguation (WSD) has been widely evaluated using the semantic frameworks of WordNet (Maru et al., 2022), BabelNet (Pasini et al., 2021), and the Oxford Dictionary of English (Gadetsky et al., 2018; Chang et al., 2018). However, for the UCREL Semantic Analysis System (USAS) framework, no open extensive evaluation has been performed beyond lexical coverage or single language evaluation. In this work, we perform the largest semantic tagging evaluation of the rule based system that uses the lexical resources in the USAS framework covering five different languages using four existing datasets and one novel Chinese dataset. We create a new silver labelled English dataset, to overcome the lack of manually tagged training data, that we train and evaluate various mono and multilingual neural models in both mono and cross-lingual evaluation setups with comparisons to their rule based counterparts, and show how a rule based system can be enhanced with a neural network model. The resulting neural network models, including the data they were trained on, the Chinese evaluation dataset, and all of the code have been released as open resources.

Item Type:
Contribution to Conference (Paper)
Journal or Publication Title:
Fifteenth biennial Language Resources and Evaluation Conference
Uncontrolled Keywords:
Research Output Funding/yes_externally_funded
Subjects:
?? semantic tagginglexiconsmultilingual annotationmachine learningyes - externally funded ??
ID Code:
236364
Deposited By:
Deposited On:
01 Apr 2026 13:40
Refereed?:
Yes
Published?:
In Press
Last Modified:
01 Apr 2026 13:40