ALEXSIS-PT : A New Resource for Portuguese Lexical Simplification

North, Kai and Zampieri, Marcos and Ranasinghe, Tharindu (2022) ALEXSIS-PT : A New Resource for Portuguese Lexical Simplification. In: Proceedings of the 29th International Conference on Computational Linguistics :. COLING Proceedings . International Committee on Computational Linguistics, KOR, pp. 6057-6062.

[thumbnail of 2022.coling-1.529]
Text (2022.coling-1.529)
2022.coling-1.529.pdf - Published Version
Available under License Creative Commons Attribution.

Download (219kB)

Abstract

Lexical simplification (LS) is the task of automatically replacing complex words for easier ones making texts more accessible to various target populations (e.g. individuals with low literacy, individuals with learning disabilities, second language learners). To train and test models, LS systems usually require corpora that feature complex words in context along with their candidate substitutions. To continue improving the performance of LS systems we introduce ALEXSISPT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605 candidate substitutions for 387 complex words. ALEXSIS-PT has been compiled following the ALEXSIS protocol for Spanish opening exciting new avenues for crosslingual models. ALEXSIS-PT is the first LS multi-candidate dataset that contains Brazilian newspaper articles. We evaluated four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, and BERTimbau. BERTimbau achieved the highest performance across all evaluation metrics.

Item Type:
Contribution in Book/Report/Proceedings
ID Code:
221667
Deposited By:
Deposited On:
12 Nov 2024 10:05
Refereed?:
Yes
Published?:
Published
Last Modified:
12 Nov 2024 10:05