Shardlow, Matthew and Alva-Manchego, Fernando and Batista-Navarro, Riza Theresa and Bott, Stefan and Calderon Ramirez, Saul and Cardon, Rémi and François, Thomas and Hayakawa, Akio and Horbach, Andrea and Hülsing, Anna and Ide, Yusuke and Imperia, Joseph Marvin and Nohej, Adam and North, Kai and Occhipinti, Laura and Rojas, Nelson Peréz and Raihan, Md Nishat and Ranasinghe, Tharindu and Salazar, Martin Solis and Zampieri, Marcos and Saggion, Horacio (2024) An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework. In: Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024 :. ELRA and ICCL, ITA, pp. 38-46. ISBN 9782493814340
2024.readi-1.4.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial.
Download (199kB)
Abstract
We present preliminary findings on the MultiLS dataset, developed in support of the 2024 Multilingual Lexical Simplification Pipeline (MLSP) Shared Task. This dataset currently comprises of 300 instances of lexical complexity prediction and lexical simplification across 10 languages. In this paper, we (1) describe the annotation protocol in support of the contribution of future datasets and (2) present summary statistics on the existing data that we have gathered. Multilingual lexical simplification can be used to support low-ability readers to engage with otherwise difficult texts in their native, often low-resourced, languages.