Complex Concept-Based Readability Estimation from Arabic Curriculum

Almujaiwel, Sultan and Premasiri, Damith and Ranasinghe, Tharindu and El-Haj, Mo and Mitkov, Ruslan (2025) Complex Concept-Based Readability Estimation from Arabic Curriculum. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 24 (11): 128. pp. 1-21. ISSN 2375-4699

Full text not available from this repository.

Abstract

This article presents an approach to readability estimation that focuses on conceptual rather than linguistic complexity, using the extensive SaudiTextBooks textbooks. We introduce DARES 2.0 , an enhanced concept-based readability training dataset designed to estimate the readability of Saudi educational texts. Building on DARES 1.0, DARES 2.0 extends the scope of conceptual complexity by replacing repetitive concepts and manually revising the input features with unique terms and their surrounding contexts from the SaudiTextBooks, spanning grades 1 to 12. The refined DARES 2.0 is employed to fine-tune pre-trained transformer models, including XLM-R Base, mBERT, AraELECTRA, AraBERTv2, and CAMeLBERTmix. The findings suggest that both the dataset and experimental setup require further development to ensure a larger, higher-quality dataset and to support more extensive fine-tuning experiments, in addition to exploring transfer learning from other languages and enhancing the diversity and richness of Arabic concepts. These developments pave the way for further advancements in concept-based readability estimation in educational contexts in future work.

Item Type:
Journal Article
Journal or Publication Title:
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
ID Code:
234186
Deposited By:
Deposited On:
12 Dec 2025 11:30
Refereed?:
Yes
Published?:
Published
Last Modified:
13 Dec 2025 03:23