Almujaiwel, Sultan and Premasiri, Damith and Ranasinghe, Tharindu and El-Haj, Mo and Mitkov, Ruslan (2025) Complex Concept-Based Readability Estimation from Arabic Curriculum. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 24 (11): 128. pp. 1-21. ISSN 2375-4699
Full text not available from this repository.Abstract
This article presents an approach to readability estimation that focuses on conceptual rather than linguistic complexity, using the extensive SaudiTextBooks textbooks. We introduce DARES 2.0 , an enhanced concept-based readability training dataset designed to estimate the readability of Saudi educational texts. Building on DARES 1.0, DARES 2.0 extends the scope of conceptual complexity by replacing repetitive concepts and manually revising the input features with unique terms and their surrounding contexts from the SaudiTextBooks, spanning grades 1 to 12. The refined DARES 2.0 is employed to fine-tune pre-trained transformer models, including XLM-R Base, mBERT, AraELECTRA, AraBERTv2, and CAMeLBERTmix. The findings suggest that both the dataset and experimental setup require further development to ensure a larger, higher-quality dataset and to support more extensive fine-tuning experiments, in addition to exploring transfer learning from other languages and enhancing the diversity and richness of Arabic concepts. These developments pave the way for further advancements in concept-based readability estimation in educational contexts in future work.