Creating language resources for under-resourced languages:methodologies, and experiments with Arabic

El-Haj, Mahmoud and Kruschwitz, Udo and Fox, Chris (2015) Creating language resources for under-resourced languages:methodologies, and experiments with Arabic. Language Resources and Evaluation, 49 (3). pp. 549-580. ISSN 1574-020X

ELHAJ_LREV.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (1MB)


Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.

Item Type: Journal Article
Journal or Publication Title: Language Resources and Evaluation
Additional Information: The final publication is available at Springer via
Uncontrolled Keywords: /dk/atira/pure/subjectarea/asjc/3300/3309
Departments: Faculty of Science and Technology > School of Computing & Communications
ID Code: 71289
Deposited By: ep_importer_pure
Deposited On: 15 Oct 2014 14:59
Refereed?: Yes
Published?: Published
Last Modified: 28 Feb 2020 02:08

Actions (login required)

View Item View Item