A basic language resource kit implementation for the IgboNLP project

Onyenwe, Ikechukwu E. and Hepple, Mark and Chinedu, Uchechukwu and Ezeani, Ignatius (2018) A basic language resource kit implementation for the IgboNLP project. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 17 (2): 10. ISSN 2375-4699

Full text not available from this repository.

Abstract

Igbo, an African language with around 32 million speakers worldwide, is one of the many languages having few or none of the language processing resources needed for advanced language technology applications. In this article, we describe the approach taken to creating an initial set of resources for Igbo, including an electronic text corpus, a part-of-speech (POS) tagset, and a POS-tagged subcorpus. We discuss the approach taken in gathering texts, the preprocessing of these texts, and the development of the POS tagged corpus. We also discuss some of the problems encountered during corpus and tagset development and the solutions arrived at for these problems.

Item Type:
Journal Article
Journal or Publication Title:
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/1700/1700
Subjects:
?? african languagecorporacorpus annotationhuman annotatorigbointerannotation agreementlanguage technologymorphologynatural language processing (nlp)normalizationpart-of-speech (pos) taggingsegmentationtagsettext processingtokenizationgeneral computer scienc ??
ID Code:
142313
Deposited By:
Deposited On:
12 Mar 2020 13:25
Refereed?:
Yes
Published?:
Published
Last Modified:
17 Sep 2024 09:51