Hardie, Andrew and Lohani, Ram Raj and Regmi, Bhim N. and Yadava, Yogendra P. (2009) A morphosyntactic categorisation scheme for the automated analysis of Nepali. In: Annual Review of South Asian Languages and Linguistics 2009 :. Trends in linguistics. Studies and monographs . Mouton de Gruyter, Berlin, pp. 171-196. ISBN 9783110225594
Full text not available from this repository.Abstract
This paper describes the linguistic rationale underlying the part-of-speech tagset used for tagging the Nepali National Corpus. In particular, three conceptually complex areas are discussed in detail. In the first place, the nature of Nepali postpositions is explored, and the approach that the tagset takes to them – in which postpositions are tokenised separately to the nouns or other words to which they are attached – is justified. A similar exploration of gender marking, however, supports an opposite approach, where gender is treated as a feature of the word on which it is marked, and indicated in that word’s tag. It is further argued that an inconsistent treatment of gender on nouns, as opposed to adjectives and other words that agree with nouns, is justified for Nepali. Thirdly, the very great complexity of Nepali verb inflection (some of it created by very productive compounding) is shown to necessitate the use, within the tagset, of a simplified model of the Nepali verb. A brief analysis of the similarities and differences between this tagset and part-of-speech annotation schemes for some closely related is undertaken. Finally, the implementation of the tagset in an automated tagging system is summarised and some directions for future work outlined.