McEnery, A. M. and Xiao, R. Z. (2004) The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study. In: LREC 2004, 2004-05-24 - 2004-05-30, Lisbon, Portugal.Full text not available from this repository.
This paper presents the newly released Lancaster Corpus of Mandarin Chinese (LCMC), a Chinese match for the FLOB and Frown corpora of British and American English. LCMC is a one-million-word balanced corpus of written Mandarin Chinese. The corpus contains five hundred 2,000-word samples of written Chinese texts sampled from fifteen text categories published in Mainland China around 1991, totalling one million words. LCMC is XML-compliant and conforms to CES, with each document containing a corpus header giving general information about the corpus and a body of text. The corpus is segmented and POS tagged with a tagging precision rate of over 98%. The corpus is a useful resource for research into modern Chinese as well as the cross-linguistic contrast between English and Chinese.
|Item Type:||Conference or Workshop Item (Paper)|
|Journal or Publication Title:||LREC 2004|
|Uncontrolled Keywords:||corpus, Chinese, contrastive study|
|Subjects:||?? CORPUSCHINESECONTRASTIVE STUDY ??|
|Departments:||Faculty of Arts & Social Sciences > Linguistics & English Language|
Faculty of Arts & Social Sciences
|Deposited By:||Dr Richard Xiao|
|Deposited On:||17 Jun 2005|
|Last Modified:||17 Feb 2012 12:22|
Actions (login required)