The Lancaster Corpus of Mandarin Chinese : A corpus for monolingual and contrastive language study

McEnery, A. M. and Xiao, R. Z. (2004) The Lancaster Corpus of Mandarin Chinese : A corpus for monolingual and contrastive language study. In: LREC :. UNSPECIFIED, Lisbon, Portugal.

Full text not available from this repository.

Abstract

This paper presents the newly released Lancaster Corpus of Mandarin Chinese (LCMC), a Chinese match for the FLOB and Frown corpora of British and American English. LCMC is a one-million-word balanced corpus of written Mandarin Chinese. The corpus contains five hundred 2,000-word samples of written Chinese texts sampled from fifteen text categories published in Mainland China around 1991, totalling one million words. LCMC is XML-compliant and conforms to CES, with each document containing a corpus header giving general information about the corpus and a body of text. The corpus is segmented and POS tagged with a tagging precision rate of over 98%. The corpus is a useful resource for research into modern Chinese as well as the cross-linguistic contrast between English and Chinese.

Item Type:

Contribution in Book/Report/Proceedings

Uncontrolled Keywords:

/dk/atira/pure/researchoutput/libraryofcongress/p1

Subjects:

?? corpuschinesecontrastive studyp philology. linguistics ??

Departments:

Faculty of Arts & Social Sciences > Linguistics & English Language

ID Code:

49554

Deposited By:

ep_importer_pure

Deposited On:

25 Aug 2011 13:56

Refereed?:

Published?:

Published

Last Modified:

30 Jun 2026 20:02

URI:

https://eprints.lancs.ac.uk/id/eprint/49554