McEnery, A. M. and Xiao, R. Z. (2004) The Lancaster Corpus of Mandarin Chinese:A corpus for monolingual and contrastive language study. In: LREC. UNSPECIFIED.
Full text not available from this repository.Abstract
This paper presents the newly released Lancaster Corpus of Mandarin Chinese (LCMC), a Chinese match for the FLOB and Frown corpora of British and American English. LCMC is a one-million-word balanced corpus of written Mandarin Chinese. The corpus contains five hundred 2,000-word samples of written Chinese texts sampled from fifteen text categories published in Mainland China around 1991, totalling one million words. LCMC is XML-compliant and conforms to CES, with each document containing a corpus header giving general information about the corpus and a body of text. The corpus is segmented and POS tagged with a tagging precision rate of over 98%. The corpus is a useful resource for research into modern Chinese as well as the cross-linguistic contrast between English and Chinese.
| Item Type: | Contribution in Book/Report/Proceedings |
|---|---|
| Uncontrolled Keywords: | corpus ; Chinese ; contrastive study |
| Subjects: | P Language and Literature > P Philology. Linguistics |
| Departments: | Faculty of Arts & Social Sciences > Linguistics & English Language |
| ID Code: | 49554 |
| Deposited By: | ep_importer_pure |
| Deposited On: | 25 Aug 2011 14:56 |
| Refereed?: | No |
| Published?: | Published |
| Last Modified: | 26 Jul 2012 22:58 |
| Identification Number: | |
| URI: | http://eprints.lancs.ac.uk/id/eprint/49554 |
Actions (login required)
| View Item |

