Lancaster EPrints

Character encoding in corpus construction.

McEnery, A. M. and Xiao, R. Z. (2005) Character encoding in corpus construction. In: Developing Linguistic Corpora : A Guide to Good Practice. AHDS, Oxford, UK.

[img]
Preview
PDF (character_encoding.pdf)
Download (124Kb) | Preview

    Abstract

    This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.

    Item Type: Contribution in Book/Report/Proceedings
    Additional Information: Standards Documentation
    Uncontrolled Keywords: character encoding ; Unicode ; corpus creation
    Subjects: P Language and Literature > P Philology. Linguistics
    Departments: Faculty of Arts & Social Sciences > Linguistics & English Language
    ID Code: 60
    Deposited By: Dr Richard Xiao
    Deposited On: 17 Jun 2005
    Refereed?: No
    Published?: Published
    Last Modified: 26 Jul 2012 22:08
    Identification Number:
    URI: http://eprints.lancs.ac.uk/id/eprint/60

    Actions (login required)

    View Item