Character encoding in corpus construction.

McEnery, A. M. and Xiao, R. Z. (2005) Character encoding in corpus construction. In: Developing Linguistic Corpora : A Guide to Good Practice :. AHDS, Oxford, UK.

Preview

PDF (character_encoding.pdf)
character_encoding.pdf
Download (127kB)

Abstract

This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.

Item Type:

Contribution in Book/Report/Proceedings

Additional Information:

Standards Documentation

Uncontrolled Keywords:

/dk/atira/pure/researchoutput/libraryofcongress/p1

Subjects:

?? character encodingunicodecorpus creationp philology. linguistics ??

Departments:

Faculty of Arts & Social Sciences > Linguistics & English Language

ID Code:

Deposited By:

Dr Richard Xiao

Deposited On:

17 Jun 2005

Refereed?:

Published?:

Published

Last Modified:

06 Jan 2026 00:02

URI:

https://eprints.lancs.ac.uk/id/eprint/60