McEnery, A. M. and Xiao, R. Z. (2005) Character encoding in corpus construction. In: Developing Linguistic Corpora : A Guide to Good Practice :. AHDS, Oxford, UK.
Abstract
This chapter first briefly reviews the history of character encoding. Following from this is a discussion of standard and non-standard native encoding systems, and an evaluation of the efforts to unify these character codes. Then we move on to discuss Unicode as well as various Unicode Transformation Formats (UTFs). As a conclusion, we recommend that Unicode (UTF-8, to be precise) be used in corpus construction.
Item Type:
      
        Contribution in Book/Report/Proceedings
        
        
        
      
    Additional Information:
          Standards Documentation
        Uncontrolled Keywords:
          /dk/atira/pure/researchoutput/libraryofcongress/p1
        Subjects:
          ?? character encodingunicodecorpus creationp philology. linguistics ??
        Departments:
          
        ID Code:
          60
        Deposited By:
          
        Deposited On:
          17 Jun 2005
        Refereed?:
          No
        Published?:
          Published
        Last Modified:
          19 Sep 2025 23:57
         Altmetric
 Altmetric Altmetric
 Altmetric