Guidelines for normalising early modern English corpora:decisions and justifications

Archer, Dawn and Kytö, Merja and Baron, Alistair and Rayson, Paul Edward (2015) Guidelines for normalising early modern English corpora:decisions and justifications. ICAME Journal, 39 (1). pp. 5-24. ISSN 1502-5462

[img]
Preview
PDF (icame-2015-0001)
icame_2015_0001.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (482kB)

Abstract

Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.

Item Type: Journal Article
Journal or Publication Title: ICAME Journal
Additional Information: © 2015. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. (CC BY-NC-ND 3.0)
Departments: Faculty of Science and Technology > School of Computing & Communications
ID Code: 78019
Deposited By: ep_importer_pure
Deposited On: 28 Jan 2016 10:18
Refereed?: Yes
Published?: Published
Last Modified: 25 Feb 2020 02:31
URI: https://eprints.lancs.ac.uk/id/eprint/78019

Actions (login required)

View Item View Item