The written British National Corpus 2014 : design, compilation and analysis

Hawtin, Abigail and McEnery, Tony (2019) The written British National Corpus 2014 : design, compilation and analysis. PhD thesis, Lancaster University.

[thumbnail of 2018hawtinphd]
Text (2018hawtinphd)
2018hawtinphd.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB)

Abstract

The ESRC-funded Centre for Corpus Approaches to Social Science at Lancaster University (CASS) and the English Language Teaching Group at Cambridge University Press (CUP) have collaborated to compile a new, publicly accessible corpus of contemporary Written British English, known as the Written British National Corpus 2014 (Written BNC2014). The Written BNC2014 is an updated version of the Written British National Corpus (Written BNC1994) which was created in the 1990s. The Written BNC1994 is often used as a proxy for present day British English, so the Written BNC2014 has been created in order to allow for both comparisons between the two corpora, and also to allow for research on British English to be carried out using a state-of-the-art contemporary data-set. The Written BNC2014 contains approximately 90 million words of written British English, published between 2010-2018, from a wide variety of genres. The corpus will be publicly released in 2019. This thesis presents a detailed account of the design and compilation of the corpus, focusing on the very many challenges which needed to be overcome in order to create the corpus, along with the solutions to these challenges which were devised. It also demonstrates the utility of the corpus, by presenting a diachronic comparison of academic writing in the 1990s and 2010s, with a focus on the theory of colloquialisation. This thesis, whilst not a Written BNC2014 user-guide, presents all of the decisions made in the design and creation of the corpus, and as such, will help to make the corpus as useful to as many people, for as many purposes, as possible.

Item Type:
Thesis (PhD)
Subjects:
?? corpusbritish national corpusdesigncompilationcolloquialisationrepresentativeness ??
ID Code:
134977
Deposited By:
Deposited On:
02 Jul 2019 08:20
Refereed?:
No
Published?:
Unpublished
Last Modified:
15 Nov 2023 00:30