Using the OED Quotations Database as a Corpus - a Linguistic Appraisal.

Hoffmann, Sebastian (2004) Using the OED Quotations Database as a Corpus - a Linguistic Appraisal. ICAME Journal, 28. pp. 17-30.

Full text not available from this repository.


Over the past decades, the number of historical corpora available has steadily grown. Perhaps the best-known and most widely used is the Helsinki Corpus. (See Kytö 1996[1991] for a description of the corpus and Rissanen et al. 1993 for a range of possible applications.) Other historical corpora include ARCHER (A Representative Corpus of Historical English Registers), the Corpus of Early English Correspondence (CEEC), the Innsbruck Computer Archive of Machine-Readable English Texts (ICAMET), the Lampeter Corpus of Early Modern English Tracts, and the Zurich English Newspaper Corpus (ZEN), to name just a few (cf. Biber et al. 1994; Fries 1994; Schmied 1994; Keränen 1998; Markus 1999a). However, given their relatively small size, these historical corpora are unfortunately only of limited value for the study of less frequent features of the English language. The Helsinki Corpus, for instance, spans almost a thousand years (ca. 750 to 1700) but contains only 1.57 million words. Even for the period of Late Modern English, suitable corpus data is not in great abundance. For example, although ARCHER covers a smaller time-span from 1650 to 1990 and offers detailed categorization by register, its overall size of less than two million words still results in many of the same limitations as the Helsinki Corpus.1 For the study of less frequent features, the researcher therefore has to make use of alternative – albeit potentially less reliable – sources of data. One of the options available is the Oxford English Dictionary (OED) with its large quotations database, covering more than a thousand years of English usage. This database, which is considerably larger than any of the historical corpora mentioned above, has been successfully employed to trace both lexical and grammatical changes over time (e.g. Jucker 1994; Fischer 1997; Markus 1999b and 2001; Mair 2001). However, to my knowledge there is no detailed appraisal of the OED quotations database as a tool for linguistic research. The present paper is intended to fill this gap.

Item Type:
Journal Article
Journal or Publication Title:
ICAME Journal
Uncontrolled Keywords:
ID Code:
Deposited By:
Deposited On:
30 Apr 2008 10:56
Last Modified:
11 Sep 2023 14:05