Tracing verbal aggression over time, using the Historical Thesaurus of English

Malory, Beth (2015) Tracing verbal aggression over time, using the Historical Thesaurus of English. In: Corpus Linguistics 2015. UNSPECIFIED, GBR, p. 27.

Full text not available from this repository.


The work reported here seeks to demonstrate that automatic content analysis tools can be used effectively to trace pragmatic phenomena – including aggression – over time. In doing so, it builds upon preliminary work conducted by Archer (2014), using Wmatrix (Rayson 2008), in which Archer used six semtags – Q2.2 (speech acts), A5.1+/- (‘good/bad’ evaluation), A5.2+/- (‘true/false’ evaluation), E3- (‘angry/violent’), S1.2.4+/- (‘im/politeness’), and S7.2+/- (‘respect/lack of respect’) – to examine aggression in 200 Old Bailey trial texts covering the decade 1783-93. Having annotated the aforementioned Old Bailey dataset using Wmatrix, Archer (2014) targeted the utterances captured by the semtags listed above. This afforded her a useful “way in” to (by providing multiple potential indicators of) verbal aggression in the late eighteenth-century English courtroom. Using the ‘expand context’ facility within Wmatrix, and consulting the original trial transcripts, those incidences identified as verbally aggressive were then re-contextualised – thereby allowing Archer to disregard any that did not point to aggression in the final instance. The success of this approach allowed her to conclude that automatic content analysis tools like USAS can indeed be used to trace pragmatic phenomena (and in historical as well as modern texts). This approach was not without its teething problems, however. First, apart from those semtags which were used in conjunction with others, as portmanteau tags (e.g. Q2.2 with E3- to capture aggressive speech acts), the approach necessitated the targeting of individual semtags within a given text. The need to perform a time-intensive manual examination of the wider textual context thus made the use of large datasets prohibitive. Furthermore, there was a closely related problem concerning the tagset’s basis in The Longman Lexicon of Contemporary English (McArthur, 1981), and its consequent inability to take account of diachronic meaning change. This tended to result in the occasional mis-assignment of words which have been subject to significant semantic change over time, including politely, insult and insulted. In one instance, for example, politely was used to describe the deftness with which a thief picked his victim’s pocket! The need for manual checks to prevent such mis-assignments from affecting results further necessitated the narrowness of scope to which Archer (2014) was subject. In the extension to this work, reported here, the authors present their solutions to these problems. These solutions have at their core an innovation which allows historical datasets to be tagged semantically, using themes derived from the Historical Thesaurus of the Oxford English Dictionary (henceforth HTOED). These themes have been identified as part of an AHRC/ESRC funded project entitled “Semantic Annotation and Mark Up for Enhancing Lexical Searches”, henceforth SAMUELS11 (grant reference AH/L010062/1). The SAMUELS project has also enabled researchers from the Universities Glasgow, Lancaster, Huddersfield, Strathclyde and Central Lancashire to work together to develop a semantic annotation tool which, thanks to its advanced disambiguation facility, enables the automatic annotation of words, as well as multi-word units, in historical texts with their precise meanings. This means that pragmatic phenomena such as aggression can be more profitably sought automatically following the initial identification of what the authors have termed a ‘meaning chain’, that is, a series of HTOED-derived ‘themes’ analogous to DNA strings. This paper reports, first, on the authors’ identification of 68 potentially pertinent HTOED ‘themes’ and, second, on their investigation of the possible permutations of these themes, and the process by which they assessed which themes in which combinations best identified and captured aggression in their four datasets. The datasets used for this research are drawn from Hansard and from Historic Hansard; and are taken from periods judged to be characterized, in some way, by political/national unrest or disquiet. The datasets represent the periods 1812-14 (i.e., “The War of 1812” between Great Britain and America), 1879-81 (a period of complex wrangling between two English governments and their opposition, led by fierce rivals Disraeli and Gladstone), 1913-19 (the First World War, including its immediate build-up and aftermath), and 1978-9 (“The Winter of Discontent”).

Item Type:
Contribution in Book/Report/Proceedings
ID Code:
Deposited By:
Deposited On:
15 Jul 2022 11:05
Last Modified:
15 Jul 2022 11:05