"i didn't spel that wrong did i. Oops": Analysis and normalisation of SMS spelling variation

Tagg, Caroline and Baron, Alistair and Rayson, Paul (2012) "i didn't spel that wrong did i. Oops": Analysis and normalisation of SMS spelling variation. Lingvisticæ Investigationes, 35 (2). pp. 367-388. ISSN 0378-4169

[thumbnail of Article (submitted version)]
Preview
PDF (Article (submitted version))
LI_paper_submitted.pdf - Accepted Version

Download (303kB)

Abstract

Spelling variation, although present in all varieties of English, is particularly prevalent in SMS text messaging. Researchers argue that spelling variants in SMSes are principled and meaningful, reflecting patterns of variation across historical and contemporary texts, and contributing to the performance of social identities. However, little attempt has yet been made to empirically validate SMS spelling patterns (for most languages, with the notable exception of French) and verify the extent to which they mirror those in other texts. This article reports on the use of the VARD2 tool to analyse and normalise the spelling variation in a corpus of over 11,000 SMSes collected in the UK between 2004 and 2007. A second tool, DICER, was used to examine the variant and equivalent mappings from the normalised corpus. The database of rules and frequencies enables comparison with other text types and the automatic normalisation of spelling in larger SMS corpora. As well as examining various spelling trends with the DICER analysis it was also possible to place the spelling variants found in the SMS corpus into functional categories; the ultimate aim being to create a taxonomy of SMS spelling. The article reports on the findings from this categorisation process, whilst also discussing the difficulty in choosing categories for some spelling variants.

Item Type:
Journal Article
Journal or Publication Title:
Lingvisticæ Investigationes
Additional Information:
© 2012 John Benjamins This article has been published in Lingvisticæ Investigationes, 35:2 2012. The publisher should be contacted for permission to re-use the material in any form.
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/3200
Subjects:
?? psychology(all)computer science(all)linguistics and languagearts and humanities(all) ??
ID Code:
60484
Deposited By:
Deposited On:
05 Dec 2012 08:39
Refereed?:
Yes
Published?:
Published
Last Modified:
19 Mar 2024 00:30