Dewhurst, Maya (2022) Classification of oral and nasalised vowels from acoustic features in Glasgow English. In: BAAP 2022, 2022-04-04 - 2022-04-08, University of York.
Full text not available from this repository.Abstract
While sociophonetic research has made great progress in understanding the phonetic and social mechanisms behind variation in segmental features, much less is known about prosodies such as nasal voice quality. Existing studies report auditory or impressionistic analyses of nasality in the UK (Laver, 1972; Trudgill, 1974), but there remains a need for fine-grained acoustic evidence in order to make claims about the degree of nasality present in particular dialects. Anecdotal evidence suggests that lower working-class and middle-class males in Glasgow speak with an overall “nasal” voice quality. This is supported by a Vocal Profile Analysis in Stuart-Smith (1999), who found some degree of nasalisation in all Glaswegian gender, age, and socioeconomic groups. As such, non-phonemic nasality has the potential to be an important characteristic of Glasgow English, possibly acting as a perceptual cue to the speaker’s social class. This paper builds on the VPA work of Stuart-Smith (1999), and adds an acoustic analysis based on the multiple parameters found to correlate with nasalisation (overview in Stevens et al. 1987). I use these acoustic measures to assess the performance of binary classifiers trained on four differing predictor combinations, with the aim of eventually establishing a method for future research into nasalisation in Glasgow English. Eight middle-class male speakers from Glasgow, aged 24-61, produced monosyllabic words containing oral (CVC) and nasalised (CVN) vowels, as well as three reading passages with increasing concentrations of nasalised vowels. Around 658 tokens per speaker were recorded, totalling 5,264 tokens for analysis. 18 acoustic correlates of nasalisation and 13 MFCCs (predictors successfully used by Carignan (2021), including A1-P0 and A1-P1 measures) were then estimated at the midpoint of each vowel and used as input variables to a Support Vector Machine, a supervised learning model trained on a binary distinction (CVC vs CVN) that then aims to classify unseen tokens in terms of the learned categories. SVM models were fitted for each speaker using four predictor combinations: A1-P0 only; all 18 acoustic correlates; MFCCs only; all 18 acoustic correlates and MFCCs. This resulted in 4 models per speaker and confusion matrices were used to assess overall classification accuracies. Additional analysis assessed the influence of individual acoustic correlates on model accuracies. The results show that a combination of all 18 acoustic correlates and 13 MFCCs garnered the highest overall classifier accuracies, but to only a slightly higher degree than the use of just acoustic correlates or MFCCs. Classification accuracies peaked at ~70%, considerably lower than those found in previous similar studies. Performance across individual speakers and the contribution of individual acoustic features to classification accuracies are discussed.