Language change and evolution in Online Social Networks

Kershaw, Daniel (2018) Language change and evolution in Online Social Networks. PhD thesis, UNSPECIFIED.

PDF (2018kershawphd)
2018kershawphd.pdf - Published Version
Available under License Creative Commons Attribution.

Download (5MB)


Language is in constant flux, whether through the creation of new terms or the changing meanings of existing words. The process by which language change happens is through complex reinforcing interactions between individuals and the social structures in which they exist. There has been much research into language change and evolution, though this has involved manual processes that are both time consuming and costly. However, with the growth in popularity of osn, for the first time, researchers have access to fine-grained records of language and user interactions that not only contain data on the creation of these language innovations but also reveal the inter-user and inter-community dynamics that influence their adoptions and rejections. Having access to these osn datasets means that language change and evolution can now be assessed and modelled through the application of computational and machine-learning-based methods. Therefore, this thesis looks at how one can detect and predict language change in osn, as well as the factors that language change depends on. The answer to this over-arching question lies in three core components: first, detecting the innovations; second, modelling the individual user adoption process; and third, looking at the collective adoption across a network of individuals. In the first question, we operationalise traditional language acceptance heuristics (used to detect the emergence of new words) into three classes of computation time-series measures computing the variation in frequency, form and/or meaning. The grounded methods are applied to two osn, with results demonstrating the ability to detect language change across both networks. By additionally applying the methods to communities within each network, e.g. geographical regions, on Twitter and Subreddits in Reddit, the results indicate that language variation and change can be dependent on the community memberships. The second question in this thesis focuses on the process of users adopting language innovations in relation to other users with whom they are in contact. By modelling influence between users as a function of past innovation cascades, we compute a global activation threshold at which users adopt new terms dependent on exposure to them from their neighbours. Additionally, by testing the user interaction networks through random shuffles, we show that the time at which a user adopts a term is dependent on the local structure; however, a large part of the influence comes from sources external to the observed osn. The final question looks at how the speakers of a language are embedded in social networks, and how the networks' resulting structures and dynamics influence language usage and adoption patterns. We show that language innovations diffuse across a network in a predictable manner, which can be modelled using structural, grammatical and temporal measures, using a logistic regression model to predict the vitality of the diffusion. With regard to network structure, we show how innovations that manifest across structural holes and weak ties diffuse deeper across the given network. Beyond network influence, our results demonstrate that the grammatical context through which innovations emerge also play an essential role in diffusion dynamics - this indicates that the adoption of new words is enabled by a complex interplay of both network and linguistic factors. The three questions are used to answer the over-arching question, showing that one can, indeed, model language change and forecast user and community adoption of language innovations. Additionally, we also show the ability to apply grounded models and methods and apply them within a scalable computational framework. However, it is a challenging process that is heavily influenced by the underlying processes that are not recorded within the data from the osns.

Item Type:
Thesis (PhD)
Uncontrolled Keywords:
ID Code:
Deposited By:
Deposited On:
18 Dec 2018 11:01
Last Modified:
18 Sep 2020 06:52