More than the Sum of Their Words : Generating and Contrasting Large Linguistic Networks

Schmück, Hanna and Brezina, Vaclav (2025) More than the Sum of Their Words : Generating and Contrasting Large Linguistic Networks. PhD thesis, Lancaster University.

[thumbnail of 2024SchmueckPhD]
Text (2024SchmueckPhD)
2024schmueckphd.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial.

Download (11MB)

Abstract

“You shall know a word by the company it keeps” (Firth, 1957), and, as this thesis attempts to demonstrate, also by how the company is kept. The primary motivation of this thesis is establishing a connection of current psycholinguistic evidence, i.e. experimental and theoretical findings regarding the structural design of the mental lexicon, to empirical findings from large-scale corpus-based collocation networks. One contribution of this work therefore lies in the triangulation (Noble & Heale, 2019, p. 67) of corpus linguistics, psycholinguistics and graph theory: bridging gaps between these approaches to language and developing new viewpoints on the data might help overcome or seriously limit fundamental biases and present a more well-founded manner of interpreting results from collocation analyses with regards to their capabilities of portraying mental processes and acting as a proxy for how readers/speakers perceive certain concepts. In order to address the existing research gap, a large-scale analysis of computationally generated corpus-based collocation networks based on the BNC 2014 and psycholinguistic word association networks based on the word association database SWOW-UK is carried out here. Word associations have been chosen as the basis for the psycholinguistic network since they portray the perceived relation between concepts via discrete linguistic units (Kang, 2018, p. 87), similarly to collocations. From the theoretical perspective, in addition to new insights into current open questions regarding the structure and organisation of collocational knowledge, this approach also provides new research prompts for investigating the internal structure of the mental lexicon (ML) further. Another key contribution of this thesis is the development of a full pipeline for large scale collocation network generation that can be used by other researchers, including a thorough explanation of graph theoretical concepts to a linguistic audience paired with an in-depth analysis of the suitability of existing approaches to Association Measure calculation to ascribe the identified collocations a perceptual reality. The findings reveal that combinations of association measures (corpus linguistic approach), particularly log Dice, LL, and χ2, provide the best approximation of word association networks (psycholinguistic evidence), though systematic discrepancies remain. Additionally, word association networks are more tightly knit and generally strongly connected when compared to the more specialised and fragmented nature of collocation networks.

Item Type:
Thesis (PhD)
Uncontrolled Keywords:
Research Output Funding/yes_externally_funded
Subjects:
?? graph theorycorpus linguisticspsycholinguisticscomputational linguisticscollocation networkscollocationword associationyes - externally fundednolanguage and linguisticscomputer science (miscellaneous) ??
ID Code:
228937
Deposited By:
Deposited On:
24 Apr 2025 12:45
Refereed?:
No
Published?:
Published
Last Modified:
24 Apr 2025 12:45