The interpretation of topic models for scholarly analysis: An evaluation and critique of current practice

Gillings, Mathew and Hardie, Andrew (2022) The interpretation of topic models for scholarly analysis: An evaluation and critique of current practice. Digital Scholarship in the Humanities. ISSN 2055-7671

Full text not available from this repository.

Abstract

Topic modelling is a method of statistical data mining of a corpus of documents, popular in the digital humanities and, increasingly, in social sciences. A critical methodological issue is how ‘topics’ (groups of co-selected word types) can be interpreted in analytically meaningful terms. In the current literature, this is typically done by ‘eyeballing’; that is, cursory and largely unsystematic examination of the ‘top’ words in each algorithmically identified word group. We critically evaluate this approach in a dual analysis, comparing the ‘eyeballing’ approach with an alternative using sample close reading across the corpus. We used MALLET to extract two topic models from a test corpus: one with stopwords included, another with stopwords excluded. We then used the aforementioned methods to assign labels to these topics. The results suggest that a close-reading approach is more effective not only in level of detail but even in terms of accuracy. In particular, we found that: assigning labels via eyeballing yields incomplete or incorrect topic labels; removing stopwords drastically affects the analysis outcome; topic labelling and interpretation depend considerably on the analysts’ specialist knowledge; and differences of perspective or construal are unlikely to be captured through a topic model. We conclude that an interpretive paradigm founded in close reading may make topic modelling more appealing to humanities researchers.

Item Type:

Journal Article

Journal or Publication Title:

Digital Scholarship in the Humanities

Subjects:

?? computer science applicationslinguistics and languagelanguage and linguisticsinformation systems ??

Departments:

Faculty of Arts & Social Sciences > Linguistics & English Language

ID Code:

223346

Deposited By:

ep_importer_pure

Deposited On:

21 Aug 2024 15:05

Refereed?:

Yes

Published?:

Published

Last Modified:

13 Dec 2025 13:01

URI:

https://eprints.lancs.ac.uk/id/eprint/223346