Computational Analysis of Historical Narratives through Large Language Models

Nanomi Arachchige, Isuri and Mitkov, Ruslan and Rayson, Paul (2026) Computational Analysis of Historical Narratives through Large Language Models. PhD thesis, Lancaster University.

[thumbnail of 2026isuriphd]
Text (2026isuriphd)
2026isuriphd.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial.

Download (9MB)

Abstract

Over the last decade, Large Language Models (LLMs) have been pushing the boundaries of artificial intelligence in creativity, language generation, and specialised problem-solving. Their ability to understand and generate human-like text makes them particularly useful for different Natural Language Processing (NLP) tasks such as information extraction, text classification, summarisation, and question answering. With these advancements, the integration of LLMs into digital humanities for sensitive, domain-specific contexts remains largely unexplored. This is mainly due to the unstructured, context-dependent nature of historical texts such as oral Holocaust narratives, which present unique linguistic and ethical challenges. This study focuses on developing domain-specific NLP techniques for processing oral narratives within the broader field of digital humanities. The first phase of the thesis introduces a domain-specific framework for extracting named entities and relationships from historically and culturally sensitive narratives by employing state-of-the-art information extraction techniques. Second, we propose a novel, lightweight, and reproducible adapter-based architecture for information retrieval from oral narratives, which integrates advanced retrieval-augmented generation (RAG) techniques. Third, we construct a knowledge graph to systematically capture and analyse common patterns and insights across the narratives. Given the rapid emergence of LLMs and their increasing application in sensitive historical domains such as Holocaust research, this study critically analyses the ethical challenges associated with using LLMs in historically sensitive research. Overall, the research is designed with a flexible and modular architecture, enabling reproducibility and extension to similar historical documents that have yet to be digitised in archival collections.

Item Type:
Thesis (PhD)
Uncontrolled Keywords:
Research Output Funding/no_not_funded
Subjects:
?? no - not funded ??
ID Code:
237723
Deposited By:
Deposited On:
04 Jun 2026 16:25
Refereed?:
No
Published?:
Published
Last Modified:
16 Jun 2026 23:30