Nanomi Arachchige, Isuri and Mitkov, Ruslan and Rayson, Paul (2026) Computational Analysis of Historical Narratives through Large Language Models. PhD thesis, Lancaster University.
2026isuriphd.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial.
Download (9MB)
Abstract
Over the last decade, Large Language Models (LLMs) have been pushing the boundaries of artificial intelligence in creativity, language generation, and specialised problem-solving. Their ability to understand and generate human-like text makes them particularly useful for different Natural Language Processing (NLP) tasks such as information extraction, text classification, summarisation, and question answering. With these advancements, the integration of LLMs into digital humanities for sensitive, domain-specific contexts remains largely unexplored. This is mainly due to the unstructured, context-dependent nature of historical texts such as oral Holocaust narratives, which present unique linguistic and ethical challenges. This study focuses on developing domain-specific NLP techniques for processing oral narratives within the broader field of digital humanities. The first phase of the thesis introduces a domain-specific framework for extracting named entities and relationships from historically and culturally sensitive narratives by employing state-of-the-art information extraction techniques. Second, we propose a novel, lightweight, and reproducible adapter-based architecture for information retrieval from oral narratives, which integrates advanced retrieval-augmented generation (RAG) techniques. Third, we construct a knowledge graph to systematically capture and analyse common patterns and insights across the narratives. Given the rapid emergence of LLMs and their increasing application in sensitive historical domains such as Holocaust research, this study critically analyses the ethical challenges associated with using LLMs in historically sensitive research. Overall, the research is designed with a flexible and modular architecture, enabling reproducibility and extension to similar historical documents that have yet to be digitised in archival collections.