Evaluating Large Language Models in Relationship Extraction from Unstructured Data: Empirical Study from Holocaust Testimonies

Nanomi Arachchige, Isuri and Ha, Le An and Mitkov, Ruslan and Nahar, Vinitar (2023) Evaluating Large Language Models in Relationship Extraction from Unstructured Data: Empirical Study from Holocaust Testimonies. In: International Conference Recent Advances in Natural Language Processing, RANLP 2023 :. International Conference Recent Advances in Natural Language Processing, RANLP . Association for Computational Linguistics, pp. 117-123. ISBN 9789544520922

Full text not available from this repository.

Abstract

Relationship extraction from unstructured data remains one of the most challenging tasks in the field of Natural Language Processing (NLP). The complexity of relationship extraction arises from the need to comprehend the underlying semantics, syntactic structures, and contextual dependencies within the text. Unstructured data poses challenges with diverse linguistic patterns, implicit relationships, contextual nuances, complicating accurate relationship identification and extraction. The emergence of Large Language Models (LLMs), such as GPT (Generative Pre-trained Transformer), has indeed marked a significant advancement in the field of NLP. In this work, we assess and evaluate the effectiveness of LLMs in relationship extraction in the Holocaust testimonies within the context of the Historical realm. By delving into this domain specific context, we aim to gain deeper insights into the performance and capabilities of LLMs in accurately capturing and extracting relationships within the Holocaust domain by developing a novel knowledge graph to visualise the relationships of the Holocaust. To the best of our knowledge, there is no existing study which discusses relationship extraction in Holocaust testimonies. The majority of current approaches for Information Extraction (IE) in historic documents are either manual or Optical Character Recognition (OCR) based. Moreover, in this study, we found that the Subject-Object-Verb extraction using GPT3- based relations produced more meaningful results compared to the Semantic Role labeling based triple extraction.

Item Type:

Contribution in Book/Report/Proceedings

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

222906

Deposited By:

ep_importer_pure

Deposited On:

25 Nov 2024 16:00

Refereed?:

Yes

Published?:

Published

Last Modified:

20 Sep 2025 03:33

URI:

https://eprints.lancs.ac.uk/id/eprint/222906

Altmetric