Zmandar, Nadhem and Rayson, Paul and El-Haj, Mo (2024) Multilingual Financial Text Summarisation. PhD thesis, Lancaster University.
Abstract
With the increasing growth in the number of public firms worldwide, the volume of financial disclosures and financial texts in different languages and forms is increasing sharply; therefore, the study of Natural Language Processing (NLP) methods that automatically summarise content has grown rapidly into a major research area. Financial communication is a vital component of market transparency and constitutes a key element for investor’s confidence and the credibility and quality of a financial marketplace. Public firms are obliged to communicate regularly with their shareholders. The financial communication policy of a listed company reflects the regulatory constraints related to going public as well as the willingness of executives to regularly communicate with financial market players in a transparent, professional and responsive fashion. Financial narratives are used by firms to communicate with their stakeholders (investors, shareholders, customers, employees, financial analysts, regulators, lenders, rating agencies, and suppliers). Using financial communications, stakeholders could assess how the company can create value. This thesis explores the financial text summarisation task from different angles. The goal is the development of general and scalable algorithms that can jointly improve the state of the art (SOTA) of the tasks of financial text summarisation and compare different methodologies that combine quantitative and qualitative performance. The ability to extract key information from financial documents and generate summaries in multiple languages is crucial for financial professionals and organisations. However, current text summarisation methods cannot accurately identify and extract relevant information when applied to financial texts due to the domain-specific nature of the language, the differing structures of financial documents, the complexity of financial concepts and the lack of well-developed language resources and models. This study investigates how to adapt different transformer language models (general and domain-specific) and alternative unsupervised techniques to generate a coherent summary, then presents different ways to measure the performance by combining automatic and human evaluations, and finally it proposes several adversarial attacks and statistical methods to test the robustness of the results. The models in this thesis provide state-of-the-art performance on the multilingual financial summarisation task. This research contributes to the field of NLP by demonstrating our approach’s effectiveness in multilingual financial text summarisation and provides valuable insights for developing multilingual text summarisation systems. This thesis targets three languages: Arabic, French and English. It targets three financial reportingframeworks and three financial market cultures. It deals with three types of documents: long unstructured documents (English reports), medium structured reports (French reports) and financial newswires (Arabic). In addition, this thesis combines several novel contribution types: dataset creation, ontology labelling, benchmarks for financial summarisation systems, monitoring of the NLP training process and pretraining of novel language models to fill the lack of domain and language-specific language models. This thesis also presents a novel approach to automatically summarising long financial text in multiple languages. Using advanced pretrained transformers, our system can accurately identify and extract essential information from financial documents and generate extractive and abstractive summaries in various languages.