A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers

Zmandar, Nadhem and El-Haj, Mahmoud and Rayson, Paul (2023) A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers. In: Natural Language Processing and Information Systems - 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Proceedings :. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) . Springer, GBR, pp. 391-403. ISBN 9783031353192

Text (A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers)
A_Comparative_Study_of_Evaluation_Metrics_for_Long-Document_Financial_Narrative_Summarization_with_Transformers.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (221kB)

Abstract

There are more than 2,000 listed companies on the UK’s London Stock Exchange, divided into 11 sectors who are required to communicate their financial results at least twice in a single financial year. UK annual reports are very lengthy documents with around 80 pages on average. In this study, we aim to benchmark a variety of summarisation methods on a set of different pre-trained transformers with different extraction techniques. In addition, we considered multiple evaluation metrics in order to investigate their differing behaviour and applicability on a dataset from the Financial Narrative Summarisation (FNS 2020) shared task, which is composed of annual reports published by firms listed on the London Stock Exchange and their corresponding summaries. We hypothesise that some evaluation metrics do not reflect true summarisation ability and propose a novel BRUGEscore metric, as the harmonic mean of ROUGE-2 and BERTscore. Finally, we perform a statistical significance test on our results to verify whether they are statistically robust, alongside an adversarial analysis task with three different corruption methods.

Item Type:

Contribution in Book/Report/Proceedings

Uncontrolled Keywords:

/dk/atira/pure/subjectarea/asjc/2600/2614

Subjects:

?? benchmarkingevaluation metricslong document sumamrizationtheoretical computer sciencegeneral computer science ??

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

209656

Deposited By:

ep_importer_pure

Deposited On:

08 Nov 2023 13:15

Refereed?:

Yes

Published?:

Published

Last Modified:

08 Feb 2026 00:40

URI:

https://eprints.lancs.ac.uk/id/eprint/209656

Altmetric