Arabic multi-document text summarisation

El-Haj, Mahmoud (2012) Arabic multi-document text summarisation. PhD thesis, UNSPECIFIED.

[thumbnail of Mahmoud_ELHAJ_PHD_Thesis_2012]
Preview
PDF (Mahmoud_ELHAJ_PHD_Thesis_2012)
Mahmoud_ELHAJ_PHD_Thesis_2012.pdf - Accepted Version

Download (3MB)

Abstract

Multi-document summarisation is the process of producing a single summary of a collection of related documents. Much of the current work on multi-document text summarisation is concerned with the English language; relevant resources are numerous and readily available. These resources include human generated (gold-standard) and automatic summaries. Arabic multi-document summarisation is still in its infancy. One of the obstacles to progress is the limited availability of Arabic resources to support this research. When we started our research there were no publicly available Arabic multi-document gold-standard summaries, which are needed to automatically evaluate system generated summaries. The Document Understanding Conference (DUC) and Text Analysis Conference (TAC) at that time provided resources such as gold-standard extractive and abstractive summaries (both human and system generated) that were only available in English. Our aim was to push forward the state-of-the-art in Arabic multi-document summarisation. This required advancements in at least two areas. The first area was the creation of Arabic test collections. The second area was concerned with the actual summarisation process to find methods that improve the quality of Arabic summaries. To address both points we created single and multi-document Arabic test collections both automatically and manually using a commonly used English dataset and by having human participants. We developed extractive language dependent and language independent single and multi-document summarisers, both for Arabic and English. In our work we provided state-of-the-art approaches for Arabic multi-document summarisation. We succeeded in including Arabic in one of the leading summarisation conferences the Text Analysis Conference (TAC). Researchers on Arabic multi-document summarisation now have resources and tools that can be used to advance the research in this field.

Item Type:
Thesis (PhD)
Additional Information:
Thesis (Ph.D.), School of Computer Science and Electronic Engineering, University of Essex, 2012
ID Code:
71279
Deposited By:
Deposited On:
17 Oct 2014 08:58
Refereed?:
No
Published?:
Published
Last Modified:
21 Sep 2024 23:55