Lancaster EPrints

Building and annotating a corpus for the study of journalistic text reuse

Piao, Scott and Clough, Paul and Gaizauskas, Robert (2002) Building and annotating a corpus for the study of journalistic text reuse. In: 3rd International Conference on Language Resources and Evaluation (LREC-2002). , Las Palmas de Gran Canaria, Spain, pp. 1678-1691.

Full text not available from this repository.

Abstract

In this paper we present the METER Corpus, a novel resource for the study and analysis of journalistic text reuse. The corpus consists of a set of news stories written by the Press Association (PA), the major UK news agency, and a set of stories about the same news events, as published in various British newspapers. In some cases the newspaper stories are rewritten from the PA source; in other cases they have been independently written by the newspapers' own journalists. We discuss the motivation for creating the corpus, its contents, the annotation of certain attributes for analysis of text reuse and finally the encoding of those annotations into a standardised corpus format: the Text Encoding Initiative (TEI).

Item Type: Contribution in Book/Report/Proceedings
Uncontrolled Keywords: Journalistic text reuse ; TEI markup ; Corpus annotation ; Corpus ; Paraphrase
Subjects:
Departments: Faculty of Science and Technology > School of Computing & Communications
ID Code: 52135
Deposited By: ep_importer_pure
Deposited On: 20 Dec 2011 09:58
Refereed?: No
Published?: Published
Last Modified: 10 Apr 2014 00:48
Identification Number:
URI: http://eprints.lancs.ac.uk/id/eprint/52135

Actions (login required)

View Item