UPPC - Urdu Paraphrase Plagiarism Corpus

Muhammad, Sharjeel and Rayson, Paul Edward and Nawab, Rao Muhammad Adeel (2016) UPPC - Urdu Paraphrase Plagiarism Corpus. In: Proceedings of LREC 2016, Tenth International Conference on Language Resources and Evaluation :. European Language Resources Association (ELRA), pp. 1832-1836. ISBN 9782951740891

[thumbnail of uppc-urdu-paraphrase]
Preview
PDF (uppc-urdu-paraphrase)
uppc_urdu_paraphrase.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (341kB)

Abstract

Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect. Several methods and automatic systems have been proposed to deal with it. However, evaluation and comparison of such solutions is not possible because of the unavailability of benchmark corpora with manual examples of paraphrase plagiarism. To deal with this issue, we present the novel development of a paraphrase plagiarism corpus containing simulated (manually created) examples in the Urdu language - a language widely spoken around the world. This resource is the first of its kind developed for the Urdu language and we believe that it will be a valuable contribution to the evaluation of paraphrase plagiarism detection systems.

Item Type:
Contribution in Book/Report/Proceedings
Subjects:
?? paraphrase plagiarismcorpus generationurdu plagiarism detectionnatural language processing ??
ID Code:
78962
Deposited By:
Deposited On:
04 Apr 2016 12:56
Refereed?:
Yes
Published?:
Published
Last Modified:
19 Feb 2024 00:54