Creating and validating multilingual semantic representations for six languages : expert versus non-expert crowds

El-Haj, Mahmoud and Rayson, Paul and Piao, Scott and Wattam, Stephen (2017) Creating and validating multilingual semantic representations for six languages : expert versus non-expert crowds. In: Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications : Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 61-71. ISBN 9781945626500

Preview

PDF (W17-1908)
W17_1908.pdf - Published Version
Available under License Creative Commons Attribution.
Download (1MB)

Abstract

Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a common multilingual semantic taxonomy for sets of words in six languages. 451 non-experts (including 427 Mechanical Turk workers) and 15 expert participants semantically annotated 250 words manually for Arabic, Chinese, English, Italian, Portuguese and Urdu lexicons. In order to avoid erroneous (spam) crowdsourced results, we used a novel taskspecific two-phase filtering process where users were asked to identify synonyms in the target language, and remove erroneous senses.

Item Type:

Contribution in Book/Report/Proceedings

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

85797

Deposited By:

ep_importer_pure

Deposited On:

04 Apr 2017 08:28

Refereed?:

Yes

Published?:

Published

Last Modified:

05 May 2026 23:10

URI:

https://eprints.lancs.ac.uk/id/eprint/85797

Altmetric