Participatory Research for Low-resourced Machine Translation:A Case Study in African Languages

Nekoto, Wilhelmina and Marivate, Vukosi and Matsila, Tshinondiwa and Fasubaa, Timi and Kolawole, Tajudeen and Fagbohungbe, Taiwo and Akinola, Solomon Oluwole and Muhammad, Shamsuddeen Hassan and Kabongo, Salomon and Osei, Salomey and Freshia, Sackey and Niyongabo, Rubungo Andre and Macharm, Ricky and Ogayo, Perez and Ahia, Orevaoghene and Meressa, Musie and Adeyemi, Mofe and Mokgesi-Selinga, Masabata and Okegbemi, Lawrence and Martinus, Laura Jane and Tajudeen, Kolawole and Degila, Kevin and Ogueji, Kelechi and Siminyu, Kathleen and Kreutzer, Julia and Webster, Jason and Ali, Jamiil Toure and Abbott, Jade and Orife, Iroro and Ezeani, Ignatius and Dangana, Idris Abdulkabir and Kamper, Herman and Elsahar, Hady and Duru, Goodness and Kioko, Ghollah and Murhabazi, Espoir and Biljon, Elan van and Whitenack, Daniel and Onyefuluchi, Christopher and Emezue, Chris and Dossou, Bonaventure and Sibanda, Blessing and Bassey, Blessing Itoro and Olabiyi, Ayodele and Ramkilowan, Arshath and Öktem, Alp and Akinfaderin, Adewale and Bashir, Abdallah (2020) Participatory Research for Low-resourced Machine Translation:A Case Study in African Languages. arXiv. ISSN 2331-8422

[img]
Text (Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages)
2010.02353v2.pdf - Accepted Version

Download (709kB)

Abstract

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

Item Type:
Journal Article
Journal or Publication Title:
arXiv
Additional Information:
Findings of EMNLP 2020; updated benchmarks
Subjects:
ID Code:
150109
Deposited By:
Deposited On:
05 Jan 2021 09:49
Refereed?:
Yes
Published?:
Published
Last Modified:
01 Aug 2021 06:56