AfriQA : Cross-lingual Open-Retrieval Question Answering for African Languages

Ogundepo, Odunayo and Gwadabe, Tajuddeen R. and Rivera, Clara E. and Clark, Jonathan H. and Ruder, Sebastian and Adelani, David Ifeoluwa and Ezeani, Ignatius and Chukwuneke, Chiamaka and Dossou, Bonaventure F. P. and Abdou, Aziz DIOP and Sikasote, Claytone and Hacheme, Gilles and Buzaaba, Happy and Mabuya, Rooweither and Osei, Salomey and Emezue, Chris and Kahira, Albert Njoroge and Muhammad, Shamsuddeen H. and Oladipo, Akintunde and Owodunni, Abraham Toluwase and Tonja, Atnafu Lambebo and Shode, Iyanuoluwa and Asai, Akari and Ajayi, Tunde Oluwaseyi and Siro, Clemencia and Arthur, Steven and Adeyemi, Mofetoluwa and Ahia, Orevaoghene and Anuoluwapo, Aremu and Awosan, Oyinkansola and Opoku, Bernard and Ayodele, Awokoya and Otiende, Verrah and Mwase, Christine and Sinkala, Boyd and Rubungo, Andre Niyongabo and Ajisafe, Daniel A. and Onwuegbuzia, Emeka Felix and Mbow, Habib and Niyomutabazi, Emile and Mukonde, Eunice and Lawan, Falalu Ibrahim and Ahmad, Ibrahim Said and Alabi, Jesujoba O. and Namukombo, Martin and Chinedu, Mbonu and Phiri, Mofya and Putini, Neo and Mngoma, Ndumiso and Amuok, Priscilla A. and Iro, Ruqayya Nasir and Adhiambo, Sonia (2023) AfriQA : Cross-lingual Open-Retrieval Question Answering for African Languages. Other. Arxiv.

[thumbnail of 2305.06897v1]
Text (2305.06897v1)
2305.06897v1.pdf - Other
Available under License Creative Commons Attribution.

Download (417kB)


African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

Item Type:
Monograph (Other)
?? ??
ID Code:
Deposited By:
Deposited On:
06 Jun 2023 13:45
Last Modified:
04 May 2024 23:35