Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus

Hammo, Bassam and Sleit, Azzam and El-Haj, Mahmoud (2008) Enhancing retrieval effectiveness of diacritisized Arabic passages using stemmer and thesaurus. In: The 19th Midwest Artificial Intelligence And Cognitive Science Conference Maics2008. UNSPECIFIED, 189–196.

Full text not available from this repository.

Abstract

In this paper we discuss the enhancement of Arabic passage retrieval for both diacritisized and nondiacritisized text. Most previous work suggested that retrieval start with pre-processing the Arabic text to remove the diacritical marks (short vowels) to unify the text. In most cases, this process causes considerable ambiguity at the word level in the absence of context. However, searching for a word in diacritisized text requires typing and matching all its diacritical marks, which is cumbersome and prevents users from searching and hence retrieving valuable amount of text. The other way around, is to ignore these marks and fall into the problem of ambiguity. In this paper, we propose a passage retrieval approach to search for diacritic and diacritic-less text through query expansion to match a user’s query. We applied a rule-based stemmer and we compiled a huge thesaurus for this purpose. We tested our approach on the scripts of the Quran as an open domain source of diacritisized text using a set of 40 non-diacritical words obtained from testers. The results are presented and the applied approach reveals future directions for search engines.

Item Type:
Contribution in Book/Report/Proceedings
ID Code:
71277
Deposited By:
Deposited On:
15 Oct 2014 14:38
Refereed?:
Yes
Published?:
Published
Last Modified:
01 Jan 2020 05:48