Ahmed, Naeem and Amin, Rashid and Aldabbas, Hamza and Saeed, Muhammad and Bilal, Muhammad and Song, Houbing (2024) A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). ISSN 2375-4699
A_Novel_Approach_for_Sentiment_Analysis_of_a_Low_Resource_Language_Using_Deep.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (1MB)
Abstract
Sentiment analysis is a process of dealing with people's opinions, remarks, and comments to extract valuable insights from them. Sentiment analysis can be used for various purposes like market analysis, campaign monitoring, decision-making, etc. In recent years, there has been much research on sentiment classification, particularly in English. However, these existing approaches used for the English language cannot be applied to the Urdu language. The substantial rise in communication traffic, including audio, text, video, and pictures, has significantly shifted the Internet of Things (IoT) from scalar to Multimedia Internet of Things (MIoT). So far, the integration of MIoT and NLP systems has received less attention, but it has evolved as a novel research paradigm for smart applications. This article proposes deep learning techniques for sentence-level Urdu sentiment analysis (Urdu SA) for MIoT. Our approach consists of various phases, i.e., data gathering, text preprocessing, model training, testing, and evaluation. A data set of 25 thousand Urdu reviews are used for training the proposed models. This data set is built by scraping various Urdu blogs and social media platforms, and some part of the IMDB data set is used after translating it into the Urdu language. Native Urdu speakers do data annotation, and various preprocessing techniques, i.e., tokenization, stemming, etc., are applied. The two deep learning models, i.e., Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM), are trained on preprocessed Urdu reviews to find their sentiments in this article. Both models are tested using various combinations of hyperparameters, and each model's accuracy and F1 scores are evaluated. The study results show that the LSTM model outperforms the CNN model by achieving a 96% accuracy and 91% F1 score.