A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models

Ahmed, Naeem and Amin, Rashid and Aldabbas, Hamza and Saeed, Muhammad and Bilal, Muhammad and Song, Houbing (2024) A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). ISSN 2375-4699

[thumbnail of A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep]
Text (A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep)
A_Novel_Approach_for_Sentiment_Analysis_of_a_Low_Resource_Language_Using_Deep.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (1MB)

Abstract

Sentiment analysis is a process of dealing with people's opinions, remarks, and comments to extract valuable insights from them. Sentiment analysis can be used for various purposes like market analysis, campaign monitoring, decision-making, etc. In recent years, there has been much research on sentiment classification, particularly in English. However, these existing approaches used for the English language cannot be applied to the Urdu language. The substantial rise in communication traffic, including audio, text, video, and pictures, has significantly shifted the Internet of Things (IoT) from scalar to Multimedia Internet of Things (MIoT). So far, the integration of MIoT and NLP systems has received less attention, but it has evolved as a novel research paradigm for smart applications. This article proposes deep learning techniques for sentence-level Urdu sentiment analysis (Urdu SA) for MIoT. Our approach consists of various phases, i.e., data gathering, text preprocessing, model training, testing, and evaluation. A data set of 25 thousand Urdu reviews are used for training the proposed models. This data set is built by scraping various Urdu blogs and social media platforms, and some part of the IMDB data set is used after translating it into the Urdu language. Native Urdu speakers do data annotation, and various preprocessing techniques, i.e., tokenization, stemming, etc., are applied. The two deep learning models, i.e., Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM), are trained on preprocessed Urdu reviews to find their sentiments in this article. Both models are tested using various combinations of hyperparameters, and each model's accuracy and F1 scores are evaluated. The study results show that the LSTM model outperforms the CNN model by achieving a 96% accuracy and 91% F1 score.

Item Type:
Journal Article
Journal or Publication Title:
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
Uncontrolled Keywords:
Research Output Funding/no_not_funded
Subjects:
?? no - not fundedno ??
ID Code:
225169
Deposited By:
Deposited On:
18 Oct 2024 12:50
Refereed?:
Yes
Published?:
Published
Last Modified:
19 Dec 2024 01:23