Using Arabic Twitter to support analysis of the spread of Infectious Diseases

Alsudias, Lama and Rayson, Paul (2022) Using Arabic Twitter to support analysis of the spread of Infectious Diseases. PhD thesis, Lancaster University.

[thumbnail of 2022LamaPhD]
Text (2022LamaPhD)
2022LamaPhD.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial-ShareAlike.

Download (4MB)


This study investigates how to use Arabic social media content, especially Twitter, to measure the incidence of infectious diseases. People use social media applications such as Twitter to find news related to diseases and/or express their opinions and feelings about them. As a result, a vast amount of information could be exploited by NLP researchers for a myriad of analyses despite the informal nature of social media writing style. Systematic monitoring of social media posts (infodemiology or infoveillance) could be useful to detect misinformation outbreaks as well as to reduce reporting lag time and to provide an independent complementary source of data compared with traditional surveillance approaches. However, there has been a lack of research about analysing Arabic tweets for health surveillance purposes, due to the lack of Arabic social media datasets in comparison with what is available for English and some other languages. Therefore, it is necessary for us to create our own corpus. In addition, building ontologies is a crucial part of the semantic web endeavour. In recent years, research interest has grown rapidly in supporting languages such as Arabic in NLP in general but there has been very little research on medical ontologies for Arabic. In this thesis, the first and the largest Arabic Twitter dataset in the area of health surveillance was created to use in training and testing in the research studies presented. The Machine Learning algorithms with NLP techniques especially for Arabic were used to classify tweets into five categories: academic, media, government, health professional, and the public, to assist in reliability and trust judgements by taking into account the source of the information alongside the content of tweets. An Arabic Infectious Diseases Ontology was presented and evaluated as part of a new method to bridge between formal and informal descriptions of Infectious Diseases. Different qualitative and quantitative studies were performed to analyse Arabic tweets that have been written during the pandemic, i.e. COVID-19, to show how Public Health Organisations can learn from social media. A system was presented that measures the spread of two infectious diseases based on our Ontology to illustrate what quantitative patterns and qualitative themes can be extracted.

Item Type:
Thesis (PhD)
ID Code:
Deposited By:
Deposited On:
01 Jun 2022 16:15
Last Modified:
16 Feb 2024 00:21