Empirical Evaluation Methodology for Target Dependent Sentiment Analysis

Moore, Andrew and Rayson, Paul (2021) Empirical Evaluation Methodology for Target Dependent Sentiment Analysis. PhD thesis, Lancaster University.

[thumbnail of 2021moorephd]
Text (2021moorephd)
2021moorephd.pdf - Published Version
Available under License Creative Commons Attribution.

Download (4MB)


The area of sentiment analysis has been around for at least 20 years in one form or another. In which time, it has had many and varied applications ranging from predicting film successes to social media analytics, and it has gained widespread use via selling it as a tool through application programming interfaces. The focus of this thesis is not on the application side but rather on novel evaluation methodology for the most fine grained form of sentiment analysis, target dependent sentiment analysis (TDSA). TDSA has seen a recent upsurge but to date most research only evaluates on very similar datasets which limits the conclusions that can be drawn from it. Further, most research only marginally improves results, chasing the State Of The Art (SOTA), but these prior works cannot empirically show where their improvements come from beyond overall metrics and small qualitative examples. By performing an extensive literature review on the different granularities of sentiment analysis, coarse (document level) to fine grained, a new and extended definition of fine grained sentiment analysis, the hextuple, is created which removes ambiguities that can arise from the context. In addition, examples from the literature will be provided where studies are not able to be replicated nor reproduced. This thesis includes the largest empirical analysis on six English datasets across multiple existing neural and non-neural methods, allowing for the methods to be tested for generalisability. In performing these experiments factors such as dataset size and sentiment class distribution determine whether neural or non-neural approaches are best, further finding that no method is generalisable. By formalising, analysing, and testing prior TDSA error splits, newly created error splits, and a new TDSA specific metric, a new empirical evaluation methodology has been created for TDSA. This evaluation methodology is then applied to multiple case studies to empirically justify improvements, such as position encoding, and show how contextualised word representation improves TDSA methods. From the first reproduction study in TDSA, it is believed that random seeds significantly affecting the neural method is the reason behind the difficulty in reproducing or replicating the original study results. Thus highlighting empirically for the first in TDSA the need for reporting multiple run results for neural methods, to allow for better reporting and improved evaluation. This thesis is fully reproducible through the codebases and Jupyter notebooks referenced, making it an executable thesis.

Item Type:
Thesis (PhD)
ID Code:
Deposited By:
Deposited On:
01 Sep 2021 08:50
Last Modified:
03 Jun 2024 23:33