Mohd Ali, Azliza and Angelov, Plamen (2018) Anomalous behaviour detection using heterogeneous data. PhD thesis, Lancaster University.
2018MohdAliphd.pdf - Published Version
Available under License Creative Commons Attribution-NoDerivs.
Download (7MB)
Abstract
Anomaly detection is one of the most important methods to process and find abnormal data, as this method can distinguish between normal and abnormal behaviour. Anomaly detection has been applied in many areas such as the medical sector, fraud detection in finance, fault detection in machines, intrusion detection in networks, surveillance systems for security, as well as forensic investigations. Abnormal behaviour can give information or answer questions when an investigator is performing an investigation. Anomaly detection is one way to simplify big data by focusing on data that have been grouped or clustered by the anomaly detection method. Forensic data usually consists of heterogeneous data which have several data forms or types such as qualitative or quantitative, structured or unstructured, and primary or secondary. For example, when a crime takes place, the evidence can be in the form of various types of data. The combination of all the data types can produce rich information insights. Nowadays, data has become ‘big’ because it is generated every second of every day and processing has become time-consuming and tedious. Therefore, in this study, a new method to detect abnormal behaviour is proposed using heterogeneous data and combining the data using data fusion technique. Vast challenge data and image data are applied to demonstrate the heterogeneous data. The first contribution in this study is applying the heterogeneous data to detect an anomaly. The recently introduced anomaly detection technique which is known as Empirical Data Analytics (EDA) is applied to detect the abnormal behaviour based on the data sets. Standardised eccentricity (a newly introduced within EDA measure offering a new simplified form of the well-known Chebyshev Inequality) can be applied to any data distribution. Then, the second contribution is applying image data. The image data is processed using pre-trained deep learning network, and classification is done using a support vector machine (SVM). After that, the last contribution is combining anomaly result from heterogeneous data and image recognition using new data fusion technique. There are five types of data with three different modalities and different dimensionalities. The data cannot be simply combined and integrated. Therefore, the new data fusion technique first analyses the abnormality in each data type separately and determines the degree of suspicious between 0 and 1 and sums up all the degrees of suspicion data afterwards. This method is not intended to be a fully automatic system that resolves investigations, which would likely be unacceptable in any case. The aim is rather to simplify the role of the humans so that they can focus on a small number of cases to be looked in more detail. The proposed approach does simplify the processing of such huge amounts of data. Later, this method can assist human experts in their investigations and making final decisions.