Haynes, Kaylea and Fearnhead, Paul and Eckley, Idris (2017) Detecting abrupt changes in big data. PhD thesis, Lancaster University.
2017haynesphd.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.
Download (1MB)
Abstract
This thesis looks at developing methods for changepoint detection that can be used in the realm of Big Data. In particular we look at developing methods that can be scaled to the volume of data, now readily collected and stored, and are also versatile to the different varieties of data. A well established approach to detect changes uses penalised optimisation where the choice of the penalty has a huge impact on the performance of the method. In the first part of this thesis we propose an algorithm, CROPS (Changepoints over a Range of PenaltieS), which finds the optimal solutions for a range of penalties instead of only specifying one penalty. The second part of this thesis looks at the choice of cost function used in the optimisation. In particular we develop a computationally efficient method, which uses a nonparametric cost function, allowing for changes to be detected in a larger variety of data-sets. This nonparametric approach uses the empirical cumulative distribution of the data and thus does not require any assumptions to be made on distributional parameters. The third part of this thesis looks at ways to parallelise detection methods in order to use multi-core computers and thus allowing for changes to be detected in much larger data-sets than they could be previously. We look at different ways to split the data across multiple cores and then merge the results to try to conserve as much of the accuracy that we had when we only used one core.