Bezerra, C.G. and Costa, B.S.J. and Guedes, L.A. and Angelov, P.P. (2020) An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Information Sciences, 518. pp. 13-28. ISSN 0020-0255
Paper_Information_Sciences_Revised_auto_cloud_2020.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (3MB)
Abstract
In this paper we propose an algorithm for online clustering of data streams. This algorithm is called AutoCloud and is based on the recently introduced concept of Typicality and Eccentricity Data Analytics, mainly used for anomaly detection tasks. AutoCloud is an evolving, online and recursive technique that does not need training or prior knowledge about the data set. Thus, AutoCloud is fully online, requiring no offline processing. It allows creation and merging of clusters autonomously as new data observations become available. The clusters created by AutoCloud are called data clouds, which are structures without pre-defined shape or boundaries. AutoCloud allows each data sample to belong to multiple data clouds simultaneously using fuzzy concepts. AutoCloud is also able to handle concept drift and concept evolution, which are problems that are inherent in data streams in general. Since the algorithm is recursive and online, it is suitable for applications that require a real-time response. We validate our proposal with applications to multiple well known data sets in the literature.