Divisive clustering of high dimensional data streams

Hofmeyr, David and Pavlidis, Nicos and Eckley, Idris (2016) Divisive clustering of high dimensional data streams. Statistics and Computing, 26 (5). 1101–1120. ISSN 0960-3174

[thumbnail of Divisive_Clustering_of_High_Dimensional_Data_Streams]
Preview
PDF (Divisive_Clustering_of_High_Dimensional_Data_Streams)
HSDC.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (658kB)

Abstract

Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.

Item Type:
Journal Article
Journal or Publication Title:
Statistics and Computing
Additional Information:
Publication is available at: http://link.springer.com/article/10.1007%2Fs11222-015-9597-y
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/1700/1703
Subjects:
?? clusteringdata streamhigh dimensionality population driftmodality testingcomputational theory and mathematicstheoretical computer sciencestatistics and probabilitystatistics, probability and uncertainty ??
ID Code:
75191
Deposited By:
Deposited On:
12 Aug 2015 08:44
Refereed?:
Yes
Published?:
Published
Last Modified:
25 Oct 2024 00:10