Degrees of freedom and model selection for k-means clustering

Hofmeyr, David P. (2020) Degrees of freedom and model selection for k-means clustering. Computational Statistics & Data Analysis, 149: 106974.

Full text not available from this repository.

Abstract

A thorough investigation into the model degrees of freedom in k-means clustering is conducted. An extension of Stein's lemma is used to obtain an expression for the effective degrees of freedom in the k-means model. Approximating the degrees of freedom in practice requires simplifications of this expression, however empirical studies evince the appropriateness of the proposed approach. The practical relevance of this new degrees of freedom formulation for k-means is demonstrated through model selection using the Bayesian Information Criterion. The reliability of this method is then validated through experiments on simulated data as well as on a large collection of publicly available benchmark data sets from diverse application areas. Comparisons with popular existing techniques indicate that this approach is extremely competitive for selecting high quality clustering solutions.

Item Type:
Journal Article
Journal or Publication Title:
Computational Statistics & Data Analysis
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/2600/2613
Subjects:
?? bayesian information criterioncluster number determinationclusteringdegrees of freedommodel selectionpenalised likelihoodk-meansstatistics and probabilitycomputational theory and mathematicscomputational mathematicsapplied mathematics ??
ID Code:
231594
Deposited By:
Deposited On:
12 Sep 2025 13:20
Refereed?:
Yes
Published?:
Published
Last Modified:
12 Sep 2025 13:20