A Fast Deterministic Kmeans Initialization

Omar Kettani; Faical Ramdani

Research Article

A Fast Deterministic Kmeans Initialization

by Omar Kettani, Faical Ramdani

International Journal of Applied Information Systems

Foundation of Computer Science (FCS), NY, USA

Volume 12 - Issue 2

Published: May 2017

Authors: Omar Kettani, Faical Ramdani

10.5120/ijais2017451683

PDF

Omar Kettani, Faical Ramdani . A Fast Deterministic Kmeans Initialization. International Journal of Applied Information Systems. 12, 2 (May 2017), 6-11. DOI=10.5120/ijais2017451683

                        @article{ 10.5120/ijais2017451683,
                        author  = { Omar Kettani,Faical Ramdani },
                        title   = { A Fast Deterministic Kmeans Initialization },
                        journal = { International Journal of Applied Information Systems },
                        year    = { 2017 },
                        volume  = { 12 },
                        number  = { 2 },
                        pages   = { 6-11 },
                        doi     = { 10.5120/ijais2017451683 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2017
                        %A Omar Kettani
                        %A Faical Ramdani
                        %T A Fast Deterministic Kmeans Initialization%T 
                        %J International Journal of Applied Information Systems
                        %V 12
                        %N 2
                        %P 6-11
                        %R 10.5120/ijais2017451683
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

The k-means algorithm remains one of the most widely used clustering methods, in spite of its sensitivity to the initial settings. This paper explores a simple, computationally low, deterministic method which provides k-means with initial seeds to cluster a given data set. It is simply based on computing the means of k samples with equal parts taken from the given data set. We test and compare this method to the related well know kkz initialization algorithm for k-means, using both simulated and real data, and find it to be more efficient in many cases.

References

Aloise D., Deshpande A., Hansen P., Popat P.: NP-hardness of Euclidean sum-of- squares clustering. Machine Learning, 75, 245 - 249 (2009).
Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
Peña J.M., Lozano J.A., Larrañaga P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters, 20(10), 1027 - 1040 (1999).
4.. Forgy E., "Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications". Biometrics, 21, 768 - 769 (1965).
Arthur D., Vassilvitskii S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM Symp. on Disc. Alg, pp. 1027 - 1035 (2007).
Bahmani B., Moseley B., Vattani A., Kumar R., Vassilvitskii S.:Scalable K-means++. In: Proceedings of the VLDB Endowment (2012).
I. Katsavounidis, C.-C. J. Kuo, Z. Zhang, A New Initialization Technique for Generalized Lloyd Iteration, IEEE Signal Processing Letters 1 (10) (1994) 144–146.
8.Asuncion, A. and Newman, D.J. (2007). UCI Machine LearningRepository[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

k-means initialization kkz