An Improved Agglomerative Clustering Method

Omar Kettani; Faical Ramdani

Research Article

An Improved Agglomerative Clustering Method

by Omar Kettani, Faical Ramdani

International Journal of Applied Information Systems

Foundation of Computer Science (FCS), NY, USA

Volume 12 - Issue 3

Published: June 2017

Authors: Omar Kettani, Faical Ramdani

10.5120/ijais2017451689

PDF

Omar Kettani, Faical Ramdani . An Improved Agglomerative Clustering Method. International Journal of Applied Information Systems. 12, 3 (June 2017), 16-23. DOI=10.5120/ijais2017451689

                        @article{ 10.5120/ijais2017451689,
                        author  = { Omar Kettani,Faical Ramdani },
                        title   = { An Improved Agglomerative Clustering Method },
                        journal = { International Journal of Applied Information Systems },
                        year    = { 2017 },
                        volume  = { 12 },
                        number  = { 3 },
                        pages   = { 16-23 },
                        doi     = { 10.5120/ijais2017451689 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2017
                        %A Omar Kettani
                        %A Faical Ramdani
                        %T An Improved Agglomerative Clustering Method%T 
                        %J International Journal of Applied Information Systems
                        %V 12
                        %N 3
                        %P 16-23
                        %R 10.5120/ijais2017451689
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Clustering is a common and useful exploratory task widely used in Data mining. Among the many existing clustering algorithms, the Agglomerative Clustering Method (ACM) introduced by the authors suffers from an obvious drawback: its sensitivity to data ordering. To overcome this issue, we propose in this paper to initialize the ACM by using the KKZ seed algorithm. The proposed approach (called KKZ_ACM) has a lower computational time complexity than the famous k-means algorithm. We evaluated its performance by applying on various benchmark datasets and compare with ACM, k-means++ and KKZ_ k-means. Our performance studies have demonstrated that the proposed approach is effective in producing consistent clustering results in term of average Silhouette index.

References

Kettani, O. ; Ramdani, F. & Tadili, B. An Agglomerative Clustering Method for Large Data Sets.International Journal of Computer Applications 92(14):1-7, April 2014. DOI:10.5120/16074-4952
I. Katsavounidis, C.-C. J. Kuo, Z. Zhang, A New Initialization Technique for Generalized Lloyd Iteration, IEEE Signal Processing Letters 1 (10) (1994) 144–146.
Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P. (2009). "NP-hardness of Euclidean sum-of-squares clustering". Machine Learning 75: 245–249. doi:10.1007/s10994-009-5103-0.
Garey M.R., Johnson D.S. “Computers and Intractability: A Guide to the Theory of NP-Completeness”W. H. Freeman & Co. New York, NY, USA ©1979
E. Forgy, Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classification, Biometrics 21 (1965) 768.
MacQueen, J.B., 1967. Some Method for Classification and Analysis of Multivariate Observations, Proceeding of the Berkeley Symposium on Mathematical Statistics and Probability, (MSP’67), Berkeley, University of California Press, pp: 281-297.
L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.
Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
D. Arthur, S. Vassilvitskii, k-means++: The Advantages of Careful Seeding, in: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007, pp. 1027–1035.
Asuncion, A. and Newman, D.J. (2007). UCI Machine LearningRepository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Clustering k-means k-means++ KKZ