Research Article

An Improved Agglomerative Clustering Method

by  Omar Kettani, Faical Ramdani
journal cover
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 12 - Issue 3
Published: June 2017
Authors: Omar Kettani, Faical Ramdani
10.5120/ijais2017451689
PDF

Omar Kettani, Faical Ramdani . An Improved Agglomerative Clustering Method. International Journal of Applied Information Systems. 12, 3 (June 2017), 16-23. DOI=10.5120/ijais2017451689

                        @article{ 10.5120/ijais2017451689,
                        author  = { Omar Kettani,Faical Ramdani },
                        title   = { An Improved Agglomerative Clustering Method },
                        journal = { International Journal of Applied Information Systems },
                        year    = { 2017 },
                        volume  = { 12 },
                        number  = { 3 },
                        pages   = { 16-23 },
                        doi     = { 10.5120/ijais2017451689 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2017
                        %A Omar Kettani
                        %A Faical Ramdani
                        %T An Improved Agglomerative Clustering Method%T 
                        %J International Journal of Applied Information Systems
                        %V 12
                        %N 3
                        %P 16-23
                        %R 10.5120/ijais2017451689
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is a common and useful exploratory task widely used in Data mining. Among the many existing clustering algorithms, the Agglomerative Clustering Method (ACM) introduced by the authors suffers from an obvious drawback: its sensitivity to data ordering. To overcome this issue, we propose in this paper to initialize the ACM by using the KKZ seed algorithm. The proposed approach (called KKZ_ACM) has a lower computational time complexity than the famous k-means algorithm. We evaluated its performance by applying on various benchmark datasets and compare with ACM, k-means++ and KKZ_ k-means. Our performance studies have demonstrated that the proposed approach is effective in producing consistent clustering results in term of average Silhouette index.

References
  • Kettani, O. ; Ramdani, F. & Tadili, B. An Agglomerative Clustering Method for Large Data Sets.International Journal of Computer Applications 92(14):1-7, April 2014. DOI:10.5120/16074-4952
  • I. Katsavounidis, C.-C. J. Kuo, Z. Zhang, A New Initialization Technique for Generalized Lloyd Iteration, IEEE Signal Processing Letters 1 (10) (1994) 144–146.
  • Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P. (2009). "NP-hardness of Euclidean sum-of-squares clustering". Machine Learning 75: 245–249. doi:10.1007/s10994-009-5103-0.
  • Garey M.R., Johnson D.S. “Computers and Intractability: A Guide to the Theory of NP-Completeness”W. H. Freeman & Co. New York, NY, USA ©1979
  • E. Forgy, Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classification, Biometrics 21 (1965) 768.
  • MacQueen, J.B., 1967. Some Method for Classification and Analysis of Multivariate Observations, Proceeding of the Berkeley Symposium on Mathematical Statistics and Probability, (MSP’67), Berkeley, University of California Press, pp: 281-297.
  • L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.
  • Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
  • D. Arthur, S. Vassilvitskii, k-means++: The Advantages of Careful Seeding, in: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007, pp. 1027–1035.
  • Asuncion, A. and Newman, D.J. (2007). UCI Machine LearningRepository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Clustering k-means k-means++ KKZ

Powered by PhDFocusTM