Application of Numerical Measure Variations in K-Means Clustering for Grouping Data

  • Relita Buaton Sekolah Tinggi Manajemen Informatika dan Komputer Kaputama, Binjai, Indonesia
  • Solikhun Solikhun STIKOM Tunas Bangsa
Keywords: Data Grouping, Distance Calculation, K-Means Clustering, Numerical Measures

Abstract

The K-Means Clustering algorithm is commonly used by researchers in grouping data. The main problem in this study was that it has yet to be discovered how optimal the grouping with variations in distance calculations is in K-Means Clustering. The purpose of this research was to compare distance calculation methods with K-Means such as Euclidean Distance, Canberra Distance, Chebychev Distance, Cosine Similarity, Dynamic TimeWarping Distance, Jaccard Similarity, and Manhattan Distance to find out how optimal the distance calculation is in the K-Means method. The best distance
calculation was determined from the smallest Davies Bouldin Index value. This research aimed to find optimal clusters using the K-Means Clustering algorithm with seven distance calculations based on types of numerical measures. This research method compared distance calculation methods in the K-Means algorithm, such as Euclidean Distance,  Canberra Distance, Chebychev Distance, Cosine Smilirity, Dynamic Time Warping Distance, Jaccard Smilirity and Manhattan Distance to find out how optimal the distance calculation is in the K-Means method. Determining the best distance calculation can be seen from the smallest Davies Bouldin Index value. The data used in this study was on cosmetic sales at Devi Cosmetics, consisting of cosmetics sales from January to April 2022 with 56 product items. The result of this study was a comparison of numerical measures in the K-Means Clustering algorithm. The optimal cluster was calculating the Euclidean distance with a total of 9 clusters with a DBI value of 0.224. In comparison, the best average DBI value was the calculation of the Euclidean Distance with an average DBI value of 0.265.

Downloads

Download data is not yet available.

References

[1] D. Priyanto, B. K. Triwijoyo, D. Jollyta, H. Hairani, and N. G. A. Dasriani, “Data Mining Earthquake Prediction with Multivariate
Adaptive Regression Splines and Peak Ground Acceleration,” MATRIK : Jurnal Manajemen, Teknik Informatika dan
Rekayasa Komputer, vol. 22, no. 3, pp. 583–592, jul 2023.
[2] A. Damuri, U. Riyanto, H. Rusdianto, and M. Aminudin, “Implementasi Data Mining dengan Algoritma Na¨ıve Bayes Untuk
Klasifikasi Kelayakan Penerima Bantuan Sembako,” JURIKOM (Jurnal Riset Komputer), vol. 8, no. 6, pp. 219–225, dec 2021.
[3] P. Subarkah, E. P. Pambudi, and S. O. N. Hidayah, “Perbandingan Metode Klasifikasi Data Mining untuk Nasabah Bank Telemarketing,”
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 20, no. 1, pp. 139–148, sep 2020.
[4] L. Fimawahib and E. Rouza, “Penerapan K-Means Clustering pada Penentuan Jenis Pembelajaran di Universitas Pasir Pengaraian,”
INOVTEK Polbeng - Seri Informatika, vol. 6, no. 2, pp. 234–247, nov 2021.
[5] N. Dwitri, J. A. Tampubolon, S. Prayoga, F. I. R.H Zer, and D. Hartama, “Penerapan Algoritma K-Means Dalam Menentukan
Tingkat Penyebaran Pandemi Covid-19 Di Indonesia,” Jurnal Teknologi Informasi, vol. 4, no. 1, pp. 128–132, 2020.
[6] C. S. D. B. Sembiring, L. Hanum, and S. P. Tamba, “Penerapan Data Mining Menggunakan Algoritma K-Means Untuk
Menentukan Judul Skripsi Dan Jurnal Penelitian (Studi Kasus Ftik Unpri),” Jurnal Sistem Informasi dan Ilmu Komputer
Prima(JUSIKOM PRIMA), vol. 5, no. 2, pp. 80–85, 2022.
[7] C. Satria and A. Anggrawan, “Aplikasi K-Means berbasis Web untuk Klasifikasi Kelas Unggulan,” MATRIK : Jurnal Manajemen,
Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 1, pp. 111 – 124, nov 2021.
[8] F. Dikarya and S. Muharni, “Penerapan Algoritma K-Means Clustering Untuk Pengelompokan Universitas Terbaik Di Dunia,”
Jurnal Informatika, vol. 22, no. 2, pp. 124–131, 2022.
[9] F. Amin, D. S. Anggraeni, and Q. Aini, “Penerapan Metode K-Means dalam Penjualan Produk Souq.Com,” Applied Information
System and Management (AISM), vol. 5, no. 1, pp. 7–14, 2022.
[10] M. Nishom, “Perbandingan Akurasi Euclidean Distance, Minkowski Distance, dan Manhattan Distance pada Algoritma KMeans
Clustering berbasis Chi-Square,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 4, no. 1, pp. 20–24, 2019.
[11] A. Ali, “Klasterisasi Data Rekam Medis Pasien Menggunakan Metode K-Means Clustering di Rumah Sakit Anwar Medika
Balong Bendo Sidoarjo,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 19, no. 1, pp. 186–
195, 2019.
[12] L. G. Rady Putra and A. Anggrawan, “Pengelompokan Penerima Bantuan Sosial Masyarakat dengan Metode K-Means,” MATRIK
: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 1, pp. 205–214, nov 2021.
[13] W. Gie and D. Jollyta, “Perbandingan Euclidean dan Manhattan Untuk Optimasi Cluster Menggunakan Davies Bouldin Index
: Status Covid-19 Wilayah Riau,” in Prosiding Seminar Nasional Riset Dan Information Science (SENARIS) 2020, 2020, pp.
187–191.
[14] L. Han, F. Gao, B. Zhou, and S. Shen, “FIESTA: Fast Incremental Euclidean Distance Fields for Online Motion Planning of
Aerial Robots,” IEEE International Conference on Intelligent Robots and Systems, pp. 4423–4430, 2019.
[15] Y. Zhao, R. Dai, Y. Yang, F. Li, Y. Zhang, and X.Wang, “Integrated evaluation of resource and environmental carrying capacity
during the transformation of resource-exhausted cities based on Euclidean distance and a Gray-TOPSIS model: A case study of
Jiaozuo City, China,” Ecological Indicators, vol. 142, no. July, p. 109282, 2022.
[16] M. Santosh and A. Sharma, “Proposed framework for emotion recognition using canberra distance classifier,” Journal of Computational
and Theoretical Nanoscience, vol. 16, no. 9, pp. 3778–3782, 2019.
[17] H. Ren, Y. Gao, and T. Yang, “A Novel Regret Theory-Based Decision-Making Method Combined with the Intuitionistic Fuzzy
Canberra Distance,” Discrete Dynamics in Nature and Society, vol. 2020, no. October, pp. 1–9, oct 2020.
[18] X. Gao and G. Li, “A KNN Model Based on Manhattan Distance to Identify the SNARE Proteins,” IEEE Access, vol. 8, pp.
112 922–112 931, 2020.
[19] G. T. Pranoto, W. Hadikristanto, and Y. Religia, “Grouping of Village Status in West Java Province Using the Manhattan,
Euclidean and Chebyshev Methods on the K-Mean Algorithm,” JISA(Jurnal Informatika dan Sains), vol. 5, no. 1, pp. 28–34,
2022.
[20] R. H. Singh, S. Maurya, T. Tripathi, T. Narula, and G. Srivastav, “Movie Recommendation System using Cosine Similarity and
KNN,” International Journal of Engineering and Advanced Technology, vol. 9, no. 5, pp. 556–559, 2020.
[21] K. Park, J. S. Hong, andW. Kim, “A Methodology Combining Cosine Similarity with Classifier for Text Classification,” Applied
Artificial Intelligence, vol. 34, no. 5, pp. 396–411, 2020.
[22] G. Ilharco, V. Jain, A. Ku, E. Ie, and J. Baldridge, “General Evaluation for Instruction Conditioned Navigation using Dynamic
Time Warping,” no. NeurIPS, jul 2019.
[23] W. S. Moola, W. Bijker, M. Belgiu, and M. Li, “Vegetable mapping using fuzzy classification of Dynamic Time Warping
distances from time series of Sentinel-1A images,” International Journal of Applied Earth Observation and Geoinformation,
vol. 102, no. June, pp. 1–16, oct 2021.
[24] S. Bag, S. K. Kumar, and M. K. Tiwari, “An efficient recommendation generation using relevant Jaccard similarity,” Information
Sciences, vol. 483, no. May, pp. 53–64, 2019.
[25] M. Tang, Y. Kaymaz, B. L. Logeman, S. Eichhorn, Z. S. Liang, C. Dulac, and T. B. Sackton, “Evaluating single-cell cluster
stability using the Jaccard similarity index,” Bioinformatics, vol. 37, no. 15, pp. 2212–2214, 2021.
[26] S. S. Khan, Q. Ran, M. Khan, and M. Zhang, “Hyperspectral image classification using nearest regularized subspace with
Manhattan distance,” Journal of Applied Remote Sensing, vol. 14, no. 03, p. 1, 2019.
[27] N. Li and S.Wan, “Research on Fast Compensation Algorithm for Interframe Motion of Multimedia Video Based on Manhattan
Distance,” Journal of Mathematics, vol. 2022, no. June, pp. 1–10, jan 2022.
Published
2023-11-20
How to Cite
Buaton, R., & Solikhun, S. (2023). Application of Numerical Measure Variations in K-Means Clustering for Grouping Data. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 23(1), 103-112. https://doi.org/https://doi.org/10.30812/matrik.v23i1.3269
Section
Articles