Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering
DOI:
https://doi.org/10.30812/matrik.v23i1.3078Keywords:
Distance Measurements, Clustering, Numerical Measurements, Optimal ClusterAbstract
Heuristic data requires appropriate clustering methods to avoid casting doubt on the information generated by the grouping process. Determining an optimal cluster choice from the results of grouping is still challenging. This study aimed to analyze the four numerical measurement formulas in light of the data patterns from categorical that are now accessible to give users of heuristic data recommendations for how to derive knowledge or information from the best clusters. The method used was clustering with four measurements: Euclidean, Canberra, Manhattan, and Dynamic Time Warping and Elbow approach for optimizing. The Elbow with Sum Square Error (SSE) is employed to calculate the optimal cluster. The number of test clusters ranges from k = 2 to k = 10. Student data from social media was used in testing to help students achieve higher GPAs. 300 completed questionnaires that were circulated and used to collect the data. The result of this study showed that the Manhattan Distance is the best numerical measurement with the largest SSE of 45.359 and optimal clustering at k = 5. The optimal cluster Manhattan generated was made up of students with GPAs above 3.00 and websites/ vlogs used as learning tools by the mathematics and computer department. Each cluster’s ability to create information can be impacted by the proximity of qualities caused by variations in the number of clusters.
Downloads
References
Algorithm for Variations Number of Centroid K,†in Journal of Physics: Conference Series, vol. 1566, no. 1, 2020, p. 7.
[2] Sapriadi, Sutarman, and E. B. Nababan, “Improvement of K-Means Performance Using a Combination of Principal Component
Analysis and Rapid Centroid Estimation,†in Journal of Physics: Conference Series, vol. 1230, no. 1, 2019, p. 8.
[3] I. H. Sarker, “Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications
Perspective,†SN Computer Science, vol. 2, no. 5, pp. 1–22, 2021.
[4] H. Ren, Y. Gao, and T. Yang, “A Novel Regret Theory-Based Decision-Making Method Combined with the Intuitionistic Fuzzy
Canberra Distance,†Discrete Dynamics in Nature and Society, vol. 2020, no. -, pp. 1–9, 2020.
[5] M. Zubair, M. A. Iqbal, A. Shil, M. J. Chowdhury, M. A. Moni, and I. H. Sarker, “An Improved K-means Clustering Algorithm
Towards an Efficient Data-Driven Modeling,†Annals of Data Science, vol. June, no. June, pp. 23–25, 2022.
[6] M. Faisal, E. M. Zamzami, and Sutarman, “Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean
Distance, Canberra Distance and Manhattan Distance,†in Journal of Physics: Conference Series, vol. 1566, no. 1, 2020, p. 8.
[7] H. Wu, Y. Cao, H. Wei, and Z. Tian, “Face Recognition Based on Haar like and Euclidean Distance,†in Journal of Physics:
Conference Series, vol. 1813, no. 1, 2021, pp. 2–8.
[8] P. Istalkar, S. L. Unnithan, B. Biswal, and B. Sivakumar, “A Canberra distance-based complex network classification framework
using lumped catchment characteristics,†Stochastic Environmental Research and Risk Assessment, vol. 35, no. 6, pp. 1293–
1300, 2021.
[9] M. Raeisi and A. B. Sesay, “A Distance Metric for Uneven Clusters of Unsupervised K-Means Clustering Algorithm,†IEEE
Access, vol. 10, no. August, pp. 86 286–86 297, 2022.
[10] K.-n. Neighbor, A. F. Pulungan, M. Zarlis, and S. Suwilo, “Performance Analysis of Distance Measures in K-Nearest Neighbor,â€
in ICMASES 2019, 2020, p. 9.
[11] A. Fadlil and N. Tristanti, “Comparative Analysis of Euclidean , Manhattan , Canberra , and Squared Chord Methods in Face
Recognition,†vol. 37, no. 3, pp. 593–599, 2023.
[12] Sunardi, Abdul Fadlil, and Novi Tristanti, “The Application of The Manhattan Method to Human Face Recognition,†Jurnal
RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 6, pp. 939–944, 2022.
[13] D. Deriso and S. Boyd, “A general optimization framework for dynamic time warping,†Optimization and Engineering, vol.
June, no. 0123456789, p. 22, 2022.
[14] E. Eslami, Y. Choi, Y. Lops, A. Sayeed, and A. K. Salman, “Using wavelet transform and dynamic time warping to identify the
limitations of the CNN model as an air quality forecasting system,†Geoscientific Model Development, vol. 13, no. December,
pp. 6237–6251, 2020.
[15] P. Lippe and E. Gavves, “L Atent N Ormalizing F Lows for,†in conference paper at ICLR 2021, no. -, 2021, p. 27.
[16] C. Guyeux, S. Chr´etien, G. B. Tayeh, and J. Demerjian, “Introducing and Comparing Recent Clustering Methods for Massive
Data Management in the Internet of Things,†Journal of Sensor and Actuator Network, vol. 8, no. 56, pp. 1–25, 2019.
[17] R. Bond and P. Biglarbeigi, “Data-driven versus a domain-led approach to k-means clustering on an open heart failure dataset,â€
International Journal of Data Science and Analytics, vol. 15, no. 1, pp. 49–66, 2023.
[18] M. Cui, “Introduction to the K-Means Clustering Algorithm Based on the Elbow Method,†Accounting, Auditing and Finance,
vol. 2020, no. 1, pp. 5–8, 2020.
[19] M. A. Jassim and S. N. Abdulwahid, “Data Mining preparation: Process, Techniques and Major Issues in Data Analysis,†in
IOP Conference Series: Materials Science and Engineering, vol. 1090, no. 1, 2021, p. 012053.
[20] J. Han and M. Kamber, Data Mining: Concepts and Techniques (2nd edition), 2006, vol. 54, no. Second Edition.
[21] M.-f. O.-d. Algorithm, R. Laher, A. Grant, F. Fang, W. Chen, Z. Tian, L. Zhang, and Y. Yang, “An outlier detection algorithm
based on maximum and minimum distance,†in ICEECT, 2021, p. 6.
[22] H. S. Lee, “Application of dynamic time warping algorithm for pattern similarity of gait,†Journal of Exercise Rehabilitation,
vol. 15, no. 4, pp. 526–530, 2019.
[23] D. Bertsimas, A. Orfanoudaki, and H. Wiberg, Interpretable clustering : an optimization approach. Springer US, 2021, vol.
110, no. 1.
[24] R. D. Dana, D. Soilihudin, and R. D. Priyatna, “Improved the Performance of the K-Means Cluster Using the Sum of Squared
Error ( SSE ) optimized by using the Elbow Method,†in 1st International Conference of SNIKOM 2018, 2019, p. 7.
[25] S. Gultom, S. Sriadhi, M. Martiano, and J. Simarmata, “Comparison analysis of K-Means and K-Medoid with Ecluidience
Distance Algorithm, Chanberra Distance, and Chebyshev Distance for Big Data Clustering,†IOP Conference Series: Materials
Science and Engineering, vol. 420, no. 1, p. 8, 2018.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Relita Buaton, Solikhun Solikhun, Application of Numerical Measure Variations in K-Means Clustering for Grouping Data , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Paska Marto Hasugian, Devy Mathelinea, Siska Simamora, Pandi Barita Nauli Simangunsong, Comparative Evaluation of Data Clustering Accuracy through Integration of Dimensionality Reduction and Distance Metric , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Tb Ai Munandar, Ajif Yunizar Yusuf Pratama, Regional Clustering Based on Types of Non-Communicable Diseases Using k-Means Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Prihandoko Prihandoko, Deny Jollyta, Gusrianty Gusrianty, Muhammad Siddik, Johan Johan, Cluster Validity for Optimizing Classification Model: Davies Bouldin Index – Random Forest Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Suwardi Annas, Bobby Poerwanto, Sapriani Sapriani, Muhammad Fahmuddin S, Implementation of K-Means Clustering on Poverty Indicators in Indonesia , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Amir Ali, Klasterisasi Data Rekam Medis Pasien Menggunakan Metode K-Means Clustering di Rumah Sakit Anwar Medika Balong Bendo Sidoarjo , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 1 (2019)
- Anas Syaifudin, Purwanto Purwanto, Heribertus Himawan, M. Arief Soeleman, Customer Segmentation with RFM Model using Fuzzy C-Means and Genetic Programming , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 2 (2023)
- Saiful Nur Arif, Muhammad Dahria, Sarjon Defit, Dicky Novriansyah, Ali Ikhwan, Implementation of Single Linked on Machine Learning for Clustering Student Scientific Fields , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Robby Rizky, Zaenal Hakim, Sri Setiyowati, Susilawati susilawati, Ayu Mira Yunita, Development of the Multi-Channel Clustering Hierarchy Method for Increasing Performance in Wireless Sensor Network , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- lili Tanti, Syahril Efendi, Maya Silvi Lydia, Herman Mawengkang, Model Dynamic Facility Location in Post-Disaster Areas in Uncertainty , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Prihandoko Prihandoko, Deny Jollyta, Gusrianty Gusrianty, Muhammad Siddik, Johan Johan, Cluster Validity for Optimizing Classification Model: Davies Bouldin Index – Random Forest Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Dadang Priyanto, Bambang Krismono Triwijoyo, Deny Jollyta, Hairani Hairani, Ni Gusti Ayu Dasriani, Data Mining Earthquake Prediction with Multivariate Adaptive Regression Splines and Peak Ground Acceleration , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
- Dadang Priyanto, Raisul Azhar, SISTEM APLIKASI UNTUK KEAMANAN DATA DENGAN ALGORITMA 'DES' (Data Encryption Standard) , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 16 No. 1 (2016)
- Nurdin Nurdin, Erni Susanti, Hafizh Al-Kautsar Aidilof, Dadang Priyanto, Comparison of Naive Bayes and Dempster Shafer Methods in Expert System for Early Diagnosis of COVID-19 , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Musta’an Musta’an, Dadang Priyanto, SISTEM INFORMASI PENGADAAN BARANG LANGSUNG BERBASIS CLIENT-SERVER (Study Kasus di Universitas Mataram) , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 15 No. 2 (2016)
- Muhammad Hairul Abror, Dadang Priyanto, MEDIA BANTU PEMBELAJARAN IPA SMP SEBAGAI BEKAL MENGHADAPI UJIAN NASIONAL (UN) , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 15 No. 1 (2015)