Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering

  • Deny Jollyta Institut Bisnis dan Teknologi Pelita Indonesia, Riau, Indonesia
  • Prihandoko Prihandoko Universitas Gunadarma, Depok, Indonesia
  • Dadang Priyanto Universitas Bumigora, Mataram, Indonesia
  • Alyauma Hajjah Institut Bisnis dan Teknologi Pelita Indonesia, Riau, Indonesia
  • Yulvia Nora Marlim Institut Bisnis dan Teknologi Pelita Indonesia, Riau, Indonesia
Keywords: Distance Measurements, Clustering, Numerical Measurements, Optimal Cluster

Abstract

Heuristic data requires appropriate clustering methods to avoid casting doubt on the information generated by the grouping process. Determining an optimal cluster choice from the results of grouping is still challenging. This study aimed to analyze the four numerical measurement formulas in light of the data patterns from categorical that are now accessible to give users of heuristic data recommendations for how to derive knowledge or information from the best clusters. The method used was clustering with four measurements: Euclidean, Canberra, Manhattan, and Dynamic Time Warping and Elbow approach for optimizing. The Elbow with Sum Square Error (SSE) is employed to calculate the optimal cluster. The number of test clusters ranges from k = 2 to k = 10. Student data from social media was used in testing to help students achieve higher GPAs. 300 completed questionnaires that were circulated and used to collect the data. The result of this study showed that the Manhattan Distance is the best numerical measurement with the largest SSE of 45.359 and optimal clustering at k = 5. The optimal cluster Manhattan generated was made up of students with GPAs above 3.00 and websites/ vlogs used as learning tools by the mathematics and computer department. Each cluster’s ability to create information can be impacted by the proximity of qualities caused by variations in the number of clusters.

Downloads

Download data is not yet available.

References

[1] R. Suwanda, Z. Syahputra, and E. M. Zamzami, “Analysis of Euclidean Distance and Manhattan Distance in the K-Means
Algorithm for Variations Number of Centroid K,” in Journal of Physics: Conference Series, vol. 1566, no. 1, 2020, p. 7.
[2] Sapriadi, Sutarman, and E. B. Nababan, “Improvement of K-Means Performance Using a Combination of Principal Component
Analysis and Rapid Centroid Estimation,” in Journal of Physics: Conference Series, vol. 1230, no. 1, 2019, p. 8.
[3] I. H. Sarker, “Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications
Perspective,” SN Computer Science, vol. 2, no. 5, pp. 1–22, 2021.
[4] H. Ren, Y. Gao, and T. Yang, “A Novel Regret Theory-Based Decision-Making Method Combined with the Intuitionistic Fuzzy
Canberra Distance,” Discrete Dynamics in Nature and Society, vol. 2020, no. -, pp. 1–9, 2020.
[5] M. Zubair, M. A. Iqbal, A. Shil, M. J. Chowdhury, M. A. Moni, and I. H. Sarker, “An Improved K-means Clustering Algorithm
Towards an Efficient Data-Driven Modeling,” Annals of Data Science, vol. June, no. June, pp. 23–25, 2022.
[6] M. Faisal, E. M. Zamzami, and Sutarman, “Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean
Distance, Canberra Distance and Manhattan Distance,” in Journal of Physics: Conference Series, vol. 1566, no. 1, 2020, p. 8.
[7] H. Wu, Y. Cao, H. Wei, and Z. Tian, “Face Recognition Based on Haar like and Euclidean Distance,” in Journal of Physics:
Conference Series, vol. 1813, no. 1, 2021, pp. 2–8.
[8] P. Istalkar, S. L. Unnithan, B. Biswal, and B. Sivakumar, “A Canberra distance-based complex network classification framework
using lumped catchment characteristics,” Stochastic Environmental Research and Risk Assessment, vol. 35, no. 6, pp. 1293–
1300, 2021.
[9] M. Raeisi and A. B. Sesay, “A Distance Metric for Uneven Clusters of Unsupervised K-Means Clustering Algorithm,” IEEE
Access, vol. 10, no. August, pp. 86 286–86 297, 2022.
[10] K.-n. Neighbor, A. F. Pulungan, M. Zarlis, and S. Suwilo, “Performance Analysis of Distance Measures in K-Nearest Neighbor,”
in ICMASES 2019, 2020, p. 9.
[11] A. Fadlil and N. Tristanti, “Comparative Analysis of Euclidean , Manhattan , Canberra , and Squared Chord Methods in Face
Recognition,” vol. 37, no. 3, pp. 593–599, 2023.
[12] Sunardi, Abdul Fadlil, and Novi Tristanti, “The Application of The Manhattan Method to Human Face Recognition,” Jurnal
RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 6, pp. 939–944, 2022.
[13] D. Deriso and S. Boyd, “A general optimization framework for dynamic time warping,” Optimization and Engineering, vol.
June, no. 0123456789, p. 22, 2022.
[14] E. Eslami, Y. Choi, Y. Lops, A. Sayeed, and A. K. Salman, “Using wavelet transform and dynamic time warping to identify the
limitations of the CNN model as an air quality forecasting system,” Geoscientific Model Development, vol. 13, no. December,
pp. 6237–6251, 2020.
[15] P. Lippe and E. Gavves, “L Atent N Ormalizing F Lows for,” in conference paper at ICLR 2021, no. -, 2021, p. 27.
[16] C. Guyeux, S. Chr´etien, G. B. Tayeh, and J. Demerjian, “Introducing and Comparing Recent Clustering Methods for Massive
Data Management in the Internet of Things,” Journal of Sensor and Actuator Network, vol. 8, no. 56, pp. 1–25, 2019.
[17] R. Bond and P. Biglarbeigi, “Data-driven versus a domain-led approach to k-means clustering on an open heart failure dataset,”
International Journal of Data Science and Analytics, vol. 15, no. 1, pp. 49–66, 2023.
[18] M. Cui, “Introduction to the K-Means Clustering Algorithm Based on the Elbow Method,” Accounting, Auditing and Finance,
vol. 2020, no. 1, pp. 5–8, 2020.
[19] M. A. Jassim and S. N. Abdulwahid, “Data Mining preparation: Process, Techniques and Major Issues in Data Analysis,” in
IOP Conference Series: Materials Science and Engineering, vol. 1090, no. 1, 2021, p. 012053.
[20] J. Han and M. Kamber, Data Mining: Concepts and Techniques (2nd edition), 2006, vol. 54, no. Second Edition.
[21] M.-f. O.-d. Algorithm, R. Laher, A. Grant, F. Fang, W. Chen, Z. Tian, L. Zhang, and Y. Yang, “An outlier detection algorithm
based on maximum and minimum distance,” in ICEECT, 2021, p. 6.
[22] H. S. Lee, “Application of dynamic time warping algorithm for pattern similarity of gait,” Journal of Exercise Rehabilitation,
vol. 15, no. 4, pp. 526–530, 2019.
[23] D. Bertsimas, A. Orfanoudaki, and H. Wiberg, Interpretable clustering : an optimization approach. Springer US, 2021, vol.
110, no. 1.
[24] R. D. Dana, D. Soilihudin, and R. D. Priyatna, “Improved the Performance of the K-Means Cluster Using the Sum of Squared
Error ( SSE ) optimized by using the Elbow Method,” in 1st International Conference of SNIKOM 2018, 2019, p. 7.
[25] S. Gultom, S. Sriadhi, M. Martiano, and J. Simarmata, “Comparison analysis of K-Means and K-Medoid with Ecluidience
Distance Algorithm, Chanberra Distance, and Chebyshev Distance for Big Data Clustering,” IOP Conference Series: Materials
Science and Engineering, vol. 420, no. 1, p. 8, 2018.
Published
2023-11-20
How to Cite
Jollyta, D., Prihandoko, P., Priyanto, D., Hajjah, A., & Nora Marlim, Y. (2023). Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 23(1), 93-102. https://doi.org/https://doi.org/10.30812/matrik.v23i1.3078
Section
Articles