Comparing K-Means, GMM and BIRCH for Student Academic Performance Data: Evaluation on Two Public Datasets

Ricky Aurelius Nurtanto Diaz; Ni Luh Gede Pivin Suwirmayanti; Emy Setyaningsih; I Wayan Budi  Sentana

doi:10.30812/matrik.v25i1.5843

Authors

Ricky Aurelius Nurtanto Diaz Institut Teknologi dan Bisnis STIKOM Bali, Bali, Indonesia
Ni Luh Gede Pivin Suwirmayanti Institut Teknologi dan Bisnis STIKOM Bali, Bali, Indonesia
Emy Setyaningsih Universitas AKPRIND, Yogyakarta, Indonesia
I Wayan Budi Sentana Politeknik Negeri Bali, Bali, Indonesia

DOI:

https://doi.org/10.30812/matrik.v25i1.5843

Keywords:

Birch, Clustering, Comparison, GMM, K-Means, Student Academic

Abstract

Academic data contains complex patterns that require appropriate clustering approaches to support informed educational decision-making. However, comparative studies that regularly evaluate various clustering methods for student academic performance, using diverse public data sets and consistent evaluation criteria, are limited. This study aims to identify the most effective clustering algorithm for modeling student academic performance by comparing three techniques: K-Means, GMM, and BIRCH, on two publicly available datasets: the Student Performance Metrics (SPM) Dataset with 16 features and 493 instances, and the Higher Education Students Performance Evaluation (HESPE) dataset with 32 features and 145 instances. Algorithm evaluation was performed using Sum of Squared Errors (SSE), Davies–Bouldin Index (DBI), Silhouette Score, and computational time. The results show that K-Means consistently provides superior clustering quality on both datasets, outperforming the other algorithms in four evaluation criteria, while BIRCH demonstrates superiority in two metrics and achieves the shortest computational time. These findings highlight that clustering effectiveness is strongly influenced by algorithm characteristics and data structure, with K-Means being more suitable for accuracy-oriented clustering and BIRCH for time-critical applications. Overall, this study contributes to educational data mining by providing comparative evidence on algorithm performance and demonstrating how methodological choices influence the interpretation of student performance patterns. In practice, institutions can choose clustering methods that best suit their needs, such as K-Means for precise academic profiling or BIRCH for rapid, large scale analysis, to help students graduate successfully.

Downloads

Download data is not yet available.

References

[1] M. Alvarez-Garcia, M. Arenas-Parra, and R. Ibar-Alonso, “Uncovering student profiles. An explainable cluster analysis approach to PISA 2022,” Comput Educ, vol. 223, p. 105166, 2024, doi: https://doi.org/10.1016/j.compedu.2024.105166.

[2] Z. Zhang, X. Zeng, H. Bao, and B. Li, “Intelligent Student Performance Clustering and Personalized Teaching Suggestions Based on K-Means,” in Proceedings of the 2024 International Conference on Digital Society and Artificial Intelligence, in DSAI ’24. 2024, pp. 38–42. doi: https://doi.org/10.1145/3677892.3677898.

[3] W. Chen, Z. Wu, S. Zeng, H. Guo, and J. Li, “Diverse behavior clustering of students on campus with macroscopic attention,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-15103-8.

[4] M. Gul and M. A. Rehman, “Big data: an optimized approach for cluster initialization,” J Big Data, vol. 10, no. 1, Dec. 2023, doi: https://doi.org/10.1186/s40537-023-00798-1.

[5] K. Ouassif, B. Ziani, J. Herrera-Tapia, and C. A. Kerrache, “Empowering Education: Leveraging Clustering and Recommendations for Enhanced Student Insights,” Educ Sci (Basel), vol. 15, no. 7, 2025, doi: https://doi.org/10.3390/educsci15070819.

[6] E. Kalita et al., “Educational data mining: a 10-year review,” Discover Computing, vol. 28, no. 1, p. 81, 2025, doi: https://doi.org/10.1007/s10791-025-09589-z.

[7] S. J. Sultan Alalawi, I. N. Mohd Shaharanee, and J. Mohd Jamil, “Clustering Student Performance Data Using K-Means Algorithms,” Journal of Computational Innovation and Analytics (JCIA), vol. 2, no. 1, pp. 41–55, Jan. 2023, doi: https://doi.org/10.32890/jcia2023.2.1.3.

[8] A. F. Mohamed Nafuri, N. S. Sani, N. F. A. Zainudin, A. H. A. Rahman, and M. Aliff, “Clustering Analysis for Classifying Student Academic Performance in Higher Education,” Applied Sciences, vol. 12, no. 19, 2022, doi: https://doi.org/10.3390/app12199467.

[9] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.

[10] P. Economou, “A clustering algorithm for overlapping Gaussian mixtures,” Research in Statistics, vol. 1, no. 1, Oct. 2023, doi: https://doi.org/10.1080/27684520.2023.2242337.

[11] B. Chassagnol et al., “Gaussian Mixture Models in R,” R J, vol. 15, no. 2, Jun. 2023, doi: https://doi.org/10.32614/RJ-2023-043.

[12] A. Lang and E. Schubert, “BETULA: Fast clustering of large data with improved BIRCH CF-Trees,” Inf Syst, vol. 108, p. 101918, 2022, doi: https://doi.org/10.1016/j.is.2021.101918.

[13] R. Wang and J. Li, “Fast sparse representative tree splitting via local density for large-scale clustering,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-13848-w.

[14] A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,” PeerJ Comput Sci, vol. 10, pp. 1–45, 2024, doi: https://doi.org/10.7717/PEERJ-CS.2286.

[15] S. Pitafi, T. Anwar, and Z. Sharif, “A Taxonomy of Machine Learning Clustering Algorithms, Challenges, and Future Realms,” Mar. 01, 2023, MDPI. doi: https://doi.org/10.3390/app13063529.

[16] P. Artioli, A. Maci, and A. Magrì, “A comprehensive investigation of clustering algorithms for User and Entity Behavior Analytics,” Front Big Data, vol. 7, 2024, doi: https://doi.org/10.3389/fdata.2024.1375818.

[17] J. M. John, O. Shobayo, and B. Ogunleye, “An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market,” Analytics, vol. 2, no. 4, pp. 809–823, Dec. 2023, doi: https://doi.org/10.3390/analytics2040042.

[18] S. and R. S. Bhurre Shraddha and Prajapat, “Performance Pattern Mining for Higher Education Students in Blended Learning Using Clustering Algorithms,” in The International Conference on Computing, Communication, Cybersecurity and AI, Dec. 2024, pp. 362–386. doi: https://doi.org/10.1007/978-3-031-74443-3_22.

[19] J. Dong, R. Sun, Z. Yan, M. Shi, and X. Bi, “Research on learning achievement classification based on machine learning,” PLoS One, vol. 20, no. 6 June, Jun. 2025, doi: https://doi.org/10.1371/journal.pone.0325713.

[20] I. K. Khan et al., “Determining the optimal number of clusters by Enhanced Gap Statistic in K-mean algorithm,” Egyptian Informatics Journal, vol. 27, Sep. 2024, doi: https://doi.org/10.1016/j.eij.2024.100504.

[21] T. C. Liu, P. N. Kalugin, J. L. Wilding, and W. F. Bodmer, “GMMchi: gene expression clustering using Gaussian mixture modeling,” BMC Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: https://doi.org/10.1186/s12859-022-05006-0.

[22] R. Ulug, “Implementation of the BIRCH algorithm to construct a data-adaptive network design for regional gravity field modeling via SRBF,” Earth Sci Inform, vol. 18, no. 2, Feb. 2025, doi: https://doi.org/10.1007/s12145-025-01712-4.

[23] F. J. and R.-M. C. and P.-O. J. M. Raya-Tapia Alma Yunuen and López-Flores, “Fundamentals of Clustering: Methods, Metrics, and Optimization,” in Machine Learning and Clustering for a Sustainable Future: Applications in Engineering and Environmental Science, Cham: Springer Nature Switzerland, 2025, pp. 13–50. doi: https://doi.org/10.1007/978-3-032-03876-0_2.

[24] B. Sadeghi, “Clustering in geo-data science: Navigating uncertainty to select the most reliable method,” Jun. 01, 2025, Elsevier B.V. doi: https://doi.org/10.1016/j.oregeorev.2025.106591.

[25] L. E. Ekemeyong Awong and T. Zielinska, “Comparative Analysis of the Clustering Quality in Self-Organizing Maps for Human Posture Classification,” Sensors, vol. 23, no. 18, Sep. 2023, doi: https://doi.org/10.3390/s23187925.

[26] K. Amrulloh, T. Hendro Pudjiantoro, P. Nurul Sabrina, and A. Id Hadiana, “Comparison Between Davies-Bouldin Index and Silhouette Coefficient Evaluation Methods in Retail Store Sales Transaction Data Clusterization Using K-Medoids Algorithm,” in 3rd South American International Industrial Engineering and Operations Management Conference, IEOM Society International, Jul. 2022. doi: https://doi.org/10.46254/SA03.20220384.

[27] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.

[28] N. L. G. P. Suwirmayanti, E. Setyaningsih, R. A. N. Diaz, and K. Budiarta, “Optimization Of The K-Means Method For Clustering Banking Data Using The Hybrid Model Of Invasive Weed Optimization And K-Means (IWOKM),” ICIC Express Letters, vol. 18, no. 4, pp. 413–422, Apr. 2024, doi: https://doi.org/10.24507/icicel.18.04.413.

[29] N. L. G. P. Suwirmayanti, I. Ketut Gede Darma Putra, M. Sudarma, I. M. Sukarsa, E. Setyaningsih, and R. A. N. Diaz, “Invasive Weed Optimization K-Means Performance Robust Operations (IWOKM PRO) in High-Dimensional Datasets,” Engineering, Technology and Applied Science Research, vol. 15, no. 4, pp. 24390–24395, Aug. 2025, doi: https://doi.org/10.48084/etasr.11112.

Comparing K-Means, GMM and BIRCH for Student Academic Performance Data: Evaluation on Two Public Datasets

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

sidebar menu 2

tools

citation