Comparing K-Means, GMM and BIRCH for Student Academic Performance Data: Evaluation on Two Public Datasets
DOI:
https://doi.org/10.30812/matrik.v25i1.5843Keywords:
Birch, Clustering, Comparison, GMM, K-Means, Student AcademicAbstract
Academic data contains complex patterns that require appropriate clustering approaches to support informed educational decision-making. However, comparative studies that regularly evaluate various clustering methods for student academic performance, using diverse public data sets and consistent evaluation criteria, are limited. This study aims to identify the most effective clustering algorithm for modeling student academic performance by comparing three techniques: K-Means, GMM, and BIRCH, on two publicly available datasets: the Student Performance Metrics (SPM) Dataset with 16 features and 493 instances, and the Higher Education Students Performance Evaluation (HESPE) dataset with 32 features and 145 instances. Algorithm evaluation was performed using Sum of Squared Errors (SSE), Davies–Bouldin Index (DBI), Silhouette Score, and computational time. The results show that K-Means consistently provides superior clustering quality on both datasets, outperforming the other algorithms in four evaluation criteria, while BIRCH demonstrates superiority in two metrics and achieves the shortest computational time. These findings highlight that clustering effectiveness is strongly influenced by algorithm characteristics and data structure, with K-Means being more suitable for accuracy-oriented clustering and BIRCH for time-critical applications. Overall, this study contributes to educational data mining by providing comparative evidence on algorithm performance and demonstrating how methodological choices influence the interpretation of student performance patterns. In practice, institutions can choose clustering methods that best suit their needs, such as K-Means for precise academic profiling or BIRCH for rapid, large scale analysis, to help students graduate successfully.
Downloads
References
[1] M. Alvarez-Garcia, M. Arenas-Parra, and R. Ibar-Alonso, “Uncovering student profiles. An explainable cluster analysis approach to PISA 2022,” Comput Educ, vol. 223, p. 105166, 2024, doi: https://doi.org/10.1016/j.compedu.2024.105166.
[2] Z. Zhang, X. Zeng, H. Bao, and B. Li, “Intelligent Student Performance Clustering and Personalized Teaching Suggestions Based on K-Means,” in Proceedings of the 2024 International Conference on Digital Society and Artificial Intelligence, in DSAI ’24. 2024, pp. 38–42. doi: https://doi.org/10.1145/3677892.3677898.
[3] W. Chen, Z. Wu, S. Zeng, H. Guo, and J. Li, “Diverse behavior clustering of students on campus with macroscopic attention,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-15103-8.
[4] M. Gul and M. A. Rehman, “Big data: an optimized approach for cluster initialization,” J Big Data, vol. 10, no. 1, Dec. 2023, doi: https://doi.org/10.1186/s40537-023-00798-1.
[5] K. Ouassif, B. Ziani, J. Herrera-Tapia, and C. A. Kerrache, “Empowering Education: Leveraging Clustering and Recommendations for Enhanced Student Insights,” Educ Sci (Basel), vol. 15, no. 7, 2025, doi: https://doi.org/10.3390/educsci15070819.
[6] E. Kalita et al., “Educational data mining: a 10-year review,” Discover Computing, vol. 28, no. 1, p. 81, 2025, doi: https://doi.org/10.1007/s10791-025-09589-z.
[7] S. J. Sultan Alalawi, I. N. Mohd Shaharanee, and J. Mohd Jamil, “Clustering Student Performance Data Using K-Means Algorithms,” Journal of Computational Innovation and Analytics (JCIA), vol. 2, no. 1, pp. 41–55, Jan. 2023, doi: https://doi.org/10.32890/jcia2023.2.1.3.
[8] A. F. Mohamed Nafuri, N. S. Sani, N. F. A. Zainudin, A. H. A. Rahman, and M. Aliff, “Clustering Analysis for Classifying Student Academic Performance in Higher Education,” Applied Sciences, vol. 12, no. 19, 2022, doi: https://doi.org/10.3390/app12199467.
[9] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.
[10] P. Economou, “A clustering algorithm for overlapping Gaussian mixtures,” Research in Statistics, vol. 1, no. 1, Oct. 2023, doi: https://doi.org/10.1080/27684520.2023.2242337.
[11] B. Chassagnol et al., “Gaussian Mixture Models in R,” R J, vol. 15, no. 2, Jun. 2023, doi: https://doi.org/10.32614/RJ-2023-043.
[12] A. Lang and E. Schubert, “BETULA: Fast clustering of large data with improved BIRCH CF-Trees,” Inf Syst, vol. 108, p. 101918, 2022, doi: https://doi.org/10.1016/j.is.2021.101918.
[13] R. Wang and J. Li, “Fast sparse representative tree splitting via local density for large-scale clustering,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-13848-w.
[14] A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,” PeerJ Comput Sci, vol. 10, pp. 1–45, 2024, doi: https://doi.org/10.7717/PEERJ-CS.2286.
[15] S. Pitafi, T. Anwar, and Z. Sharif, “A Taxonomy of Machine Learning Clustering Algorithms, Challenges, and Future Realms,” Mar. 01, 2023, MDPI. doi: https://doi.org/10.3390/app13063529.
[16] P. Artioli, A. Maci, and A. Magrì, “A comprehensive investigation of clustering algorithms for User and Entity Behavior Analytics,” Front Big Data, vol. 7, 2024, doi: https://doi.org/10.3389/fdata.2024.1375818.
[17] J. M. John, O. Shobayo, and B. Ogunleye, “An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market,” Analytics, vol. 2, no. 4, pp. 809–823, Dec. 2023, doi: https://doi.org/10.3390/analytics2040042.
[18] S. and R. S. Bhurre Shraddha and Prajapat, “Performance Pattern Mining for Higher Education Students in Blended Learning Using Clustering Algorithms,” in The International Conference on Computing, Communication, Cybersecurity and AI, Dec. 2024, pp. 362–386. doi: https://doi.org/10.1007/978-3-031-74443-3_22.
[19] J. Dong, R. Sun, Z. Yan, M. Shi, and X. Bi, “Research on learning achievement classification based on machine learning,” PLoS One, vol. 20, no. 6 June, Jun. 2025, doi: https://doi.org/10.1371/journal.pone.0325713.
[20] I. K. Khan et al., “Determining the optimal number of clusters by Enhanced Gap Statistic in K-mean algorithm,” Egyptian Informatics Journal, vol. 27, Sep. 2024, doi: https://doi.org/10.1016/j.eij.2024.100504.
[21] T. C. Liu, P. N. Kalugin, J. L. Wilding, and W. F. Bodmer, “GMMchi: gene expression clustering using Gaussian mixture modeling,” BMC Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: https://doi.org/10.1186/s12859-022-05006-0.
[22] R. Ulug, “Implementation of the BIRCH algorithm to construct a data-adaptive network design for regional gravity field modeling via SRBF,” Earth Sci Inform, vol. 18, no. 2, Feb. 2025, doi: https://doi.org/10.1007/s12145-025-01712-4.
[23] F. J. and R.-M. C. and P.-O. J. M. Raya-Tapia Alma Yunuen and López-Flores, “Fundamentals of Clustering: Methods, Metrics, and Optimization,” in Machine Learning and Clustering for a Sustainable Future: Applications in Engineering and Environmental Science, Cham: Springer Nature Switzerland, 2025, pp. 13–50. doi: https://doi.org/10.1007/978-3-032-03876-0_2.
[24] B. Sadeghi, “Clustering in geo-data science: Navigating uncertainty to select the most reliable method,” Jun. 01, 2025, Elsevier B.V. doi: https://doi.org/10.1016/j.oregeorev.2025.106591.
[25] L. E. Ekemeyong Awong and T. Zielinska, “Comparative Analysis of the Clustering Quality in Self-Organizing Maps for Human Posture Classification,” Sensors, vol. 23, no. 18, Sep. 2023, doi: https://doi.org/10.3390/s23187925.
[26] K. Amrulloh, T. Hendro Pudjiantoro, P. Nurul Sabrina, and A. Id Hadiana, “Comparison Between Davies-Bouldin Index and Silhouette Coefficient Evaluation Methods in Retail Store Sales Transaction Data Clusterization Using K-Medoids Algorithm,” in 3rd South American International Industrial Engineering and Operations Management Conference, IEOM Society International, Jul. 2022. doi: https://doi.org/10.46254/SA03.20220384.
[27] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.
[28] N. L. G. P. Suwirmayanti, E. Setyaningsih, R. A. N. Diaz, and K. Budiarta, “Optimization Of The K-Means Method For Clustering Banking Data Using The Hybrid Model Of Invasive Weed Optimization And K-Means (IWOKM),” ICIC Express Letters, vol. 18, no. 4, pp. 413–422, Apr. 2024, doi: https://doi.org/10.24507/icicel.18.04.413.
[29] N. L. G. P. Suwirmayanti, I. Ketut Gede Darma Putra, M. Sudarma, I. M. Sukarsa, E. Setyaningsih, and R. A. N. Diaz, “Invasive Weed Optimization K-Means Performance Robust Operations (IWOKM PRO) in High-Dimensional Datasets,” Engineering, Technology and Applied Science Research, vol. 15, no. 4, pp. 24390–24395, Aug. 2025, doi: https://doi.org/10.48084/etasr.11112.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ricky Aurelius Nurtanto Diaz, Ni Luh Gede Pivin Suwirmayanti, Emy Setyaningsih, I Wayan Budi Sentana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
How to Cite
Similar Articles
- Indradi Rahmatullah, Gibran Satya Nugraha, Arik Aranta, Feature Selection on Grouping Students Into Lab Specializations for the Final Project Using Fuzzy C-Means , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Suhirman Suhirman, Shoffan Saifullah, Ahmad Tri Hidayat, Rr Hajar Puji Sejati, Otsu Method for Chicken Egg Embryo Detection based-on Increase Image Quality , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Christofer Satria, Peter Wijaya Sugijanto, Anthony Anggrawan, I Nyoman Yoga Sumadewa, Aprilia Dwi Dayani, Rini Anggriani, Multi-Algorithm Approach to Enhancing Social Assistance Efficiency Through Accurate Poverty Classification , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Saiful Nur Arif, Muhammad Dahria, Sarjon Defit, Dicky Novriansyah, Ali Ikhwan, Implementation of Single Linked on Machine Learning for Clustering Student Scientific Fields , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Sela Octaviani, Evi Triandini, Dandy Pramana Hostiadi, Evaluating Lecturer Satisfaction on Academic Information System Using Usability and EUCS at Bandung University of Technology , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 25 No. 1 (2025)
- Muhammad Yunus, Suriyati Suriyati, ANALISA DAN PERANCANGAN SISTEM FUZZY UNTUK PENENTUAN BEASISWA , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 16 No. 1 (2016)
- Yully Sofyah Waode, Anang Kurnia, Yenni Angraini, K-Means Optimization Algorithm to Improve Cluster Quality on Sparse Data , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Dinny Komalasari, Maria Ulfa, Pengujian Usability Heuristic Terhadap Perangkat Lunak Pembelajaran Matematika , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 2 (2020)
- Tb Ai Munandar, Ajif Yunizar Yusuf Pratama, Regional Clustering Based on Types of Non-Communicable Diseases Using k-Means Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Anthony Anggrawan, Analisis Deskriptif Hasil Belajar Pembelajaran Tatap Muka dan Pembelajaran Online Menurut Gaya Belajar Mahasiswa , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 18 No. 2 (2019)
You may also start an advanced similarity search for this article.
.png)











