Comparing K-Means, GMM and BIRCH for Student Academic Performance Data: Evaluation on Two Public Datasets
DOI:
https://doi.org/10.30812/matrik.v25i1.5843Keywords:
Birch, Clustering, Comparison, GMM, K-Means, Student AcademicAbstract
Academic data contains complex patterns that require appropriate clustering approaches to support informed educational decision-making. However, comparative studies that regularly evaluate various clustering methods for student academic performance, using diverse public data sets and consistent evaluation criteria, are limited. This study aims to identify the most effective clustering algorithm for modeling student academic performance by comparing three techniques: K-Means, GMM, and BIRCH, on two publicly available datasets: the Student Performance Metrics (SPM) Dataset with 16 features and 493 instances, and the Higher Education Students Performance Evaluation (HESPE) dataset with 32 features and 145 instances. Algorithm evaluation was performed using Sum of Squared Errors (SSE), Davies–Bouldin Index (DBI), Silhouette Score, and computational time. The results show that K-Means consistently provides superior clustering quality on both datasets, outperforming the other algorithms in four evaluation criteria, while BIRCH demonstrates superiority in two metrics and achieves the shortest computational time. These findings highlight that clustering effectiveness is strongly influenced by algorithm characteristics and data structure, with K-Means being more suitable for accuracy-oriented clustering and BIRCH for time-critical applications. Overall, this study contributes to educational data mining by providing comparative evidence on algorithm performance and demonstrating how methodological choices influence the interpretation of student performance patterns. In practice, institutions can choose clustering methods that best suit their needs, such as K-Means for precise academic profiling or BIRCH for rapid, large scale analysis, to help students graduate successfully.
Downloads
References
[1] M. Alvarez-Garcia, M. Arenas-Parra, and R. Ibar-Alonso, “Uncovering student profiles. An explainable cluster analysis approach to PISA 2022,” Comput Educ, vol. 223, p. 105166, 2024, doi: https://doi.org/10.1016/j.compedu.2024.105166.
[2] Z. Zhang, X. Zeng, H. Bao, and B. Li, “Intelligent Student Performance Clustering and Personalized Teaching Suggestions Based on K-Means,” in Proceedings of the 2024 International Conference on Digital Society and Artificial Intelligence, in DSAI ’24. 2024, pp. 38–42. doi: https://doi.org/10.1145/3677892.3677898.
[3] W. Chen, Z. Wu, S. Zeng, H. Guo, and J. Li, “Diverse behavior clustering of students on campus with macroscopic attention,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-15103-8.
[4] M. Gul and M. A. Rehman, “Big data: an optimized approach for cluster initialization,” J Big Data, vol. 10, no. 1, Dec. 2023, doi: https://doi.org/10.1186/s40537-023-00798-1.
[5] K. Ouassif, B. Ziani, J. Herrera-Tapia, and C. A. Kerrache, “Empowering Education: Leveraging Clustering and Recommendations for Enhanced Student Insights,” Educ Sci (Basel), vol. 15, no. 7, 2025, doi: https://doi.org/10.3390/educsci15070819.
[6] E. Kalita et al., “Educational data mining: a 10-year review,” Discover Computing, vol. 28, no. 1, p. 81, 2025, doi: https://doi.org/10.1007/s10791-025-09589-z.
[7] S. J. Sultan Alalawi, I. N. Mohd Shaharanee, and J. Mohd Jamil, “Clustering Student Performance Data Using K-Means Algorithms,” Journal of Computational Innovation and Analytics (JCIA), vol. 2, no. 1, pp. 41–55, Jan. 2023, doi: https://doi.org/10.32890/jcia2023.2.1.3.
[8] A. F. Mohamed Nafuri, N. S. Sani, N. F. A. Zainudin, A. H. A. Rahman, and M. Aliff, “Clustering Analysis for Classifying Student Academic Performance in Higher Education,” Applied Sciences, vol. 12, no. 19, 2022, doi: https://doi.org/10.3390/app12199467.
[9] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.
[10] P. Economou, “A clustering algorithm for overlapping Gaussian mixtures,” Research in Statistics, vol. 1, no. 1, Oct. 2023, doi: https://doi.org/10.1080/27684520.2023.2242337.
[11] B. Chassagnol et al., “Gaussian Mixture Models in R,” R J, vol. 15, no. 2, Jun. 2023, doi: https://doi.org/10.32614/RJ-2023-043.
[12] A. Lang and E. Schubert, “BETULA: Fast clustering of large data with improved BIRCH CF-Trees,” Inf Syst, vol. 108, p. 101918, 2022, doi: https://doi.org/10.1016/j.is.2021.101918.
[13] R. Wang and J. Li, “Fast sparse representative tree splitting via local density for large-scale clustering,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-13848-w.
[14] A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,” PeerJ Comput Sci, vol. 10, pp. 1–45, 2024, doi: https://doi.org/10.7717/PEERJ-CS.2286.
[15] S. Pitafi, T. Anwar, and Z. Sharif, “A Taxonomy of Machine Learning Clustering Algorithms, Challenges, and Future Realms,” Mar. 01, 2023, MDPI. doi: https://doi.org/10.3390/app13063529.
[16] P. Artioli, A. Maci, and A. Magrì, “A comprehensive investigation of clustering algorithms for User and Entity Behavior Analytics,” Front Big Data, vol. 7, 2024, doi: https://doi.org/10.3389/fdata.2024.1375818.
[17] J. M. John, O. Shobayo, and B. Ogunleye, “An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market,” Analytics, vol. 2, no. 4, pp. 809–823, Dec. 2023, doi: https://doi.org/10.3390/analytics2040042.
[18] S. and R. S. Bhurre Shraddha and Prajapat, “Performance Pattern Mining for Higher Education Students in Blended Learning Using Clustering Algorithms,” in The International Conference on Computing, Communication, Cybersecurity and AI, Dec. 2024, pp. 362–386. doi: https://doi.org/10.1007/978-3-031-74443-3_22.
[19] J. Dong, R. Sun, Z. Yan, M. Shi, and X. Bi, “Research on learning achievement classification based on machine learning,” PLoS One, vol. 20, no. 6 June, Jun. 2025, doi: https://doi.org/10.1371/journal.pone.0325713.
[20] I. K. Khan et al., “Determining the optimal number of clusters by Enhanced Gap Statistic in K-mean algorithm,” Egyptian Informatics Journal, vol. 27, Sep. 2024, doi: https://doi.org/10.1016/j.eij.2024.100504.
[21] T. C. Liu, P. N. Kalugin, J. L. Wilding, and W. F. Bodmer, “GMMchi: gene expression clustering using Gaussian mixture modeling,” BMC Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: https://doi.org/10.1186/s12859-022-05006-0.
[22] R. Ulug, “Implementation of the BIRCH algorithm to construct a data-adaptive network design for regional gravity field modeling via SRBF,” Earth Sci Inform, vol. 18, no. 2, Feb. 2025, doi: https://doi.org/10.1007/s12145-025-01712-4.
[23] F. J. and R.-M. C. and P.-O. J. M. Raya-Tapia Alma Yunuen and López-Flores, “Fundamentals of Clustering: Methods, Metrics, and Optimization,” in Machine Learning and Clustering for a Sustainable Future: Applications in Engineering and Environmental Science, Cham: Springer Nature Switzerland, 2025, pp. 13–50. doi: https://doi.org/10.1007/978-3-032-03876-0_2.
[24] B. Sadeghi, “Clustering in geo-data science: Navigating uncertainty to select the most reliable method,” Jun. 01, 2025, Elsevier B.V. doi: https://doi.org/10.1016/j.oregeorev.2025.106591.
[25] L. E. Ekemeyong Awong and T. Zielinska, “Comparative Analysis of the Clustering Quality in Self-Organizing Maps for Human Posture Classification,” Sensors, vol. 23, no. 18, Sep. 2023, doi: https://doi.org/10.3390/s23187925.
[26] K. Amrulloh, T. Hendro Pudjiantoro, P. Nurul Sabrina, and A. Id Hadiana, “Comparison Between Davies-Bouldin Index and Silhouette Coefficient Evaluation Methods in Retail Store Sales Transaction Data Clusterization Using K-Medoids Algorithm,” in 3rd South American International Industrial Engineering and Operations Management Conference, IEOM Society International, Jul. 2022. doi: https://doi.org/10.46254/SA03.20220384.
[27] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.
[28] N. L. G. P. Suwirmayanti, E. Setyaningsih, R. A. N. Diaz, and K. Budiarta, “Optimization Of The K-Means Method For Clustering Banking Data Using The Hybrid Model Of Invasive Weed Optimization And K-Means (IWOKM),” ICIC Express Letters, vol. 18, no. 4, pp. 413–422, Apr. 2024, doi: https://doi.org/10.24507/icicel.18.04.413.
[29] N. L. G. P. Suwirmayanti, I. Ketut Gede Darma Putra, M. Sudarma, I. M. Sukarsa, E. Setyaningsih, and R. A. N. Diaz, “Invasive Weed Optimization K-Means Performance Robust Operations (IWOKM PRO) in High-Dimensional Datasets,” Engineering, Technology and Applied Science Research, vol. 15, no. 4, pp. 24390–24395, Aug. 2025, doi: https://doi.org/10.48084/etasr.11112.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ricky Aurelius Nurtanto Diaz, Ni Luh Gede Pivin Suwirmayanti, Emy Setyaningsih, I Wayan Budi Sentana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
How to Cite
Similar Articles
- Reni Fatrisna Salsabila, Didik Dwi Prasetya, Triyanna Widyaningtyas, Tsukasa Hirashima, Comparison of Text Representation for Clustering Student Concept Maps , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 2 (2025)
- Deny Jollyta, Prihandoko Prihandoko, Dadang Priyanto, Alyauma Hajjah, Yulvia Nora Marlim, Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Arwin Datumaya Wahyudi Sumari, Fatiha Eros Perdana, Dwi Nugraheny, Sandra Lovrencic, Improving the User Interface and Experience of a Student PortalThrough the Eight Golden Rules , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Relita Buaton, Solikhun Solikhun, Application of Numerical Measure Variations in K-Means Clustering for Grouping Data , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Paska Marto Hasugian, Devy Mathelinea, Siska Simamora, Pandi Barita Nauli Simangunsong, Comparative Evaluation of Data Clustering Accuracy through Integration of Dimensionality Reduction and Distance Metric , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Andi Hary Akbar, Heri Wijayanto, I Wayan Agus Arimbawa, K-Means-Based Customer Segmentation with Domain-Specific Feature Engineering for Water Payment Arrears Management , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 25 No. 1 (2025)
- Helna Wardhana, Baiq Dinda Uswatun Hasanah, APLIKASI MONITORING PENERIMA BEASISWA BIDIKMISI BERBASIS WEB, ANDROID DAN SMS GATEWAY , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 16 No. 1 (2016)
- Nadindra Dwi Ariyanta, Didik Dwi Prasetya, Ilham Ari Elbaith Zaeni, Tsukasa Hirashima, Reo Wicaksono, Assessing the Semantic Alignment in Multilingual Student-Teacher Concept Maps Using mBERT , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 25 No. 1 (2025)
- Jihadil Qudsi S., Anthony Anggrawan, EVALUASI PRODUK PEMBELAJARAN MULTIMEDIA (PELIN) EVALUATION OF LEARNING MULTIMEDIA PRODUCT (PELIN) , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 16 No. 1 (2016)
- Indra Indra, Nur Aliza, Detecting Disaster Trending Topics on Indonesian Tweets Using BNgram , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
You may also start an advanced similarity search for this article.
.png)











