Comparing K-Means, GMM and BIRCH for Student Academic Performance Data: Evaluation on Two Public Datasets
DOI:
https://doi.org/10.30812/matrik.v25i1.5843Keywords:
Birch, Clustering, Comparison, GMM, K-Means, Student AcademicAbstract
Academic data contains complex patterns that require appropriate clustering approaches to support informed educational decision-making. However, comparative studies that regularly evaluate various clustering methods for student academic performance, using diverse public data sets and consistent evaluation criteria, are limited. This study aims to identify the most effective clustering algorithm for modeling student academic performance by comparing three techniques: K-Means, GMM, and BIRCH, on two publicly available datasets: the Student Performance Metrics (SPM) Dataset with 16 features and 493 instances, and the Higher Education Students Performance Evaluation (HESPE) dataset with 32 features and 145 instances. Algorithm evaluation was performed using Sum of Squared Errors (SSE), Davies–Bouldin Index (DBI), Silhouette Score, and computational time. The results show that K-Means consistently provides superior clustering quality on both datasets, outperforming the other algorithms in four evaluation criteria, while BIRCH demonstrates superiority in two metrics and achieves the shortest computational time. These findings highlight that clustering effectiveness is strongly influenced by algorithm characteristics and data structure, with K-Means being more suitable for accuracy-oriented clustering and BIRCH for time-critical applications. Overall, this study contributes to educational data mining by providing comparative evidence on algorithm performance and demonstrating how methodological choices influence the interpretation of student performance patterns. In practice, institutions can choose clustering methods that best suit their needs, such as K-Means for precise academic profiling or BIRCH for rapid, large scale analysis, to help students graduate successfully.
Downloads
References
[1] M. Alvarez-Garcia, M. Arenas-Parra, and R. Ibar-Alonso, “Uncovering student profiles. An explainable cluster analysis approach to PISA 2022,” Comput Educ, vol. 223, p. 105166, 2024, doi: https://doi.org/10.1016/j.compedu.2024.105166.
[2] Z. Zhang, X. Zeng, H. Bao, and B. Li, “Intelligent Student Performance Clustering and Personalized Teaching Suggestions Based on K-Means,” in Proceedings of the 2024 International Conference on Digital Society and Artificial Intelligence, in DSAI ’24. 2024, pp. 38–42. doi: https://doi.org/10.1145/3677892.3677898.
[3] W. Chen, Z. Wu, S. Zeng, H. Guo, and J. Li, “Diverse behavior clustering of students on campus with macroscopic attention,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-15103-8.
[4] M. Gul and M. A. Rehman, “Big data: an optimized approach for cluster initialization,” J Big Data, vol. 10, no. 1, Dec. 2023, doi: https://doi.org/10.1186/s40537-023-00798-1.
[5] K. Ouassif, B. Ziani, J. Herrera-Tapia, and C. A. Kerrache, “Empowering Education: Leveraging Clustering and Recommendations for Enhanced Student Insights,” Educ Sci (Basel), vol. 15, no. 7, 2025, doi: https://doi.org/10.3390/educsci15070819.
[6] E. Kalita et al., “Educational data mining: a 10-year review,” Discover Computing, vol. 28, no. 1, p. 81, 2025, doi: https://doi.org/10.1007/s10791-025-09589-z.
[7] S. J. Sultan Alalawi, I. N. Mohd Shaharanee, and J. Mohd Jamil, “Clustering Student Performance Data Using K-Means Algorithms,” Journal of Computational Innovation and Analytics (JCIA), vol. 2, no. 1, pp. 41–55, Jan. 2023, doi: https://doi.org/10.32890/jcia2023.2.1.3.
[8] A. F. Mohamed Nafuri, N. S. Sani, N. F. A. Zainudin, A. H. A. Rahman, and M. Aliff, “Clustering Analysis for Classifying Student Academic Performance in Higher Education,” Applied Sciences, vol. 12, no. 19, 2022, doi: https://doi.org/10.3390/app12199467.
[9] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.
[10] P. Economou, “A clustering algorithm for overlapping Gaussian mixtures,” Research in Statistics, vol. 1, no. 1, Oct. 2023, doi: https://doi.org/10.1080/27684520.2023.2242337.
[11] B. Chassagnol et al., “Gaussian Mixture Models in R,” R J, vol. 15, no. 2, Jun. 2023, doi: https://doi.org/10.32614/RJ-2023-043.
[12] A. Lang and E. Schubert, “BETULA: Fast clustering of large data with improved BIRCH CF-Trees,” Inf Syst, vol. 108, p. 101918, 2022, doi: https://doi.org/10.1016/j.is.2021.101918.
[13] R. Wang and J. Li, “Fast sparse representative tree splitting via local density for large-scale clustering,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-13848-w.
[14] A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,” PeerJ Comput Sci, vol. 10, pp. 1–45, 2024, doi: https://doi.org/10.7717/PEERJ-CS.2286.
[15] S. Pitafi, T. Anwar, and Z. Sharif, “A Taxonomy of Machine Learning Clustering Algorithms, Challenges, and Future Realms,” Mar. 01, 2023, MDPI. doi: https://doi.org/10.3390/app13063529.
[16] P. Artioli, A. Maci, and A. Magrì, “A comprehensive investigation of clustering algorithms for User and Entity Behavior Analytics,” Front Big Data, vol. 7, 2024, doi: https://doi.org/10.3389/fdata.2024.1375818.
[17] J. M. John, O. Shobayo, and B. Ogunleye, “An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market,” Analytics, vol. 2, no. 4, pp. 809–823, Dec. 2023, doi: https://doi.org/10.3390/analytics2040042.
[18] S. and R. S. Bhurre Shraddha and Prajapat, “Performance Pattern Mining for Higher Education Students in Blended Learning Using Clustering Algorithms,” in The International Conference on Computing, Communication, Cybersecurity and AI, Dec. 2024, pp. 362–386. doi: https://doi.org/10.1007/978-3-031-74443-3_22.
[19] J. Dong, R. Sun, Z. Yan, M. Shi, and X. Bi, “Research on learning achievement classification based on machine learning,” PLoS One, vol. 20, no. 6 June, Jun. 2025, doi: https://doi.org/10.1371/journal.pone.0325713.
[20] I. K. Khan et al., “Determining the optimal number of clusters by Enhanced Gap Statistic in K-mean algorithm,” Egyptian Informatics Journal, vol. 27, Sep. 2024, doi: https://doi.org/10.1016/j.eij.2024.100504.
[21] T. C. Liu, P. N. Kalugin, J. L. Wilding, and W. F. Bodmer, “GMMchi: gene expression clustering using Gaussian mixture modeling,” BMC Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: https://doi.org/10.1186/s12859-022-05006-0.
[22] R. Ulug, “Implementation of the BIRCH algorithm to construct a data-adaptive network design for regional gravity field modeling via SRBF,” Earth Sci Inform, vol. 18, no. 2, Feb. 2025, doi: https://doi.org/10.1007/s12145-025-01712-4.
[23] F. J. and R.-M. C. and P.-O. J. M. Raya-Tapia Alma Yunuen and López-Flores, “Fundamentals of Clustering: Methods, Metrics, and Optimization,” in Machine Learning and Clustering for a Sustainable Future: Applications in Engineering and Environmental Science, Cham: Springer Nature Switzerland, 2025, pp. 13–50. doi: https://doi.org/10.1007/978-3-032-03876-0_2.
[24] B. Sadeghi, “Clustering in geo-data science: Navigating uncertainty to select the most reliable method,” Jun. 01, 2025, Elsevier B.V. doi: https://doi.org/10.1016/j.oregeorev.2025.106591.
[25] L. E. Ekemeyong Awong and T. Zielinska, “Comparative Analysis of the Clustering Quality in Self-Organizing Maps for Human Posture Classification,” Sensors, vol. 23, no. 18, Sep. 2023, doi: https://doi.org/10.3390/s23187925.
[26] K. Amrulloh, T. Hendro Pudjiantoro, P. Nurul Sabrina, and A. Id Hadiana, “Comparison Between Davies-Bouldin Index and Silhouette Coefficient Evaluation Methods in Retail Store Sales Transaction Data Clusterization Using K-Medoids Algorithm,” in 3rd South American International Industrial Engineering and Operations Management Conference, IEOM Society International, Jul. 2022. doi: https://doi.org/10.46254/SA03.20220384.
[27] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.
[28] N. L. G. P. Suwirmayanti, E. Setyaningsih, R. A. N. Diaz, and K. Budiarta, “Optimization Of The K-Means Method For Clustering Banking Data Using The Hybrid Model Of Invasive Weed Optimization And K-Means (IWOKM),” ICIC Express Letters, vol. 18, no. 4, pp. 413–422, Apr. 2024, doi: https://doi.org/10.24507/icicel.18.04.413.
[29] N. L. G. P. Suwirmayanti, I. Ketut Gede Darma Putra, M. Sudarma, I. M. Sukarsa, E. Setyaningsih, and R. A. N. Diaz, “Invasive Weed Optimization K-Means Performance Robust Operations (IWOKM PRO) in High-Dimensional Datasets,” Engineering, Technology and Applied Science Research, vol. 15, no. 4, pp. 24390–24395, Aug. 2025, doi: https://doi.org/10.48084/etasr.11112.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ricky Aurelius Nurtanto Diaz, Ni Luh Gede Pivin Suwirmayanti, Emy Setyaningsih, I Wayan Budi Sentana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
How to Cite
Similar Articles
- Jusmita Weriza, Ismail Husein, Noranizamardia Noranizamardia, M Fakhariza, Khairan Marzuki, Development of OnlineWeb-Based New Student Graduation Application in Junior High School , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 3 (2022)
- Mochamad Wahyudi, Solikhun Solikhun, Lise Pujiastuti, Gerhard-Wilhelm Weber, New Approach K-Medoids Clustering Based on Chebyshev Distance with Quantum Computing for Anemia Prediction , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 25 No. 1 (2025)
- Solikhun Solikhun, Lise Pujiastuti, Mochamad Wahyudi, Enhancing Lung Cancer Prediction Accuracy UsingQuantum-Enhanced K-Medoids with Manhattan Distance , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Robby Rizky, Zaenal Hakim, Sri Setiyowati, Susilawati susilawati, Ayu Mira Yunita, Development of the Multi-Channel Clustering Hierarchy Method for Increasing Performance in Wireless Sensor Network , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Zein Zein, Ahmat Adil, APLIKASI MEDIA BANTU PEMBELAJARAN KRIPTOGRAFI DENGAN MENGGUNAKAN ALGORITMA MESSAGE DIGEST 5 (MD5) , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 15 No. 2 (2016)
- Muhamad Nur Gunawan, Titi Farhanah, Siti Ummi Masruroh, Ahmad Mukhlis Jundulloh, Nafdik Zaydan Raushanfikar, Rona Nisa Sofia Amriza, Accuracy of K-Nearest Neighbors Algorithm Classification For Archiving Research Publications , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Muhammad Zaki Wiryawan, Didik Dwi Prasetya, Anik Nur Handayani, Tsukasa Hirashima, Wahyu Styo Pratama, Lalu Ganda Rady Putra, Enhancing Semantic Similarity in Concept Maps Using LargeLanguage Models , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Donny Kurniawan, Anthony Anggrawan, Hairani Hairani, Graduation Prediction System on Students Using C4.5 Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 2 (2020)
- Dimas Afryzal Hanan, Ario Yudo Husodo, Regania Pasca Rassy, Sentiment Study of ChatGPT on Twitter Data with Hybrid K-Means and LSTM , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 2 (2025)
- Baiq Rima Mozarita Erdiani, Aryo Yudo Husodo, Ida Bagus Ketut Widiartha, Novel Application of K-Means Algorithm for Unique Sentiment Clustering in 2024 Korean Movie Reviews on TikTok Platform , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 2 (2025)
You may also start an advanced similarity search for this article.
.png)











