Comparing K-Means, GMM and BIRCH for Student Academic Performance Data: Evaluation on Two Public Datasets
DOI:
https://doi.org/10.30812/matrik.v25i1.5843Keywords:
Birch, Clustering, Comparison, GMM, K-Means, Student AcademicAbstract
Academic data contains complex patterns that require appropriate clustering approaches to support informed educational decision-making. However, comparative studies that regularly evaluate various clustering methods for student academic performance, using diverse public data sets and consistent evaluation criteria, are limited. This study aims to identify the most effective clustering algorithm for modeling student academic performance by comparing three techniques: K-Means, GMM, and BIRCH, on two publicly available datasets: the Student Performance Metrics (SPM) Dataset with 16 features and 493 instances, and the Higher Education Students Performance Evaluation (HESPE) dataset with 32 features and 145 instances. Algorithm evaluation was performed using Sum of Squared Errors (SSE), Davies–Bouldin Index (DBI), Silhouette Score, and computational time. The results show that K-Means consistently provides superior clustering quality on both datasets, outperforming the other algorithms in four evaluation criteria, while BIRCH demonstrates superiority in two metrics and achieves the shortest computational time. These findings highlight that clustering effectiveness is strongly influenced by algorithm characteristics and data structure, with K-Means being more suitable for accuracy-oriented clustering and BIRCH for time-critical applications. Overall, this study contributes to educational data mining by providing comparative evidence on algorithm performance and demonstrating how methodological choices influence the interpretation of student performance patterns. In practice, institutions can choose clustering methods that best suit their needs, such as K-Means for precise academic profiling or BIRCH for rapid, large scale analysis, to help students graduate successfully.
Downloads
References
[1] M. Alvarez-Garcia, M. Arenas-Parra, and R. Ibar-Alonso, “Uncovering student profiles. An explainable cluster analysis approach to PISA 2022,” Comput Educ, vol. 223, p. 105166, 2024, doi: https://doi.org/10.1016/j.compedu.2024.105166.
[2] Z. Zhang, X. Zeng, H. Bao, and B. Li, “Intelligent Student Performance Clustering and Personalized Teaching Suggestions Based on K-Means,” in Proceedings of the 2024 International Conference on Digital Society and Artificial Intelligence, in DSAI ’24. 2024, pp. 38–42. doi: https://doi.org/10.1145/3677892.3677898.
[3] W. Chen, Z. Wu, S. Zeng, H. Guo, and J. Li, “Diverse behavior clustering of students on campus with macroscopic attention,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-15103-8.
[4] M. Gul and M. A. Rehman, “Big data: an optimized approach for cluster initialization,” J Big Data, vol. 10, no. 1, Dec. 2023, doi: https://doi.org/10.1186/s40537-023-00798-1.
[5] K. Ouassif, B. Ziani, J. Herrera-Tapia, and C. A. Kerrache, “Empowering Education: Leveraging Clustering and Recommendations for Enhanced Student Insights,” Educ Sci (Basel), vol. 15, no. 7, 2025, doi: https://doi.org/10.3390/educsci15070819.
[6] E. Kalita et al., “Educational data mining: a 10-year review,” Discover Computing, vol. 28, no. 1, p. 81, 2025, doi: https://doi.org/10.1007/s10791-025-09589-z.
[7] S. J. Sultan Alalawi, I. N. Mohd Shaharanee, and J. Mohd Jamil, “Clustering Student Performance Data Using K-Means Algorithms,” Journal of Computational Innovation and Analytics (JCIA), vol. 2, no. 1, pp. 41–55, Jan. 2023, doi: https://doi.org/10.32890/jcia2023.2.1.3.
[8] A. F. Mohamed Nafuri, N. S. Sani, N. F. A. Zainudin, A. H. A. Rahman, and M. Aliff, “Clustering Analysis for Classifying Student Academic Performance in Higher Education,” Applied Sciences, vol. 12, no. 19, 2022, doi: https://doi.org/10.3390/app12199467.
[9] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.
[10] P. Economou, “A clustering algorithm for overlapping Gaussian mixtures,” Research in Statistics, vol. 1, no. 1, Oct. 2023, doi: https://doi.org/10.1080/27684520.2023.2242337.
[11] B. Chassagnol et al., “Gaussian Mixture Models in R,” R J, vol. 15, no. 2, Jun. 2023, doi: https://doi.org/10.32614/RJ-2023-043.
[12] A. Lang and E. Schubert, “BETULA: Fast clustering of large data with improved BIRCH CF-Trees,” Inf Syst, vol. 108, p. 101918, 2022, doi: https://doi.org/10.1016/j.is.2021.101918.
[13] R. Wang and J. Li, “Fast sparse representative tree splitting via local density for large-scale clustering,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-13848-w.
[14] A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,” PeerJ Comput Sci, vol. 10, pp. 1–45, 2024, doi: https://doi.org/10.7717/PEERJ-CS.2286.
[15] S. Pitafi, T. Anwar, and Z. Sharif, “A Taxonomy of Machine Learning Clustering Algorithms, Challenges, and Future Realms,” Mar. 01, 2023, MDPI. doi: https://doi.org/10.3390/app13063529.
[16] P. Artioli, A. Maci, and A. Magrì, “A comprehensive investigation of clustering algorithms for User and Entity Behavior Analytics,” Front Big Data, vol. 7, 2024, doi: https://doi.org/10.3389/fdata.2024.1375818.
[17] J. M. John, O. Shobayo, and B. Ogunleye, “An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market,” Analytics, vol. 2, no. 4, pp. 809–823, Dec. 2023, doi: https://doi.org/10.3390/analytics2040042.
[18] S. and R. S. Bhurre Shraddha and Prajapat, “Performance Pattern Mining for Higher Education Students in Blended Learning Using Clustering Algorithms,” in The International Conference on Computing, Communication, Cybersecurity and AI, Dec. 2024, pp. 362–386. doi: https://doi.org/10.1007/978-3-031-74443-3_22.
[19] J. Dong, R. Sun, Z. Yan, M. Shi, and X. Bi, “Research on learning achievement classification based on machine learning,” PLoS One, vol. 20, no. 6 June, Jun. 2025, doi: https://doi.org/10.1371/journal.pone.0325713.
[20] I. K. Khan et al., “Determining the optimal number of clusters by Enhanced Gap Statistic in K-mean algorithm,” Egyptian Informatics Journal, vol. 27, Sep. 2024, doi: https://doi.org/10.1016/j.eij.2024.100504.
[21] T. C. Liu, P. N. Kalugin, J. L. Wilding, and W. F. Bodmer, “GMMchi: gene expression clustering using Gaussian mixture modeling,” BMC Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: https://doi.org/10.1186/s12859-022-05006-0.
[22] R. Ulug, “Implementation of the BIRCH algorithm to construct a data-adaptive network design for regional gravity field modeling via SRBF,” Earth Sci Inform, vol. 18, no. 2, Feb. 2025, doi: https://doi.org/10.1007/s12145-025-01712-4.
[23] F. J. and R.-M. C. and P.-O. J. M. Raya-Tapia Alma Yunuen and López-Flores, “Fundamentals of Clustering: Methods, Metrics, and Optimization,” in Machine Learning and Clustering for a Sustainable Future: Applications in Engineering and Environmental Science, Cham: Springer Nature Switzerland, 2025, pp. 13–50. doi: https://doi.org/10.1007/978-3-032-03876-0_2.
[24] B. Sadeghi, “Clustering in geo-data science: Navigating uncertainty to select the most reliable method,” Jun. 01, 2025, Elsevier B.V. doi: https://doi.org/10.1016/j.oregeorev.2025.106591.
[25] L. E. Ekemeyong Awong and T. Zielinska, “Comparative Analysis of the Clustering Quality in Self-Organizing Maps for Human Posture Classification,” Sensors, vol. 23, no. 18, Sep. 2023, doi: https://doi.org/10.3390/s23187925.
[26] K. Amrulloh, T. Hendro Pudjiantoro, P. Nurul Sabrina, and A. Id Hadiana, “Comparison Between Davies-Bouldin Index and Silhouette Coefficient Evaluation Methods in Retail Store Sales Transaction Data Clusterization Using K-Medoids Algorithm,” in 3rd South American International Industrial Engineering and Operations Management Conference, IEOM Society International, Jul. 2022. doi: https://doi.org/10.46254/SA03.20220384.
[27] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.
[28] N. L. G. P. Suwirmayanti, E. Setyaningsih, R. A. N. Diaz, and K. Budiarta, “Optimization Of The K-Means Method For Clustering Banking Data Using The Hybrid Model Of Invasive Weed Optimization And K-Means (IWOKM),” ICIC Express Letters, vol. 18, no. 4, pp. 413–422, Apr. 2024, doi: https://doi.org/10.24507/icicel.18.04.413.
[29] N. L. G. P. Suwirmayanti, I. Ketut Gede Darma Putra, M. Sudarma, I. M. Sukarsa, E. Setyaningsih, and R. A. N. Diaz, “Invasive Weed Optimization K-Means Performance Robust Operations (IWOKM PRO) in High-Dimensional Datasets,” Engineering, Technology and Applied Science Research, vol. 15, no. 4, pp. 24390–24395, Aug. 2025, doi: https://doi.org/10.48084/etasr.11112.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ricky Aurelius Nurtanto Diaz, Ni Luh Gede Pivin Suwirmayanti, Emy Setyaningsih, I Wayan Budi Sentana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
How to Cite
Similar Articles
- Tri Astuti, Galuh Kusumastuti, Rudi Fitriyanto, PEMANFAATAN ANALYTICAL HIERARCHY PROCESS (AHP) PADA E-VOTING PEMILIHAN KETUA OSIS , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 17 No. 1 (2017)
- Dyah Susilowati, Hairani Hairani, Indah Puji Lestari, Khairan Marzuki, Lalu Zazuli Azhar Mardedi, Segmentasi Lokasi Promosi Penerimaan Mahasiswa Baru Menggunakan Metode RFM dan K-Means Clustering , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Heru Pramono Hadi, Eko Hari Rachmawanto, Rabei Raad Ali, Comparison of DenseNet-121 and MobileNet for Coral Reef Classification , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Achmad Lukman, Wahju Tjahjo Saputro, Erni Seniwati, Improving Performance Convolutional Neural Networks Using Modified Pooling Function , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Vivi Aida Fitria, Lilis Widayanti, Enhancing Accuracy in Stock Price Prediction: The Power of Optimization Algorithms , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Angelina Ervina Jeanette Egeten, Lya Santi Rahayu, Riansyah Rafsanjani, Analisis dan Perancangan Sistem Reservasi Paket Wisata Untuk Internal Karyawan PT. Garuda Maintenance Facility (GMF) Tbk , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 1 (2019)
- Putri Jafar, Dolly Indra, Fitriyani Umar, Color Feature Extraction for Grape Variety Identification: Naïve Bayes Approach , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Debby Ummul Hidayah, Ika Romadoni Yunita, Gustin Setyaningsih, Evaluasi Website Kuliah Online STMIK Amikom Purwokerto Menggunakan Metode Heuristik (Studi Kasus Mata Kuliah Enterprise Resource Management) , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 18 No. 2 (2019)
- Denny Indrajaya, Adi Setiawan, Bambang Susanto, Comparison of k-Nearest Neighbor and Naive Bayes Methods for SNP Data Classification , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Edi Ismanto, Januar Al Amien, Vitriani Vitriani, A Comparison of Enhanced Ensemble Learning Techniques for Internet of Things Network Attack Detection , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
You may also start an advanced similarity search for this article.
.png)











