Prediction of Student Major Selection at High School Using a Machine Learning Approach
Abstract
The primary objective of this research was to develop and evaluate a machine learning prediction system that matches Senior High School (SMA) Nusa Putra Kota Tangerang students with their potential school majors based on their academic interests and performance levels. This research method employs machine learning algorithms, including Random Forest, Support Vector Machine (SVM), logistic regression, K-Nearest Neighbor (K-NN), and Naive Bayes, using academic records, interest tests, and questionnaires for data collection. The data was processed and analyzed to train and test the algorithm. The findings of this study indicate that the Random Forest algorithm achieved the best performance among the models, with an accuracy of 85%, a precision of 82%, a recall of 88%, and an AUC score of 0.92. The factors that affected the prediction of major selection were Grade XII Mathematics scores and Science Interest Test results. The research implications suggest that Random Forest technology within Machine Learning (ML) enhances major selection accuracy while promoting fairness, providing superior educational choices and increased student satisfaction. Future studies should investigate additional factors that influence this phenomenon.
References
[1] J. Di and P. Tinggi, “Pendekatan Konseling Karir Trait And Factor Dalam Membantu Siswa SMA Untuk Memilih,” vol. 2, no. September, pp. 113–123, 2022.
[2] I. Pki, D. I. Kediri, and S. Meletusnya, “Jurnal Artefak Vol.9 No.2 September 2022 https://jurnal.unigal.ac.id/index.php/artefak/article/view/8006,” vol. 9, no. 2, pp. 121–138, 2022.
[3] K. Vakadkar, D. Purkayastha, and D. Krishnan, “Detection of Autism Spectrum Disorder in Children Using Machine Learning Techniques,” SN Comput. Sci., vol. 2, no. 5, pp. 1–9, 2021, doi: 10.1007/s42979-021-00776-5.
[4] I. Mugunga, J. Dong, E. Rigall, S. Guo, A. H. Madessa, and H. S. Nawaz, “A frame-based feature model for violence detection from surveillance cameras using ConvLSTM network,” 2021 6th Int. Conf. Image, Vis. Comput. ICIVC 2021, pp. 55–60, 2021, doi: 10.1109/ICIVC52351.2021.9526948.
[5] M. Hani’ah, M. Z. Abdullah, W. I. Sabilla, S. Akbar, and D. R. Shafara, “Google Trends and Technical Indicator based Machine Learning for Stock Market Prediction,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 22, no. 2, pp. 271–284, 2023, doi: 10.30812/matrik.v22i2.2287.
[6] A. Yusof, “Students ’ perceptions on OSPE in anatomy subject .,” vol. 7, no. 1, pp. 54–62, 2024.
[7] L. A. Al-arabiyah, J. Pendidikan, B. Arab, and Y. Ahmadi, “Maksimalisasi Potensi Siswa dalam Pembelajaran Bahasa Arab dengan Strategi Kooperatif Kontemporer,” vol. 7, no. 2, pp. 121–139, 2024.
[8] S. Huang, Y. Kuo, and H. Chen, “Applying digital escape rooms infused with science teaching in elementary school : Learning performance , learning motivation , and problem-solving ability,” Think. Ski. Creat., vol. 37, no. 129, p. 100681, 2020, doi: 10.1016/j.tsc.2020.100681.
[9] H. Zeineddine, U. Braendle, and A. Farah, “Enhancing prediction of student success: Automated machine learning approach,” Comput. Electr. Eng., vol. 89, no. October 2019, p. 106903, 2021, doi: 10.1016/j.compeleceng.2020.106903.
[10] Ü. Ağbulut, A. E. Gürel, and Y. Biçen, “Prediction of daily global solar radiation using different machine learning algorithms: Evaluation and comparison,” Renew. Sustain. Energy Rev., vol. 135, no. July 2020, 2021, doi: 10.1016/j.rser.2020.110114.
[11] E. Alyahyan and D. Düştegör, “Predicting academic success in higher education: literature review and best practices,” Int. J. Educ. Technol. High. Educ., vol. 17, no. 1, 2020, doi: 10.1186/s41239-020-0177-7.
[12] P. L. Bokonda, K. Ouazzani-Touhami, and N. Souissi, “Predictive analysis using machine learning: Review of trends and methods,” 2020 Int. Symp. Adv. Electr. Commun. Technol. ISAECT 2020, pp. 1–6, 2020, doi: 10.1109/ISAECT50560.2020.9523703.
[13] M. Astu, S. Pawitra, H. Hung, and H. Jati, “A Machine Learning Approach to Predicting On-Time Graduation in Indonesian Higher Education,” vol. 9, no. 2, pp. 294–308, 2024.
[14] S. P. Nabila, N. Ulinnuha, A. Yusuf, S. Informasi, J. Wonosari, and J. Timur, “Model Prediksi Kelulusan Tepat Waktu Dengan Metode Fuzzy C-Means Dan K-Nearest Neighbors,” vol. 6, no. 1, pp. 39–47, 2021.
[15] M. Yusuf, M. Hasanudin, and I. Prihandi, “Design and Build A Customer-Finding Application For Leko Restaurant Using The K-Means Algorithm,” Int. J. Inf. Syst. Technol. Akreditasi, vol. 6, no. 158, pp. 270–275, 2022.
[16] Y. Liu, D. Zhang, H. Gooi, and S. Member, “Data-driven Decision-making Strategies for Electricity Retailers : A Deep Reinforcement,” vol. 7, no. 2, pp. 358–367, 2021, doi: 10.17775/CSEEJPES.2019.02510.
[17] M. Ruiz et al., “Ezatech : Design And Development Of Artificial Intelligence Technologies For Knowledge Management Throughout,” vol. 6, no. 1, pp. 24–33, 2025.
[18] K. Al Mayahi and D. M. Al-Bahri, “Machine Learning Based Predicting Student Academic Success,” Int. Congr. Ultra Mod. Telecommun. Control Syst. Work., vol. 2020-Octob, pp. 264–268, 2020, doi: 10.1109/ICUMT51630.2020.9222435.
[19] K. Kristiawan and A. Widjaja, “Perbandingan Algoritma Machine Learning dalam Menilai Sebuah Lokasi Toko Ritel,” J. Tek. Inform. dan Sist. Inf., vol. 7, no. 1, pp. 35–46, 2021, doi: 10.28932/jutisi.v7i1.3182.
[20] D. Wu, X. Ma, and D. L. Olson, “Financial distress prediction using integrated Z-score and multilayer perceptron neural networks,” Decis. Support Syst., vol. 159, no. March, p. 113814, 2022, doi: 10.1016/j.dss.2022.113814.
[21] F. Neutatz, B. Chen, Y. Alkhatib, J. Ye, and Z. Abedjan, “Data Cleaning and AutoML: Would an Optimizer Choose to Clean?,” Datenbank-Spektrum, vol. 22, no. 2, pp. 121–130, 2022, doi: 10.1007/s13222-022-00413-2.
[22] M. Hasanudin, Y. Devianto, W. Gunawan, S. Dwiasnati, and I. Prihandi, “Isometric Contraction ankle joint in Cerebral Palsy using Naive Bayes,” vol. 12, no. 3, pp. 78–81, 2024.
[23] K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A Comparative Analysis of Logistic Regression , Random Forest and KNN Models for the Text Classification,” Augment. Hum. Res., vol. 5, no. 1, pp. 1–16, 2020, doi: 10.1007/s41133-020-00032-0.
[24] M. Sheykhmousa, M. Mahdianpari, H. Ghanbari, F. Mohammadimanesh, P. Ghamisi, and S. Member, “Support Vector Machine Versus Random Forest for Remote Sensing Image Classification : A Meta-Analysis and Systematic Review,” vol. 13, pp. 6308–6325, 2020.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.