Ensemble Implementation for Predicting Student Graduation with Classification Algorithm
Abstract
Graduating on time at the higher education level is one of the main targets of every student and university institution. Many factors can affect a student's length of study, the different character of each student is also an internal factor that affects their study period. These characters are used in this study to classify data groups of students who graduated on time or not. Classification was chosen because it is able to find a model or pattern that can describe and distinguish classes in a dataset. This research method uses the esemble learning method which aims to see student graduation predictions using a dataset from Kaggle, the data used is a IPK dataset collected from a university in Indonesia which consists of 1687 records and 5 attributes where this dataset is not balanced. The intended target is whether the student is predicted to graduate on time or not. The method proposed in this study is Ensemble Learning Different Contribution Sampling (DCS) and the algorithms used include Logistic Regression, Decision Tree Classifier, Gaussian, Random Forest Classifier, Ada Bost Classifier, Support Vector Coefficient, KNeighbors Classifier and MLP Classifier. From each classification algorithm used, the test value and accuracy are calculated which are then compared between the algorithms. Based on the results of research that has been carried out, it is concluded that the best accuracy results are owned by the MLPClassifier algorithm with the ability to predict student graduation on time of 91.87%. The classification model provided by the DCS-LCA used does not give better results than the basic classifier of its constituent, namely the MLPClassifier algorithm of 91.87%, SVC of 91.64%, Logistic Regression of 91.46%, AdaBost Classifier of 90.87%, Random Forest Classifier of 90.45% , and KNN of 89.80%.
References
[2] M. B. Cart, J. Matematika, U. Bengkulu, and J. W. R. Supratman, “Analisis ketepatan waktu lulus mahasiswa dengan menggunakan bagging cart,” pp. 155–166, 2019.
[3] L. Setiyani, “ANALISIS PREDIKSI KELULUSAN MAHASISWA TEPAT WAKTU MENGGUNAKAN METODE DATA MINING NAÏVE BAYES : SYSTEMATIC REVIEW,” vol. 13, no. 1, pp. 35–43, 2020.
[4] Y. Pristianto, “PENERAPAN METODE ENSEMBLE UNTUK MENINGKATKAN KINERJA ALGORITME KLASIFIKASI PADA IMBALANCED DATASET,” vol. 13, no. 1, pp. 11–16, 2019.
[5] R. Maulida, “Prediksi Kelulusan Mahasiswa Tepat Waktu dengan Algoritma C4 . 5 dengan Particle Swarm Optimization pada Univeristas XYZ,” vol. 1, no. 3, pp. 138–144, 2020.
[6] A. Ikhlas and D. Y. Prasetyo, “MESIN PEMBELAJARAN ENSEMBLE UNTUK IDENTIFIKASI VARIETAS PADI Ensemble Machine Learning for Rice Varieties Identification,” 2020.
[7] R. S. Wahono, U. Dian, N. Semarang, N. Suryana, and S. Ahmad, “Neural Network Parameter Optimization Based on Genetic Algorithm for Software Defect Prediction,” no. October, 2014.
[8] A. Saifudin, U. Pamulang, R. S. Wahono, U. Dian, and N. Semarang, “Penerapan Teknik Ensemble untuk Menangani Ketidakseimbangan Kelas pada Penerapan Teknik Ensemble untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software,” no. May 2015, 2019.
[9] Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” vol. 40, pp. 3358–3378, 2007.
[10] L. M. Rushi Longadge, Snhlata S.Dongre, “Class Imbalance Problem in Data Mining : Review,” vol. 2, no. 1, 2013.
[11] R. Hendayana, “ADOPSI TEKNOLOGI PERTANIAN Application Method of Logistic Regression Analyze the Agricultural Technology Adoption,” no. 2, pp. 1–9, 2012.
[12] O. D. Thomas W, Manz, Research Methods for Cyber Security. 2017.
[13] L. Breiman, RANDOM FOREST. 2001.
[14] E. K. Ampomah, G. Nyame, Z. Qin, P. C. Addo, E. O. Gyamfi, and M. Gyan, “Stock market prediction with gaussian naïve bayes machine learning algorithm,” Inform., vol. 45, no. 2, pp. 243–256, 2021.
[15] F. Dwi Meliani Achmad, Budanis, Slamat, “Klasifikasi Data Karyawan Untuk Menentukan Jadwal Kerja Menggunakan Metode Decision Tree,” J. IPTEK, vol. 16, no. 1, pp. 18–23, 2012.
[16] J. Han, Data Mining Concepts and Techniques 3rd Edition - 2012.pdf. 2012.
[17] S. Palaniappan and R. Awang, “Intelligent heart disease prediction system using data mining techniques,” AICCSA 08 - 6th IEEE/ACS Int. Conf. Comput. Syst. Appl., no. December, pp. 108–115, 2008.
[18] N. C. S. N. I. M Anbarasi, E Anupriya, “Enhanced Prediction of Heart Disease with Feature Subset Selection using Genetic Algorithm Enhanced Prediction of Heart Disease with Feature Subset Selection using Genetic Algorithm,” Int. J. Eng. Sci. Technol., vol. 2, no. 10, pp. 5370–5376, 2010.
[19] Y. Astuti, U. A. Yogyakarta, and L. D. Farida, “ALGORITMA SUPPORT VECTOR MACHINE UNTUK KLASIFIKASI SIKAP POLITIK,” no. August, 2019.
[20] A. M. Listiana, Eka; Muslim, “Penerapan AdaBosst untuk Klasifikasi Support Vector Machine Guna Mengingkatkan Akurasi pada Diagnosa Chronic Kidney Disease,” SNATIF, vol. 4, no. ISBN 978-602-1180-50-1, 2017.
[21] A. Muliantara and I. M. Widiartha, “PENERAPAN MULTI LAYER PERCEPTRON,” pp. 9–15.
[22] I. Mahendro, “PENERAPAN DATA MINING UNTUK PREDIKSI KELULUSAN UKP,” pp. 155–159.
[23] A. M. Siregar, S. Faisal, and A. Fauzi, “Klasifikasi untuk Prediksi Cuaca Menggunakan Esemble Learning,” vol. 13, no. 2, pp. 138–147, 2020.
[24] D. C. S. L. C. A. Method, “Sistem Pendeteksi Kerusakan Buah Mangga Menggunakan Sensor Gas Dengan Metode DCS - LCA ( Mango Damage Detection System Using Gas Sensor With,” vol. 3, no. 4, pp. 186–194, 2022.