Penerapan Ensemble Learning dengan Hard Voting untuk Klasifikasi Customer Churn
DOI:
https://doi.org/10.30812/corisindo.v1.5340Keywords:
customer churn, ensemble learning, hard voting classifier, random forest, SMOTE-TomekAbstract
Customer churn menjadi salah satu tantangan terbesar bagi perusahaan telekomunikasi karena berdampak langsung pada pendapatan dan keberlanjutan bisnis. Penelitian ini bertujuan untuk meningkatkan akurasi prediksi churn dengan mengembangkan model ensemble learning berbasis Hard Voting Classifier yang menggabungkan tiga algoritma berbeda, yaitu Naïve Bayes, Random Forest, dan Nearest Centroid. Dataset pelanggan yang digunakan mencakup informasi demografis, perilaku penggunaan layanan, dan status churn, yang kemudian diproses melalui tahapan pembersihan data, seleksi fitur, normalisasi, serta teknik resampling SMOTE-Tomek untuk menyeimbangkan distribusi kelas. Pemilihan fitur dilakukan dengan metode Information Gain dan analisis korelasi, sehingga hanya atribut yang relevan digunakan dalam pemodelan. Hasil pengujian menunjukkan bahwa Hard Voting Classifier mampu mencapai akurasi sebesar 90% dengan nilai recall untuk kelas churn sebesar 81%, lebih tinggi dibandingkan Random Forest (78%), meskipun akurasi Random Forest lebih tinggi (95%). Nilai precision untuk kelas non-churn juga meningkat hingga 97%, menandakan model ini efektif mengurangi kesalahan dalam memprediksi pelanggan tetap. Temuan ini membuktikan bahwa pendekatan ensemble learning dengan base learner heterogen dapat memadukan keunggulan masing-masing algoritma untuk meningkatkan deteksi churn. Meski demikian, performa Hard Voting masih bergantung pada kualitas masing-masing classifier, sehingga optimasi hyperparameter dan eksplorasi kombinasi model lain direkomendasikan untuk penelitian selanjutnya. Hasil penelitian ini diharapkan dapat membantu perusahaan merumuskan strategi retensi pelanggan yang lebih tepat sasaran dan berkelanjutan.
References
[1] S. Saleh and S. Saha, “Customer retention and churn prediction in the telecommunication industry: a case study on a Danish university,” SN Appl Sci, vol. 5, no. 7, p. 173, 2023.
[2] O. Çelik and U. O. Osmanoglu, “Comparing to techniques used in customer churn analysis,” Journal of Multidisciplinary Developments, vol. 4, no. 1, pp. 30–38, 2019.
[3] A. Amin, A. Adnan, and S. Anwar, “An adaptive learning approach for customer churn prediction in the telecommunication industry using evolutionary computation and Naïve Bayes,” Appl Soft Comput, vol. 137, p. 110103, 2023, doi: https://doi.org/10.1016/j.asoc.2023.110103.
[4] M. K. Awang, M. Makhtar, N. Udin, and N. F. Mansor, “Improving customer churn classification with ensemble stacking method,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 11, 2021.
[5] A. Muneer, R. F. Ali, A. Alghamdi, S. M. Taib, A. Almaghthawi, and E. A. A. Ghaleb, “Predicting customers churning in banking industry: A machine learning approach,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 26, no. 1, pp. 539–549, 2022.
[6] S. Antoh, R. Herteno, I. Budiman, D. Kartini, and M. I. Mazdadi, “Prediksi Churn Pelanggan Telekomunikasi dengan Optimalisasi Seleksi Fitur dan Tuning Hyperparameter pada Algoritma Klasifikasi C4. 5,” Jurnal Sistem Informasi Bisnis, vol. 15, no. 1, pp. 60–67, 2025.
[7] Y. Zhou, W. Chen, X. Sun, and D. Yang, “Early warning of telecom enterprise customer churn based on ensemble learning,” PLoS One, vol. 18, no. 10, p. e0292466, 2023.
[8] E. Manro, D. Malhotra, and D. Kamthania, “Customer Churn Prediction using Machine Learning,” Journal of Innovations in Computer Science and Trends in IT, vol. 2, no. 1, pp. 3048–4707, 2025.
[9] B. R. Agasti and S. Satpathy, “Predicting customer churn in telecommunication sector using Naïve Bayes algorithm,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 35, no. 3, pp. 1610–1617, 2024.
[10] J. Gerald Manju, A. Dharini, B. Kiruthika, and A. Malini, “Online Food Delivery Customer Churn Prediction: A Quantitative Analysis on the Performance of Machine Learning Classifiers,” in International Conference on Data Analytics & Management, Springer, 2023, pp. 95–104.
[11] J. Latheef and S. Vineetha, “Predicting customer loyalty in banking sector with mixed ensemble model and hybrid model,” in Smart Computing Techniques and Applications: Proceedings of the Fourth International Conference on Smart Computing and Informatics, Volume 2, Springer, 2021, pp. 363–371.
[12] R. Bhuria et al., “Ensemble-based customer churn prediction in banking: a voting classifier approach for improved client retention using demographic and behavioral data,” Discover Sustainability, vol. 6, no. 1, p. 28, 2025.
[13] R. Bhuria et al., “Ensemble-based customer churn prediction in banking: a voting classifier approach for improved client retention using demographic and behavioral data,” Discover Sustainability, vol. 6, no. 1, p. 28, 2025.
[14] A. Manzoor, M. A. Qureshi, E. Kidney, and L. Longo, “A review on machine learning methods for customer churn prediction and recommendations for business practitioners,” IEEE access, vol. 12, pp. 70434–70463, 2024.
[15] M. Z. Alotaibi and M. A. Haq, “Customer churn prediction for telecommunication companies using machine learning and ensemble methods,” Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14572–14578, 2024.
[16] S. Wu, W.-C. Yau, T.-S. Ong, and S.-C. Chong, “Integrated churn prediction and customer segmentation framework for telco business,” Ieee Access, vol. 9, pp. 62118–62136, 2021.
[17] M. Vasudevan, R. S. Narayanan, S. F. Nakeeb, and A. Abhishek, “Customer churn analysis using XGBoosted decision trees,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 25, no. 1, pp. 488–495, 2022.
[18] Reyad Hussien, Mohamed Mahgoub, Shahenda Youssef, Ashraqat Torky, and Nermin K. Negied, “A novel artificial intelligent-based approach for real time prediction of telecom customer’s coming interaction,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 33, no. 1, pp. 540–556, Jan. 2024.
[19] J. Gerald Manju, A. Dharini, B. Kiruthika, and A. Malini, “Online food delivery customer churn prediction: a quantitative analysis on the performance of machine learning classifiers,” in International Conference on Data Analytics & Management, Springer, 2023, pp. 95–104.
[20] Sudip Chatterjee, CrowdANALYTIX, and Kaggle, “Telechurn,” Kaggle. Accessed: Jun. 05, 2025. [Online]. Available: https://www.kaggle.com/datasets/remonic97/telechurn/data
[21] R. F. Putra et al., Data Mining: Algoritma dan Penerapannya. PT. Sonpedia Publishing Indonesia, 2023.
[22] S. Lonang and D. Normawati, “Klasifikasi Status Stunting Pada Balita Menggunakan K-Nearest Neighbor Dengan Feature Selection Backward Elimination,” J. Media Inform. Budidarma, vol. 6, no. 1, p. 49, 2022.
[23] P. H. Artanti, “Penerapan Neural Network dengan optimasi Ant Colony Optimization dan Backpropagation untuk membangun model prediksi diabetes tahap awal,” Universitas Islam Negeri Maulana Malik Ibrahim, 2023.
[24] K. Elsa Virantika and J. Ipmawati, “Evaluasi Hasil Pengujian Tingkat Clusterisasi Penerapan Metode K-Means Dalam Menentukan Tingkat Penyebaran Covid-19 di Indonesia,” 2022.
[25] B. Aribowo and S. Fairuz, Panduan Praktis Machine Learning Klasifikasi Menggunakan Python: Diandra Kreatif. Diandra Kreatif, 2024.
[26] S. E. Caria Ningsih, M. P. Sukemi, M. S. Andi Reni Syamsuddin, and S. P. M. Fitra Gustiar, “Statistik: panduan praktis untuk analisis data,” 2024, PT. Media Penerbit Indonesia.
[27] P. W. Rahayu et al., Buku Ajar Data Mining. PT. Sonpedia Publishing Indonesia, 2024.
[28] I. Maulana, N. Khairunisa, and R. Mufidah, “Deteksi bentuk wajah menggunakan convolutional neural network (CNN),” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 6, pp. 3348–3355, 2023.
[29] E. F. Swana, W. Doorsamy, and P. Bokoro, “Tomek link and SMOTE approaches for machine fault classification with an imbalanced dataset,” Sensors, vol. 22, no. 9, p. 3246, 2022.
[30] Z. Xu, D. Shen, T. Nie, Y. Kou, N. Yin, and X. Han, “A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data,” Inf Sci (N Y), vol. 572, pp. 574–589, 2021.
[31] A. Taha, “Intelligent ensemble learning approach for phishing website detection based on weighted soft voting,” Mathematics, vol. 9, no. 21, p. 2799, 2021.
[32] H. B. Truong and V. C. Tran, “A framework for fake news detection based on the wisdom of crowds and the ensemble learning model,” Computer Science and Information Systems, vol. 20, no. 4, pp. 1439–1457, 2023.