Handling Imbalanced Data in K-Nearest Neighbor Algorithm using Synthetic Minority Oversampling Technique-Nominal Continuous
DOI:
https://doi.org/10.30812/ijecsa.v4i2.5142Keywords:
k-fold Cross Validation, K-NN, Credit, SMOTE-NCAbstract
Classification is a part of data mining that aims to predict the class of data using a trained machine learning model. K-Nearest Neighbor (K-NN) is one of the classification methods that uses the concept of distance to the nearest neighbor in creating classification models. However, K-NN has limitations in handling imbalanced class distributions. This core problem can be addressed by applying a class balancing technique. One such technique is the Synthetic Minority Oversampling Technique for Nominal and Continuous (SMOTE-NC), which is suitable for datasets containing both nominal and continuous variables. The aim of this research is to classify Honda motorcycle loan customer data at Company Z using the K-NN method combined with SMOTE-NC to address data imbalance. This research method is experimental, using a 10-fold cross-validation approach to partition training and testing data. The input variables include gender, occupation, length of installment, income, installment amount, motorcycle price, and down payment, while the output variable is payment status (current or non-current). The results of this research are: the optimal K value for classification using K-NN with SMOTE-NC is K = 1, with an average APER (Average Probability of Error Rate) of 0.143. The best result is found in subset 8 with an APER value of 0.033. In this subset, out of 61 data points, 34 current-status customers are correctly classified as current, and 25 non-current-status customers are correctly classified as non-current, with only one misclassification in each class. The conclusion of this study is that the combination of SMOTE-NC and K-NN (K=1) provides high classification accuracy for imbalanced data, and can be effectively used to support credit risk assessment in motorcycle financing.
Downloads
References
[1] J. O. Ogunleye, “The Concept of Data Mining,” in Data Mining, C. Thomas, Ed. Rijeka: IntechOpen, 2021. doi: 10.5772/intechopen.99417.
[2] M. Chaudhry, I. Shafi, M. Mahnoor, D. L. R. Vargas, E. B. Thompson, and I. Ashraf, “A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective,” Symmetry (Basel)., vol. 15, no. 9, pp. 1–44, 2023, doi: 10.3390/sym15091679.
[3] B. Ghasemkhani, K. F. Balbal, and D. Birant, “A New Predictive Method for Classification Tasks in Machine Learning: Multi-Class Multi-Label Logistic Model Tree (MMLMT),” Mathematics, vol. 12, no. 18, pp. 1–27, 2024, doi: 10.3390/math12182825.
[4] F. Y. Pamuji, “Pengujian Metode SMOTE Untuk Penanganan Data Tidak Seimbang Pada Dataset Binary,” Semin. Nas. Sist. Inf., pp. 3200–3208, 2022.
[5] A. Syukron, Sardiarinto, E. Saputro, and P. Widodo, “Penerapan Metode Smote Untuk Mengatasi Ketidakseimbangan Kelas Pada Prediksi Gagal Jantung,” J. Teknol. Inf. dan Terap., vol. 10, no. 1, pp. 47–50, 2023, doi: 10.25047/jtit.v10i1.313.
[6] N. V Chawla, K. W. Bowyer, and L. O. Hall, “SMOTE : Synthetic Minority Over-sampling TEchnique,” J. Artif. Intell. Res., vol. 16, pp. 341–378, 2002.
[7] D. T. Utari, “Integration of Svm and Smote-Nc for Classification of Heart Failure Patients,” BAREKENG J. Math. Its Appl., vol. 17, no. 4, pp. 2263–2272, 2023, doi: 10.30598/barekengvol17iss4pp2263-2272.
[8] E. Bu’ulolo, I. S. Tampubolon, C. V. Nababan, and L. N. Nasution, “Implementasi Algoritma K-Nearest Neighbor (K-NN) Dalam Klasifikasi Kredit Motor,” Bull. Inf. Syst. Res., vol. 1, no. 1, pp. 18–22, 2022.
[9] A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 3, no. 2, pp. 196–201, Aug. 2019, doi: 10.29207/resti.v3i2.945.
[10] J. Allgaier and R. Pryss, “Cross-Validation Visualized: A Narrative Guide to Advanced Methods,” Mach. Learn. Knowl. Extr., vol. 6, no. 2, pp. 1378–1388, 2024, doi: 10.3390/make6020065.
[11] R. S. P. Pratama, M. N. Hayati, and R. Goejantoro, “Klasifikasi Status Hipertensi Pasien UPTD Puskesmas Sempaja, Kota Samarinda Menggunakan Metode K-Nearest Neighbor,” J. Eksponensial, vol. 14, no. 2, pp. 67–74, 2023, doi: 10.30872/eksponensial.v14i2.1009.
[12] U. M. martha, Dodi Vionanda, Dony Permana, and Zilrahmi, “Comparison of Quadratic Discrimination Analysis with Robust Quadratic Discrimination,” UNP J. Stat. Data Sci., vol. 2, no. 469–474, pp. 469–474, Nov. 2024, doi: 10.24036/ujsds/vol2-iss4/315.
[13] T. T. Muryono and Irwansyah, “Implementasi Data Mining Untuk Menentukan Kelayakan Pemberian Kredit Dengan Menggunakan Algoritma K-Nearest Neighbors (K-NN),” Infotech J. Technol. Inf., vol. 6, no. 1, pp. 43–48, 2020.
[14] M. A. Latief, L. R. Nabila, W. Miftakhurrahman, S. Ma’rufatullah, and H. Tantyoko, “Handling Imbalance Data using Hybrid Sampling SMOTE-ENN in Lung Cancer Classification,” Int. J. Eng. Comput. Sci. Appl., vol. 3, no. 1, pp. 11–18, 2024, doi: 10.30812/ijecsa.v3i1.3758.
[15] H. Hairani and D. Priyanto, “A New Approach of Hybrid Sampling SMOTE and ENN to the Accuracy of Machine Learning Methods on Unbalanced Diabetes Disease Data,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 8, pp. 585–590, 2023, doi: https://dx.doi.org/10.14569/IJACSA.2023.0140864.
[16] Baiq Candra Herawati, Hairani Hairani, and Juvinal Ximenes Guterres, “SMOTE Variants and Random Forest Method: A Comprehensive Approach to Breast Cancer Classification,” Int. J. Eng. Contin., vol. 3, no. 1, pp. 12–23, 2024, doi: 10.58291/ijec.v3i1.147.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Anjani Anjani, Memi Nor Hayati, Surya Prangga

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.









