Evaluating Different K Values in K-Fold Cross Validation for Binary Logistic Regression to Classify Poverty
DOI:
https://doi.org/10.30812/varian.v8i2.4403Keywords:
Binary Logistic Regression, Classification, K-Fold Cross Validation, Poverty Depth LevelsAbstract
Data mining is essential for decision-makers to analyze and extract insights from data efficiently. Classification is one of the data mining techniques used to organize data based on its features, helping to identify patterns and make predictions. This study evaluates Binary Logistic Regression (BLR), a type of generalized linear model that suitable for binary outcomes, for classifying poverty depth across Indonesian regencies/cities in 2022, with a focus on the impact of different K values in K-Fold Cross Validation. The dataset includes 514 regencies/cities, with the Poverty Depth Index as the target variable, categorized into high (1) and low (0) levels, using 11 predictor variables. K-Fold Cross Validation was performed with K values of 3, 5, and 10, using accuracy and Area Under Curve (AUC) as evaluation metrics. The mean accuracy values for BLR are 75.7% for K=3, 74.3% for K=5, and 75.1% for K=10. Results show that K=3 offers the highest accuracy in classifying poverty depth in Indonesia, with the lowest standard deviation of 0.03. However, K=10 demonstrates superior discriminative ability in BLR, reflected by a higher AUC value. This study highlights the significant influence of K values in K-Fold Cross Validation on BLR performance.
Downloads
References
Agarwal, N., & Das, S. (2020). Interpretable Machine Learning Tools: A Survey. 2020 IEEE Symposium Series on Computational
Intelligence (SSCI), 1528–1534. https://doi.org/10.1109/SSCI47803.2020.9308260
Agresti, A. (2018, November 20). An Introduction to Categorical Data Analysis. John Wiley & Sons.
Arisandi, R. R. R., Warsito, B., & Hakim, A. R. (2022). Aplikasi Na¨ıve Bayes Classifier (NBC) pada Klasifikasi Status Gizi Balita
Stunting dengan Pengujian K-Fold Cross Validation. Jurnal Gaussian, 11(1), 130–139. https://doi.org/10.14710/j.gauss.
v11i1.33991
Asriningtias, Y., & Mardhiyah, R. (2014). Aplikasi Data Mining Untuk Menampilkan Informasi Tingkat Kelulusan Mahasiswa.
Jurnal Informatika, 8(1), 837–848. https://journal.uad.ac.id/index.php/JIFO/article/view/2082
Asysyifa, S., Vionanda, D., Amalita, N., & Fitria, D. (2023). Comparison of Error Rate Prediction Methods in Binary Logistic
Regression Model for Balanced Data. UNP Journal of Statistics and Data Science, 1(4), 256–263. https://doi.org/10.
24036/ujsds/vol1-iss4/90
Azis, H. (2024). Assessing the Performance of Logistic Regression in Heart Disease Detection through 5-Fold Cross-Validation.
International Journal of Artificial Intelligence in Medical Issues, 2(1), 1–11. https://doi.org/10.56705/ijaimi.v2i1.137
Braun, T., Spiliopoulos, S., Veltman, C., Hergesell, V., Passow, A., Tenderich, G., Borggrefe, M., & Koerner, M. M. (2020). Detection
of myocardial ischemia due to clinically asymptomatic coronary artery stenosis at rest using supervised artificial intelligenceenabled
vectorcardiography - A five-fold cross validation of accuracy. Journal of Electrocardiology, 59, 100–105. https:
//doi.org/10.1016/j.jelectrocard.2019.12.018
Hendayanti, N. P. N., & Nurhidayati, M. (2020). Regresi Logistik Biner dalam Penentuan Ketepatan Klasifikasi Tingkat Kedalaman
Kemiskinan Provinsi-Provinsi di Indonesia. Sainstek : Jurnal Sains dan Teknologi, 12(2), 63–70. https://doi.org/10.
31958/js.v12i2.2483
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Third edition). Wiley.
Larose, D. T., & Larose, C. D. (2014). Discovering knowledge in data: An introduction to data mining (Second edition).Wiley.
Ling, H., Qian, C., Kang, W., Liang, C., & Chen, H. (2019). Combination of Support Vector Machine and K-Fold cross validation to
predict compressive strength of concrete in marine environment. Construction and Building Materials, 206, 355–363.
https://doi.org/10.1016/j.conbuildmat.2019.02.071
Liu, Y. (2019, February 28). Python Machine Learning By Example - Second Edition: Implement machine learning algorithms
and techniques to build intelligent systems, 2nd Edition (2nd edition). Packt Publishing.
Nti, I. K., Nyarko-Boateng, O., & Aning, J. (2021). Performance of Machine Learning Algorithms with Different K Values in Kfold
Cross Validation. International Journal of Information Technology and Computer Science, 13(6), 61–71. https:
//doi.org/10.5815/ijitcs.2021.06.05
Nurrizqi, A. I., Erfiani, E., Indahwati, I., Fitrianto, A., & Amelia, R. (2022). Pemodelan Regresi Logistik Berbasis Backward Elimination
Untuk Mengetahui Faktor yang Mempengaruhi Tingkat Kemiskinan di Indonesia Tahun 2021. Jurnal Statistika
dan Aplikasinya, 6(2), 160–170. https://doi.org/10.21009/JSA.06202
Prasetyo, E. (2012). Data Mining: Konsep dan Aplikasi menggunakan MATLAB. Penerbit Andi.
Prusty, S., Patnaik, S., Dash, S. K., & Priyadarsini Prusty, S. G. (2024). SEMeL-LR: An improvised modeling approach using a
meta-learning algorithm to classify breast cancer. Engineering Applications of Artificial Intelligence, 129, 107630.
https://doi.org/10.1016/j.engappai.2023.107630
Putri, F.W., Vionanda, D., Putra, A. A., & Fitri, F. (2023). Comparison of Error Prediction Methods in Claassification Modeling with
CHAID Methods for Balanced Data. UNP Journal of Statistics and Data Science, 1(5), 456–463. https://doi.org/10.
24036/ujsds/vol1-iss5/116
Sahputra, D. R., Sulistiani, M., Aulia, E. N., Fadhillah, R., Fadhilah, K., Sumarni, S., Fadhilah, A. N., Wirawan, A. S., & Wasono,
W. (2023). Model Regresi Logistik pada Indeks Kedalaman Kemiskinan di Provinsi Jawa Timur Tahun 2021. Prosiding
Seminar Nasional Matematika dan Statistika, 3(1), 1–9. https://jurnal.fmipa.unmul.ac.id/index.php/SNMSA/article/
view/1159
Sasongko, T. B. (2016). Komparasi dan Analisis Kinerja Model Algoritma SVM dan PSO-SVM (Studi Kasus Klasifikasi Jalur Minat
SMA). Jurnal Teknik Informatika dan Sistem Informasi, 2(2). https://doi.org/10.28932/jutisi.v2i2.627
Tougui, I., Jilbab, A., & Mhamdi, J. E. (2021). Impact of the Choice of Cross-Validation Techniques on the Results of Machine
Learning-Based Diagnostic Applications. Healthcare Informatics Research, 27(3), 189–199. https://doi.org/10.4258/hir.
2021.27.3.189
Widodo, S., Brawijaya, H., & Samudi, S. (2022). Stratified K-fold cross validation optimization on machine learning for prediction.
Sinkron : jurnal dan penelitian teknik informatika, 6(4), 2407–2414. https://doi.org/10.33395/sinkron.v7i4.11792
Wong, T.-T. (2015). Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognition,
48(9), 2839–2846. https://doi.org/10.1016/j.patcog.2015.03.009
World Bank. (2022). Population 2022. Retrieved May 14, 2024, from https://databank.worldbank.org/source/world-developmentindicators
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Julia Oriana Sinaga, M. Fathurrahman, Sri Wahyuningsih, Memi Nor Hayati

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Most read articles by the same author(s)
- M. Fathurahman, Pemodelan Indeks Pembangunan Kesehatan Masyarakat Kabupaten/Kota di Pulau Kalimantan Menggunakan Pendekatan Regresi Probit , Jurnal Varian: Vol. 2 No. 2 (2019)
- Gerald Claudio Messakh, Memi Nor Hayati, Sifriyani Sifriyani, Comparison K-Means and Fuzzy C-Means In Regencies/Cities Grouping Based on Educational Indicators , Jurnal Varian: Vol. 7 No. 1 (2023)
- Andrea Tri Rian Dani, M. Fathurahman, Ludia Ni'matuzzahroh, Regita Putri Permata, Fachrian Bimantoro Putra, Exploring Crime Problems from A Statistical Point of View with Negative Binomial Regression , Jurnal Varian: Vol. 8 No. 2 (2025)