Enhancing Predictive Models: An In-depth Analysis of Feature Selection Techniques Coupled with Boosting Algorithms

  • Neny Sulistianingsih Universitas Bumigora, Mataram, Indonesia
  • Galih Hendro Martono Universitas Bumigora, Mataram, Indonesia https://orcid.org/0000-0002-0697-010X
Keywords: Boosting Algorithm, Feature Selection, Fetal Health Dataset, Fetal Health, Recursive Feature Elimination

Abstract

This research addresses the critical need to enhance predictive models for fetal health classification using Cardiotocography (CTG) data. The literature review underscores challenges in imbalanced labels, feature selection, and efficient data handling. This paper aims to enhance predictive models for fetal health classification using Cardiotocography (CTG) data by addressing challenges related to imbalanced labels, feature selection, and efficient data handling. The study uses Recursive Feature Elimination (RFE) and boosting algorithms (XGBoost, AdaBoost, LightGBM, CATBoost, and Histogram-Based Boosting) to refine model performance. The results reveal notable variations in precision, Recall, F1-Score, accuracy, and AUC across different algorithms and RFE applications. Notably, Random Forest with XGBoost exhibits superior performance in precision (0.940), Recall (0.890), F1-Score (0.920), accuracy (0.950), and AUC (0.960). Conversely, Logistic Regression with AdaBoost demonstrates lower performance. The absence of RFE also impacts model effectiveness. In conclusion, the study successfully employs RFE and boosting algorithms to enhance fetal health classification models, contributing valuable insights for improved prenatal diagnosis.

Downloads

Download data is not yet available.

Author Biographies

Neny Sulistianingsih, Universitas Bumigora, Mataram, Indonesia

Lecturer from Departement of Computer Science, Faculty of Engineering, Universitas Bumigora, Mataram, Indonesia, 83127

Galih Hendro Martono, Universitas Bumigora, Mataram, Indonesia

Lecturer from Departement of Computer Science, Faculty of Engineering, Universitas Bumigora, Mataram, Indonesia, 83127

References

[1] L. Davidson and M. R. Boland, “Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and
machine learning methods to improve pregnancy outcomes,” Briefings in Bioinformatics, vol. 22, no. 5, p. bbaa369, Sep. 2021.
[Online]. Available: https://academic.oup.com/bib/article/doi/10.1093/bib/bbaa369/6065792
[2] P. Garcia-Canadilla, S. Sanchez-Martinez, F. Crispi, and B. Bijnens, “Machine Learning in Fetal Cardiology:
What to Expect,” Fetal Diagnosis and Therapy, vol. 47, no. 5, pp. 363–372, 2020. [Online]. Available: https:
//www.karger.com/Article/FullText/505021
[3] N. Muhammad Hussain, A. U. Rehman, M. T. B. Othman, J. Zafar, H. Zafar, and H. Hamam, “Accessing Artificial Intelligence
for Fetus Health Status Using Hybrid Deep Learning Algorithm (AlexNet-SVM) on Cardiotocographic Data,” Sensors, vol. 22,
no. 14, p. 5103, Jul. 2022. [Online]. Available: https://www.mdpi.com/1424-8220/22/14/5103
[4] R. Fung, J. Villar, A. Dashti, L. C. Ismail, E. Staines-Urias, E. O. Ohuma, L. J. Salomon, C. G. Victora, F. C. Barros,
A. Lambert, M. Carvalho, Y. A. Jaffer, J. A. Noble, M. G. Gravett, M. Purwar, R. Pang, E. Bertino, S. Munim, A. M.
Min, R. McGready, S. A. Norris, Z. A. Bhutta, S. H. Kennedy, A. T. Papageorghiou, A. Ourmazd, S. Norris, S. Abbott,
A. Abubakar, J. Acedo, I. Ahmed, F. Al-Aamri, J. Al-Abduwani, J. Al-Abri, D. Alam, E. Albernaz, H. Algren, F. Al-Habsi,
M. Alija, H. Al-Jabri, H. Al-Lawatiya, B. Al-Rashidiya, D. Altman, W. Al-Zadjali, H. Andersen, L. Aranzeta, S. Ash,
M. Baricco, F. Barros, H. Barsosio, C. Batiuk, M. Batra, J. Berkley, E. Bertino, M. Bhan, B. Bhat, Z. Bhutta, I. Blakey,
S. Bornemeier, A. Bradman, M. Buckle, O. Burnham, F. Burton, A. Capp, V. Cararra, R. Carew, V. Carrara, A. Carter,
M. Carvalho, P. Chamberlain, I. L. Cheikh, L. Cheikh Ismail, A. Choudhary, S. Choudhary, W. Chumlea, C. Condon,
L. Corra, C. Cosgrove, R. Craik, M. Da Silveira, D. Danelon, T. De Wet, E. De Leon, S. Deshmukh, G. Deutsch, J. Dhami,
N. P. Di, M. Dighe, H. Dolk, M. Domingues, D. Dongaonkar, D. Enquobahrie, B. Eskenazi, F. Farhi, M. Fernandes,
D. Finkton, S. Fonseca, I. Frederick, M. Frigerio, P. Gaglioti, C. Garza, G. Gilli, P. Gilli, M. Giolito, F. Giuliani, J. Golding,
M. Gravett, S. Gu, Y. Guman, Y. He, L. Hoch, S. Hussein, D. Ibanez, C. Ioannou, N. Jacinta, N. Jackson, Y. Jaffer,
S. Jaiswal, J. Jimenez-Bustos, F. Juangco, L. Juodvirsiene, M. Katz, B. Kemp, S. Kennedy, M. Ketkar, V. Khedikar, M. Kihara,
J. Kilonzo, C. Kisiang’ani, J. Kizidio, C. Knight, H. Knight, N. Kunnawar, A. Laister, A. Lambert, A. Langer, T. Lephoto,
A. Leston, T. Lewis, H. Liu, S. Lloyd, P. Lumbiganon, S. Macauley, E. Maggiora, C. Mahorkar, M. Mainwaring, L. Malgas,
A. Matijasevich, K. McCormick, R. McGready, R. Miller, A. Min, A. Mitidieri, V. Mkrtychyan, B. Monyepote, D. Mota,I. Mulik, S. Munim, D. Muninzwa, N. Musee, S. Mwakio, H. Mwangudzah, R. Napolitano, C. Newton, V. Ngami, J. Noble,
S. Norris, T. Norris, F. Nosten, K. Oas, M. Oberto, L. Occhi, R. Ochieng, E. Ohuma, E. Olearo, I. Olivera, M. Owende,
C. Pace, Y. Pan, R. Pang, A. Papageorghiou, B. Patel, V. Paul, W. Paulsene, F. Puglia, M. Purwar, V. Rajan, A. Raza,
D. Reade, J. Rivera, D. Rocco, F. Roseman, S. Roseman, C. Rossi, P. Rothwell, I. Rovelli, K. Saboo, R. Salam, M. Salim,
L. Salomon, L. M. Sanchez, J. Sande, I. Sarris, S. Savini, I. Sclowitz, A. Seale, J. Shah, M. Sharps, C. Shembekar, Y. Shen,
M. Shorten, F. Signorile, A. Singh, S. Sohoni, A. Somani, T. Sorensen, A. Soria-Frisch, E. Staines Urias, A. Stein, W. Stones,
V. Taori, K. Tayade, T. Todros, R. Uauy, A. Varalda, M. Venkataraman, C. Victora, J. Villar, S. Vinayak, S. Waller,
L. Walusuna, J. Wang, L. Wang, S. Wanyonyi, D. Weatherall, S. Wiladphaingern, A. Wilkinson, D. Wilson, M. Wu, Q. Wu,
K. Wulff, D. Yellappan, Y. Yuan, S. Zaidi, G. Zainab, J. Zhang, and Y. Zhang, “Achieving accurate estimates of fetal
gestational age and personalised predictions of fetal growth based on data from an international prospective cohort study:
a population-based machine learning study,” The Lancet Digital Health, vol. 2, no. 7, pp. e368–e375, Jul. 2020. [Online].
Available: https://linkinghub.elsevier.com/retrieve/pii/S258975002030131X
[5] M. T. Alam, M. A. I. Khan, N. N. Dola, T. Tazin, M. M. Khan, A. A. Albraikan, and F. A. Almalki, “Comparative Analysis of
Different Efficient Machine Learning Methods for Fetal Health Classification,” Applied Bionics and Biomechanics, vol. 2022,
pp. 1–12, Apr. 2022. [Online]. Available: https://www.hindawi.com/journals/abb/2022/6321884/
[6] N. Rahmayanti, H. Pradani, M. Pahlawan, and R. Vinarti, “Comparison of machine learning algorithms to classify
fetal health using cardiotocogram data,” Procedia Computer Science, vol. 197, pp. 162–171, 2022. [Online]. Available:
https://linkinghub.elsevier.com/retrieve/pii/S1877050921023541
[7] R. R. Dixit, “Predicting Fetal Health using Cardiotocograms: A Machine Learning Approach,” Journal of Advanced
Analytics in Healthcare Management, vol. 6, no. 1, pp. 43–57, Jan. 2022, number: 1. [Online]. Available:
https://research.tensorgate.org/index.php/JAAHM/article/view/38
[8] M. M. Islam, M. Rokunojjaman, A. Amin, M. N. Akhtar, and I. H. Sarker, “Diagnosis and Classification of Fetal Health Based
on CTG Data Using Machine Learning Techniques,” in Machine Intelligence and Emerging Technologies. Springer, Cham,
2023, pp. 3–16, iSSN: 1867-822X. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-34622-4 1
[9] J. Xia, L. Sun, S. Xu, Q. Xiang, J. Zhao, W. Xiong, Y. Xu, and S. Chu, “A Model Using Support
Vector Machines Recursive Feature Elimination (SVM-RFE) Algorithm to Classify Whether COPD Patients Have
Been Continuously Managed According to GOLD Guidelines,” International Journal of Chronic Obstructive Pulmonary
Disease, vol. Volume 15, pp. 2779–2786, Nov. 2020. [Online]. Available: https://www.dovepress.com/
a-model-using-support-vector-machines-recursive-feature-elimination-sv-peer-reviewed-article-COPD
[10] M. Awad and S. Fraihat, “Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method
for Machine Learning-Based Intrusion Detection Systems,” Journal of Sensor and Actuator Networks, vol. 12, no. 5, p. 67,
Sep. 2023. [Online]. Available: https://www.mdpi.com/2224-2708/12/5/67
[11] H. M. Alshanbari, T. Mehmood, W. Sami, W. Alturaiki, M. A. Hamza, and B. Alosaimi, “Prediction and Classification of
COVID-19 Admissions to Intensive Care Units (ICU) Using Weighted Radial Kernel SVM Coupled with Recursive Feature
Elimination (RFE),” Life, vol. 12, no. 7, p. 1100, Jul. 2022. [Online]. Available: https://www.mdpi.com/2075-1729/12/7/1100
[12] Y. Han, L. Huang, and F. Zhou, “A dynamic recursive feature elimination framework (dRFE) to further refine
a set of OMIC biomarkers,” Bioinformatics, vol. 37, no. 15, pp. 2183–2189, Aug. 2021. [Online]. Available:
https://academic.oup.com/bioinformatics/article/37/15/2183/6124282
[13] W. Lian, G. Nie, B. Jia, D. Shi, Q. Fan, and Y. Liang, “An Intrusion Detection Method Based on Decision Tree-Recursive
Feature Elimination in Ensemble Learning,” Mathematical Problems in Engineering, vol. 2020, pp. 1–15, Nov. 2020. [Online].
Available: https://www.hindawi.com/journals/mpe/2020/2835023/
[14] D. A. Otchere, T. O. A. Ganat, J. O. Ojero, B. N. Tackie-Otoo, and M. Y. Taki, “Application of gradient
boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation
predictions,” Journal of Petroleum Science and Engineering, vol. 208, p. 109244, Jan. 2022. [Online]. Available:
https://linkinghub.elsevier.com/retrieve/pii/S0920410521008998
[15] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining. San Francisco California USA: ACM, Aug. 2016, pp.
785–794. [Online]. Available: https://dl.acm.org/doi/10.1145/2939672.2939785
[16] Y. Freund and R. E. Schapire, “A Short Introduction to Boosting,” Society, vol. 14, no. 5, pp. 771–780, 2009.
[17] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “LightGBM: A Highly Efficient
Gradient Boosting Decision Tree,” Advances in Neural Information Processing Systems, vol. 30, 2017. [Online]. Available:
https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
[18] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with
categorical features,” Advances in Neural Information Processing Systems, vol. 31, 2018. [Online]. Available:
https://proceedings.neurips.cc/paper/2018/hash/14491b756b3a51daac41c24863285549-Abstract.html
[19] Z. Yuan and L. Duan, “Construction Method of Sentiment Lexicon Based on Word2vec,” in 2019 IEEE 8th Joint International
Information Technology and Artificial Intelligence Conference (ITAIC). Chongqing, China: IEEE, May 2019, pp. 848–851.
[Online]. Available: https://ieeexplore.ieee.org/document/8785471/
[20] W. Ramadhan, S. Astri Novianty, and S. Casi Setianingsih, “Sentiment analysis using multinomial logistic regression,” in 2017
International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC). Yogyakarta: IEEE,
Sep. 2017, pp. 46–49. [Online]. Available: https://ieeexplore.ieee.org/document/8226700/
[21] M. T. H. K. Tusar and M. T. Islam, “A Comparative Study of Sentiment Analysis Using NLP and Different Machine
Learning Techniques on US Airline Twitter Data,” in 2021 International Conference on Electronics, Communications
and Information Technology (ICECIT). Khulna, Bangladesh: IEEE, Sep. 2021, pp. 1–4. [Online]. Available:
https://ieeexplore.ieee.org/document/9641336/
[22] L. Yu and N. Zhou, “Survey of Imbalanced Data Methodologies,” 2021, publisher: [object Object] Version Number: 1.
[Online]. Available: https://arxiv.org/abs/2104.02240
[23] Dept. of Computer Science & Engineering, Hajee Mohammad Danesh Science and Technology University, Bangladesh,
P. Bhowmik, P. C. Bhowmik, U. A. M. Ehsan Ali, and M. Sohrawordi, “Cardiotocography Data Analysis to Predict Fetal
Health Risks with Tree-Based Ensemble Learning,” International Journal of Information Technology and Computer Science,
vol. 13, no. 5, pp. 30–40, Oct. 2021. [Online]. Available: https://www.mecs-press.org/ijitcs/ijitcs-v13-n5/v13n5-3.html
Published
2024-03-08
How to Cite
Sulistianingsih, N., & Martono, G. (2024). Enhancing Predictive Models: An In-depth Analysis of Feature Selection Techniques Coupled with Boosting Algorithms. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 23(2), 353-364. https://doi.org/https://doi.org/10.30812/matrik.v23i2.3788
Section
Articles