Stroke Prediction with Enhanced Gradient Boosting Classifier and Strategic Hyperparameter

Dela Ananda Setyarini; Agnes Ayu Maharani Dyah Gayatri; Christian Sri Kusuma Aditya; Didih Rizki Chandranegara

doi:10.30812/matrik.v23i2.3555

Dela Ananda Setyarini Universitas Muhammadiyah Malang, Malang, Indonesia
Agnes Ayu Maharani Dyah Gayatri Universitas Muhammadiyah Malang, Malang, Indonesia
Christian Sri Kusuma Aditya Universitas Muhammadiyah Malang, Malang, Indonesia
Didih Rizki Chandranegara Universitas Muhammadiyah Malang, Malang, Indonesia

DOI: https://doi.org/10.30812/matrik.v23i2.3555

Keywords: Comparison Boosting, Hyperparameter, Machine Learning, Stroke

Abstract

A stroke is a medical condition that occurs when the blood supply to the brain is interrupted. Stroke can cause damage to the brain that can potentially affect a person's function and ability to move, speak, think, and feel normally. The effect of stroke on health emphasizes the importance of stroke detection, so an effective model is needed in predicting stroke. This research aimed to find a new approach that can improve the performance of stroke prediction by comparing four derivative algorithms from Gradient Boosting by adding hyperparameters tuning. The addition of hyperparameters was used to find the best combination of parameter values that can improve the model accuracy. The methods used in this research were Categorical Boosting, Histogram Gradient Boosting, Light Gradient Boosting, and Extreme Gradient Boosting. The research involved retrieving, cleaning, and analyzing data and then the model performance was evaluated with a confusion matrix and execution time. The results obtained were Light Gradient Boosting with Hyperparameter RandomSearchCV achieved the highest accuracy at 95% among the algorithms tested, while also being the fastest in execution. The contribution of this research to the medical field can help doctors and patients predict the occurrence of stroke early and reduce serious consequences.

Downloads

References

[1] H. Al-Zubaidi, M. Dweik, and A. Al-Mousa, “Stroke Prediction Using Machine Learning Classification Methods,” in 2022 International
Arab Conference on Information Technology (ACIT). IEEE, nov 2022, pp. 1–8, https://doi.org/10.1109/ACIT57182.
2022.10022050.
[2] P. Govindarajan, R. K. Soundarapandian, A. H. Gandomi, R. Patan, P. Jayaraman, and R. Manikandan, “Classification of
stroke disease using machine learning algorithms,” Neural Computing and Applications, vol. 32, no. 3, pp. 817–828, feb 2020,
https://doi.org/10.1007/s00521-019-04041-y.
[3] M. GholamAzad, J. Pourmahmoud, A. R. Atashi, M. Farhoudi, and R. Deljavan Anvari, “Predicting of Stroke Risk Based On
Clinical Symptoms Using the Logistic Regression Method,” Int. J. Industrial Mathematics, vol. 14, no. 2, https://doi.org/10.
30495/ijim.2022.64325.1559.
[4] N. A. Arifuddin, I. W. R. Pinastawa, N. Anugraha, and M. G. Pradana, “Classification of Stroke Opportunities with Neural
Network and K-Nearest Neighbor Approaches,” SinkrOn, vol. 8, no. 2, pp. 688–693, apr 2023, https://doi.org/10.33395/sinkron.
v8i2.12228.
[5] B. Imran, “Classification of stroke patients using data mining with AdaBoost , Decision Tree and Random Forest models,” no.
December, 2022, https://doi.org/10.33096/ilkom.v14i3.1328.
[6] T. R. G, S. Bhattacharya, P. K. R. Maddikunta, S. Hakak, W. Z. Khan, A. K. Bashir, A. Jolfaei, and U. Tariq, “Antlion resampling
based deep neural network model for classification of imbalanced multimodal stroke dataset,” Multimedia Tools and
Applications, vol. 81, no. 29, pp. 41 429–41 453, dec 2022, https://doi.org/10.1007/s11042-020-09988-y.
[7] Sydney Caulfeild, Sarah Pak, Nathanael Yao, and Hoz Rashid, “Stroke Prediction,” Journal of Mechanics Engineering and
Automation, vol. 11, no. 6, dec 2021, https://doi.org/10.17265/2159-5275/2021.06.004.
[8] H. Saleh, S. F. Abd-El Ghany, E. M. G. Younis, N. Omran, H. Ahmed, E. M. G. Youn, N. F. Omran, and A. A. Ali, “Stroke
Prediction using Distributed Machine Learning Based on Apache Spark Hybrid Online-Offline Ranking based on Metaheuristic
Methods with Click Models View project My Articles After PhD for Promotion to Associated Professor Job View project
Stroke Prediction using Distributed Machine Learning Based on Apache Spark,” International Journal of Advanced Science
and Technology, vol. 28, no. 15, pp. 89–97, 2019, https://doi.org/10.13140/RG.2.2.13478.68162.
[9] T. Tazin, M. N. Alam, N. N. Dola, M. S. Bari, S. Bourouis, and M. Monirujjaman Khan, “Stroke Disease Detection and
Prediction Using Robust Learning Approaches,” Journal of Healthcare Engineering, vol. 2021, pp. 1–12, nov 2021, https:
//doi.org/10.1155/2021/7633381.
[10] M. U. Emon, M. S. Keya, T. I. Meghla, and M. Rahman, “Performance Analysis of Machine Learning Approaches in Stroke
Prediction,” no. January, 2021, https://doi.org/10.1109/ICECA49313.2020.9297525.
[11] P. A. Riyantoko, T. M. Fahrudin, K. M. Hindrayani, and M. Idhom, “Exploratory Data Analysis and Machin e Learning Algorithms
to Classifying Stroke Disease,” IJCONSIST JOURNALS, vol. 2, no. 02, pp. 77–82, jun 2021, https://doi.org/10.33005/
ijconsist.v2i02.49.
[12] J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, “Boosting methods for multi-class imbalanced data classification:
an experimental review,” Journal of Big Data, vol. 7, no. 1, p. 70, dec 2020, https://doi.org/10.1186/s40537-020-00349-y.
[13] F. Alzamzami, M. Hoda, and A. El Saddik, “Light Gradient Boosting Machine for General Sentiment Classification on Short
Texts: A Comparative Evaluation,” IEEE Access, vol. 8, pp. 101 840–101 858, 2020, https://doi.org/10.1109/ACCESS.2020.
2997330.
[14] M. Robben, M. S. Nasr, A. Das, M. Huber, J. Jaworski, J. Weidanz, J. Luber, M. Sadegh Nasr, M. Hu-Ber, and J. . Luber,
“Selection of an Ideal Machine Learning Framework for Predicting Perturbation Effects on Network Topology of Bacterial
KEGG Pathways,” The 13th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, August
07ˆa10, 2022, Chicago, IL, vol. 1, https://doi.org/10.1101/2022.07.21.501034.
[15] M. A. S. Yudono, A. D. W. M. Sidik, I. H. Kusumah, A. Suryana, A. P. Junfithrana, A. Nugraha, M. Artiyasa, E. Edwinanto,
and Y. Imamulhak, “Bitcoin USD Closing Price (BTC-USD) Comparison Using Simple Moving Average And Radial Basis
Function Neural Network Methods,” FIDELITY : Jurnal Teknik Elektro, vol. 4, no. 2, pp. 29–34, may 2022, https://doi.org/10.
52005/fidelity.v4i2.74.
[16] P. Li, X. Rao, J. Blase, Y. Zhang, X. Chu, and C. Zhang, “CleanML: A Study for Evaluating the Impact of Data Cleaning on
ML Classification Tasks,” apr 2019.
[17] W. M. Hameed and N. A. Ali, “Comparison of Seventeen Missing Value Imputation Techniques,” Journal of Hunan University
Natural Sciences, vol. 49, no. 7, pp. 26–36, jul 2022, https://doi.org/10.55463/issn.1674-2974.49.7.4.
[18] A. K. Venkitaraman and V. S. R. Kosuru, “Hybrid Deep Learning Mechanism for Charging Control and Management of Electric
Vehicles,” European Journal of Electrical Engineering and Computer Science, vol. 7, no. 1, pp. 38–46, jan 2023, https://doi.
org/10.24018/ejece.2023.7.1.485.
[19] C. L. Krishna and P. V. S. Reddy, “Principal Component Analysis on Mixed Data For Deep Neural Network Classifier in
Banking System,” International Journal of Computer Sciences and Engineering, vol. 7, no. 5, pp. 129–134, may 2019, https:
//doi.org/10.26438/ijcse/v7i5.129134.
[20] A. Srinivasan and J. Basilio, “Predicting revenue generation in an online retail website using machine learning algorithm MSc
Research Project in Data Analytics.”
[21] S. Srivastava, R. Kumar Yadav, V. Narayan, and P. K. Mall, “An Ensemble Learning Approach For Chronic Kidney Disease
Classification,” Journal of Pharmaceutical Negative Results , vol. 13, https://doi.org/10.47750/pnr.2022.13.S10.279.
[22] E. Isaac and E. Chikweru, “Test for Significance of Pearson’s Correlation Coefficient ( ).”
[23] Y. Xia, “Correlation and association analyses in microbiome study integrating multiomics in health and disease,” 2020, pp.
309–491, https://doi.org/10.1016/bs.pmbts.2020.04.003.
[24] M. A. Janicka, M. A. Lango, and J. E. Stefanowski, “Using Information on Class Interrelations to Improve Classification
of Multiclass Imbalanced Data: A New Resampling Algorithm,” vol. 29, no. 4, pp. 769–781, 2019, https://doi.org/10.2478/
amcs-2019-0057.
[25] R. Ghorbani and R. Ghousi, “Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine
Learning Techniques,” IEEE Access, vol. 8, pp. 67 899–67 911, 2020, https://doi.org/10.1109/ACCESS.2020.2986809.
[26] K. Aditya, G. Wasis Wicaksono, H. Abi Sarwan Heryawan, and C. Sri Kusuma Aditya, “Sentiment Analysis of the 2024
Presidential Candidates Using SMOTE and Long Short Term Memory,” vol. 8, no. 2, pp. 279–286, 2023, https://doi.org/10.
32493/informatika.v8i2.32210.
[27] V. R. Joseph, “Optimal ratio for data splitting,” Statistical Analysis and Data Mining, vol. 15, no. 4, pp. 531–538, aug 2022,
https://doi.org/10.1002/sam.11583.
[28] I. K. Nti, A. F. Adekoya, B. A. Weyori, and O. Nyarko-Boateng, “Applications of artificial intelligence in engineering and
manufacturing: a systematic review,” Journal of Intelligent Manufacturing, vol. 33, no. 6, pp. 1581–1601, aug 2022, https:
//doi.org/10.1007/s10845-021-01771-6.
[29] T. T. Ngoc, L. V. Dai, and C. M. Thuyen, “Support Vector Regression based on Grid Search method of Hyperparameters for
Load Forecasting.”
[30] L. Zahedi, F. G. Mohammadi, S. Rezapour, M. W. Ohland, and M. H. Amini, “Search Algorithms for Automated Hyper-
Parameter Tuning,” apr 2021.
[31] E. S. Solano, P. Dehghanian, and C. M. Affonso, “Solar Radiation Forecasting Using Machine Learning and Ensemble Feature
Selection,” Energies, vol. 15, no. 19, p. 7049, sep 2022, https://doi.org/10.3390/en15197049.
[32] F. Zhou, H. Pan, Z. Gao, X. Huang, G. Qian, Y. Zhu, and F. Xiao, “Fire Prediction Based on CatBoost Algorithm,” Mathematical
Problems in Engineering, vol. 2021, pp. 1–9, jul 2021, https://doi.org/10.1155/2021/1929137.
[33] L. Sari, A. Romadloni, R. Lityaningrum, and H. D. Hastuti, “Implementation of LightGBM and Random Forest in Potential
Customer Classification,” TIERS Information Technology Journal, vol. 4, no. 1, pp. 43–55, jun 2023, https://doi.org/10.38043/
tiers.v4i1.4355.
[34] D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine
(LightGBM),” Diagnostics, vol. 11, no. 9, p. 1714, sep 2021, https://doi.org/10.3390/diagnostics11091714.
[35] J. Khiari and C. Olaverri-Monreal, “Boosting Algorithms for Delivery Time Prediction in Transportation Logistics,” in IEEE
International Conference on Data Mining Workshops, ICDMW, vol. 2020-November. IEEE Computer Society, nov 2020, pp.
251–258, https://doi.org/10.1109/ICDMW51313.2020.00043.
[36] G. Marvin, L. Grbˇci´c, S. Druˇzeta, and L. Kranjˇcevi´c, “Water distribution network leak localization with histogram-based gradient
boosting,” Journal of Hydroinformatics, vol. 25, no. 3, pp. 663–684, may 2023, https://doi.org/10.2166/hydro.2023.102.
[37] M. Ibrahim, H. Abdelraouf, K. M. Amin, and N. Semary, “International Journal of Computers and Information (IJCI) Keystroke
dynamics based user authentication using Histogram Gradient Boosting.”
[38] R. D. Abdu-Aljabar and O. A. Awad, “A Comparative analysis study of lung cancer detection and relapse prediction using
XGBoost classifier,” IOP Conference Series: Materials Science and Engineering, vol. 1076, no. 1, p. 012048, feb 2021, https:
//doi.org/10.1088/1757-899X/1076/1/012048.
[39] G. Abdurrahman and M. Sintawati, “Implementation of xgboost for classification of parkinson’s disease,” in Journal of Physics:
Conference Series, vol. 1538, no. 1. Institute of Physics Publishing, jun 2020, https://doi.org/10.1088/1742-6596/1538/1/
012024.
[40] A. Shebl, D. Abriha, A. S. Fahil, H. A. El-Dokouny, A. A. Elrasheed, and A´ . Csa´mer, “PRISMA hyperspectral data for lithological
mapping in the Egyptian Eastern Desert: Evaluating the support vector machine, random forest, and XG boost machine
learning algorithms,” Ore Geology Reviews, vol. 161, p. 105652, oct 2023, https://doi.org/10.1016/j.oregeorev.2023.105652.
[41] E. S. Theel, J. Harring, H. Hilgart, and D. Granger, “Performance Characteristics of Four High-Throughput Immunoassays for
Detection of IgG Antibodies against SARS-CoV-2,” 2020.
[42] M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19 083–19 095,
2022, https://doi.org/10.1109/ACCESS.2022.3151048.
[43] B. He, D. Dong, Y. She, C. Zhou, M. Fang, Y. Zhu, H. Zhang, Z. Huang, T. Jiang, J. Tian, and C. Chen, “Predicting response
to immunotherapy in advanced non-small-cell lung cancer using tumor mutational burden radiomic biomarker,” Journal for
ImmunoTherapy of Cancer, vol. 8, no. 2, jul 2020, https://doi.org/10.1136/jitc-2020-000550.