Stroke Prediction Using Machine Learning Method with Extreme Gradient Boosting Algorithm

  • Abd Mizwar A Rahim Universitas Amikom Yogyakarta
  • Andi Sunyoto Universitas Amikom Yogyakarta
  • Muhammad Rudyanto Arief Universitas Amikom Yogyakarta
Keywords: Cardiovascular Disease, Ensemble Learning, Machine Learning, Stroke Prediction, Xtreme Gradient Boosting


Based on data obtained from WHO, stroke is a disease that ranks as the second most deadly disease. The cause of a stroke is when a blood vessel is hit or ruptured, resulting in a part of the brain not getting the blood supply that carries the oxygen it needs, leading to death. By utilizing technology in the health sciences, especially in the health sector, machine learning models can adjust and make it easier for users to predict certain diseases. Previous studies have had problems with low accuracy when used in healthcare. The purpose of this research is to increase accuracy by proposing the application of one of the ensemble learning algorithms, namely the Xtreme Gradient Boosting algorithm. This stroke prediction research uses the Xtreme Gradient Boosting Algorithm; the application of this method with split data Training data and 70/30 test data, 70% of the training data is 3582, 30% of the test data is 1536, and the results are 96% accuracy with these results having good results. This study increase accuracy in predicting stroke cases and get better accuracy than previous studies.


Download data is not yet available.


[1] M. C. Medeiros, G. F. R. Vasconcelos, and A´ . Veiga, “ce pt us cr t,” vol. 0015, 2019.
[2] B. Hakim, “Analisa Sentimen Data Text Preprocessing pada Data Mining dengan Menggunakan Machine Learning Data Text
Pre-Processing Sentiment Analysis in Data Mining Using Machine Learning School of Computer Science and Technology ,
Harbin Institute of Technology,” vol. 4, no. 2, pp. 16–22, 2021.
[3] V. Chandani, “Komparasi Algoritma Klasifikasi Machine Learning dan Feature Selection pada Analisis Sentimen Review Film,”
vol. 1, no. 1, pp. 56–60, 2015.
[4] I. Lishania, R. Goejantoro, and Y. N. Nasution, “Perbandingan Klasifikasi Metode Naive Bayes dan Metode Decision Tree
Algoritma (J48) pada Pasien Penderita Penyakit Stroke di RSUD Abdul Wahab Sjahranie Samarinda,” Jurnal Eksponensial,
vol. 10, no. 2, pp. 135–142, 2019.
[5] H. Kamel, B. B. Navi, N. S. Parikh, A. E. Merkler, P. M. Okin, R. B. Devereux, J. W. Weinsaft, J. Kim, J. W. Cheung, L. K.
Kim, B. Casadei, C. Iadecola, M. R. Sabuncu, A. Gupta, and I. D´ıaz, “Machine Learning Prediction of Stroke Mechanism in
Embolic Strokes of Undetermined Source,” Stroke, no. September, pp. 203–210, 2020.
[6] H. Ahmed, S. F. Abd-El Ghany, E. M. Youn, N. F. Omran, and A. A. Ali, “Stroke Prediction Using Distributed Machine
Learning Based on Apache Spark,” International Journal of Advanced Science and Technology, vol. 28, no. 15, pp. 89–97,
[7] C. Colak, E. Karaman, and M. G. Turtay, “Application of Knowledge Discovery Process on The Prediction of stroke,” Computer
Methods and Programs in Biomedicine, vol. 119, no. 3, pp. 181–185, 2015.
[8] T. Liu,W. Fan, and C.Wu, “A Hybrid Machine Learning Approach to Cerebral Stroke Prediction Based on Imbalanced Medical
Dataset,” Artificial Intelligence in Medicine, vol. 101, p. 101723, 2019.
[9] S. A. J. I. Nugroho, “Naskah Publikasi Perbandingan Metode Fuzzy K-Nearest Neighbor dan Neighbor Weighted K-Nearest
Neighbor untuk Deteksi Penyakit Stroke,” 2020.
[10] R. S. Rohman, R. A. Saputra, and D. A. Firmansaha, “Komparasi Algoritma C4.5 Berbasis PSO dan GA untuk Diagnosa
Penyakit Stroke,” CESS (Journal of Computer Engineering, System and Science), vol. 5, no. 1, p. 155, 2020.
[11] H. J. Van Os, L. A. Ramos, A. Hilbert, M. Van Leeuwen, M. A. Van Walderveen, N. D. Kruyt, D. W. Dippel, E. W. Steyerberg,
I. C. Van Der Schaaf, H. F. Lingsma, W. J. Schonewille, C. B. Majoie, S. D. Olabarriaga, K. H. Zwinderman, E. Venema, H. A.
Marquering, and M. J. Wermer, “Predicting Outcome of Endovascular Treatment for Acute Ischemic Stroke: Potential Value of
Machine Learning Algorithms,” Frontiers in Neurology, vol. 9, no. SEP, pp. 1–8, 2018.
[12] A. M. Khalimi, “Dataset adalah Data untuk Data Mining,” 2020.
[13] Fedesoriano, “archive,” 2021. [Online]. Available:
[14] H. Ma’rifah, A. P. Wibawa, and M. I. Akbar, “Klasifikasi Artikel Ilmiah dengan Berbagai Skenario Preprocessing,” Sains,
Aplikasi, Komputasi dan Teknologi Informasi, vol. 2, no. 2, p. 70, 2020.
[15] A. A. Rizal and S. Soraya, “Multi Time Steps Prediction dengan Recurrent Neural Network Long Short Term Memory,” MATRIK
: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 18, no. 1, pp. 115–124, 2018.
How to Cite
Rahim, A. M. A., Sunyoto, A., & Arief, M. R. (2022). Stroke Prediction Using Machine Learning Method with Extreme Gradient Boosting Algorithm. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 21(3), 595-606.