Implementation of The Extreme Gradient Boosting Algorithm with Hyperparameter Tuning in Celiac Disease Classification
Abstract
Celiac Disease (CeD) is an autoimmune disorder triggered by gluten consumption and involves the immune system and HLA in the intestine. The global incidence ranges from 0.5%-1%, with only 30% correctly diagnosed. Diagnosis remains challenging, requiring complex tests like blood tests, small bowel biopsy, and elimination of gluten from the diet. Therefore, a faster and more efficient alternative is needed. Extreme Gradient Boosting (XGBoost), an ensemble machine learning technique that utilizes decision trees to aid in the classification of Celiac disease, was used. The aim of this study was to classify patients into six classes, namely potential, atypical, silent, typical, latent and none disease, based on attributes such as blood test results, clinical symptoms and medical history. This research method employs 5-fold cross-validation to optimize parameters that are max depth, n estimator, gamma, and learning rate. Experiments were conducted 96 times to get the best combination of parameters. The results of this research are highlighted by an improvement of 0.45% above the accuracy value with the default XGBoost parameter of 98.19%. The best model was obtained in the trial with parameters max depth of 3, n estimator of 100, gamma of 0, and learning rate of 0.3 and 0.5 after modifying the parameters, yielding an accuracy rate of 98.64%, a sensitivity rate of 98.43%, and a specificity rate of 99.72%. This research shows that tuning the XGBoost parameters for Celiac
Downloads
References
Scientific Reports, vol. 12, no. 1, pp. 4071–4082, Mar. 2022, https://doi.org/10.1038/s41598-022-07199-z. [Online]. Available:
https://www.nature.com/articles/s41598-022-07199-z
[2] H. J. Van Der Fels-Klerx, N. G. E. Smits, M. G. E. G. Bremer, J. M. Schultink, M. M. Nijkamp, J. J. M. Castenmiller, and
J. H. M. De Vries, “Detection of gluten in duplicate portions to determine gluten intake of coeliac disease patients on a glutenfree
diet,” British Journal of Nutrition, vol. 125, no. 9, pp. 1051–1057, May 2021, https://doi.org/10.1017/S0007114520002974.
[Online]. Available: https://www.cambridge.org/core/product/identifier/S0007114520002974/type/journal article
[3] L. Ciacchi, H. H. Reid, and J. Rossjohn, “Structural bases of T cell antigen receptor recognition in celiac disease,” Current
Opinion in Structural Biology, vol. 74, pp. 1–12, Jun. 2022, https://doi.org/10.1016/j.sbi.2022.102349. [Online]. Available:
https://linkinghub.elsevier.com/retrieve/pii/S0959440X22000288
[4] I. K. W. Jayawardhana and I. N. B. A. Kresnapati, “Anemia Megaloblastik: Sebuah Tinjauan Pustaka,” Biocity Journal of
Pharmacy Bioscience and Clinical Community, vol. 1, no. 1, pp. 25–35, Nov. 2022, https://doi.org/10.30812/biocity.v1i1.2422.
[Online]. Available: https://journal.universitasbumigora.ac.id/index.php/biocity/article/view/2422
[5] V. Caratelli, M. Moccia, F. R. Paggioro, L. Fiore, C. Avitabile, M. Saviano, A. L. Imbriani, P. Dardano, L. De Stefano,
D. Moscone, N. A. Colabufo, I. Ghafir El Idrissi, F. Russo, G. Riezzo, G. Giannelli, and F. Arduini, “Liquid Biopsy beyond
Cancer: A miRNA Detection in Serum with Electrochemical Chip for Non-Invasive Coeliac Disease Diagnosis,” Advanced NanoBiomed Research, vol. 2, no. 9, pp. 1–15, Sep. 2022, https://doi.org/10.1002/anbr.202200015. [Online]. Available:
https://onlinelibrary.wiley.com/doi/10.1002/anbr.202200015
[6] H. Oktadiana, M. Abdullah, K. Renaldi, and N. Dyah, “Diagnosis dan Tata Laksana Penyakit Celiac,” Jurnal Penyakit
Dalam Indonesia, vol. 4, no. 3, pp. 157–162, Sep. 2017, https://doi.org/10.7454/jpdi.v4i3.131. [Online]. Available:
https://scholarhub.ui.ac.id/jpdi/vol4/iss3/9/
[7] O. A. Orjiekwe, “Nutritional Management of Celiac Disease,” Journal of Clinical & Experimental Immunology, vol. 8,
no. 2, pp. 561–572, Apr. 2023, https://doi.org/10.33140/JCEI.08.02.03. [Online]. Available: https://www.opastpublishers.com/
open-access-articles/nutritional-management-of-celiac-disease.pdf
[8] S. Diantika, H. Nalatissifa, R. Supriyadi, N. Maulidah, and A. Fauzi, “Implementasi Multi-Class Gradient Boosting
untuk Mengklasifikasikan Jenis Hewan pada Kebun Binatang,” Antivirus : Jurnal Ilmiah Teknik Informatika,
vol. 17, no. 1, pp. 33–40, Jun. 2023, https://doi.org/10.35457/antivirus.v17i1.2812. [Online]. Available: https:
//ejournal.unisbablitar.ac.id/index.php/antivirus/article/view/2812
[9] J. M. A. S. Dachi and P. Sitompul, “Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest
Ensemble Learning pada Klasifikasi Keputusan Kredit,” Jurnal Riset Rumpun Matematika dan Ilmu Pengetahuan
Alam, vol. 2, no. 2, pp. 87–103, Jul. 2023, https://doi.org/10.55606/jurrimipa.v2i2.1470. [Online]. Available:
https://prin.or.id/index.php/JURRIMIPA/article/view/1470
[10] B. Jange, “Prediksi Harga Saham Bank BCA Menggunakan XGBoost,” ARBITRASE: Journal of Economics and
Accounting, vol. 3, no. 2, pp. 231–237, Nov. 2022, https://doi.org/10.47065/arbitrase.v3i2.495. [Online]. Available:
https://djournals.com/arbitrase/article/view/495
[11] Y. Rombe, S. A. Thamrin, and A. Lawi, “Application of Adaptive Synthetic Nominal and Extreme Gradient Boosting Methods
in Determining Factors Affecting Obesity: A Case Study of Indonesian Basic Health Research Survey 2013,” Indonesian
Journal of Statistics and Its Applications, vol. 6, no. 2, pp. 309–317, Aug. 2022, https://doi.org/10.29244/ijsa.v6i2p309-317.
[Online]. Available: https://ijsa.stats.id/index.php/ijsa/article/view/877
[12] T.-T.-H. Le, Y. E. Oktian, and H. Kim, “XGBoost for Imbalanced Multiclass Classification-Based Industrial Internet of
Things Intrusion Detection Systems,” Sustainability, vol. 14, no. 14, pp. 87–96, Jul. 2022, https://doi.org/10.3390/su14148707.
[Online]. Available: https://www.mdpi.com/2071-1050/14/14/8707
[13] H. Christanto, J. Rahmad, S. H. Sinurat, D. R. Hamonangan Sitompul, A. Sitomorang, D. J. Ziegel, and E. Indra, “Analisis
Perbandingan Decision Tree, Support Vector Machine, dan XGBoost dalam Mengklasifikasi Review Hotel Trip Advisor,”
Jurnal Teknologi Informatika dan Komputer, vol. 9, no. 1, pp. 306–319, Mar. 2023, https://doi.org/10.37012/jtik.v9i1.1429.
[Online]. Available: http://journal.thamrin.ac.id/index.php/jtik/article/view/1429
[14] Y. Amelia, “Perbandingan Metode Machine Learning untuk Mendeteksi Penyakit Jantung,” IDEALIS : Indonesia Journal
Information System, vol. 6, no. 2, pp. 220–225, Jul. 2023, https://doi.org/10.36080/idealis.v6i2.3043. [Online]. Available:
https://jom.fti.budiluhur.ac.id/index.php/IDEALIS/article/view/3043
[15] M. K. Nasution, R. R. Saedudin, and V. P. Widartha, “Perbandingan Akurasi Algoritma Na¨ıve Bayes Dan Algoritma Xgboost
Pada Klasifikasi Penyakit Diabetes,” eProceedings of Engineering, vol. 8, no. 5, pp. 9765–9772, 2021. [Online]. Available:
https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15759
[16] R. Kurniawan, “Klasifikasi Tingkat Kematangan Buah Pepaya Berdasarkan Warna Kulit Menggunakan Sensor Warna
TCS3200,” Journal ICTEE, vol. 4, no. 1, pp. 1–10, Mar. 2023, https://doi.org/10.33365/jictee.v4i1.2692. [Online]. Available:
https://ejurnal.teknokrat.ac.id/index.php/ictee/article/view/2692
[17] Y. Hui, L. Shuli, L. Rongxiu, and Z. Jianyong, “Prediction of component content in rare earth extraction process based on
ESNs-Adaboost,” IFAC-PapersOnLine, vol. 51, no. 21, pp. 42–47, 2018, https://doi.org/10.1016/j.ifacol.2018.09.390. [Online].
Available: https://linkinghub.elsevier.com/retrieve/pii/S2405896318320780
[18] G. Abdurrahman and M. Sintawati, “Implementation of xgboost for classification of parkinson’s disease,” Journal of Physics:
Conference Series, vol. 1538, no. 1, p. 012024, May 2020, https://doi.org/10.1088/1742-6596/1538/1/012024. [Online].
Available: https://iopscience.iop.org/article/10.1088/1742-6596/1538/1/012024
[19] M. E. Irawati and H. Zakaria, “Classification Model for Covid-19 Detection Through Recording of Cough Using
XGboost Classifier Algorithm,” in 2021 International Symposium on Electronics and Smart Devices (ISESD).
Bandung, Indonesia: IEEE, Jun. 2021, pp. 1–5, https://doi.org/10.1109/ISESD53023.2021.9501695. [Online]. Available:
https://ieeexplore.ieee.org/document/9501695/
[20] C. Hu, L. Li, W. Huang, T. Wu, Q. Xu, J. Liu, and B. Hu, “Interpretable Machine Learning for Early Prediction of Prognosis
in Sepsis: A Discovery and Validation Study,” Infectious Diseases and Therapy, vol. 11, no. 3, pp. 1117–1132, Jun. 2022,
https://doi.org/10.1007/s40121-022-00628-6. [Online]. Available: https://link.springer.com/10.1007/s40121-022-00628-6
[21] R. Hoque, S. Das, M. Hoque, and E. Haque, “Breast Cancer Classification using XGBoost,” World Journal of Advanced
Research and Reviews, vol. 21, no. 2, pp. 1985–1994, Feb. 2024, https://doi.org/10.30574/wjarr.2024.21.2.0625. [Online].
Available: https://wjarr.com/content/breast-cancer-classification-using-xgboost
[22] A. T. R. Dani, V. Ratnasari, L. Ni’Matuzzahroh, I. C. Aviantholib, R. Novidianto, and N. Y. Adrianingsih, “Analisis
Klasifikasi Artist Music Menggunakan Model Regresi Logistik Biner dan Analisis Diskriminan,” Jambura Journal of
Probability and Statistics, vol. 3, no. 1, pp. 1–10, May 2022, https://doi.org/10.34312/jjps.v3i1.13708. [Online]. Available:
https://ejurnal.ung.ac.id/index.php/jps/article/view/13708
[23] H. Nuraliza, O. N. Pratiwi, and F. Hamami, “Analisis Sentimen IMBd Film Review Dataset Menggunakan Support Vector
Machine (SVM) dan Seleksi Feature Importance,” Jurnal Mirai Management, vol. 7, no. 1, pp. 1–17, Aug. 2022, https:
//doi.org/10.37531/mirai.v7i1.2222. [Online]. Available: https://journal.stieamkop.ac.id/index.php/mirai/article/view/2222
[24] K. Luxmi, N. K. Tiwari, and S. Ranjan, “Estimation and comparison of gabion weir oxygen mass transfer by ensemble learnings
of bagging, boosting, and stacking algorithms,” ISH Journal of Hydraulic Engineering, vol. 29, no. sup1, pp. 196–211, Dec.
2023, https://doi.org/10.1080/09715010.2023.2203109. [Online]. Available: https://doi.org/10.1080/09715010.2023.2203109
[25] P. Y. Lim, W. Y. Chin, and Y. H. Kong, “Exploring the Effectiveness of Ensemble Learning for Businessto-
Business (B2B) Demand Forecasting: An Empirical Analysis,” in 2024 3rd International Conference on
Digital Transformation and Applications (ICDXA). Kuala Lumpur, Malaysia: IEEE, Jan. 2024, pp. 226–230,
https://doi.org/10.1109/ICDXA61007.2024.10470870. [Online]. Available: https://ieeexplore.ieee.org/document/10470870/
[26] R. Sibindi, R. W. Mwangi, and A. G. Waititu, “A boosting ensemble learning based hybrid light gradient boosting machine
and extreme gradient boosting model for predicting house prices,” Engineering Reports, vol. 5, no. 4, pp. 1–19, Apr. 2023,
https://doi.org/10.1002/eng2.12599. [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1002/eng2.12599
[27] R. Siringoringo, J. Jamaluddin, and R. Perangin-angin, “Text Mining dan Klasifikasi Multi Label Menggunakan XGBoost,”
METHOMIKA Jurnal Manajemen Informatika dan Komputerisasi Akuntansi, vol. 6, no. 6, pp. 234–238, Oct. 2022,
https://doi.org/10.46880/jmika.Vol6No2.pp234-238. [Online]. Available: https://ejurnal.methodist.ac.id/index.php/methomika/
article/view/1498
[28] S. Syihabuddin Azmil Umri, “Analisis Dan Komparasi Algoritma Klasifikasi dalam Indeks Pencemaran Udara di DKI Jakarta,”
JIKO (Jurnal Informatika dan Komputer), vol. 4, no. 2, pp. 98–104, Aug. 2021, https://doi.org/10.33387/jiko.v4i2.2871.
[Online]. Available: https://ejournal.unkhair.ac.id/index.php/jiko/article/view/2871
[29] I. K. Nti, O. Nyarko-Boateng, and J. Aning, “Performance of Machine Learning Algorithms with Different K Values in K-fold
CrossValidation,” International Journal of Information Technology and Computer Science, vol. 13, no. 6, pp. 61–71, Dec. 2021,
https://doi.org/10.5815/ijitcs.2021.06.05. [Online]. Available: https://www.mecs-press.org/ijitcs/ijitcs-v13-n6/v13n6-5.html
[30] C. Bent´ejac, A. Cs¨org˝o, and G. Mart´ınez-Mu˜noz, “A comparative analysis of gradient boosting algorithms,” Artificial
Intelligence Review, vol. 54, no. 3, pp. 1937–1967, Mar. 2021, https://doi.org/10.1007/s10462-020-09896-5. [Online].
Available: https://link.springer.com/10.1007/s10462-020-09896-5
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.