Investigating Liver Disease Machine Learning Prediction Performancethrough Various Feature Selection Methods

Authors

DOI:

https://doi.org/10.30812/matrik.v24i3.4531

Keywords:

Feature Selection, Liver Disease Prediction, Ensemble Machine Learning

Abstract

Given the increasing prevalence and significant health burden of liver diseases globally, improving the accuracy of predictive models is essential for early diagnosis and effective treatment. The purpose of the study is to systematically analyze how different feature selection methods impact the performance of various machine learning classifiers for liver disease prediction. The research method involved evaluating four distinct feature selection techniques—regular, analysis of variance (ANOVA), univariate, and model-based on a suite of classifiers, including decision forest, decision tree, support vector classifier, multi-layer perceptron, and linear discriminant analysis. The result revealed a significant and variable impact of feature selection on model accuracy. Notably, the ANOVA method paired with the multi-layer perceptron achieved the highest accuracy of 0.801724, while the univariate method was optimal for the decision forest classifier (0.741379). In contrast, model-based selection often degraded performance, particularly for the decision tree classifier, likely due to the introduction of noise and overfitting. The support vector classifier, however, demonstrated robust and consistent accuracy across all selection techniques. These findings underscore that there is no universally superior feature selection method; instead, optimal predictive performance hinges on tailoring the selection technique to the specific machine learning model. This study contributes practical, evidence-based insights into the critical interplay between feature selection and model choice in medical data analysis, offering a guide for improving classification accuracy in liver disease prediction. Future work should explore more sophisticated and hybrid feature selection methods to enhance model performance further.

Downloads

Download data is not yet available.

References

[1] P.-L. Gan, S. Huang, X. Pan, H.-F. Xia, X.-Y. Zeng, W.-S. Ren, X. Zhou, M.-H. Lv, and X.-W. Tang, “Global research trends in

the field of liver cirrhosis from 2011 to 2020: A visualised and bibliometric study,” World Journal of Gastroenterology, vol. 28,

no. 33, pp. 4909–4919, Sep. 2022, https://doi.org/10.3748/wjg.v28.i33.4909.

[2] S. Cheemerla and M. Balakrishnan, “Global Epidemiology of Chronic Liver Disease,” Clinical Liver Disease, vol. 17, no. 5,

pp. 365–370, May 2021, https://doi.org/10.1002/cld.1061.

[3] R. Williams, C. Alessi, G. Alexander, M. Allison, R. Aspinall, R. L. Batterham, N. Bhala, N. Day, A. Dhawan, C. Drummond,

J. Ferguson, G. Foster, I. Gilmore, R. Goldacre, H. Gordon, C. Henn, D. Kelly, A. MacGilchrist, R. McCorry, N. McDougall,

Z. Mirza, K. Moriarty, P. Newsome, R. Pinder, S. Roberts, H. Rutter, S. Ryder, M. Samyn, K. Severi, N. Sheron, D. Thorburn,

J. Verne, J. Williams, and A. Yeoman, “New dimensions for hospital services and early detection of disease: A Review from

the Lancet Commission into liver disease in the UK,” The Lancet, vol. 397, no. 10286, pp. 1770–1780, May 2021, https:

//doi.org/10.1016/s0140-6736(20)32396-5.

[4] H. S. R. Rajula, G. Verlato, M. Manchia, N. Antonucci, and V. Fanos, “Comparison of Conventional Statistical Methods with

Machine Learning in Medicine: Diagnosis, Drug Development, and Treatment,” Medicina, vol. 56, no. 9, p. 455, Sep. 2020,

https://doi.org/10.3390/medicina56090455.

[5] M. Pourhomayoun and M. Shakibi, “Predicting mortality risk in patients with COVID-19 using machine learning to help medical

decision-making,” Smart Health, vol. 20, p. 100178, Apr. 2021, https://doi.org/10.1016/j.smhl.2020.100178.

[6] C. Giordano, M. Brennan, B. Mohamed, P. Rashidi, F. Modave, and P. Tighe, “Accessing Artificial Intelligence for Clinical

Decision-Making,” Frontiers in Digital Health, vol. 3, Jun. 2021, https://doi.org/10.3389/fdgth.2021.645232.

[7] S. M. D. A. C. Jayatilake and G. U. Ganegoda, “Involvement of Machine Learning Tools in Healthcare Decision Making,”

Journal of Healthcare Engineering, vol. 2021, pp. 1–20, Jan. 2021, https://doi.org/10.1155/2021/6679512.

[8] M. A. Kalas, L. Chavez, M. Leon, P. T. Taweesedt, and S. Surani, “Abnormal liver enzymes: A review for clinicians,” World

Journal of Hepatology, vol. 13, no. 11, pp. 1688–1698, Nov. 2021, https://doi.org/10.4254/wjh.v13.i11.1688.

[9] K. Pafili and M. Roden, “Nonalcoholic fatty liver disease (NAFLD) from pathogenesis to treatment concepts in humans,”

Molecular Metabolism, vol. 50, p. 101122, Aug. 2021, https://doi.org/10.1016/j.molmet.2020.101122.

[10] F. Mostafa, E. Hasan, M. Williamson, and H. Khan, “Statistical Machine Learning Approaches to Liver Disease Prediction,”

Livers, vol. 1, no. 4, pp. 294–312, Dec. 2021, https://doi.org/10.3390/livers1040023.

[11] A. Spann, A. Yasodhara, J. Kang, K. Watt, B. Wang, A. Goldenberg, and M. Bhat, “Applying Machine Learning in Liver

Disease and Transplantation: A Comprehensive Review,” Hepatology (Baltimore, Md.), vol. 71, no. 3, pp. 1093–1105, Mar.

2020, https://doi.org/10.1002/hep.31103.

[12] M. Ghosh, Md. Mohsin Sarker Raihan, M. Raihan, L. Akter, A. Kumar Bairagi, S. S. Alshamrani, and M. Masud, “A Comparative

Analysis of Machine Learning Algorithms to Predict Liver Disease,” Intelligent Automation & Soft Computing, vol. 30,

no. 3, pp. 917–928, 2021, https://doi.org/10.32604/iasc.2021.017989.

[13] R. Choudhary, T. Gopalakrishnan, D. Ruby, A. Gayathri, V. S. Murthy, and R. Shekhar, “An Efficient Model for Predicting

Liver Disease Using Machine Learning,” in Data Analytics in Bioinformatics, 1st ed. Wiley, Feb. 2021, pp. 443–457, https:

//doi.org/10.1002/9781119785620.ch18.

[14] J. Singh, S. Bagga, and R. Kaur, “Software-based Prediction of Liver Disease with Feature Selection and Classification Techniques,”

Procedia Computer Science, vol. 167, pp. 1970–1980, 2020, https://doi.org/10.1016/j.procs.2020.03.226.

[15] U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” Journal of King Saud University -

Computer and Information Sciences, vol. 34, no. 4, pp. 1060–1073, Apr. 2022, https://doi.org/10.1016/j.jksuci.2019.06.012.

[16] S. Alabdulwahab and B. Moon, “Feature Selection Methods Simultaneously Improve the Detection Accuracy and Model Building

Time of Machine Learning Classifiers,” Symmetry, vol. 12, no. 9, p. 1424, Aug. 2020, https://doi.org/10.3390/sym12091424.

[17] G. Kou, P. Yang, Y. Peng, F. Xiao, Y. Chen, and F. E. Alsaadi, “Evaluation of feature selection methods for text classification

with small datasets using multiple criteria decision-making methods,” Applied Soft Computing, vol. 86, p. 105836, Jan. 2020,

https://doi.org/10.1016/j.asoc.2019.105836.

[18] R.-C. Chen, C. Dewi, S.-W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine

learning methods,” Journal of Big Data, vol. 7, no. 1, Dec. 2020, https://doi.org/10.1186/s40537-020-00327-4.

[19] R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, and J. Saeed, “A Comprehensive Review of Dimensionality Reduction

Techniques for Feature Selection and Feature Extraction,” Journal of Applied Science and Technology Trends, vol. 1, no. 1, pp.

56–70, May 2020, https://doi.org/10.38094/jastt1224.

[20] A. Bommert, X. Sun, B. Bischl, J. Rahnenf¨uhrer, and M. Lang, “Benchmark for filter methods for feature selection in highdimensional

classification data,” Computational Statistics & Data Analysis, vol. 143, p. 106839, Mar. 2020, https://doi.org/10.

1016/j.csda.2019.106839.

[21] N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A Review of Feature Selection Methods for Machine

Learning-Based Disease Risk Prediction,” Frontiers in Bioinformatics, vol. 2, Jun. 2022, https://doi.org/10.3389/fbinf.2022.

927312.

[22] M. Rostami, K. Berahmand, E. Nasiri, and S. Forouzandeh, “Review of swarm intelligence-based feature selection methods,”

Engineering Applications of Artificial Intelligence, vol. 100, p. 104210, Apr. 2021, https://doi.org/10.1016/j.engappai.2021.

104210.

[23] S. Afrin, F. M. J. M. Shamrat, T. I. Nibir, M. F. Muntasim, M. S. Moharram, M. M. Imran, and M. Abdulla, “Supervised

machine learning based liver disease prediction approach with LASSO feature selection,” Bulletin of Electrical Engineering

and Informatics, vol. 10, no. 6, pp. 3369–3376, Dec. 2021, https://doi.org/10.11591/eei.v10i6.3242.

[24] J. S. Ko, J. Byun, S. Park, and J. Y.Woo, “Prediction of insufficient hepatic enhancement during the Hepatobiliary phase of Gd-

EOB DTPA-enhanced MRI using machine learning classifier and feature selection algorithms,” Abdominal Radiology, vol. 47,

no. 1, pp. 161–173, Jan. 2022, https://doi.org/10.1007/s00261-021-03308-0.

[25] N. Biswas, M. M. Ali, M. A. Rahaman, M. Islam, M. R. Mia, S. Azam, K. Ahmed, F. M. Bui, F. A. Al-Zahrani, and M. A. Moni,

“Machine Learning-Based Model to Predict Heart Disease in Early Stage Employing Different Feature Selection Techniques,”

BioMed Research International, vol. 2023, no. 1, Jan. 2023, https://doi.org/10.1155/2023/6864343.

[26] R. Spencer, F. Thabtah, N. Abdelhamid, and M. Thompson, “Exploring feature selection and classification methods for predicting

heart disease,” Dgital Healt, vol. 6, Jan. 2020, https://doi.org/10.1177/2055207620914777.

[27] A. Soni, “Performance Analysis of Classification Algorithms on Liver Disease Detection,” in 2021 IEEE Mysore Sub Section

International Conference (MysuruCon). Hassan, India: IEEE, Oct. 2021, pp. 1–5, https://doi.org/10.1109/mysurucon52639.

2021.9641684.

[28] X. Zhang, L. Yu, H. Yin, and K. K. Lai, “Integrating data augmentation and hybrid feature selection for small sample credit risk

assessment with high dimensionality,” Computers & Operations Research, vol. 146, p. 105937, Oct. 2022, https://doi.org/10.

1016/j.cor.2022.105937.

[29] O. A. Akinola, J. O. Agushaka, and A. E. Ezugwu, “Binary dwarf mongoose optimizer for solving high-dimensional feature

selection problems,” PLOS ONE, vol. 17, no. 10, p. e0274850, Oct. 2022, https://doi.org/10.1371/journal.pone.0274850.

[30] D. P. M. Abellana and D. M. Lao, “A new univariate feature selection algorithm based on the best–worst multi-attribute decisionmaking

method,” Decision Analytics Journal, vol. 7, p. 100240, Jun. 2023.

[31] M. Amiriebrahimabadi and N. Mansouri, “A comprehensive survey of feature selection techniques based on whale optimization

algorithm,” Multimedia Tools and Applications, vol. 83, no. 16, pp. 47 775–47 846, Oct. 2023, https://doi.org/10.1007/

s11042-023-17329-y.

[32] D. Theng and K. K. Bhoyar, “Feature selection techniques for machine learning: A survey of more than two decades of research,”

Knowledge and Information Systems, vol. 66, no. 3, pp. 1575–1637, Mar. 2024, https://doi.org/10.1007/s10115-023-02010-5.

[33] R. Amin, R. Yasmin, S. Ruhi, M. H. Rahman, and M. S. Reza, “Prediction of chronic liver disease patients using integrated

projection based statistical feature extraction with machine learning algorithms,” Informatics in Medicine Unlocked, vol. 36, p.

101155, 2023, https://doi.org/10.1016/j.imu.2022.101155.

[34] M. P. Behera, A. Sarangi, D. Mishra, and S. K. Sarangi, “A Hybrid Machine Learning algorithm for Heart and Liver Disease

Prediction Using Modified Particle Swarm Optimization with Support Vector Machine,” Procedia Computer Science, vol. 218,

pp. 818–827, 2023, https://doi.org/10.1016/j.procs.2023.01.062.

[35] R. K. Sachdeva, P. Bathla, P. Rani, V. Solanki, and R. Ahuja, “A systematic method for diagnosis of hepatitis disease using

machine learning,” Innovations in Systems and Software Engineering, vol. 19, no. 1, pp. 71–80, Mar. 2023, https://doi.org/10.

1007/s11334-022-00509-8.

[36] T. A. Assegie, “Support Vector Machine And K-Nearest Neighbor Based Liver Disease Classification Model,” Indonesian

Journal of electronics, electromedical engineering, and medical informatics, vol. 3, no. 1, pp. 9–14, Feb. 2021, https://doi.org/

10.35882/ijeeemi.v3i1.2.

[37] M. A. Jamil and S. Khanam, “Influence of One-Way ANOVA and Kruskal–Wallis Based Feature Ranking on the Performance of

ML Classifiers for Bearing Fault Diagnosis,” Journal of Vibration Engineering & Technologies, vol. 12, no. 3, pp. 3101–3132,

Mar. 2024, https://doi.org/10.1007/s42417-023-01036-x.

[38] J. Tripathy, R. Dash, B. K. Pattanayak, S. K. Mishra, T. K. Mishra, and D. Puthal, “Combination of Reduction Detection

Using TOPSIS for Gene Expression Data Analysis,” Big Data and Cognitive Computing, vol. 6, no. 1, p. 24, Feb. 2022,

https://doi.org/10.3390/bdcc6010024.

[39] C. Jacob, C. Gopakumar, and F. Nazarudeen, “Optimized Radiomics-Based Machine Learning Approach for Lung Cancer

Subtype Classification,” Biomedical Engineering: Applications, Basis and Communications, vol. 35, no. 05, Oct. 2023, https:

//doi.org/10.4015/s1016237223500230.

[40] A. Ganji, D. Usha, and P. Rajakumar, “Enhanced Early Diagnosis of Liver Diseases Using Feature Selection and Machine

Learning Techniques on the Indian Liver Patient Dataset,” Scalable Computing: Practice and Experience, vol. 26, no. 3, pp.

1104–1115, Apr. 2025, https://doi.org/10.12694/scpe.v26i3.4254.

[41] I. R. Hikmah and R. N. Yasa, “Perbandingan Hasil Prediksi Diagnosis pada Indian Liver Patient Dataset (ILPD) dengan Teknik

Supervised Learning Menggunakan Software Orange,” Jurnal Telematika, vol. 16, no. 2, pp. 69–76, 2021, https://doi.org/10.

61769/telematika.v16i2.402.

[42] A. Gulia, R. Vohra, and P. Rani, “Liver Patient Classification Using Intelligent Techniques,” International Journal of Computer

Science and Information Technologies, vol. 5, no. 4, pp. 5110–5115, 2014.

[43] E. O. Abiodun, A. Alabdulatif, O. I. Abiodun, M. Alawida, A. Alabdulatif, and R. S. Alkhawaldeh, “A systematic review

of emerging feature selection optimization methods for optimal text classification: The present state and prospective opportunities,”

Neural Computing and Applications, vol. 33, no. 22, pp. 15 091–15 118, Nov. 2021, https://doi.org/10.1007/

s00521-021-06406-8.

[44] H. Nasiri and S. A. Alavi, “A Novel Framework Based on Deep Learning and ANOVA Feature Selection Method for Diagnosis

of COVID-19 Cases from Chest X-Ray Images,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–11, Jan. 2022,

https://doi.org/10.1155/2022/4694567.

[45] M. Alassaf and A. M. Qamar, “Improving Sentiment Analysis of Arabic Tweets by One-way ANOVA,” Journal of King Saud

University - Computer and Information Sciences, vol. 34, no. 6, pp. 2849–2859, Jun. 2022, https://doi.org/10.1016/j.jksuci.

2020.10.023.

[46] S. Williamson, K. Vijayakumar, and V. J. Kadam, “Predicting breast cancer biopsy outcomes from BI-RADS findings using

random forests with chi-square and MI features,” Multimedia Tools and Applications, vol. 81, no. 26, pp. 36 869–36 889, Nov.

2022, https://doi.org/10.1007/s11042-021-11114-5.

[47] A. S. Sumant and D. Patil, “Ensemble Feature Subset Selection: Integration of Symmetric Uncertainty and Chi-Square techniques

with RReliefF,” Journal of The Institution of Engineers (India): Series B, vol. 103, no. 3, pp. 831–844, Jun. 2022,

https://doi.org/10.1007/s40031-021-00684-5.

[48] Y. E. Isik, Y. Gormez, Z. Aydin, and B. Bakir-Gungor, “The Determination of Distinctive Single Nucleotide Polymorphism Sets

for the Diagnosis of Behc¸et’s Disease,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 19, no. 3,

pp. 1909–1918, May 2022, https://doi.org/10.1109/tcbb.2021.3053429.

[49] J. Azimjonov and T. Kim, “Stochastic gradient descent classifier-based lightweight intrusion detection systems using the efficient

feature subsets of datasets,” Expert Systems with Applications, vol. 237, p. 121493, Mar. 2024, https://doi.org/10.1016/j.

eswa.2023.121493.

[50] P. Dhal and C. Azad, “A comprehensive survey on feature selection in the various fields of machine learning,” Applied Intelligence,

vol. 52, no. 4, pp. 4543–4581, Mar. 2022, https://doi.org/10.1007/s10489-021-02550-9.

Downloads

Published

2025-07-09

Issue

Section

Articles

How to Cite

[1]
A. Z. A. Wafi, F. P. Rochim, and V. Bezaleel, “Investigating Liver Disease Machine Learning Prediction Performancethrough Various Feature Selection Methods”, MATRIK, vol. 24, no. 3, pp. 507–520, Jul. 2025, doi: 10.30812/matrik.v24i3.4531.

Similar Articles

1-10 of 148

You may also start an advanced similarity search for this article.