Cluster Validity for Optimizing Classification Model: Davies Bouldin Index – Random Forest Algorithm
DOI:
https://doi.org/10.30812/matrik.v24i1.4043Keywords:
Classification, Davies bouldin index, Maternal health risk, Optimal cluster, Random forest algorithmAbstract
Several factors impact pregnant women’s health and mortality rates. The symptoms of disease in pregnant women are often similar. This makes it difficult to evaluate which factors contribute to a low, medium, or high risk of mortality among pregnant women. The purpose of this research is to generate classification rules for maternal health risk using optimal clusters. The optimal cluster is obtained from the process carried out by the validity cluster. The methods used are K-Means clustering, Davies Bouldin Index (DBI), and the Random Forest algorithm. These methods build optimum clusters from a set of k-tests to produce the best classification. Optimal clusters comprising cluster members with
strong similarities are high-dimensional data. Therefore, the Principal Component Analysis (PCA) technique is required to evaluate attribute value. The result of the research is that the best classification rule was obtained from k-tests = 22 on the 20th cluster, which has an accuracy of 97% to low, mid, and high risk. The novelty lies in using DBI for data that the Random Forest will classify. According to the research findings, the classification rules created through optimal clusters are 9.7% better than without the clustering process. This demonstrates that optimizing the data group has implications for enhancing the classification algorithm’s performance.
Downloads
References
Maternal Mortality,†vol. 13, no. 4, pp. 70–80, 2023, https://doi.org/10.9790/9622-13047080.
[2] M. Y. Al-Hindi, T. A. Al Sayari, R. Al Solami, A. K. AL Baiti, J. A. Alnemri, I. M. Mirza, A. Alattas, and Y. A. Faden,
“Association of Antenatal Risk Score With Maternal and Neonatal Mortality and Morbidity,†Cureus, vol. 12, no. 12, pp. 1–8,
2020, https://doi.org/10.7759/cureus.12230.
[3] J. Lopes, T. Guimaraes, and M. F. Santos, “Identifying Diabetic Patient Profile Through Machine Learning-Based Clustering
Analysis,†in Procedia Computer Science, vol. 220. Elsevier B.V., 2023, pp. 862–867, https://doi.org/10.1016/j.procs.2023.
03.116.
[4] A. Raza, H. U. R. Siddiqui, K. Munir, M. Almutairi, F. Rustam, and I. Ashraf, “Ensemble learning-based feature engineering
to analyze maternal health during pregnancy and health risk prediction,†PLoS ONE, vol. 17, no. 11, pp. 1–29, 2022, https:
//doi.org/10.1371/journal.pone.0276525.
[5] M. N. Islam, S. N. Mustafina, T. Mahmud, and N. I. Khan, “Machine learning to predict pregnancy outcomes: a systematic review, synthesizing framework and future research agenda,†BMC Pregnancy and Childbirth, vol. 22, no. 1, pp. 1–19, 2022,
https://doi.org/10.1186/s12884-022-04594-2.
[6] T. O. Togunwa, A. O. Babatunde, and K. U. R. Abdullah, “Deep hybrid model for maternal health risk classification in
pregnancy: synergy of ANN and random forest,†Frontiers in Artificial Intelligence, vol. 6, no. July, pp. 1–11, 2023,
https://doi.org/10.3389/frai.2023.1213436.
[7] G. J. Paul, S. A. Princy, S. Anju, S. Anita, M. C. Mary, G. Gnanavelu, K. Kanmani, M. Meena, M. Nandakumaran, S. Ramya,
G. Ravishankar, G. Shaanthi, S. Shoba, V. Sangareddi, S. Vijaya, Gomathy, Geetha, U. Rani, N. Tamil Selvi, Sarala, B. Tamil
Selvi, Prema Elizabeth, Nalina, Priyadarsene, Kasthuri, Sadhana, Sindhumathy, Sudarshini, Nazreeen, Devika, Shoba Sivakumar,
C. Umarani, R. Priya, Kaleeswari, Suganya, R. M. Shunmugam, P. Ganapathy, M. Chandran, S. Nagarajan, M. Ganesan,
A. M. Angappamudali, N. Jeyabalan, B. P. Palani, Saravanababu, K. Srinivasan, E. M. Elangovan, N. P. Mohandoss, E. Chandrasekaran,
R. R. Duraipandian, P. K. Gorijavaram, T. Kunjjitham, Ravindran, Dharmarajan, T. Kaliyamurthy, J. Sreeram,
A. Seeralan, Mangalabharathi, B. Mariappan, C. Manimaran, and E. J. Kumar, “Pregnancy outcomes in women with heart
disease: the Madras Medical College Pregnancy And Cardiac (M-PAC) Registry from India,†European Heart Journal, vol. 44,
no. 17, pp. 1530–1540, 2023, https://doi.org/10.1093/eurheartj/ehad003.
[8] A. A. Sinha and S. Rajendran, “A novel two-phase location analytics model for determining operating station locations of
emerging air taxi services,†Decision Analytics Journal, vol. 2, no. June 2021, p. 100013, 2022, https://doi.org/10.1016/j.dajour.
2021.100013.
[9] J. Yu, L. Zhu, R. Qin, Z. Zhang, L. Li, and T. Huang, “Combining k-means clustering and random forest to evaluate the gas
content of coalbed bed methane reservoirs,†Geofluids, vol. 2021, no. -, pp. 1–8, 2021, https://doi.org/10.1155/2021/9321565.
[10] A. Ultsch and J. L¨otsch, “Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans),â€
BMC Bioinformatics, vol. 23, no. 1, pp. 1–18, 2022, https://doi.org/10.1186/s12859-022-04769-w.
[11] W. Ramdhan, O. S. Sitompul, E. B. Nababan, and Sawaluddin, “A Framework for Dominant Factors Revelation of the Outbreak’s
Cause,†in 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics, DATABIA
2021 - Proceedings. IEEE, 2021, pp. 52–57, https://doi.org/10.1109/DATABIA53375.2021.9649732.
[12] K. Rodolaki, V. Pergialiotis, N. Iakovidou, T. Boutsikou, Z. Iliodromiti, and C. Kanaka-Gantenbein, “The impact of maternal
diabetes on the future health and neurodevelopment of the offspring: a review of the evidence,†Frontiers in Endocrinology,
vol. 14, no. July, pp. 1–19, 2023, https://doi.org/10.3389/fendo.2023.1125628.
[13] W. Li, “Optimization and Application of Random Forest Algorithm for Applied Mathematics Specialty,†Security and Communication
Networks, vol. 2022, no. -, pp. 1–9, 2022, https://doi.org/10.1155/2022/1131994.
[14] M. Jiang, J. Wang, L. Hu, and Z. He, “Random forest clustering for discrete sequences,†Pattern Recognition Letters, vol. 174,
no. September, pp. 145–151, 2023, https://doi.org/10.1016/j.patrec.2023.09.001.
[15] M. Savargiv, B. Masoumi, and M. R. Keyvanpour, “A new random forest algorithm based on learning automata,†Computational
Intelligence and Neuroscience, vol. 2021, no. -, pp. 1–19, 2021, https://doi.org/10.1155/2021/5572781.
[16] S. Kumar, P. Kaur, and A. Gosain, “A Comprehensive Survey on Ensemble Methods,†in 2022 IEEE 7th International conference
for Convergence in Technology, I2CT 2022, no. April, 2022, pp. 1–8, https://doi.org/10.1109/I2CT54291.2022.9825269.
[17] R. J. Janse, T. Hoekstra, K. J. Jager, C. Zoccali, G. Tripepi, F.W. Dekker, and M. Van Diepen, “Conducting correlation analysis:
Important limitations and pitfalls,†Clinical Kidney Journal, vol. 14, no. 11, pp. 2332–2337, 2021, https://doi.org/10.1093/ckj/
sfab085.
[18] A. Nobi, K. H. Tuhin, and J. W. Lee, “Application of principal component analysis on temporal evolution of COVID-19,†PLoS
ONE, vol. 16, no. 12 December, pp. 1–12, 2021, https://doi.org/10.1371/journal.pone.0260899.
[19] S. P and K. Pothuganti, “Overview on Principal Component Analysis Algorithm in Machine Learning,†@International Research
Journal of Modernization in Engineering, vol. 02, no. 10, pp. 241–246, 2020.
[20] M. Greenacre, P. J. Groenen, T. Hastie, A. I. D’Enza, A. Markos, and E. Tuzhilina, “Principal component analysis,†Nature
Reviews Methods Primers, vol. 2, no. 1, pp. 1–24, 2022, https://doi.org/10.1038/s43586-022-00184-w.
[21] G. J. Oyewole and G. A. Thopil, Data clustering: application and trends. Springer Netherlands, 2023, vol. 56, no. 7,
https://doi.org/10.1007/s10462-022-10325-y.
[22] K. A. Abbas, A. Gharavi, N. A. Hindi, M. Hassan, H. Y. Alhosin, J. Gholinezhad, H. Ghoochaninejad, H. Barati, J. Buick,
P. Yousefi, R. Alasmar, and S. Al-Saegh, “Unsupervised machine learning technique for classifying production zones in unconventional
reservoirs,†International Journal of Intelligent Networks, vol. 4, no. October 2022, pp. 29–37, 2023, https:
//doi.org/10.1016/j.ijin.2022.11.007.
[23] R. Buaton and S. Solikhun, “The Application of Numerical Measure Variations in K-Means Clustering for Grouping Data,â€
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 1, pp. 103–112, 2023, https://doi.org/
10.30812/matrik.v23i1.3269.
[24] F. Ros, R. Riad, and S. Guillaume, “PDBI: A partitioning Davies-Bouldin index for clustering evaluation,†Neurocomputing,
vol. 528, no. -, pp. 178–199, 2023, https://doi.org/10.1016/j.neucom.2023.01.043.
[25] B. Zagajewski, M. Kluczek, E. Raczko, A. Njegovec, A. Dabija, and M. Kycko, “Comparison of random forest, support vector
machines, and neural networks for post-disaster forest species mapping of the krkonoˇse/karkonosze transboundary biosphere
reserve,†Remote Sensing, vol. 13, no. 2581, pp. 1–23, 2021, https://doi.org/10.3390/rs13132581.
[26] M. Aria, C. Cuccurullo, and A. Gnasso, “A comparison among interpretative proposals for Random Forests,†Machine Learning
with Applications, vol. 6, no. April, p. 100094, 2021, https://doi.org/10.1016/j.mlwa.2021.100094.
[27] A. D. Purwanto, K. Wikantika, A. Deliar, and S. Darmawan, “Decision Tree and Random Forest Classification Algorithms
for Mangrove Forest Mapping in Sembilang National Park, Indonesia,†Remote Sensing, vol. 15, no. 16, pp. 1–31, 2023,
https://doi.org/10.3390/rs15010016.
[28] T. G. Pratama, R. Hartanto, and N. A. Setiawan, “Machine learning algorithm for improving performance on 3 AQ-screening
classification,†Communications in Science and Technology, vol. 4, no. 2, pp. 44–49, 2019, https://doi.org/10.21924/cst.4.2.
2019.118.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Saiful Nur Arif, Muhammad Dahria, Sarjon Defit, Dicky Novriansyah, Ali Ikhwan, Implementation of Single Linked on Machine Learning for Clustering Student Scientific Fields , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Munirul Ula, Veri Ilhadi, Zailani Mohamed Sidek, Comparing Long Short-Term Memory and Random Forest Accuracy for Bitcoin Price Forecasting , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Edi Ismanto, Januar Al Amien, Vitriani Vitriani, A Comparison of Enhanced Ensemble Learning Techniques for Internet of Things Network Attack Detection , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Fiby Nur Afiana, Pungkas Subarkah, A. Kholil Hidayat, Analisis Perbandingan Metode TAM dan Metode UTAUT 2 dalam Mengukur Kesuksesan Penerapan SIMRS pada Rumah Sakit Wijaya Kusuma DKT Purwokerto , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 1 (2019)
- Muchlis Nurseno, Umar Aditiawarman, Haris Al Qodri Maarif, Teddy Mantoro, Detecting Hidden Illegal Online Gambling on .go.id Domains Using Web Scraping Algorithms , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Ni Wayan Sumartini Saraswati, I Wayan Dharma Suryawan, Ni Komang Tri Juniartini, I Dewa Made Krishna Muku, Poria Pirozmand, Weizhi Song, Recognizing Pneumonia Infection in Chest X-Ray Using Deep Learning , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Annisa’ul Mubarokah, Rita Ambarwati, Dedy Dedy, Mashhura Toirхonovna Alimova, Unsafe Conditions Identification Using Social Networks in Power Plant Safety Reports , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Yully Sofyah Waode, Anang Kurnia, Yenni Angraini, K-Means Optimization Algorithm to Improve Cluster Quality on Sparse Data , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Frans Mikael Sinaga, Sio Jurnalis Pipin, Sunaryo Winardi, Karina Mannita Tarigan, Ananda Putra Brahmana, Analyzing Sentiment with Self-Organizing Map and Long Short-Term Memory Algorithms , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Roudlotul Jannah Alfirdausy, Nurissaidah Ulinnuha, Wika Dianita Utami, Implementation of The Extreme Gradient Boosting Algorithm with Hyperparameter Tuning in Celiac Disease Classification , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Dadang Priyanto, Bambang Krismono Triwijoyo, Deny Jollyta, Hairani Hairani, Ni Gusti Ayu Dasriani, Data Mining Earthquake Prediction with Multivariate Adaptive Regression Splines and Peak Ground Acceleration , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
- Deny Jollyta, Prihandoko Prihandoko, Dadang Priyanto, Alyauma Hajjah, Yulvia Nora Marlim, Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
.png)











