Cluster Validity for Optimizing Classification Model: Davies Bouldin Index – Random Forest Algorithm
DOI:
https://doi.org/10.30812/matrik.v24i1.4043Keywords:
Classification, Davies bouldin index, Maternal health risk, Optimal cluster, Random forest algorithmAbstract
Several factors impact pregnant women’s health and mortality rates. The symptoms of disease in pregnant women are often similar. This makes it difficult to evaluate which factors contribute to a low, medium, or high risk of mortality among pregnant women. The purpose of this research is to generate classification rules for maternal health risk using optimal clusters. The optimal cluster is obtained from the process carried out by the validity cluster. The methods used are K-Means clustering, Davies Bouldin Index (DBI), and the Random Forest algorithm. These methods build optimum clusters from a set of k-tests to produce the best classification. Optimal clusters comprising cluster members with
strong similarities are high-dimensional data. Therefore, the Principal Component Analysis (PCA) technique is required to evaluate attribute value. The result of the research is that the best classification rule was obtained from k-tests = 22 on the 20th cluster, which has an accuracy of 97% to low, mid, and high risk. The novelty lies in using DBI for data that the Random Forest will classify. According to the research findings, the classification rules created through optimal clusters are 9.7% better than without the clustering process. This demonstrates that optimizing the data group has implications for enhancing the classification algorithm’s performance.
Downloads
References
Maternal Mortality,†vol. 13, no. 4, pp. 70–80, 2023, https://doi.org/10.9790/9622-13047080.
[2] M. Y. Al-Hindi, T. A. Al Sayari, R. Al Solami, A. K. AL Baiti, J. A. Alnemri, I. M. Mirza, A. Alattas, and Y. A. Faden,
“Association of Antenatal Risk Score With Maternal and Neonatal Mortality and Morbidity,†Cureus, vol. 12, no. 12, pp. 1–8,
2020, https://doi.org/10.7759/cureus.12230.
[3] J. Lopes, T. Guimaraes, and M. F. Santos, “Identifying Diabetic Patient Profile Through Machine Learning-Based Clustering
Analysis,†in Procedia Computer Science, vol. 220. Elsevier B.V., 2023, pp. 862–867, https://doi.org/10.1016/j.procs.2023.
03.116.
[4] A. Raza, H. U. R. Siddiqui, K. Munir, M. Almutairi, F. Rustam, and I. Ashraf, “Ensemble learning-based feature engineering
to analyze maternal health during pregnancy and health risk prediction,†PLoS ONE, vol. 17, no. 11, pp. 1–29, 2022, https:
//doi.org/10.1371/journal.pone.0276525.
[5] M. N. Islam, S. N. Mustafina, T. Mahmud, and N. I. Khan, “Machine learning to predict pregnancy outcomes: a systematic review, synthesizing framework and future research agenda,†BMC Pregnancy and Childbirth, vol. 22, no. 1, pp. 1–19, 2022,
https://doi.org/10.1186/s12884-022-04594-2.
[6] T. O. Togunwa, A. O. Babatunde, and K. U. R. Abdullah, “Deep hybrid model for maternal health risk classification in
pregnancy: synergy of ANN and random forest,†Frontiers in Artificial Intelligence, vol. 6, no. July, pp. 1–11, 2023,
https://doi.org/10.3389/frai.2023.1213436.
[7] G. J. Paul, S. A. Princy, S. Anju, S. Anita, M. C. Mary, G. Gnanavelu, K. Kanmani, M. Meena, M. Nandakumaran, S. Ramya,
G. Ravishankar, G. Shaanthi, S. Shoba, V. Sangareddi, S. Vijaya, Gomathy, Geetha, U. Rani, N. Tamil Selvi, Sarala, B. Tamil
Selvi, Prema Elizabeth, Nalina, Priyadarsene, Kasthuri, Sadhana, Sindhumathy, Sudarshini, Nazreeen, Devika, Shoba Sivakumar,
C. Umarani, R. Priya, Kaleeswari, Suganya, R. M. Shunmugam, P. Ganapathy, M. Chandran, S. Nagarajan, M. Ganesan,
A. M. Angappamudali, N. Jeyabalan, B. P. Palani, Saravanababu, K. Srinivasan, E. M. Elangovan, N. P. Mohandoss, E. Chandrasekaran,
R. R. Duraipandian, P. K. Gorijavaram, T. Kunjjitham, Ravindran, Dharmarajan, T. Kaliyamurthy, J. Sreeram,
A. Seeralan, Mangalabharathi, B. Mariappan, C. Manimaran, and E. J. Kumar, “Pregnancy outcomes in women with heart
disease: the Madras Medical College Pregnancy And Cardiac (M-PAC) Registry from India,†European Heart Journal, vol. 44,
no. 17, pp. 1530–1540, 2023, https://doi.org/10.1093/eurheartj/ehad003.
[8] A. A. Sinha and S. Rajendran, “A novel two-phase location analytics model for determining operating station locations of
emerging air taxi services,†Decision Analytics Journal, vol. 2, no. June 2021, p. 100013, 2022, https://doi.org/10.1016/j.dajour.
2021.100013.
[9] J. Yu, L. Zhu, R. Qin, Z. Zhang, L. Li, and T. Huang, “Combining k-means clustering and random forest to evaluate the gas
content of coalbed bed methane reservoirs,†Geofluids, vol. 2021, no. -, pp. 1–8, 2021, https://doi.org/10.1155/2021/9321565.
[10] A. Ultsch and J. L¨otsch, “Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans),â€
BMC Bioinformatics, vol. 23, no. 1, pp. 1–18, 2022, https://doi.org/10.1186/s12859-022-04769-w.
[11] W. Ramdhan, O. S. Sitompul, E. B. Nababan, and Sawaluddin, “A Framework for Dominant Factors Revelation of the Outbreak’s
Cause,†in 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics, DATABIA
2021 - Proceedings. IEEE, 2021, pp. 52–57, https://doi.org/10.1109/DATABIA53375.2021.9649732.
[12] K. Rodolaki, V. Pergialiotis, N. Iakovidou, T. Boutsikou, Z. Iliodromiti, and C. Kanaka-Gantenbein, “The impact of maternal
diabetes on the future health and neurodevelopment of the offspring: a review of the evidence,†Frontiers in Endocrinology,
vol. 14, no. July, pp. 1–19, 2023, https://doi.org/10.3389/fendo.2023.1125628.
[13] W. Li, “Optimization and Application of Random Forest Algorithm for Applied Mathematics Specialty,†Security and Communication
Networks, vol. 2022, no. -, pp. 1–9, 2022, https://doi.org/10.1155/2022/1131994.
[14] M. Jiang, J. Wang, L. Hu, and Z. He, “Random forest clustering for discrete sequences,†Pattern Recognition Letters, vol. 174,
no. September, pp. 145–151, 2023, https://doi.org/10.1016/j.patrec.2023.09.001.
[15] M. Savargiv, B. Masoumi, and M. R. Keyvanpour, “A new random forest algorithm based on learning automata,†Computational
Intelligence and Neuroscience, vol. 2021, no. -, pp. 1–19, 2021, https://doi.org/10.1155/2021/5572781.
[16] S. Kumar, P. Kaur, and A. Gosain, “A Comprehensive Survey on Ensemble Methods,†in 2022 IEEE 7th International conference
for Convergence in Technology, I2CT 2022, no. April, 2022, pp. 1–8, https://doi.org/10.1109/I2CT54291.2022.9825269.
[17] R. J. Janse, T. Hoekstra, K. J. Jager, C. Zoccali, G. Tripepi, F.W. Dekker, and M. Van Diepen, “Conducting correlation analysis:
Important limitations and pitfalls,†Clinical Kidney Journal, vol. 14, no. 11, pp. 2332–2337, 2021, https://doi.org/10.1093/ckj/
sfab085.
[18] A. Nobi, K. H. Tuhin, and J. W. Lee, “Application of principal component analysis on temporal evolution of COVID-19,†PLoS
ONE, vol. 16, no. 12 December, pp. 1–12, 2021, https://doi.org/10.1371/journal.pone.0260899.
[19] S. P and K. Pothuganti, “Overview on Principal Component Analysis Algorithm in Machine Learning,†@International Research
Journal of Modernization in Engineering, vol. 02, no. 10, pp. 241–246, 2020.
[20] M. Greenacre, P. J. Groenen, T. Hastie, A. I. D’Enza, A. Markos, and E. Tuzhilina, “Principal component analysis,†Nature
Reviews Methods Primers, vol. 2, no. 1, pp. 1–24, 2022, https://doi.org/10.1038/s43586-022-00184-w.
[21] G. J. Oyewole and G. A. Thopil, Data clustering: application and trends. Springer Netherlands, 2023, vol. 56, no. 7,
https://doi.org/10.1007/s10462-022-10325-y.
[22] K. A. Abbas, A. Gharavi, N. A. Hindi, M. Hassan, H. Y. Alhosin, J. Gholinezhad, H. Ghoochaninejad, H. Barati, J. Buick,
P. Yousefi, R. Alasmar, and S. Al-Saegh, “Unsupervised machine learning technique for classifying production zones in unconventional
reservoirs,†International Journal of Intelligent Networks, vol. 4, no. October 2022, pp. 29–37, 2023, https:
//doi.org/10.1016/j.ijin.2022.11.007.
[23] R. Buaton and S. Solikhun, “The Application of Numerical Measure Variations in K-Means Clustering for Grouping Data,â€
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 1, pp. 103–112, 2023, https://doi.org/
10.30812/matrik.v23i1.3269.
[24] F. Ros, R. Riad, and S. Guillaume, “PDBI: A partitioning Davies-Bouldin index for clustering evaluation,†Neurocomputing,
vol. 528, no. -, pp. 178–199, 2023, https://doi.org/10.1016/j.neucom.2023.01.043.
[25] B. Zagajewski, M. Kluczek, E. Raczko, A. Njegovec, A. Dabija, and M. Kycko, “Comparison of random forest, support vector
machines, and neural networks for post-disaster forest species mapping of the krkonoˇse/karkonosze transboundary biosphere
reserve,†Remote Sensing, vol. 13, no. 2581, pp. 1–23, 2021, https://doi.org/10.3390/rs13132581.
[26] M. Aria, C. Cuccurullo, and A. Gnasso, “A comparison among interpretative proposals for Random Forests,†Machine Learning
with Applications, vol. 6, no. April, p. 100094, 2021, https://doi.org/10.1016/j.mlwa.2021.100094.
[27] A. D. Purwanto, K. Wikantika, A. Deliar, and S. Darmawan, “Decision Tree and Random Forest Classification Algorithms
for Mangrove Forest Mapping in Sembilang National Park, Indonesia,†Remote Sensing, vol. 15, no. 16, pp. 1–31, 2023,
https://doi.org/10.3390/rs15010016.
[28] T. G. Pratama, R. Hartanto, and N. A. Setiawan, “Machine learning algorithm for improving performance on 3 AQ-screening
classification,†Communications in Science and Technology, vol. 4, no. 2, pp. 44–49, 2019, https://doi.org/10.21924/cst.4.2.
2019.118.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Muhammad Amirul Mukminin, Tio Dharmawan, Muhamad Arief Hidayat, Gender Classification Using Viola Jones, Orthogonal Difference Local Binary Pattern and Principal Component Analysis , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Nurahman Nurahman, Agung Purwanto, Sigit Mulyanto, Klasterisasi Sekolah Menggunakan Algoritma K-Means berdasarkan Fasilitas, Pendidik, dan Tenaga Pendidik , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Neny Sulistianingsih, Galih Hendro Martono, Enhancing Predictive Models: An In-depth Analysis of Feature Selection Techniques Coupled with Boosting Algorithms , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Ahmad Zein Al Wafi, Febry Putra Rochim, Veda Bezaleel, Investigating Liver Disease Machine Learning Prediction Performancethrough Various Feature Selection Methods , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Baiq Rima Mozarita Erdiani, Aryo Yudo Husodo, Ida Bagus Ketut Widiartha, Novel Application of K-Means Algorithm for Unique Sentiment Clustering in 2024 Korean Movie Reviews on TikTok Platform , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 2 (2025)
- Budi Sumanto, Salima Nurrahma, Comparison of Random Forest Support Vector Machine and Passive Aggressive Models on E-nose-Based Aromatic Rice Classification , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Suwardi Annas, Bobby Poerwanto, Sapriani Sapriani, Muhammad Fahmuddin S, Implementation of K-Means Clustering on Poverty Indicators in Indonesia , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Erlin Erlin, Yenny Desnelita, Nurliana Nasution, Laili Suryati, Fransiskus Zoromi, Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 3 (2022)
- Putu Tisna Putra, Anthony Anggrawan, Hairani Hairani, Comparison of Machine Learning Methods for Classifying User Satisfaction Opinions of the PeduliLindungi Application , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
- Helen Sastypratiwi, Yulianti Yulianti, Hafiz Muhardi, Desepta Isna Ulumi, Incorporating User Experience Evaluation into Application Design for Optimal Usability , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Dadang Priyanto, Bambang Krismono Triwijoyo, Deny Jollyta, Hairani Hairani, Ni Gusti Ayu Dasriani, Data Mining Earthquake Prediction with Multivariate Adaptive Regression Splines and Peak Ground Acceleration , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
- Deny Jollyta, Prihandoko Prihandoko, Dadang Priyanto, Alyauma Hajjah, Yulvia Nora Marlim, Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)