Cluster Validity for Optimizing Classification Model: Davies Bouldin Index – Random Forest Algorithm
DOI:
https://doi.org/10.30812/matrik.v24i1.4043Keywords:
Classification, Davies bouldin index, Maternal health risk, Optimal cluster, Random forest algorithmAbstract
Several factors impact pregnant women’s health and mortality rates. The symptoms of disease in pregnant women are often similar. This makes it difficult to evaluate which factors contribute to a low, medium, or high risk of mortality among pregnant women. The purpose of this research is to generate classification rules for maternal health risk using optimal clusters. The optimal cluster is obtained from the process carried out by the validity cluster. The methods used are K-Means clustering, Davies Bouldin Index (DBI), and the Random Forest algorithm. These methods build optimum clusters from a set of k-tests to produce the best classification. Optimal clusters comprising cluster members with
strong similarities are high-dimensional data. Therefore, the Principal Component Analysis (PCA) technique is required to evaluate attribute value. The result of the research is that the best classification rule was obtained from k-tests = 22 on the 20th cluster, which has an accuracy of 97% to low, mid, and high risk. The novelty lies in using DBI for data that the Random Forest will classify. According to the research findings, the classification rules created through optimal clusters are 9.7% better than without the clustering process. This demonstrates that optimizing the data group has implications for enhancing the classification algorithm’s performance.
Downloads
References
Maternal Mortality,†vol. 13, no. 4, pp. 70–80, 2023, https://doi.org/10.9790/9622-13047080.
[2] M. Y. Al-Hindi, T. A. Al Sayari, R. Al Solami, A. K. AL Baiti, J. A. Alnemri, I. M. Mirza, A. Alattas, and Y. A. Faden,
“Association of Antenatal Risk Score With Maternal and Neonatal Mortality and Morbidity,†Cureus, vol. 12, no. 12, pp. 1–8,
2020, https://doi.org/10.7759/cureus.12230.
[3] J. Lopes, T. Guimaraes, and M. F. Santos, “Identifying Diabetic Patient Profile Through Machine Learning-Based Clustering
Analysis,†in Procedia Computer Science, vol. 220. Elsevier B.V., 2023, pp. 862–867, https://doi.org/10.1016/j.procs.2023.
03.116.
[4] A. Raza, H. U. R. Siddiqui, K. Munir, M. Almutairi, F. Rustam, and I. Ashraf, “Ensemble learning-based feature engineering
to analyze maternal health during pregnancy and health risk prediction,†PLoS ONE, vol. 17, no. 11, pp. 1–29, 2022, https:
//doi.org/10.1371/journal.pone.0276525.
[5] M. N. Islam, S. N. Mustafina, T. Mahmud, and N. I. Khan, “Machine learning to predict pregnancy outcomes: a systematic review, synthesizing framework and future research agenda,†BMC Pregnancy and Childbirth, vol. 22, no. 1, pp. 1–19, 2022,
https://doi.org/10.1186/s12884-022-04594-2.
[6] T. O. Togunwa, A. O. Babatunde, and K. U. R. Abdullah, “Deep hybrid model for maternal health risk classification in
pregnancy: synergy of ANN and random forest,†Frontiers in Artificial Intelligence, vol. 6, no. July, pp. 1–11, 2023,
https://doi.org/10.3389/frai.2023.1213436.
[7] G. J. Paul, S. A. Princy, S. Anju, S. Anita, M. C. Mary, G. Gnanavelu, K. Kanmani, M. Meena, M. Nandakumaran, S. Ramya,
G. Ravishankar, G. Shaanthi, S. Shoba, V. Sangareddi, S. Vijaya, Gomathy, Geetha, U. Rani, N. Tamil Selvi, Sarala, B. Tamil
Selvi, Prema Elizabeth, Nalina, Priyadarsene, Kasthuri, Sadhana, Sindhumathy, Sudarshini, Nazreeen, Devika, Shoba Sivakumar,
C. Umarani, R. Priya, Kaleeswari, Suganya, R. M. Shunmugam, P. Ganapathy, M. Chandran, S. Nagarajan, M. Ganesan,
A. M. Angappamudali, N. Jeyabalan, B. P. Palani, Saravanababu, K. Srinivasan, E. M. Elangovan, N. P. Mohandoss, E. Chandrasekaran,
R. R. Duraipandian, P. K. Gorijavaram, T. Kunjjitham, Ravindran, Dharmarajan, T. Kaliyamurthy, J. Sreeram,
A. Seeralan, Mangalabharathi, B. Mariappan, C. Manimaran, and E. J. Kumar, “Pregnancy outcomes in women with heart
disease: the Madras Medical College Pregnancy And Cardiac (M-PAC) Registry from India,†European Heart Journal, vol. 44,
no. 17, pp. 1530–1540, 2023, https://doi.org/10.1093/eurheartj/ehad003.
[8] A. A. Sinha and S. Rajendran, “A novel two-phase location analytics model for determining operating station locations of
emerging air taxi services,†Decision Analytics Journal, vol. 2, no. June 2021, p. 100013, 2022, https://doi.org/10.1016/j.dajour.
2021.100013.
[9] J. Yu, L. Zhu, R. Qin, Z. Zhang, L. Li, and T. Huang, “Combining k-means clustering and random forest to evaluate the gas
content of coalbed bed methane reservoirs,†Geofluids, vol. 2021, no. -, pp. 1–8, 2021, https://doi.org/10.1155/2021/9321565.
[10] A. Ultsch and J. L¨otsch, “Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans),â€
BMC Bioinformatics, vol. 23, no. 1, pp. 1–18, 2022, https://doi.org/10.1186/s12859-022-04769-w.
[11] W. Ramdhan, O. S. Sitompul, E. B. Nababan, and Sawaluddin, “A Framework for Dominant Factors Revelation of the Outbreak’s
Cause,†in 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics, DATABIA
2021 - Proceedings. IEEE, 2021, pp. 52–57, https://doi.org/10.1109/DATABIA53375.2021.9649732.
[12] K. Rodolaki, V. Pergialiotis, N. Iakovidou, T. Boutsikou, Z. Iliodromiti, and C. Kanaka-Gantenbein, “The impact of maternal
diabetes on the future health and neurodevelopment of the offspring: a review of the evidence,†Frontiers in Endocrinology,
vol. 14, no. July, pp. 1–19, 2023, https://doi.org/10.3389/fendo.2023.1125628.
[13] W. Li, “Optimization and Application of Random Forest Algorithm for Applied Mathematics Specialty,†Security and Communication
Networks, vol. 2022, no. -, pp. 1–9, 2022, https://doi.org/10.1155/2022/1131994.
[14] M. Jiang, J. Wang, L. Hu, and Z. He, “Random forest clustering for discrete sequences,†Pattern Recognition Letters, vol. 174,
no. September, pp. 145–151, 2023, https://doi.org/10.1016/j.patrec.2023.09.001.
[15] M. Savargiv, B. Masoumi, and M. R. Keyvanpour, “A new random forest algorithm based on learning automata,†Computational
Intelligence and Neuroscience, vol. 2021, no. -, pp. 1–19, 2021, https://doi.org/10.1155/2021/5572781.
[16] S. Kumar, P. Kaur, and A. Gosain, “A Comprehensive Survey on Ensemble Methods,†in 2022 IEEE 7th International conference
for Convergence in Technology, I2CT 2022, no. April, 2022, pp. 1–8, https://doi.org/10.1109/I2CT54291.2022.9825269.
[17] R. J. Janse, T. Hoekstra, K. J. Jager, C. Zoccali, G. Tripepi, F.W. Dekker, and M. Van Diepen, “Conducting correlation analysis:
Important limitations and pitfalls,†Clinical Kidney Journal, vol. 14, no. 11, pp. 2332–2337, 2021, https://doi.org/10.1093/ckj/
sfab085.
[18] A. Nobi, K. H. Tuhin, and J. W. Lee, “Application of principal component analysis on temporal evolution of COVID-19,†PLoS
ONE, vol. 16, no. 12 December, pp. 1–12, 2021, https://doi.org/10.1371/journal.pone.0260899.
[19] S. P and K. Pothuganti, “Overview on Principal Component Analysis Algorithm in Machine Learning,†@International Research
Journal of Modernization in Engineering, vol. 02, no. 10, pp. 241–246, 2020.
[20] M. Greenacre, P. J. Groenen, T. Hastie, A. I. D’Enza, A. Markos, and E. Tuzhilina, “Principal component analysis,†Nature
Reviews Methods Primers, vol. 2, no. 1, pp. 1–24, 2022, https://doi.org/10.1038/s43586-022-00184-w.
[21] G. J. Oyewole and G. A. Thopil, Data clustering: application and trends. Springer Netherlands, 2023, vol. 56, no. 7,
https://doi.org/10.1007/s10462-022-10325-y.
[22] K. A. Abbas, A. Gharavi, N. A. Hindi, M. Hassan, H. Y. Alhosin, J. Gholinezhad, H. Ghoochaninejad, H. Barati, J. Buick,
P. Yousefi, R. Alasmar, and S. Al-Saegh, “Unsupervised machine learning technique for classifying production zones in unconventional
reservoirs,†International Journal of Intelligent Networks, vol. 4, no. October 2022, pp. 29–37, 2023, https:
//doi.org/10.1016/j.ijin.2022.11.007.
[23] R. Buaton and S. Solikhun, “The Application of Numerical Measure Variations in K-Means Clustering for Grouping Data,â€
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 1, pp. 103–112, 2023, https://doi.org/
10.30812/matrik.v23i1.3269.
[24] F. Ros, R. Riad, and S. Guillaume, “PDBI: A partitioning Davies-Bouldin index for clustering evaluation,†Neurocomputing,
vol. 528, no. -, pp. 178–199, 2023, https://doi.org/10.1016/j.neucom.2023.01.043.
[25] B. Zagajewski, M. Kluczek, E. Raczko, A. Njegovec, A. Dabija, and M. Kycko, “Comparison of random forest, support vector
machines, and neural networks for post-disaster forest species mapping of the krkonoˇse/karkonosze transboundary biosphere
reserve,†Remote Sensing, vol. 13, no. 2581, pp. 1–23, 2021, https://doi.org/10.3390/rs13132581.
[26] M. Aria, C. Cuccurullo, and A. Gnasso, “A comparison among interpretative proposals for Random Forests,†Machine Learning
with Applications, vol. 6, no. April, p. 100094, 2021, https://doi.org/10.1016/j.mlwa.2021.100094.
[27] A. D. Purwanto, K. Wikantika, A. Deliar, and S. Darmawan, “Decision Tree and Random Forest Classification Algorithms
for Mangrove Forest Mapping in Sembilang National Park, Indonesia,†Remote Sensing, vol. 15, no. 16, pp. 1–31, 2023,
https://doi.org/10.3390/rs15010016.
[28] T. G. Pratama, R. Hartanto, and N. A. Setiawan, “Machine learning algorithm for improving performance on 3 AQ-screening
classification,†Communications in Science and Technology, vol. 4, no. 2, pp. 44–49, 2019, https://doi.org/10.21924/cst.4.2.
2019.118.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Achmad Rian Tarmizi, Ahmat Adil, Lilik Widyawati, Optimization of The use of Wireless Lan Devices to Minimize Operational Costs , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 2 (2020)
- Apriani Apriani, Sandi Justitia Putra, Ismarmiaty Ismarmiaty, Ni Gusti Ayu Dasriani, E-Alert Application in Facing Earthquake Disaster , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 2 (2020)
- Denny Indrajaya, Adi Setiawan, Bambang Susanto, Comparison of k-Nearest Neighbor and Naive Bayes Methods for SNP Data Classification , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Muhammad Furqan Nazuli, Muhammad Fachrurrozi, Muhammad Qurhanul Rizqie, Abdiansah Abdiansah, Muhammad Ikhsan, A Image Classification of Poisonous Plants Using the MobileNetV2 Convolutional Neural Network Model Method , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 2 (2025)
- Pungkas Subarkah, Penerapan Algoritme Klasifikasi Classification And Regression Trees (CART) Untuk Diagnosis Penyakit Diabetes Retinopathy , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 2 (2020)
- Tugiman Tugiman, Herman Herman, Anton Yudhana, The UTAUT Model for Measuring Acceptance of the Application of the Patient Registration System , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 2 (2023)
- Donny Kurniawan, Anthony Anggrawan, Hairani Hairani, Graduation Prediction System on Students Using C4.5 Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 2 (2020)
- Danang Wahyu Utomo, Christy Atika Sari, Folasade Olubusola Isinkaye, Quality Improvement for Invisible Watermarking using Singular Value Decomposition and Discrete Cosine Transform , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
- Muhammad Yusuf, Arizal Arizal, Ira Rosianal Hikmah, Implementation Cryptography and Access Control on IoT-Based Warehouse Inventory Management System , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Miftahus Sholihin, Mohd Farhan Bin Md. Fudzee, Lilik Anifah, A Novel CNN-Based Approach for Classification of Tomato Plant Diseases , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Dadang Priyanto, Bambang Krismono Triwijoyo, Deny Jollyta, Hairani Hairani, Ni Gusti Ayu Dasriani, Data Mining Earthquake Prediction with Multivariate Adaptive Regression Splines and Peak Ground Acceleration , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
- Deny Jollyta, Prihandoko Prihandoko, Dadang Priyanto, Alyauma Hajjah, Yulvia Nora Marlim, Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
.png)











