Cluster Validity for Optimizing Classification Model: Davies Bouldin Index – Random Forest Algorithm
DOI:
https://doi.org/10.30812/matrik.v24i1.4043Keywords:
Classification, Davies bouldin index, Maternal health risk, Optimal cluster, Random forest algorithmAbstract
Several factors impact pregnant women’s health and mortality rates. The symptoms of disease in pregnant women are often similar. This makes it difficult to evaluate which factors contribute to a low, medium, or high risk of mortality among pregnant women. The purpose of this research is to generate classification rules for maternal health risk using optimal clusters. The optimal cluster is obtained from the process carried out by the validity cluster. The methods used are K-Means clustering, Davies Bouldin Index (DBI), and the Random Forest algorithm. These methods build optimum clusters from a set of k-tests to produce the best classification. Optimal clusters comprising cluster members with
strong similarities are high-dimensional data. Therefore, the Principal Component Analysis (PCA) technique is required to evaluate attribute value. The result of the research is that the best classification rule was obtained from k-tests = 22 on the 20th cluster, which has an accuracy of 97% to low, mid, and high risk. The novelty lies in using DBI for data that the Random Forest will classify. According to the research findings, the classification rules created through optimal clusters are 9.7% better than without the clustering process. This demonstrates that optimizing the data group has implications for enhancing the classification algorithm’s performance.
Downloads
References
Maternal Mortality,†vol. 13, no. 4, pp. 70–80, 2023, https://doi.org/10.9790/9622-13047080.
[2] M. Y. Al-Hindi, T. A. Al Sayari, R. Al Solami, A. K. AL Baiti, J. A. Alnemri, I. M. Mirza, A. Alattas, and Y. A. Faden,
“Association of Antenatal Risk Score With Maternal and Neonatal Mortality and Morbidity,†Cureus, vol. 12, no. 12, pp. 1–8,
2020, https://doi.org/10.7759/cureus.12230.
[3] J. Lopes, T. Guimaraes, and M. F. Santos, “Identifying Diabetic Patient Profile Through Machine Learning-Based Clustering
Analysis,†in Procedia Computer Science, vol. 220. Elsevier B.V., 2023, pp. 862–867, https://doi.org/10.1016/j.procs.2023.
03.116.
[4] A. Raza, H. U. R. Siddiqui, K. Munir, M. Almutairi, F. Rustam, and I. Ashraf, “Ensemble learning-based feature engineering
to analyze maternal health during pregnancy and health risk prediction,†PLoS ONE, vol. 17, no. 11, pp. 1–29, 2022, https:
//doi.org/10.1371/journal.pone.0276525.
[5] M. N. Islam, S. N. Mustafina, T. Mahmud, and N. I. Khan, “Machine learning to predict pregnancy outcomes: a systematic review, synthesizing framework and future research agenda,†BMC Pregnancy and Childbirth, vol. 22, no. 1, pp. 1–19, 2022,
https://doi.org/10.1186/s12884-022-04594-2.
[6] T. O. Togunwa, A. O. Babatunde, and K. U. R. Abdullah, “Deep hybrid model for maternal health risk classification in
pregnancy: synergy of ANN and random forest,†Frontiers in Artificial Intelligence, vol. 6, no. July, pp. 1–11, 2023,
https://doi.org/10.3389/frai.2023.1213436.
[7] G. J. Paul, S. A. Princy, S. Anju, S. Anita, M. C. Mary, G. Gnanavelu, K. Kanmani, M. Meena, M. Nandakumaran, S. Ramya,
G. Ravishankar, G. Shaanthi, S. Shoba, V. Sangareddi, S. Vijaya, Gomathy, Geetha, U. Rani, N. Tamil Selvi, Sarala, B. Tamil
Selvi, Prema Elizabeth, Nalina, Priyadarsene, Kasthuri, Sadhana, Sindhumathy, Sudarshini, Nazreeen, Devika, Shoba Sivakumar,
C. Umarani, R. Priya, Kaleeswari, Suganya, R. M. Shunmugam, P. Ganapathy, M. Chandran, S. Nagarajan, M. Ganesan,
A. M. Angappamudali, N. Jeyabalan, B. P. Palani, Saravanababu, K. Srinivasan, E. M. Elangovan, N. P. Mohandoss, E. Chandrasekaran,
R. R. Duraipandian, P. K. Gorijavaram, T. Kunjjitham, Ravindran, Dharmarajan, T. Kaliyamurthy, J. Sreeram,
A. Seeralan, Mangalabharathi, B. Mariappan, C. Manimaran, and E. J. Kumar, “Pregnancy outcomes in women with heart
disease: the Madras Medical College Pregnancy And Cardiac (M-PAC) Registry from India,†European Heart Journal, vol. 44,
no. 17, pp. 1530–1540, 2023, https://doi.org/10.1093/eurheartj/ehad003.
[8] A. A. Sinha and S. Rajendran, “A novel two-phase location analytics model for determining operating station locations of
emerging air taxi services,†Decision Analytics Journal, vol. 2, no. June 2021, p. 100013, 2022, https://doi.org/10.1016/j.dajour.
2021.100013.
[9] J. Yu, L. Zhu, R. Qin, Z. Zhang, L. Li, and T. Huang, “Combining k-means clustering and random forest to evaluate the gas
content of coalbed bed methane reservoirs,†Geofluids, vol. 2021, no. -, pp. 1–8, 2021, https://doi.org/10.1155/2021/9321565.
[10] A. Ultsch and J. L¨otsch, “Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans),â€
BMC Bioinformatics, vol. 23, no. 1, pp. 1–18, 2022, https://doi.org/10.1186/s12859-022-04769-w.
[11] W. Ramdhan, O. S. Sitompul, E. B. Nababan, and Sawaluddin, “A Framework for Dominant Factors Revelation of the Outbreak’s
Cause,†in 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics, DATABIA
2021 - Proceedings. IEEE, 2021, pp. 52–57, https://doi.org/10.1109/DATABIA53375.2021.9649732.
[12] K. Rodolaki, V. Pergialiotis, N. Iakovidou, T. Boutsikou, Z. Iliodromiti, and C. Kanaka-Gantenbein, “The impact of maternal
diabetes on the future health and neurodevelopment of the offspring: a review of the evidence,†Frontiers in Endocrinology,
vol. 14, no. July, pp. 1–19, 2023, https://doi.org/10.3389/fendo.2023.1125628.
[13] W. Li, “Optimization and Application of Random Forest Algorithm for Applied Mathematics Specialty,†Security and Communication
Networks, vol. 2022, no. -, pp. 1–9, 2022, https://doi.org/10.1155/2022/1131994.
[14] M. Jiang, J. Wang, L. Hu, and Z. He, “Random forest clustering for discrete sequences,†Pattern Recognition Letters, vol. 174,
no. September, pp. 145–151, 2023, https://doi.org/10.1016/j.patrec.2023.09.001.
[15] M. Savargiv, B. Masoumi, and M. R. Keyvanpour, “A new random forest algorithm based on learning automata,†Computational
Intelligence and Neuroscience, vol. 2021, no. -, pp. 1–19, 2021, https://doi.org/10.1155/2021/5572781.
[16] S. Kumar, P. Kaur, and A. Gosain, “A Comprehensive Survey on Ensemble Methods,†in 2022 IEEE 7th International conference
for Convergence in Technology, I2CT 2022, no. April, 2022, pp. 1–8, https://doi.org/10.1109/I2CT54291.2022.9825269.
[17] R. J. Janse, T. Hoekstra, K. J. Jager, C. Zoccali, G. Tripepi, F.W. Dekker, and M. Van Diepen, “Conducting correlation analysis:
Important limitations and pitfalls,†Clinical Kidney Journal, vol. 14, no. 11, pp. 2332–2337, 2021, https://doi.org/10.1093/ckj/
sfab085.
[18] A. Nobi, K. H. Tuhin, and J. W. Lee, “Application of principal component analysis on temporal evolution of COVID-19,†PLoS
ONE, vol. 16, no. 12 December, pp. 1–12, 2021, https://doi.org/10.1371/journal.pone.0260899.
[19] S. P and K. Pothuganti, “Overview on Principal Component Analysis Algorithm in Machine Learning,†@International Research
Journal of Modernization in Engineering, vol. 02, no. 10, pp. 241–246, 2020.
[20] M. Greenacre, P. J. Groenen, T. Hastie, A. I. D’Enza, A. Markos, and E. Tuzhilina, “Principal component analysis,†Nature
Reviews Methods Primers, vol. 2, no. 1, pp. 1–24, 2022, https://doi.org/10.1038/s43586-022-00184-w.
[21] G. J. Oyewole and G. A. Thopil, Data clustering: application and trends. Springer Netherlands, 2023, vol. 56, no. 7,
https://doi.org/10.1007/s10462-022-10325-y.
[22] K. A. Abbas, A. Gharavi, N. A. Hindi, M. Hassan, H. Y. Alhosin, J. Gholinezhad, H. Ghoochaninejad, H. Barati, J. Buick,
P. Yousefi, R. Alasmar, and S. Al-Saegh, “Unsupervised machine learning technique for classifying production zones in unconventional
reservoirs,†International Journal of Intelligent Networks, vol. 4, no. October 2022, pp. 29–37, 2023, https:
//doi.org/10.1016/j.ijin.2022.11.007.
[23] R. Buaton and S. Solikhun, “The Application of Numerical Measure Variations in K-Means Clustering for Grouping Data,â€
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 1, pp. 103–112, 2023, https://doi.org/
10.30812/matrik.v23i1.3269.
[24] F. Ros, R. Riad, and S. Guillaume, “PDBI: A partitioning Davies-Bouldin index for clustering evaluation,†Neurocomputing,
vol. 528, no. -, pp. 178–199, 2023, https://doi.org/10.1016/j.neucom.2023.01.043.
[25] B. Zagajewski, M. Kluczek, E. Raczko, A. Njegovec, A. Dabija, and M. Kycko, “Comparison of random forest, support vector
machines, and neural networks for post-disaster forest species mapping of the krkonoˇse/karkonosze transboundary biosphere
reserve,†Remote Sensing, vol. 13, no. 2581, pp. 1–23, 2021, https://doi.org/10.3390/rs13132581.
[26] M. Aria, C. Cuccurullo, and A. Gnasso, “A comparison among interpretative proposals for Random Forests,†Machine Learning
with Applications, vol. 6, no. April, p. 100094, 2021, https://doi.org/10.1016/j.mlwa.2021.100094.
[27] A. D. Purwanto, K. Wikantika, A. Deliar, and S. Darmawan, “Decision Tree and Random Forest Classification Algorithms
for Mangrove Forest Mapping in Sembilang National Park, Indonesia,†Remote Sensing, vol. 15, no. 16, pp. 1–31, 2023,
https://doi.org/10.3390/rs15010016.
[28] T. G. Pratama, R. Hartanto, and N. A. Setiawan, “Machine learning algorithm for improving performance on 3 AQ-screening
classification,†Communications in Science and Technology, vol. 4, no. 2, pp. 44–49, 2019, https://doi.org/10.21924/cst.4.2.
2019.118.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Muhamad Azwar, Eka Nurul Qomaliyah, Nurul Indriani, Development of a Smart System for Optimizing Treatment Using Forward Chaining Method , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Vivin Nur Aziza, Utami Dyah Syafitri, Anwar Fitrianto, Optimizing Currency Circulation Forecasts in Indonesia: A Hybrid Prophet- Long Short Term Memory Model with Hyperparameter Tuning , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Siti Ummi Masruroh, Andrew Fiade, Muhammad Ikhsan Tanggok, Rizka Amalia Putri, Luigi Ajeng Pratiwi, Convolutional Neural Network for Colorization of Black and White Photos , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 2 (2023)
- Yuniar Farida, Afanin Hamidah, Silvia Kartika Sari, Lutfi Hakim, Modeling the Farmer Exchange Rate in Indonesia Using the Vector Error Correction Model Method , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Aini Suri Talita, Aristiawan Wiguna, Implementasi Algoritma Long Short-Term Memory (LSTM) Untuk Mendeteksi Ujaran Kebencian (Hate Speech) Pada Kasus Pilpres 2019 , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 1 (2019)
- Helna Wardhana, I Made Yadi Dharma, Khairan Marzuki, Ibjan Syarif Hidayatullah, Implementation of Neural Machine Translation in Translating from Indonesian to Sasak Language , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- M Safii, Rika Setiana, Population Prediction Using Multiple Regression and Geometry Models Based on Demographic Data , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Kasiyanto Kasiyanto, Aripriharta Aripriharta, Dekki Widiatmoko, Dodo Irmanto, Muhammad Cahyo Bagaskoro, Hostage Liberation Operations using Wheeled Robots Based on LIDAR (Light Detection and Ranging) Sensors , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Dedi Saputra, Haryani Haryani, Artika Surniandari, Martias Martias, Fajar Akbar, Sistem Informasi Bimbingan Tugas Akhir Mahasiswa Berbasis Website Menggunakan Metode Waterfall , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Nenny Anggraini, Zulkifli Zulkifli, Nashrul Hakiem, Development of Smart Charity Box Monitoring Robot in Mosque with Internet of Things and Firebase using Raspberry Pi , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Dadang Priyanto, Bambang Krismono Triwijoyo, Deny Jollyta, Hairani Hairani, Ni Gusti Ayu Dasriani, Data Mining Earthquake Prediction with Multivariate Adaptive Regression Splines and Peak Ground Acceleration , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
- Deny Jollyta, Prihandoko Prihandoko, Dadang Priyanto, Alyauma Hajjah, Yulvia Nora Marlim, Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
.png)











