Regional Clustering Based on Types of Non-Communicable Diseases Using k-Means Algorithm
Abstract
Noncommunicable diseases (NCDs) have become a global threat to public health, necessitating a comprehensive understanding of their geographic and epidemiological distribution in order to devise appropriate interventions. The objective of this study is to clustering areas of Banten Province based on NCDS profiles using the unsupervised learning technique. The method used in this study is the k-means algorithm for grouping types of non-communicable diseases based on region. The processing and normalisation of NCDS prevalence data from various health sources preceded cluster analysis using the k-means clustering algorithm. This research is categorised into two scenarios: the first involves the clustering of data obtained from outlier analysis, while the second scenario excludes any outliers. The objective is to observe disparities in regional clustering outcomes by categorising non-communicable diseases according to these two scenarios. The silhouette index is used to determine the validity of cluster results. These findings are analysed in depth to determine the geographic and socioeconomic patterns associated with each cluster's NCDS profile. Based on the mean silhouette index value of 0.812, the results indicate that the sum of k = 2 in the k-means algorithm is the optimal cluster result in this case. Five non-communicable diseases, namely diabetes, hypertension, obesity, stroke, and cataracts, necessitate significant focus in the first cluster (C1), where 202 regions were grouped. Six regions belong to the second cluster (C2), which includes areas that are not only susceptible to the five non-communicable diseases in cluster C1 but also to breast cancer, cervical cancer, heart disease, chronic obstructive pulmonary disease (COPD), and congenital deafness.
Downloads
References
health programA review,” Frontiers in Public Health, vol. 10, no. January, pp. 1–9, 2023. [Online]. Available:
https://www.frontiersin.org/articles/10.3389/fpubh.2022.1093170/full
[2] M. F. Owusu, J. Adu, B. A. Dortey, S. Gyamfi, and E. Martin-Yeboah, “Exploring health promotion efforts for
non-communicable disease prevention and control in Ghana,” PLOS Global Public Health, vol. 3, no. 9, pp. 1–14, 2023.
[Online]. Available: https://dx.plos.org/10.1371/journal.pgph.0002408
[3] A. Odunyemi, T. Rahman, and K. Alam, “Economic burden of non-communicable diseases on households in Nigeria: evidence
from the Nigeria living standard survey 2018-19,” BMC Public Health, vol. 23, no. 1, pp. 1–12, 2023. [Online]. Available:
https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-023-16498-7
[4] H. G. A. S. Samarasinghe, D. A. T. D. S. Ranasinghe, W. R. Jayasekara, S. A. A. D. Senarathna, J. D. P. M. Jayakody,
P. M. Kalubovila, M. D. Edirisuriya, and N. S. A. S. N. Senarath, “Barriers to Accessing Medical Services and Adherence
to Recommended Drug Regimens among Patients with Non-Communicable Diseases: A Study at Divisional Hospital
Thalangama, Sri Lanka,” in IECN 2023. MDPI, 2023, pp. 1–6. [Online]. Available: https://www.mdpi.com/2673-9976/29/1/14
[5] K. S. Maliangkay, U. Rahma, S. Putri, and N. D. Istanti, “Analisis Peran Promosi Kesehatan Dalam Mendukung Keberhasilan
Program Pencegahan Penyakit Tidak Menular Di Indonesia,” Jurnal Medika Nusantara, vol. 1, no. 2, pp. 108–122, 2023.
[6] H. B, H. Akbar, and S. Sarman, “Pencegahan Penyakit Tidak Menular Melalui Edukasi Cerdik Pada Masyarakat
Desa Moyag Kotamobagu,” Abdimas Universal, vol. 3, no. 1, pp. 83–87, 2021. [Online]. Available: http:
//abdimasuniversal.uniba-bpn.ac.id/index.php/abdimasuniversal/article/view/94
[7] R. Gupta, K. Gaur, and C. V. S. Ram, “Emerging trends in hypertension epidemiology in India,” Journal of Human
Hypertension, vol. 33, no. 8, pp. 575–587, 2019. [Online]. Available: https://www.nature.com/articles/s41371-018-0117-3
[8] C. Antza, G. Kostopoulos, S. Mostafa, K. Nirantharakumar, and A. Tahrani, “The links between sleep duration, obesity
and type 2 diabetes mellitus,” Journal of Endocrinology, vol. 252, no. 2, pp. 125–141, 2022. [Online]. Available:
https://joe.bioscientifica.com/view/journals/joe/252/2/JOE-21-0155.xml
[9] L. Wang, S. Wang, Q. Zhang, C. He, C. Fu, and Q. Wei, “The role of the gut microbiota in health
and cardiovascular diseases,” Molecular Biomedicine, vol. 3, no. 1, pp. 1–50, 2022. [Online]. Available: https:
//link.springer.com/10.1186/s43556-022-00091-2
[10] A. A. Samarraie, M. Pichette, and G. Rousseau, “Role of the Gut Microbiome in the Development of Atherosclerotic
Cardiovascular Disease,” International Journal of Molecular Sciences, vol. 24, no. 6, pp. 1–17, 2023. [Online]. Available:
https://www.mdpi.com/1422-0067/24/6/5420
[11] R. T. Chlebowski, J. Luo, G. L. Anderson, W. Barrington, K. Reding, M. S. Simon, J. E. Manson, T. E. Rohan,
J. WactawskiWende, D. Lane, H. Strickler, Y. MosaverRahmani, J. L. Freudenheim, N. Saquib, and M. L. Stefanick, “Weight
loss and breast cancer incidence in postmenopausal women,” Cancer, vol. 125, no. 2, pp. 205–212, 2019. [Online]. Available:
https://acsjournals.onlinelibrary.wiley.com/doi/10.1002/cncr.31687
[12] M. Ellingjord-Dale, S. Christakoudi, E. Weiderpass, S. Panico, L. Dossus, A. Olsen, A. Tjønneland, R. Kaaks, M. B. Schulze,
G. Masala, I. T. Gram, G. Skeie, A. H. Rosendahl, M. Sund, T. Key, P. Ferrari, M. Gunter, A. K. Heath, K. K. Tsilidis, and
E. Riboli, “Long-term weight change and risk of breast cancer in the European Prospective Investigation into Cancer and
Nutrition (EPIC) study,” International Journal of Epidemiology, vol. 50, no. 6, pp. 1914–1926, 2022. [Online]. Available:
https://academic.oup.com/ije/article/50/6/1914/6182058
[13] E. Kriˇckovi´c, T. Luki´c, and D. Jovanovi´c-Popovi´c, “Geographic Medical Overview of Noncommunicable Diseases
(Cardiovascular Diseases and Diabetes) in the Territory of the AP Vojvodina (Northern Serbia),” Healthcare, vol. 11, no. 1, pp.
1–33, 2022. [Online]. Available: https://www.mdpi.com/2227-9032/11/1/48
[14] T. B. Darikwa and S. O. Manda, “Spatial Co-Clustering of Cardiovascular Diseases and Select Risk Factors among Adults in
South Africa,” International Journal of Environmental Research and Public Health, vol. 17, no. 10, pp. 1–16, 2020. [Online].
Available: https://www.mdpi.com/1660-4601/17/10/3583
[15] D. Mpanya, T. Celik, E. Klug, and H. Ntsinjana, “Clustering of Heart Failure Phenotypes in Johannesburg
Using Unsupervised Machine Learning,” Applied Sciences, vol. 13, no. 3, pp. 1–15, 2023. [Online]. Available:
https://www.mdpi.com/2076-3417/13/3/1509
[16] L. Zhang, G. Yang, and X. Li, “Mining sequential patterns of PM2.5 pollution between 338 cities in China,” Journal of
Environmental Management, vol. 262, no. March, pp. 1–8, 2020. [Online]. Available: https://linkinghub.elsevier.com/retrieve/
pii/S0301479720302760
[17] D. Majcherek, M. A. Weresa, and C. Ciecierski, “A Cluster Analysis of Risk Factors for Cancer across EU Countries: Health
Policy Recommendations for Prevention,” International Journal of Environmental Research and Public Health, vol. 18, no. 15,
pp. 1–14, 2021. [Online]. Available: https://www.mdpi.com/1660-4601/18/15/8142
[18] M. A. Emon, A. Heinson, P.Wu, D. Domingo-Fern´andez, M. Sood, H. Vrooman, J.-C. Corvol, P. Scordis, M. Hofmann-Apitius,
and H. Fr¨ohlich, “Clustering of Alzheimer’s and Parkinson’s disease based on genetic burden of shared molecular mechanisms,”
Scientific Reports, vol. 10, no. 1, pp. 1–16, 2020. [Online]. Available: https://www.nature.com/articles/s41598-020-76200-4
[19] J. Prakash, V. Wang, R. E. Quinn, and C. S. Mitchell, “Unsupervised Machine Learning to Identify Separable
Clinical Alzheimer’s Disease Sub-Populations,” Brain Sciences, vol. 11, no. 8, pp. 1–21, 2021. [Online]. Available:
https://www.mdpi.com/2076-3425/11/8/977
[20] S. Bhattacharjee, Y.-B. Hwang, R. I. Sumon, H. Rahman, D.-W. Hyeon, D. Moon, K. S. Carole, H.-C. Kim, and H.-K.
Choi, “Cluster Analysis: Unsupervised Classification for Identifying Benign and Malignant Tumors on Whole Slide Image of
Prostate Cancer,” in 2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS). IEEE,
2022, pp. 1–5. [Online]. Available: https://ieeexplore.ieee.org/document/10052952/
[21] Y. Jiang, Z.-G. Yang, J. Wang, R. Shi, P.-L. Han, W.-L. Qian, W.-F. Yan, and Y. Li, “Unsupervised machine learning based on
clinical factors for the detection of coronary artery atherosclerosis in type 2 diabetes mellitus,” Cardiovascular Diabetology,
vol. 21, no. 1, pp. 1–10, 2022. [Online]. Available: https://cardiab.biomedcentral.com/articles/10.1186/s12933-022-01700-8
[22] G. Sarveswaran, V. Kulothungan, and P. Mathur, “Clustering of noncommunicable disease risk factors among adults (1869
years) in rural population, South-India,” Diabetes & Metabolic Syndrome: Clinical Research & Reviews, vol. 14, no. 5, pp.
1005–1014, 2020. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1871402120301624
[23] S. V. Rocha, S. C. de Oliveira, H. L. R. Munaro, C. F. R. Squarcini, B. M. P. Ferreira, F. de Oliveira Mendonc¸a, and C. A.
dos Santos, “Cluster analysis of risk factors for chronic non-communicable diseases in elderly Brazilians: population-based
cross-sectional studies in a rural town,” Research, Society and Development, vol. 10, no. 17, pp. 1–10, 2021. [Online].
Available: https://rsdjournal.org/index.php/rsd/article/view/24202
[24] R. Uddin, E.-Y. Lee, S. R. Khan, M. S. Tremblay, and A. Khan, “Clustering of lifestyle risk factors for non-communicable
diseases in 304,779 adolescents from 89 countries: A global perspective,” Preventive Medicine, vol. 131, no. December, pp.
1–8, 2020. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0091743519304384
[25] N. Nurahman, A. Purwanto, and S. Mulyanto, “Klasterisasi Sekolah Menggunakan Algoritma K-Means berdasarkan Fasilitas,
Pendidik, dan Tenaga Pendidik,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 2, pp.
337–350, 2022. [Online]. Available: https://journal.universitasbumigora.ac.id/index.php/matrik/article/view/1411
[26] H. Hairani, D. Susilowati, I. P. Lestari, K. Marzuki, and L. Z. A. Mardedi, “Segmentasi Lokasi Promosi
Penerimaan Mahasiswa Baru Menggunakan Metode RFM dan K-Means Clustering,” MATRIK : Jurnal Manajemen,
Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 2, pp. 275–282, 2022. [Online]. Available: https:
//journal.universitasbumigora.ac.id/index.php/matrik/article/view/1542
[27] S. Annas, B. Poerwanto, S. Sapriani, and M. F. S, “Implementation of K-Means Clustering on Poverty Indicators in Indonesia,”
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 2, pp. 257–266, 2022. [Online].
Available: https://journal.universitasbumigora.ac.id/index.php/matrik/article/view/1289
[28] Y. Zhao and X. Zhou, “K-means Clustering Algorithm and Its Improvement Research,” Journal of Physics: Conference Series,
vol. 1873, no. 1, pp. 1–5, 2021. [Online]. Available: https://iopscience.iop.org/article/10.1088/1742-6596/1873/1/012074
[29] M. Darwis, L. H. Hasibuan, M. Firmansyah, N. Ahady, and R. Tiaharyadini, “Implementation of K-Means
clustering algorithm in mapping the groups of graduated or dropped-out students in the Management Department
of the National University,” JISA(Jurnal Informatika dan Sains), vol. 4, no. 1, pp. 1–9, 2021. [Online]. Available:
http://trilogi.ac.id/journal/ks/index.php/JISA/article/view/848
[30] A. R. Danurisa and J. Heikal, “Customer Clustering Using the K-Means Clustering Algorithm in the Top 5 Online Marketplaces
in Indonesia,” Budapest International Research and Critics Institute-Journal (BIRCI-Journal), vol. 5, no. 3, 2022.
[31] A. Chaerudin, D. T. Murdiansyah, and M. Imrona, “Implementation of K-Means++ Algorithm for Store Customers Segmentation
Using Neo4J,” Indonesia Journal on Computing (Indo-JC), vol. 6, no. 1, pp. 53–60, 2021.
[32] A. Dudek, Silhouette Index as Clustering Evaluation Tool, 2020, pp. 19–33. [Online]. Available: http://link.springer.com/10.
1007/978-3-030-52348-0f g2
[33] M. Shutaywi and N. N. Kachouie, “Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to
Clustering,” Entropy, vol. 23, no. 6, pp. 1–17, 2021. [Online]. Available: https://www.mdpi.com/1099-4300/23/6/759
[34] R. Hidayati, A. Zubair, A. H. Pratama, and L. Indana, “Analisis Silhouette Coefficient pada 6 Perhitungan Jarak K-Means
Clustering,” Techno.Com, vol. 20, no. 2, pp. 186–197, 2021. [Online]. Available: http://publikasi.dinus.ac.id/index.php/
technoc/article/view/4556
[35] S. Paembonan and H. Abduh, “Penerapan Metode Silhouette Coefficient untuk Evaluasi Clustering Obat,” PENA TEKNIK:
Jurnal Ilmiah Ilmu-Ilmu Teknik, vol. 6, no. 2, pp. 48–54, 2021. [Online]. Available: https://ojs.unanda.ac.id/index.php/jiit/
article/view/659
[36] Y. Januzaj, E. Beqiri, and A. Luma, “Determining the Optimal Number of Clusters using Silhouette Score as a Data Mining
Technique,” International Journal of Online and Biomedical Engineering (iJOE), vol. 19, no. 04, pp. 174–182, 2023. [Online].
Available: https://online-journals.org/index.php/i-joe/article/view/37059
[37] T. Li, Y. Ma, and T. Endoh, “Normalization-Based Validity Index of Adaptive K-Means Clustering for Multi-Solution
Application,” IEEE Access, vol. 8, pp. 9403–9419, 2020. [Online]. Available: https://ieeexplore.ieee.org/document/8952702/
[38] M. Faisal, E. M. Zamzami, and Sutarman, “Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean
Distance, Canberra Distance and Manhattan Distance,” Journal of Physics: Conference Series, vol. 1566, no. 1, pp. 1–7, 2020.
[Online]. Available: https://iopscience.iop.org/article/10.1088/1742-6596/1566/1/012112
[39] H. A. Ahmed, P. J. M. Ali, A. K. Faeq, and S. M. Abdullah, “An Investigation on Disparity Responds of Machine Learning
Algorithms to Data Normalization Method,” ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, vol. 10, no. 2, pp.
29–37, 2022. [Online]. Available: https://aro.koyauniversity.org/index.php/aro/article/view/970
[40] I. Izonin, R. Tkachenko, N. Shakhovska, B. Ilchyshyn, and K. K. Singh, “A Two-Step Data Normalization Approach for
Improving Classification Accuracy in the Medical Diagnosis Domain,” Mathematics, vol. 10, no. 11, p. 1942, 2022. [Online].
Available: https://www.mdpi.com/2227-7390/10/11/1942
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.