K-Means Optimization Algorithm to Improve Cluster Quality on Sparse Data
DOI:
https://doi.org/10.30812/matrik.v23i3.3936Keywords:
Clustering, K-Means, Optimization algorithm, Sparse dataAbstract
The aim of this research is clustering sparse data using various K-Means optimization algorithms. Sparse data used in this research came from Citampi Stories game reviews on Google Play Store. This research method are Density Based Spatial Clustering of Applications with Noise-Kmeans (DB-Kmeans), Particle Swarm Optimization-Kmeans (PSO-Kmeans), and Robust Sparse Kmeans Clustering (RSKC) which are evaluated using the silhouette score. Clustering sparse data presented a challenge as it could complicate the analysis process, leading to suboptimal or non-representative results. To address this challenge, the research employed an approach that involved dividing the data based on the number of terms in three different scenarios to reduce sparsity. The results of this research showed that DB-Kmeans had the potential to enhance clustering quality across most data scenarios. Additionally, this research found that dividing data based on the number of terms could effectively mitigate sparsity, significantly influencing the optimization of topic formation within each cluster. The conclusion of this research is that this approach is effective in enhancing the quality of clustering for sparse data, providing more diverse and easily interpretable information. The results of this research could be valuable for developers seeking to understand user preferences and enhance game quality.
Downloads
References
2020, https://doi.org/10.21275/ART20203995.
[2] I. C. Chang, T. K. Yu, Y. J. Chang, and T. Y. Yu, “Applying text mining, clustering analysis, and latent dirichlet allocation
techniques for topic classification of environmental education journals,†Sustainability (Switzerland), vol. 13, no. 19, pp. 1–20,
2021, https://doi.org/10.3390/su131910856.
[3] A. Subakti, H. Murfi, and N. Hariadi, “The performance of BERT as data representation of text clustering,†Journal of Big Data,
vol. 9, no. 1, pp. 1–21, 2022, https://doi.org/10.1186/s40537-022-00564-9.
[4] H. Hassani, C. Beneki, S. Unger, M. T. Mazinani, and M. R. Yeganegi, “Text mining in big data analytics,†Big Data and
Cognitive Computing, vol. 4, no. 1, pp. 1–34, 2020, https://doi.org/10.3390/bdcc4010001.
[5] A. A. Amer and H. I. Abdalla, “A set theory based similarity measure for text clustering and classification,†Journal of Big
Data, vol. 7, no. 74, pp. 1–43, 2020, https://doi.org/10.1186/s40537-020-00344-3.
[6] X. Gao, X. Ding, T. Han, and Y. Kang, “Analysis of influencing factors on excellent teachers’ professional growth based on
DB-Kmeans method,†Eurasip Journal on Advances in Signal Processing, vol. 117, no. 1, pp. 1–11, 2022, https://doi.org/10.
1186/s13634-022-00948-2
649
[7] S. He, D. Luo, and K. Guo, “Evaluation of mineral resources carrying capacity based on the particle swarm optimization
clustering algorithm,†Journal of the Southern African Institute of Mining and Metallurgy, vol. 120, no. 12, pp. 681–691, 2020,
https://doi.org/10.17159/2411-9717/1139/2020.
[8] M. A. Hosen, S. H. Moz, S. S. Kabir, S. M. Galib, and M. N. Adnan, “Enhancing Thyroid Patient Dietary Management with
an Optimized Recommender System based on PSO and K-means,†in Procedia Computer Science, vol. 230, no. 3, 2023, pp.
688–697, https://doi.org/10.1016/j.procs.2023.12.124.
[9] M. B. Aulia and L. Kusdibyo, Analisis Persepsi Konsumen Terhadap Desain Game Buatan Indonesia Dalam Konteks Teori
Game Design, Bandung, 2021, vol. 12, no. 12.
[10] J. Qiang, Z. Qian, Y. Li, Y. Yuan, and X. Wu, “Short Text Topic Modeling Techniques, Applications, and Performance: A
Survey,†IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 3, pp. 1427–1445, 2020, https://doi.org/10.1109/
TKDE.2020.2992485.
[11] S. Yang, G. Huang, and B. Cai, “Discovering Topic Representative Terms for Short Text Clustering,†IEEE Access, vol. 7, no. 7,
pp. 92 037–92 047, 2020, https://doi.org/10.1109/ACCESS.2019.2927345.
[12] A. Hadifar, L. Sterckx, T. Demeester, and C. Develder, “A self-training approach for short text clustering,†Workshop on
Representation Learning for NLP, vol. 4, no. 8, pp. 194–199, 2020, https://doi.org/10.18653/v1/w19-4322.
[13] J. L. Balsor, K. Arbabi, D. Singh, R. Kwan, J. Zaslavsky, E. Jeyanesan, and K. M. Murphy, “A Practical Guide to Sparse
k-Means Clustering for Studying Molecular Development of the Human Brain,†Frontiers in Neuroscience, vol. 15, no. 11, pp.
1–28, 2021, https://doi.org/10.3389/fnins.2021.668293.
[14] L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, “Text Preprocessing for Text Mining in Organizational Research:
Review and Recommendations,†Organizational Research Methods, vol. 25, no. 1, pp. 114–146, 2022, https://doi.org/10.1177/
1094428120971683.
[15] M. A. Palomino and F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,†Applied Sciences
(Switzerland), vol. 12, no. 17, pp. 1–21, 2022, https://doi.org/10.3390/app12178765.
[16] R. G. Garc´ıa, B. B. A´ n, D. V. NoËœ, C. Zepeda, and R.Mart´ınez, “Comparison of Clustering Algorithms in Text Clustering Tasks,â€
Computacion y Sistemas, vol. 24, no. 2, pp. 429–437, 2020, https://doi.org/10.13053/CyS-24-2-3369.
[17] N. Nurahman, A. Purwanto, and S. Mulyanto, “Klasterisasi Sekolah Menggunakan Algoritma K-Means berdasarkan Fasilitas,
Pendidik, dan Tenaga Pendidik,†MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 2, pp.
337–350, 2022, https://doi.org/10.30812/matrik.v21i2.1411.
[18] I. G. M. S. S. Krisna, I. W. Supriana, I. D. M. B. A. Darmawan, A. Muliantara, N. A. S. ER, and L. G. Astuti, “Perbandingan
Pengelompokan Metode PSO K-Means Dan Tanpa PSO Dalam Pengelompokan Data Alert,†JELIKU (Jurnal Elektronik Ilmu
Komputer Udayana), vol. 11, no. 2, pp. 283–290, 2022, https://doi.org/10.24843/jlk.2022.v11.i02.p07.
[19] M. Shutaywi and N. N. Kachouie, “Silhouette analysis for performance evaluation in machine learning with applications to
clustering,†Entropy, vol. 23, no. 6, pp. 1–17, 2021, https://doi.org/10.3390/e23060759.
[20] S. Cao and X. Li, “Research on Disease and Pest Prediction Model Based on Sparse Clustering Algorithm,†in Procedia
Computer Science, vol. 208, no. 7, 2022, pp. 263–270, https://doi.org/10.1016/j.procs.2022.10.038.
K-
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Virdiana Sriviana Fatmawaty, Imam Riadi, Herman Herman, Higher Education Institution Clustering Based on Key Performance Indicators using Quartile Binning Method , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Dewa Ayu Kadek Pramita, Ni Wayan Sumartini Saraswati, I Putu Dedy Sandana, Poria Pirozmand, I Kadek Agus Bisena, Optimizing Hotel Room Occupancy Prediction Using an Enhanced Linear Regression Algorithms , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Raisul Azhar, ANALISA PERBANDINGAN PENERAPAN PBR DAN NON PBR PADA PROTOCOL OSPF UNTUK KONEKSI INTERNET , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 15 No. 1 (2015)
- Abulwafa Muhammad, Sarjon Defit, Analyzing the use of Social Media by Fashion Designers with K-Means and C45 , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Dyah Susilowati, Hairani Hairani, Indah Puji Lestari, Khairan Marzuki, Lalu Zazuli Azhar Mardedi, Segmentasi Lokasi Promosi Penerimaan Mahasiswa Baru Menggunakan Metode RFM dan K-Means Clustering , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Wahyu Styo Pratama, Didik Dwi Prasetya, Triyanna Widyaningtyas, Muhammad Zaki Wiryawan, Lalu Ganda Rady Putra, Tsukasa Hirashima, Performance Evaluation of Artificial Intelligence Models for Classification in Concept Map Quality Assessment , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Jaka Tirta Samudra, Rika Rosnelly, Zakarias Situmorang, Comparative Analysis of SVM and Perceptron Algorithms in Classification of Work Programs , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 2 (2023)
- Suhirman Suhirman, Shoffan Saifullah, Ahmad Tri Hidayat, Rr Hajar Puji Sejati, Otsu Method for Chicken Egg Embryo Detection based-on Increase Image Quality , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Sri Suwarno, Erick Kurniawan, Multi-Level Pooling Model for Fingerprint-Based Gender Classification , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 2 (2023)
- Winny purbaratri, Hindriyanto Dwi Purnomo, Danny Manongga, Iwan Setyawan, Hendry Hendry, Sentiment Analysis of e-Government Service Using the Naive Bayes Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
You may also start an advanced similarity search for this article.
.png)











