Novel Application of K-Means Algorithm for Unique Sentiment Clustering in 2024 Korean Movie Reviews on TikTok Platform
Abstract
In recent years, social media has become one of the main factors influencing public perception of films. As a rapidly growing video-sharing platform, TikTok plays a crucial role in shaping audience opinions through comments, short reviews, and user discussions. This phenomenon is increasingly relevant in the Korean film industry, attracting global attention with its diverse genres and engaging narratives. However, a deep understanding of how audiences respond to films based on genre remains limited, especially in the dynamic context of social media. Therefore, this study aims to analyze audience sentiment toward Korean films released in 2024 on TikTok, focusing on sentiment distribution across four main genres: comedy, romance, action, and fun stories. The research methodology includes data collection through web crawling on TikTok, followed by text preprocessing and feature extraction using IndoBERT. Sentiment classification uses SentimentIntensityAnalyzer to categorize comments into positive, negative, or neutral. Since the dataset consists of unlabeled text, K-Means clustering is employed to identify sentiment groupings, with validation using principal component analysis to ensure cluster quality. The findings indicate that the romance and comedy genres are predominantly associated with neutral sentiment, reaching 89.6% and 87.4%, respectively. In contrast, the action genre exhibits higher sentiment polarization, with 14.9% positive and 24.7% negative sentiment. The fun story genre shows a more evenly distributed sentiment pattern. The main challenges include determining the optimal number of clusters and addressing imbalanced sentiment distribution across genres. This study provides valuable insights for filmmakers and marketers to understand audience reactions on social media better, enabling more targeted promotional strategies. Additionally, it contributes to the literature on sentiment analysis in the film industry, emphasizing the importance of genre-specific audience reception patterns for future research.
Downloads
References
108–120, https://doi.org/10.32509/wacana.v22i1.2671.
[2] F. T. Laily and A. P. Purbantina, “Digitalisasi Industri Perfilman Korea Selatan melalui Netflix sebagai Alternatif Pasar Ekspor
Film,” vol. 4, no. 2, p. 141, https://doi.org/10.33021/exp.v4i2.1494.
[3] S. V. Mahardhika, I. Nurjannah, I. I. Ma’una, and Z. Islamiyah, “Faktor-Faktor Penyebab Tingginya Minat
Generasi Post-Millenial di Indonesia terhadap Penggunaan Aplikasi TikTok,” vol. 2, no. 1, pp. 40–53, https:
//doi.org/10.26740/sosearch.v2n1.p40-53.
[4] P. S. Rahmadani, F. C. Tampubolon, A. N. Jannah, N. L. H. Hutabarat, and A. M. Simarmata, “Tiktok Social Media Sentiment
Analysis Using the Nave Bayes Classifier Algorithm,” Sinkron: jurnal dan penelitian teknik informatika, vol. 6, no. 3, pp.
995–999, 2022, https://doi.org/10.33395/sinkron.v7i3.11579.
[5] J. C. Setiawan, K. M. Lhaksmana, and B. Bunyamin, “Sentiment Analysis of Indonesian TikTok Review Using LSTM and
IndoBERTweet Algorithm,” vol. 8, no. 3, pp. 774–780, https://doi.org/10.29100/jipi.v8i3.3911.
[6] E. Apriani, F. Oktavianalisti, L. D. H. Monasari, I. Winarni, and I. F. Hanif, “Analisis Sentimen Penggunaan TikTok Sebagai
Media Pembelajaran Menggunakan Algoritma Na¨ıve Bayes Classifier: Sentiment Analysis of Using TikTok as a Learning
Media Using the Na¨ıve Bayes Classifiers Algorithm,” vol. 4, no. 3, pp. 1160–1168, https://doi.org/10.57152/malcom.v4i3.1482.
[7] S. Jung, D. Murthy, B. S. Bateineh, A. Loukas, and A. V.Wilkinson, “The Normalization of Vaping on TikTok Using Computer
Vision, Natural Language Processing, and Qualitative Thematic Analysis: Mixed Methods Study,” vol. 26,December, p.
e55591, https://doi.org/10.2196/55591.
[8] C. Chen, B. Xu, J.-H. Yang, and M. Liu, “Sentiment Analysis of Animated Film Reviews Using Intelligent Machine Learning,”
vol. 2022, July, pp. 1–8, https://doi.org/10.1155/2022/8517205.
[9] R. Merdiansah, S. Siska, and A. A. Ridha, “Analisis Sentimen Pengguna X Indonesia Terkait Kendaraan Listrik Menggunakan
IndoBERT,” vol. 7, no. 1, pp. 221–228, https://doi.org/10.55338/jikomsi.v7i1.2895.
[10] D. Abimanyu, E. Budianita, E. P. Cynthia, F. Yanto, and Y. Yusra, “Analisis Sentimen Akun Twitter Apex Legends
Menggunakan VADER,” vol. 5, no. 3, pp. 423–431, https://doi.org/10.32672/jnkti.v5i3.4382.
[11] N. A. Maori and E. Evanita, “Metode Elbow dalam Optimasi Jumlah Cluster pada K-Means Clustering,” vol. 14, no. 2, pp.
277–288, https://doi.org/10.24176/simet.v14i2.9630.
[12] L. Efrizoni, S. Defit, and M. Tajuddin, “Hybrid Modeling to Classify and Detect Outliers on Multilabel Dataset based on
Content and Context,” vol. 13, no. 12, pp. 550–559, 2022/34/30, https://doi.org/10.14569/IJACSA.2022.0131267.
[13] S. Armand, M. Hafid T, and M. Rafi Muttaqin, “Analisis Sentimen Sistem E-tilang pada Platform Twitter Menggunakan
Metode Na¨ıve Bayes,” vol. 7, no. 3, pp. 1989–1994, https://doi.org/10.36040/jati.v7i3.7023.
[14] D. Khyani, B. S. Siddhartha, N. M. Niveditha, B. M. Divya, and Y. M. Manu, “An Interpretation of Lemmatization and Stemming
in Natural Language Processing,” vol. 22, no. 10, pp. 350–357, https://www.researchgate.net/publication/348306833.
[15] R. Rinandyaswara, Y. A. Sari, and M. T. Furqon, “Pembentukan Daftar Stopword Menggunakan Term Based Random
Sampling Pada Analisis Sentimen Dengan Metode Na¨ıve Bayes (Studi Kasus: Kuliah Daring Di Masa Pandemi),” vol. 9, no. 4,
p. 717, https://doi.org/10.25126/jtiik.2022934707.
[16] Febriyanto A, D. S. S. Anggie, and I. Mulyadi, “Penerapan Algoritma K-Means terhadap Evaluasi Website E-commerce,”
vol. 3, no. 12, pp. 12–20, https://doi.org/10.59003/nhj.v3i12.1124.
[17] A. B. Saputra, P. W. Cahyo, M. Habibi, and A. Priadana, “Analysis and Visualization of BPJS on Twitter Using K-Means
Clustering,” vol. 3, no. 3, pp. 109–117, https://doi.org/10.31101/ijhst.v3i3.2466.
[18] D. Puspita and R. Syahri, “Penerapan Metode K-Means Clustering Untuk Pengelompokan Potensi Padi di Kota Pagar Alam,”
JATI (Jurnal Mahasiswa Teknik Informatika), vol. 8, no. 2, pp. 2187–2193, 2024, https://doi.org/10.36040/jati.v8i2.9432.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.