Detecting Hoax News Regarding the Covid-19 Vaccine using Levenshtein Distance

  • Gilang Brilians Firmanesha Telkom University
  • Sri Suryani Prasetyowati Telkom University
  • Yuliant Sibaroni Telkom University
Keywords: Hoax Detection, Covid-19, Vaccination, Levenshtein Distance, TF-IDF


The internet is a communication tool that we often use. The internet itself has brought many benefits. However, some people misuse it, for example, individuals or a group of people who spread hoaxes or fake news to incite and lead the publics’ opinions to their desired side. When COVID-19 spread in Indonesia and the government implemented mandatory vaccine obligations, the total of hoaxes on vaccination increased rapidly. Due to a large number of hoaxes on the Internet on COVID-19 vaccinations, As for several studies on the creation of a hoax detection system with various methods to try to overcome this problem, one of the studies with a system that detects hoax news and uses several methods, one of these methods is Levenshtein, getting a fairly low-performance result of 40% compared to other methods used. Therefore. Researchers are motivated to develop a hoax detection system with a similar method by adding Feature Extraction which aims to improve system performance from the previous research. In this study, 2 main experiments were conducted using Levenshtein distance as the main classification method, the results showed the best results in experiment-2 with an f1-score of 70.2% which was an increase compared to previous studies due to adding feature extraction using tf-idf.


[1] O. D. Apuke and B. Omar, “Fake news and COVID-19: modelling the predictors of fake news sharing among social media users,” Telematics and Informatics, vol. 56, Jan. 2021, doi: 10.1016/j.tele.2020.101475.
[2] Indonesia Goverment, “Positive Covid-19 indonesia.” (accessed Nov. 01, 2021).
[3] Kuntarto and R. Widyaningsih, “Motivasi Penyebaran Berita Hoax,” Seminar Nasional Pengembangan Sumber Daya Perdesaan dan Kearifan Lokal Berkelanjutan LPPM UNSOED, pp. 209–215, Oct. 2020.
[4] Indonesia Goverment (Kominfo), “Hoax in indonesia.” (accessed Nov. 01, 2021).
[5] Y. Madani, M. Erritali, and B. Bouikhalene, “Using artificial intelligence techniques for detecting Covid-19 epidemic fake news in Moroccan tweets,” Results in Physics, vol. 25, Jun. 2021, doi: 10.1016/j.rinp.2021.104266.
[6] M. Aldwairi and A. Alwahedi, “Detecting fake news in social media networks,” in Procedia Computer Science, 2018, vol. 141, pp. 215–222. doi: 10.1016/j.procs.2018.10.171.
[7] P. Agung B, I. R. Rizal, E. Dania, S. Yosua Alvin Adi, A. M, and S. Aghus, Hoax Detection System on Indonesian News Sites Based on Text Classification using SVM and SGD. Semarang: Proc. of 2017 4th Int. Conf. on Information Tech., Computer, and Electrical Engineering (ICITACEE), 2017.
[8] Adzlan Ishak, Y.Y. Chen, and Suet-Peng Yong, “Distance-based Hoax Detection System,” 2012 International Conference on Computer & Information Science (ICCIS), p. 1132, 2012.
[9] S. Y. Yuliani, S. Y. Yuliani, S. Sahib, M. F. Abdollah, Y. S. Wijaya, and N. H. M. Yusoff, “Hoax news validation using similarity algorithms,” in Journal of Physics: Conference Series, Jun. 2020, vol. 1524, no. 1. doi: 10.1088/1742-6596/1524/1/012035.
[10] B. L. Devi, A. Soni, S. S. Kapkoti, and S. Shankar, “Fake News Detection Based on Machine Learning by using TFIDF,” International Journal of Engineering Science and Computing IJESC, 2019.
[11] T. Widaretna and J. Tirtawangsa, “Indonesian Hoax Identification on Tweets Using Doc2Vec,” Telkom University, 2021.
[12] A. Afriza and J. Adisantoso, “Metode Klasifikasi Rocchio untuk Analisis Hoax Rocchio Classification Method for Hoax Analysis”, [Online]. Available:
[13] S. García, J. Luengo, and F. Herrera, “Intelligent Systems Reference Library 72 Data Preprocessing in Data Mining,” 2015. [Online]. Available:
[14] Y. T. Zhang, L. Gong, and Y. C. Wang, “Improved TF-IDF approach for text classification,” Journal of Zhejiang University: Science, vol. 6 A, no. 1, pp. 49–55, Jan. 2005, doi: 10.1631/jzus.2005.A0049.
[15] D. Berrar, “Cross-validation,” in Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, vol. 1–3, Elsevier, 2018, pp. 542–545. doi: 10.1016/B978-0-12-809633-8.20349-X.
[16] B. P. Pratama and S. A. Pamungkas, “Analisis Kinerja Algoritma Levenshtein Distance Dalam Mendeteksi Kemiripan Dokumen Teks,” Jurnal Matematika “Log!k@,” vol. 6, no. 2, pp. 131–143, 2016.
[17] D. Winarsono, D. O. Siahaan, and U. Yuhana, “Sistem Penilaian Otomatis Kemiripan Kalimat Menggunakan Syntactic-Semantic Similarity Pada Sistem E-Learning,” Jurbal Ilmiah Kursor, vol. 5, no. 2, 2009.