Hate Speech Detection for Banjarese Languages on Instagram Using Machine Learning Methods

  • Muhammad Alkaff Universitas Lambung Mangkurat, Banjarmasin, Indonesia
  • Muhammad Afrizal Miqdad Universitas Lambung Mangkurat, Banjarmasin, Indonesia
  • Muhammad Fachrurrazi Universitas Lambung Mangkurat, Banjarmasin, Indonesia
  • Muhammad Nur Abdi Universitas Lambung Mangkurat, Banjarmasin, Indonesia
  • Ahmad Zainul Abidin Universitas Lambung Mangkurat
  • Raisa Amalia Universitas Lambung Mangkurat, Banjarmasin, Indonesia
Keywords: Banjarese language, Dataset, Hate Speech Detection, Instagram, Machine Learning

Abstract

Hate speech refers to verbal expression or communication that aims to provoke or discriminate against individuals. The Ministry of Communication and Information of Indonesia has encountered and dealt with 3,640 cases of hate speech transmitted through digital channels between 2018 and 2021. Particularly in South Kalimantan, hate speech in the local language, Banjarese has become increasingly prevalent in recent years. Surprisingly, there is a lack of research on using machine learning to detect hate speech in the Banjarese language, specifically on Instagram. Therefore, this study aimed to address this gap by constructing a dataset of Banjarese language hate speech and comparing various feature extraction and machine learning models to detect Banjarese language hate speech effectively. This
research used several feature extraction techniques and machine learning methods to detect Banjarese
language hate speech. The feature extraction methods used were Word N-Gram, Term Frequency- Inverse Document Frequency (TF-IDF), a combination of Word N-Gram and TF-IDF, Word2Vec, and Glove, while the machine learning methods used were Support Vector Machine (SVM), Na¨ıve Bayes, and Decision Tree. The results of this study revealed that the combination of TF-IDF for feature extraction and SVM as the model achieves exceptional performance. The average Recall, Precision, Accuracy, and F1-Score score exceeded 90%, demonstrating the model’s ability to identify Banjarese hate speech accurately.

Downloads

Download data is not yet available.

References

[1] F. E. Ayo, O. Folorunso, F. T. Ibharalu, and I. A. Osinuga, “Machine Learning Techniques for Hate Speech Classification of
Twitter Data: State-Of-The-Art, Future Challenges and Research Directions,” Computer Science Review, vol. 38, p. 100311,
2020.
[2] G. H. Martono, A. Azhari, and K. Mustofa, “An Extended Approach of Weight Collective Influence Graph for Detection
Influence Actor,” International Journal of Advances in Intelligent Informatics, vol. 8, no. 1, pp. 1–11, mar 2022.
[3] M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” in Proceedings of
the Third Workshop on Abusive Language Online. Stroudsburg: Association for Computational Linguistics, 2019, pp. 46–57.
[4] N. S. Mullah andW. M. N.W. Zainon, “Advances in Machine Learning Algorithms for Hate Speech Detection in Social Media:
A Review,” IEEE Access, vol. 9, pp. 88 364–88 376, 2021.
[5] C. E. Rudy Salim and D. Suhartono, “A Systematic Literature Review of Different Machine Learning Methods on Hate Speech
Detection,” JOIV : International Journal on Informatics Visualization, vol. 4, no. 4, pp. 213–218, dec 2020.
[6] A. Olteanu, C. Castillo, J. Boy, and K. Varshney, “The Effect of Extremist Violence on Hateful Speech Online,” in Proceedings
of the international AAAI conference on web and social media, 2018, pp. 1–10.
[7] S. Ghosal and A. Jain, “Research Journey of Hate Content Detection From Cyberspace,” 2021, pp. 200–225.
[8] M. R. Awal, R. K.-W. Lee, E. Tanwar, T. Garg, and T. Chakraborty, “Model-Agnostic Meta-Learning for Multilingual Hate
Speech Detection,” IEEE Transactions on Computational Social Systems, pp. 1–10, 2023.
[9] J. Li and Y. Ning, “Anti-Asian Hate Speech Detection via Data Augmented Semantic Relation Inference,” in Proceedings of the
International AAAI Conference on Web and Social Media, 2022, pp. 607–617.
[10] F. T. Boishakhi, P. C. Shill, and M. G. R. Alam, “Multi-modal Hate Speech Detection using Machine Learning,” in 2021 IEEE
International Conference on Big Data (Big Data). IEEE, dec 2021, pp. 4496–4499.
[11] C. Erico, “Long Short-Term Memory Approach For Hate Speech and Abusive Language Detection on Indonesian Youtube
Comment Section,” Ph.D. dissertation, 2021.
[12] N. Deshpande, N. Farris, and V. Kumar, “Highly Generalizable Models for Multilingual Hate Speech Detection,” CoRR, 2022.
[13] M. Mozafari, R. Farahbakhsh, and N. Crespi, “Cross-Lingual Few-Shot Hate Speech and Offensive Language Detection Using
Meta Learning,” IEEE Access, vol. 10, pp. 14 880–14 896, 2022.
[14] Y. Li, K. Bontcheva, and H. Cunningham, “Adapting SVM for Natural Language Learning: A Case Study Involving Information
Extraction,” pp. 1–25, 2006.
[15] B. AlBadani, R. Shi, and J. Dong, “A Novel Machine Learning Approach for Sentiment Analysis on Twitter Incorporating the
Universal Language Model Fine-Tuning and SVM,” Applied System Innovation, vol. 5, no. 1, p. 13, 2022.
[16] D. Jurafsky and J. Martin, “Naive Bayes and Sentiment Classification,” Speech and Language Processing, p. 1024, 2019.
[17] M. Bansal, A. Goyal, and A. Choudhary, “A Comparative Analysis of K-Nearest Neighbour, Genetic, Support Vector Machine,
Decision Tree, and Long Short Term Memory Algorithms in Machine Learning,” Decision Analytics Journal, p. 100071, 2022.
[18] U. A. N. Rohmawati, S. W. Sihwi, and D. E. Cahyani, “SEMAR: An interface for Indonesian hate speech detection using
machine learning,” 2018 International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2018,
no. 1, pp. 646–651, 2018.
[19] S. Cokrowibowo and N. Zulkarnaim, “Online News Analysis of Majene Public Figure ElectabilityWith NLP (Natural Language
Processing),” in IOP Conference Series: Materials Science and Engineering, vol. 875, no. 1. IOP Publishing, 2020, p. 12092.
[20] S. Sarica and J. Luo, “Stopwords in Technical Language Processing,” PLoS ONE, vol. 16, no. 8 August, pp. 1–13, 2021.
[21] A. Garlapati, N. Malisetty, and G. Narayanan, “Classification of Toxicity in Comments using NLP and LSTM,” in 2022 8th
International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1. IEEE, 2022, pp. 16–21.
[22] K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text Classification Algorithms: A Survey,”
Information (Switzerland), vol. 10, no. 4, pp. 1–68, 2019.
[23] B. Pahwa, S. Taruna, and N. Kasliwal, “Sentiment Analysis- Strategy for Text Pre-Processing,” International Journal of Computer
Applications, vol. 180, no. 34, pp. 15–18, 2018.
[24] D. Z. Abidin, S. Nurmaini, R. F. Malik, E. Rasywir, and Y. Pratama, “A Model of Preprocessing for Social Media Data
Extraction,” in 2019 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS). IEEE,
2019, pp. 67–72.
[25] N. I. Pratiwi, I. Budi, and M. A. Jiwanggi, “Hate Speech Identification Using the Hate Codes for Indonesian Tweets,” in
PervasiveHealth: Pervasive Computing Technologies for Healthcare. ICST, jul 2019, pp. 128–133.
[26] S. Abro, S. Shaikh, Z. Ali, S. Khan, G. Mujtaba, and Z. H. Khand, “Automatic Hate Speech Detection Using Machine Learning:
A Comparative Study,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 8, pp. 484–491,
2020.
[27] A. Alrehili, “Automatic Hate Speech Detection on Social Media: A Brief Survey,” Proceedings of IEEE/ACS International
Conference on Computer Systems and Applications, AICCSA, vol. 2019-Novem, pp. 1–6, 2019.
[28] H. Zhou, “Research of Text Classification Based on TF-IDF and CNN-LSTM,” in Journal of Physics: Conference Series, vol.
2171, no. 1. IOP Publishing, 2022, p. 12021.
[29] S. Styawati, A. Nurkholis, A. A. Aldino, S. Samsugi, E. Suryati, and R. P. Cahyono, “Sentiment Analysis on Online Transportation
Reviews Using Word2vec Text Embedding Model Feature Extraction and Support Vector Machine (Svm) Algorithm,” in
2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE). IEEE, 2022, pp. 163–167.
[30] J. Pennington, R. Socher, and C. Manning, Glove: Global Vectors for Word Representation, jan 2014, vol. 14.
[31] N. Aulia and I. Budi, “Hate Speech Detection on Indonesian Long Text Documents Using Machine Learning Approach,” in
PervasiveHealth: Pervasive Computing Technologies for Healthcare. ICST, apr 2019, pp. 164–169.
[32] E. M. Dharma, F. L. Gaol, H. Warnars, and B. Soewito, “The Accuracy Comparison Among Word2vec, Glove, and Fasttext
Towards Convolution Neural Network (CNN) Text Classification,” Journal of Theoretical and Applied Information Technology,
vol. 100, no. 2, p. 31, 2022.
[33] S. Khan, A. Kamal, M. Fazil, M. A. Alshara, V. K. Sejwal, R. M. Alotaibi, A. R. Baig, and S. Alqahtani, “HCovBi-Caps: Hate
Speech Detection Using Convolutional and Bi-Directional Gated Recurrent UnitWith Capsule Network,” IEEE Access, vol. 10,
pp. 7881–7894, 2022.
[34] T. P¨oyh¨onen, M. H¨am¨al¨ainen, and K. Alnajjar, “Multilingual Persuasion Detection: Video Games as an Invaluable Data Source
for NLP,” arXiv preprint arXiv:2207.04453, 2022.
[35] C. C. Wang, M. Y. Day, and C. L. Wu, “Political Hate Speech Detection and Lexicon Building: A Study in Taiwan,” IEEE
Access, vol. 10, pp. 44 337–44 346, 2022.
[36] P. Jain, K. R. Srinivas, and A. Vichare, “Depression and Suicide Analysis Using Machine Learning and NLP,” in Journal of
Physics: Conference Series, vol. 2161, no. 1. IOP Publishing, 2022, p. 12034.
[37] W. Etaiwi and G. Naymat, “The Impact of Applying Different Preprocessing Steps on Review Spam Detection,” Procedia
Computer Science, vol. 113, pp. 273–279, 2017.
[38] A. A. Amri, A. R. Ismail, and O. A. Mohammad, “Evolutionary Deep Belief NetworksWith Bootstrap Sampling for Imbalanced
Class Datasets,” International Journal of Advances in Intelligent Informatics, vol. 5, no. 2, pp. 123–136, 2019.
[39] S. D. A. Putri, M. O. Ibrohim, and I. Budi, “Abusive Language and Hate Speech Detection for Javanese and Sundanese Languages
in Tweets: Dataset and Preliminary Study,” 2021 11th International Workshop on Computer Science and Engineering,
WCSE 2021, no. Wcse, pp. 461–465, 2021.
Published
2023-07-07
How to Cite
Alkaff, M., Miqdad, M., Fachrurrazi, M., Abdi, M., Abidin, A., & Amalia, R. (2023). Hate Speech Detection for Banjarese Languages on Instagram Using Machine Learning Methods. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 22(3), 495-504. https://doi.org/https://doi.org/10.30812/matrik.v22i3.2939
Section
Articles