Hate Speech Detection for Banjarese Languages on Instagram Using Machine Learning Methods

Muhammad Alkaff; Muhammad Afrizal Miqdad; Muhammad Fachrurrazi; Muhammad Nur Abdi; Ahmad Zainul Abidin; Raisa Amalia

doi:10.30812/matrik.v22i3.2939

Authors

Muhammad Alkaff Universitas Lambung Mangkurat, Banjarmasin, Indonesia
Muhammad Afrizal Miqdad Universitas Lambung Mangkurat, Banjarmasin, Indonesia
Muhammad Fachrurrazi Universitas Lambung Mangkurat, Banjarmasin, Indonesia
Muhammad Nur Abdi Universitas Lambung Mangkurat, Banjarmasin, Indonesia
Ahmad Zainul Abidin Universitas Lambung Mangkurat
Raisa Amalia Universitas Lambung Mangkurat, Banjarmasin, Indonesia

DOI:

https://doi.org/10.30812/matrik.v22i3.2939

Keywords:

Banjarese language, Dataset, Hate Speech Detection, Instagram, Machine Learning

Abstract

Hate speech refers to verbal expression or communication that aims to provoke or discriminate against individuals. The Ministry of Communication and Information of Indonesia has encountered and dealt with 3,640 cases of hate speech transmitted through digital channels between 2018 and 2021. Particularly in South Kalimantan, hate speech in the local language, Banjarese has become increasingly prevalent in recent years. Surprisingly, there is a lack of research on using machine learning to detect hate speech in the Banjarese language, specifically on Instagram. Therefore, this study aimed to address this gap by constructing a dataset of Banjarese language hate speech and comparing various feature extraction and machine learning models to detect Banjarese language hate speech effectively. This
research used several feature extraction techniques and machine learning methods to detect Banjarese
language hate speech. The feature extraction methods used were Word N-Gram, Term Frequency- Inverse Document Frequency (TF-IDF), a combination of Word N-Gram and TF-IDF, Word2Vec, and Glove, while the machine learning methods used were Support Vector Machine (SVM), NaÂ¨Ä±ve Bayes, and Decision Tree. The results of this study revealed that the combination of TF-IDF for feature extraction and SVM as the model achieves exceptional performance. The average Recall, Precision, Accuracy, and F1-Score score exceeded 90%, demonstrating the modelâ€™s ability to identify Banjarese hate speech accurately.

Downloads

Download data is not yet available.

References

[1] F. E. Ayo, O. Folorunso, F. T. Ibharalu, and I. A. Osinuga, â€œMachine Learning Techniques for Hate Speech Classification of
Twitter Data: State-Of-The-Art, Future Challenges and Research Directions,â€ Computer Science Review, vol. 38, p. 100311,
2020.
[2] G. H. Martono, A. Azhari, and K. Mustofa, â€œAn Extended Approach of Weight Collective Influence Graph for Detection
Influence Actor,â€ International Journal of Advances in Intelligent Informatics, vol. 8, no. 1, pp. 1â€“11, mar 2022.
[3] M. O. Ibrohim and I. Budi, â€œMulti-label Hate Speech and Abusive Language Detection in Indonesian Twitter,â€ in Proceedings of
the Third Workshop on Abusive Language Online. Stroudsburg: Association for Computational Linguistics, 2019, pp. 46â€“57.
[4] N. S. Mullah andW. M. N.W. Zainon, â€œAdvances in Machine Learning Algorithms for Hate Speech Detection in Social Media:
A Review,â€ IEEE Access, vol. 9, pp. 88 364â€“88 376, 2021.
[5] C. E. Rudy Salim and D. Suhartono, â€œA Systematic Literature Review of Different Machine Learning Methods on Hate Speech
Detection,â€ JOIV : International Journal on Informatics Visualization, vol. 4, no. 4, pp. 213â€“218, dec 2020.
[6] A. Olteanu, C. Castillo, J. Boy, and K. Varshney, â€œThe Effect of Extremist Violence on Hateful Speech Online,â€ in Proceedings
of the international AAAI conference on web and social media, 2018, pp. 1â€“10.
[7] S. Ghosal and A. Jain, â€œResearch Journey of Hate Content Detection From Cyberspace,â€ 2021, pp. 200â€“225.
[8] M. R. Awal, R. K.-W. Lee, E. Tanwar, T. Garg, and T. Chakraborty, â€œModel-Agnostic Meta-Learning for Multilingual Hate
Speech Detection,â€ IEEE Transactions on Computational Social Systems, pp. 1â€“10, 2023.
[9] J. Li and Y. Ning, â€œAnti-Asian Hate Speech Detection via Data Augmented Semantic Relation Inference,â€ in Proceedings of the
International AAAI Conference on Web and Social Media, 2022, pp. 607â€“617.
[10] F. T. Boishakhi, P. C. Shill, and M. G. R. Alam, â€œMulti-modal Hate Speech Detection using Machine Learning,â€ in 2021 IEEE
International Conference on Big Data (Big Data). IEEE, dec 2021, pp. 4496â€“4499.
[11] C. Erico, â€œLong Short-Term Memory Approach For Hate Speech and Abusive Language Detection on Indonesian Youtube
Comment Section,â€ Ph.D. dissertation, 2021.
[12] N. Deshpande, N. Farris, and V. Kumar, â€œHighly Generalizable Models for Multilingual Hate Speech Detection,â€ CoRR, 2022.
[13] M. Mozafari, R. Farahbakhsh, and N. Crespi, â€œCross-Lingual Few-Shot Hate Speech and Offensive Language Detection Using
Meta Learning,â€ IEEE Access, vol. 10, pp. 14 880â€“14 896, 2022.
[14] Y. Li, K. Bontcheva, and H. Cunningham, â€œAdapting SVM for Natural Language Learning: A Case Study Involving Information
Extraction,â€ pp. 1â€“25, 2006.
[15] B. AlBadani, R. Shi, and J. Dong, â€œA Novel Machine Learning Approach for Sentiment Analysis on Twitter Incorporating the
Universal Language Model Fine-Tuning and SVM,â€ Applied System Innovation, vol. 5, no. 1, p. 13, 2022.
[16] D. Jurafsky and J. Martin, â€œNaive Bayes and Sentiment Classification,â€ Speech and Language Processing, p. 1024, 2019.
[17] M. Bansal, A. Goyal, and A. Choudhary, â€œA Comparative Analysis of K-Nearest Neighbour, Genetic, Support Vector Machine,
Decision Tree, and Long Short Term Memory Algorithms in Machine Learning,â€ Decision Analytics Journal, p. 100071, 2022.
[18] U. A. N. Rohmawati, S. W. Sihwi, and D. E. Cahyani, â€œSEMAR: An interface for Indonesian hate speech detection using
machine learning,â€ 2018 International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2018,
no. 1, pp. 646â€“651, 2018.
[19] S. Cokrowibowo and N. Zulkarnaim, â€œOnline News Analysis of Majene Public Figure ElectabilityWith NLP (Natural Language
Processing),â€ in IOP Conference Series: Materials Science and Engineering, vol. 875, no. 1. IOP Publishing, 2020, p. 12092.
[20] S. Sarica and J. Luo, â€œStopwords in Technical Language Processing,â€ PLoS ONE, vol. 16, no. 8 August, pp. 1â€“13, 2021.
[21] A. Garlapati, N. Malisetty, and G. Narayanan, â€œClassification of Toxicity in Comments using NLP and LSTM,â€ in 2022 8th
International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1. IEEE, 2022, pp. 16â€“21.
[22] K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, â€œText Classification Algorithms: A Survey,â€
Information (Switzerland), vol. 10, no. 4, pp. 1â€“68, 2019.
[23] B. Pahwa, S. Taruna, and N. Kasliwal, â€œSentiment Analysis- Strategy for Text Pre-Processing,â€ International Journal of Computer
Applications, vol. 180, no. 34, pp. 15â€“18, 2018.
[24] D. Z. Abidin, S. Nurmaini, R. F. Malik, E. Rasywir, and Y. Pratama, â€œA Model of Preprocessing for Social Media Data
Extraction,â€ in 2019 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS). IEEE,
2019, pp. 67â€“72.
[25] N. I. Pratiwi, I. Budi, and M. A. Jiwanggi, â€œHate Speech Identification Using the Hate Codes for Indonesian Tweets,â€ in
PervasiveHealth: Pervasive Computing Technologies for Healthcare. ICST, jul 2019, pp. 128â€“133.
[26] S. Abro, S. Shaikh, Z. Ali, S. Khan, G. Mujtaba, and Z. H. Khand, â€œAutomatic Hate Speech Detection Using Machine Learning:
A Comparative Study,â€ International Journal of Advanced Computer Science and Applications, vol. 11, no. 8, pp. 484â€“491,
2020.
[27] A. Alrehili, â€œAutomatic Hate Speech Detection on Social Media: A Brief Survey,â€ Proceedings of IEEE/ACS International
Conference on Computer Systems and Applications, AICCSA, vol. 2019-Novem, pp. 1â€“6, 2019.
[28] H. Zhou, â€œResearch of Text Classification Based on TF-IDF and CNN-LSTM,â€ in Journal of Physics: Conference Series, vol.
2171, no. 1. IOP Publishing, 2022, p. 12021.
[29] S. Styawati, A. Nurkholis, A. A. Aldino, S. Samsugi, E. Suryati, and R. P. Cahyono, â€œSentiment Analysis on Online Transportation
Reviews Using Word2vec Text Embedding Model Feature Extraction and Support Vector Machine (Svm) Algorithm,â€ in
2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE). IEEE, 2022, pp. 163â€“167.
[30] J. Pennington, R. Socher, and C. Manning, Glove: Global Vectors for Word Representation, jan 2014, vol. 14.
[31] N. Aulia and I. Budi, â€œHate Speech Detection on Indonesian Long Text Documents Using Machine Learning Approach,â€ in
PervasiveHealth: Pervasive Computing Technologies for Healthcare. ICST, apr 2019, pp. 164â€“169.
[32] E. M. Dharma, F. L. Gaol, H. Warnars, and B. Soewito, â€œThe Accuracy Comparison Among Word2vec, Glove, and Fasttext
Towards Convolution Neural Network (CNN) Text Classification,â€ Journal of Theoretical and Applied Information Technology,
vol. 100, no. 2, p. 31, 2022.
[33] S. Khan, A. Kamal, M. Fazil, M. A. Alshara, V. K. Sejwal, R. M. Alotaibi, A. R. Baig, and S. Alqahtani, â€œHCovBi-Caps: Hate
Speech Detection Using Convolutional and Bi-Directional Gated Recurrent UnitWith Capsule Network,â€ IEEE Access, vol. 10,
pp. 7881â€“7894, 2022.
[34] T. PÂ¨oyhÂ¨onen, M. HÂ¨amÂ¨alÂ¨ainen, and K. Alnajjar, â€œMultilingual Persuasion Detection: Video Games as an Invaluable Data Source
for NLP,â€ arXiv preprint arXiv:2207.04453, 2022.
[35] C. C. Wang, M. Y. Day, and C. L. Wu, â€œPolitical Hate Speech Detection and Lexicon Building: A Study in Taiwan,â€ IEEE
Access, vol. 10, pp. 44 337â€“44 346, 2022.
[36] P. Jain, K. R. Srinivas, and A. Vichare, â€œDepression and Suicide Analysis Using Machine Learning and NLP,â€ in Journal of
Physics: Conference Series, vol. 2161, no. 1. IOP Publishing, 2022, p. 12034.
[37] W. Etaiwi and G. Naymat, â€œThe Impact of Applying Different Preprocessing Steps on Review Spam Detection,â€ Procedia
Computer Science, vol. 113, pp. 273â€“279, 2017.
[38] A. A. Amri, A. R. Ismail, and O. A. Mohammad, â€œEvolutionary Deep Belief NetworksWith Bootstrap Sampling for Imbalanced
Class Datasets,â€ International Journal of Advances in Intelligent Informatics, vol. 5, no. 2, pp. 123â€“136, 2019.
[39] S. D. A. Putri, M. O. Ibrohim, and I. Budi, â€œAbusive Language and Hate Speech Detection for Javanese and Sundanese Languages
in Tweets: Dataset and Preliminary Study,â€ 2021 11th International Workshop on Computer Science and Engineering,
WCSE 2021, no. Wcse, pp. 461â€“465, 2021.

Hate Speech Detection for Banjarese Languages on Instagram Using Machine Learning Methods

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Similar Articles

Quick Menu

tools

whatsapp

citation

statistik

Current Issue