Comparative Analysis of TF-IDF and Modern Text Embedding for the Classification of Islamic Ideologies on Indonesian Twitter
DOI:
https://doi.org/10.30812/matrik.v25i1.5600Keywords:
Islamic Ideologies, Machine Learning, Social Media, Support Vector Machine, Text ClassificationAbstract
The ideological polarization that has emerged on social media platforms like Twitter, particularly regarding discussions on Islamic ideologies in Indonesia, has led to the rapid spread of da’wah. However, it has also created challenges in effectively classifying tweets into distinct Islamic ideologies, such as Liberal Islam and Moderate Islam (Wasathiyyah). The lack of effective methods for accurately
classifying such nuanced content presents a significant challenge. To address this problem, the research aimed to develop and evaluate a machine learning model that compares the effectiveness of traditional word vectorization methods (TF-IDF) with modern text embedding models (Nomic Embed v2). The study utilized the Knowledge Discovery in Databases (KDD) framework, scraped relevant data using the Twitter API, and annotated the dataset based on ideology. Preprocessing techniques such as case folding, stopword removal, and symbol removal were applied to the dataset. Classification was carried out using an SVM model, and cross-validation was employed to assess the model’s accuracy. The findings indicate that the embedding model improved the accuracy by providing nuanced semantic context for the tweets, suggesting that modern semantic models can outperform traditional methods in
classifying complex, context-dependent texts.
Downloads
References
[1] M. Murniati, “Ruang Publik dan Wacana Agama: Dinamika Dakwah di Tengah Polarisasi Sosial,” vol. 1, no. 1, pp. 26–33,
June,2025, https://doi.org/10.70742/khazanah.v1i1.260.
[2] S. R. I. Rezeki, Y. Restiviani, and R. Zahara, “Penggunaan Sosial Media Twitter dalam Komunikasi Organisasi (Studi Kasus
Pemerintah Provinsi DKI Jakarta dalam Penanganan Covid-19),” vol. 4, no. 2, pp. 63–78, 2020, https://doi.org/10.18592/jils.
v4i2.3812.
[3] S. Hudaa, N. Nuryani, and B. Sumadyo, “Pesan Dakwah Hijrah Influencer untuk Kalangan Muda di Media Sosial,” vol. 17,
no. 2, pp. 105–121, January,2023, https://doi.org/10.47651/mrf.v17i2.198.
[4] A. S. Amin and M. S. Syarifah, “Liberal Islam and Its Influences on the Development of Quranic Exegesis in Indonesia and
Malaysia,” vol. 22, no. 1, pp. 137–160, January,2021, https://doi.org/10.14421/qh.2021.2201-07.
[5] N. Rubani, “Elemen Islam Liberal dalam Idea Pembaharuan Islam AhmadWahib: Elements of Liberal Islam In AhmadWahib’s
Idea Of Islamic Reform,” vol. 16, no. 1, pp. 9–21, May,2023, https://doi.org/10.53840/jpi.v16i1.235.
[6] A. Maksum, I. Abdullah, S. Mas’udah, and M. Saud, “Islamic Movements in Indonesia: A Critical Study of Hizbut Tahrir
Indonesia and Jaringan Islam Liberal,” vol. 17, no. 2, pp. 71–82, December,2022, https://doi.org/10.22452/JAT.vol17no2.6.
[7] A. Halim, H. Hosaini, A. Zukin, and R. Mahtum, “Paradigma Islam Moderat di Indonesia dalam Membentuk Perdamaian
Dunia,” vol. 1, no. 4, pp. 705–708, October,2022, https://doi.org/10.59004/jisma.v1i4.239.
[8] M. Mudhofi, I. Supena, A. Karim, S. Safrodin, and S. Solahuddin, “Public opinion analysis for moderate religious: Social media
data mining approach,” vol. 43, no. 1, pp. 1–27, May,2023, https://doi.org/10.21580/jid.v43.1.16101.
[9] N. Nuwairah and M. Munsyi, “Classification Content in Indonesian Website Da’wah using Text Mining for Detecting Islamic
Radical Understanding:,” February,2022, pp. 11–16, https://doi.org/10.2991/assehr.k.220206.002.
[10] K. T. Mursi, M. D. Alahmadi, F. S. Alsubaei, and A. S. Alghamdi, “Detecting Islamic Radicalism Arabic Tweets Using Natural
Language Processing,” vol. 10, pp. 72 526–72 534, July, 2022, https://doi.org/10.1109/ACCESS.2022.3188688.
[11] A. Olowolayemo and S. Moustafa Sharey Moustafa, “Classifying Muslim Ideologies from IslamicWebsites using Text Analysis
Based on Naive Bayes and TF-IDF,” vol. 10, no. 1, pp. 8–15, January,2024, https://doi.org/10.31436/ijpcc.v10i1.321.
[12] W. Gonz´alez-Baquero, J. J. Amores, and C. Arcila-Calder´on, “The Conversation around Islam on Twitter: Topic Modeling
and Sentiment Analysis of Tweets about the Muslim Community in Spain since 2015,” vol. 14, no. 6, p. 724, May,2023,
https://doi.org/10.3390/rel14060724.
[13] A. Palanivinayagam, C. Z. El-Bayeh, and R. Damaˇseviˇcius, “Twenty Years of Machine-Learning-Based Text Classification: A
Systematic Review,” vol. 16, no. 5, p. 236, April,2023, https://doi.org/10.3390/a16050236.
[14] X. Shu and Y. Ye, “Knowledge Discovery: Methods from data mining and machine learning,” vol. 110, p. 102817, February,
2023, https://doi.org/10.1016/j.ssresearch.2022.102817.
[15] R. Ulgasesa, A. B. P. Negara, and T. Tursina, “Pengaruh Stemming Terhadap Performa Klasifikasi Sentimen Masyarakat Tentang
Kebijakan New Normal,” vol. 10, no. 3, p. 286, September,2022, https://doi.org/10.26418/justin.v10i3.53880.
[16] E. Dewi, “Islam Liberal di Indonesia (Pemikiran dan Pengaruhnya dalam Pemikiran Politik Islam di Indonesia),” vol. 2, no. 2,
pp. 18–32, January,2018, https://doi.org/10.14710/jiip.v2i2.2119.
[17] K. Bustamam-Ahmad, “Contemporary Islamic Thought in Indonesian and Malay World: Islam Liberal, Islam Hadhari, and
Islam Progresif,” vol. 5, no. 1, p. 91, June,2011, https://doi.org/10.15642/JIIS.2011.5.1.91-129.
[18] C. T. Agustina, “Pergerakan jaringan islam liberal (jil) di indonesia tahun 2001-2005,” vol. 4, p. 242059, September,2012.
[Online]. Available: https://www.neliti.com/publications/242059/
[19] D. E. Cahyani and I. Patasik, “Performance comparison of TF-IDF and Word2Vec models for emotion text classification,”
vol. 10, no. 5, pp. 2780–2788, October,2021, https://doi.org/10.11591/eei.v10i5.3157.
[20] Z. Nussbaum, J. X. Morris, B. Duderstadt, and A. Mulyar. (2024) Nomic Embed: Training a Reproducible Long Context Text
Embedder. https://doi.org/10.48550/ARXIV.2402.01613.
[21] J. Mutinda, W. Mwangi, and G. Okeyo, “Sentiment Analysis of Text Reviews Using Lexicon-Enhanced Bert Embedding
(LeBERT) Model with Convolutional Neural Network,” vol. 13, no. 3, p. 1445, 2023-01-21, https://doi.org/10.3390/
app13031445.
[22] X. Zhang, N. Thakur, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu, M. Rezagholizadeh, and J. Lin,
“MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages,” vol. 11, pp. 1114–1131, September,2023,
https://doi.org/10.1162/tacl a 00595.
[23] H. Abdelmotaleb, C. Mcneile, and M. Wojty´s, “A comparative study of word embedding techniques for classification of star
ratings,” vol. 297, p. 129037, February,2026, https://doi.org/10.1016/j.eswa.2025.129037.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Siti Ummi Masruroh, Cong Dai Nguyen, Doni Febrianus

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
.png)











