Comparative Analysis of TF-IDF and Modern Text Embedding for theClassification of Islamic Ideologies on Indonesian Twitter

Authors

  • Siti Ummi Masruroh Universitas Islam Negeri Syarif Hidayatullah, Jakarta, Indonesia
  • ⁠Cong Dai Nguyen Le Quy Don Technical University, Hanoi 10000, Viet Nam
  • Doni Febrianus Universitas Islam Negeri Syarif Hidayatullah, Jakarta, Indonesia

DOI:

https://doi.org/10.30812/matrik.v25i1.5600

Keywords:

Islamic Ideologies, Machine Learning, Social Media, Support Vector Machine, Text Classification

Abstract

The ideological polarization that has emerged on social media platforms like Twitter, particularly regarding discussions on Islamic ideologies in Indonesia, has led to the rapid spread of da’wah. However, it has also created challenges in effectively classifying tweets into distinct Islamic ideologies, such as Liberal Islam and Moderate Islam (Wasathiyyah). The lack of effective methods for accurately
classifying such nuanced content presents a significant challenge. To address this problem, the research aimed to develop and evaluate a machine learning model that compares the effectiveness of traditional word vectorization methods (TF-IDF) with modern text embedding models (Nomic Embed v2). The study utilized the Knowledge Discovery in Databases (KDD) framework, scraped relevant data using the Twitter API, and annotated the dataset based on ideology. Preprocessing techniques such as case folding, stopword removal, and symbol removal were applied to the dataset. Classification was carried out using an SVM model, and cross-validation was employed to assess the model’s accuracy. The findings indicate that the embedding model improved the accuracy by providing nuanced semantic context for the tweets, suggesting that modern semantic models can outperform traditional methods in
classifying complex, context-dependent texts.

Downloads

Download data is not yet available.

References

[1] M. Murniati, “Ruang Publik dan Wacana Agama: Dinamika Dakwah di Tengah Polarisasi Sosial,” vol. 1, no. 1, pp. 26–33,

June,2025, https://doi.org/10.70742/khazanah.v1i1.260.

[2] S. R. I. Rezeki, Y. Restiviani, and R. Zahara, “Penggunaan Sosial Media Twitter dalam Komunikasi Organisasi (Studi Kasus

Pemerintah Provinsi DKI Jakarta dalam Penanganan Covid-19),” vol. 4, no. 2, pp. 63–78, 2020, https://doi.org/10.18592/jils.

v4i2.3812.

[3] S. Hudaa, N. Nuryani, and B. Sumadyo, “Pesan Dakwah Hijrah Influencer untuk Kalangan Muda di Media Sosial,” vol. 17,

no. 2, pp. 105–121, January,2023, https://doi.org/10.47651/mrf.v17i2.198.

[4] A. S. Amin and M. S. Syarifah, “Liberal Islam and Its Influences on the Development of Quranic Exegesis in Indonesia and

Malaysia,” vol. 22, no. 1, pp. 137–160, January,2021, https://doi.org/10.14421/qh.2021.2201-07.

[5] N. Rubani, “Elemen Islam Liberal dalam Idea Pembaharuan Islam AhmadWahib: Elements of Liberal Islam In AhmadWahib’s

Idea Of Islamic Reform,” vol. 16, no. 1, pp. 9–21, May,2023, https://doi.org/10.53840/jpi.v16i1.235.

[6] A. Maksum, I. Abdullah, S. Mas’udah, and M. Saud, “Islamic Movements in Indonesia: A Critical Study of Hizbut Tahrir

Indonesia and Jaringan Islam Liberal,” vol. 17, no. 2, pp. 71–82, December,2022, https://doi.org/10.22452/JAT.vol17no2.6.

[7] A. Halim, H. Hosaini, A. Zukin, and R. Mahtum, “Paradigma Islam Moderat di Indonesia dalam Membentuk Perdamaian

Dunia,” vol. 1, no. 4, pp. 705–708, October,2022, https://doi.org/10.59004/jisma.v1i4.239.

[8] M. Mudhofi, I. Supena, A. Karim, S. Safrodin, and S. Solahuddin, “Public opinion analysis for moderate religious: Social media

data mining approach,” vol. 43, no. 1, pp. 1–27, May,2023, https://doi.org/10.21580/jid.v43.1.16101.

[9] N. Nuwairah and M. Munsyi, “Classification Content in Indonesian Website Da’wah using Text Mining for Detecting Islamic

Radical Understanding:,” February,2022, pp. 11–16, https://doi.org/10.2991/assehr.k.220206.002.

[10] K. T. Mursi, M. D. Alahmadi, F. S. Alsubaei, and A. S. Alghamdi, “Detecting Islamic Radicalism Arabic Tweets Using Natural

Language Processing,” vol. 10, pp. 72 526–72 534, July, 2022, https://doi.org/10.1109/ACCESS.2022.3188688.

[11] A. Olowolayemo and S. Moustafa Sharey Moustafa, “Classifying Muslim Ideologies from IslamicWebsites using Text Analysis

Based on Naive Bayes and TF-IDF,” vol. 10, no. 1, pp. 8–15, January,2024, https://doi.org/10.31436/ijpcc.v10i1.321.

[12] W. Gonz´alez-Baquero, J. J. Amores, and C. Arcila-Calder´on, “The Conversation around Islam on Twitter: Topic Modeling

and Sentiment Analysis of Tweets about the Muslim Community in Spain since 2015,” vol. 14, no. 6, p. 724, May,2023,

https://doi.org/10.3390/rel14060724.

[13] A. Palanivinayagam, C. Z. El-Bayeh, and R. Damaˇseviˇcius, “Twenty Years of Machine-Learning-Based Text Classification: A

Systematic Review,” vol. 16, no. 5, p. 236, April,2023, https://doi.org/10.3390/a16050236.

[14] X. Shu and Y. Ye, “Knowledge Discovery: Methods from data mining and machine learning,” vol. 110, p. 102817, February,

2023, https://doi.org/10.1016/j.ssresearch.2022.102817.

[15] R. Ulgasesa, A. B. P. Negara, and T. Tursina, “Pengaruh Stemming Terhadap Performa Klasifikasi Sentimen Masyarakat Tentang

Kebijakan New Normal,” vol. 10, no. 3, p. 286, September,2022, https://doi.org/10.26418/justin.v10i3.53880.

[16] E. Dewi, “Islam Liberal di Indonesia (Pemikiran dan Pengaruhnya dalam Pemikiran Politik Islam di Indonesia),” vol. 2, no. 2,

pp. 18–32, January,2018, https://doi.org/10.14710/jiip.v2i2.2119.

[17] K. Bustamam-Ahmad, “Contemporary Islamic Thought in Indonesian and Malay World: Islam Liberal, Islam Hadhari, and

Islam Progresif,” vol. 5, no. 1, p. 91, June,2011, https://doi.org/10.15642/JIIS.2011.5.1.91-129.

[18] C. T. Agustina, “Pergerakan jaringan islam liberal (jil) di indonesia tahun 2001-2005,” vol. 4, p. 242059, September,2012.

[Online]. Available: https://www.neliti.com/publications/242059/

[19] D. E. Cahyani and I. Patasik, “Performance comparison of TF-IDF and Word2Vec models for emotion text classification,”

vol. 10, no. 5, pp. 2780–2788, October,2021, https://doi.org/10.11591/eei.v10i5.3157.

[20] Z. Nussbaum, J. X. Morris, B. Duderstadt, and A. Mulyar. (2024) Nomic Embed: Training a Reproducible Long Context Text

Embedder. https://doi.org/10.48550/ARXIV.2402.01613.

[21] J. Mutinda, W. Mwangi, and G. Okeyo, “Sentiment Analysis of Text Reviews Using Lexicon-Enhanced Bert Embedding

(LeBERT) Model with Convolutional Neural Network,” vol. 13, no. 3, p. 1445, 2023-01-21, https://doi.org/10.3390/

app13031445.

[22] X. Zhang, N. Thakur, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu, M. Rezagholizadeh, and J. Lin,

“MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages,” vol. 11, pp. 1114–1131, September,2023,

https://doi.org/10.1162/tacl a 00595.

[23] H. Abdelmotaleb, C. Mcneile, and M. Wojty´s, “A comparative study of word embedding techniques for classification of star

ratings,” vol. 297, p. 129037, February,2026, https://doi.org/10.1016/j.eswa.2025.129037.

Downloads

Published

2025-11-30

Issue

Section

Articles

How to Cite

[1]
S. U. Masruroh, ⁠Cong D. Nguyen, and D. Febrianus, “Comparative Analysis of TF-IDF and Modern Text Embedding for theClassification of Islamic Ideologies on Indonesian Twitter”, MATRIK, vol. 25, no. 1, pp. 63–72, Nov. 2025, doi: 10.30812/matrik.v25i1.5600.

Similar Articles

1-10 of 204

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)