Multiclass Text Classification of Indonesian Short Message Service Spam using Deep Learning Method and Easy Data Augmentation

  • Nurun Latifah Universitas Mataram, Mataram, Indonesia
  • Ramaditia Dwiyansaputra Universitas Mataram, Mataram, Indonesia
  • Gibran Satya Nugraha Universitas Mataram, Mataram, Indonesia
Keywords: Easy Data Augmentation, Multiclass Classification, Short Message Service Spam, Text Classification


The ease of using Short Message Service (SMS) has brought the issue of SMS spam, characterized by unsolicited and unwanted. Many studies have been conducted utilizing machine learning methods to build models capable of classifying SMS Spam to overcome this problem. However, most of these studies still rely on traditional methods, with limited exploration of deep learning-based approaches. Whereas traditional methods have a limitation compared to deep learning, which performs manual feature extraction. Moreover, many of these studies only focus on binary classification rather than multiclass SMS classification which can provide more detailed classification results. The aim of this research is to analyze deep learning model for multiclass Indonesian SMS spam classification with six categories and to assess the effectiveness of the text augmentation method in addressing data imbalace issues arising from the increased number of SMS categories. The research method used were Indonesian version of Bidirectional Encoder Representations from Transformers (IndoBERT) model and exploratory data analysis (EDA) augmentation technique to address imbalance dataset issue. The evaluation is conducted by comparing the performance of the IndoBERT model on the dataset and applying EDA techniques to enhance the representation of minority classes. The result of this research shows that IndoBERT achieves 91% accuracy rate in classifying SMS spam. Furthermore, the use of EDA technique results in significant improvement in f1-score, with an average 12% increase in minority classes. Overall model accuracy also improves to 93% after EDA implementation. This research concludes that IndoBERT is effective for multiclass SMS spam classification, and the EDA is beneficial in handling imbalanced data, contributing to the enhancement of model performances.


Download data is not yet available.


