TY - JOUR AU - Susandri Susandri AU - Ahmad Zamsuri AU - Nurliana Nasution AU - Yoyon Efendi AU - Hiba Alwan PY - 2025/03/10 Y2 - 2025/04/03 TI - The Mitigating Overfitting in Sentiment Analysis Insights from CNN-LSTM Hybrid Models JF - MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer JA - matrik VL - 24 IS - 2 SE - Articles DO - https://doi.org/10.30812/matrik.v24i2.4742 UR - https://journal.universitasbumigora.ac.id/index.php/matrik/article/view/4742 AB - This study aims to improve sentiment analysis accuracy and address overfitting challenges in deep learning models by developing a hybrid model based on Convolutional Neural Networks and Long Short-Term Memory Networks. The research methodology involved multiple stages, starting with preprocessing a dataset of 5,456 rows. This process included removing duplicate data, empty entries, and neutral sentiments, resulting in 2,685 usable rows. To overcome data quantity limitations, data augmentation expanded the training dataset from 2,148 to 10,740 samples. Data transformation was carried out using tokenization, padding, and embedding techniques, leveraging Word2Vec and GloVe to produce numerical representations of textual data. The hybrid model demonstrated strong performance, achieving a training accuracy of 99.51%, validation accuracy of 99.25%, and testing accuracy of 87.34%, with a loss value of 0.56. Evaluation metrics showed precision, recall, and F1-Score values of 86%, 87%, and 86%, respectively. The hybrid model outperformed individual models, including Convolutional Neural Networks (70% accuracy) and Long Short-Term Memory Networks (81% accuracy). It also surpassed other hybrid models, such as the multiscale Convolutional Neural Network-Long Short-Term Memory Network, which achieved a maximum accuracy of 89.25%. The implications of this study demonstrate that the hybrid model based on Convolutional Neural Networks and Long Short-Term Memory Networks effectively improves sentiment analysis accuracy while reducing the risk of overfitting, particularly in small or imbalanced datasets. Future research is recommended to enhance data quality, adopt more advanced embedding techniques, and optimize model configurations to achieve better performance. ER -