Comparative Analysis of Indonesian Pre-trained BERT Models for the Extractive Question Answering Task on an Indonesian-Translated SQuAD Dataset
DOI:
https://doi.org/10.30812/matrik.v25i2.5847Keywords:
Fine-tuning, IndoBERT, Natural Language Processing, Pre-training, Questioning-AnsweringAbstract
Transformer-based architectures have significantly advanced Natural Language Processing (NLP), with Bidirectional Encoder Representations from Transformers (BERT) serving as a strong baseline for extractive Question Answering (QA). This study aims to evaluate the performance of Indonesian BERT models on extractive QA tasks and to identify the most effective model for low-resource language settings. This research employed a comparative experimental method using two Indonesian BERT variants: indobert-base- ncased (IndoLEM) and indobert-base-p1 (IndoNLU/IndoBenchmark). Both models were fine-tuned on an Indonesian version of SQuAD 2.0, automatically translated via the Google Translate API. Answer-span alignment errors caused by translation were corrected using fuzzy string matching. Evaluation was conducted under identical hyperparameter settings and training schemes, using Exact Match (EM) and F1-score as performance metrics. The results indicate that IndoLEM achieved superior performance, with better loss convergence and a higher F1-score (71.58) than IndoNLU (63.59), and the difference was statistically significant (p < 0.001). In conclusion, IndoLEM is a more effective baseline model for Indonesian extractive QA systems. The findings also demonstrate that the composition and scale of pre-trained corpora substantially influence model performance in low-resource language contexts and highlight the importance of transfer learning for advancing NLP in underrepresented languages.
Downloads
References
[1] M. Zaib, W. E. Zhang, Q. Z. Sheng, A. Mahmood, and Y. Zhang, “Conversational question answering: A survey,” Knowledge
and Information Systems, vol. 64, no. 12, pp. 3151–3195, Dec. 2022, https://doi.org/10.1007/s10115-022-01744-y.
[2] Y. Chen, “Intelligent question answering system for internet of things data analysis and educational technology,” in Proceedings
of the 2021 International conference on Smart Technologies and Systems for Internet of Things (STS-IOT 2021). Atlantis
Press, 2022, pp. 274–279, https://doi.org/10.2991/ahis.k.220601.052.
[3] S. Cai, Q. Ma, Y. Hou, and G. Zeng, “Knowledge graph multi-hop question answering based on dependent syntactic semantic
augmented graph networks,” Electronics, vol. 13, p. 1436, 4 2024, https://doi.org/10.3390/electronics13081436.
[4] X. Luo, Z. Deng, B. Yang, and M. Y. Luo, “Pre-trained language models in medicine: A survey,” Artificial Intelligence in
Medicine, vol. 154, p. 102904, Aug. 2024, https://doi.org/10.1016/j.artmed.2024.102904.
[5] H. Wang, J. Li, H. Wu, E. Hovy, and Y. Sun, “Pre-Trained Language Models and Their Applications,” Engineering, vol. 25, pp.
51–65, jun 2023, https://doi.org/10.1016/j.eng.2022.04.024.
[6] X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, “Pre-trained models for natural language processing: A survey,” Science
China Technological Sciences, vol. 63, pp. 1872–1897, 10 2020, https://doi.org/10.1007/s11431-020-1647-3.
[7] H.W. Chung, L. Hou, S. Longpre, B. Zoph, T. Yi,W. Fedus, Y. Li, X.Wang, M. Dehghani, and S. Brahma, “Scaling instructionfinetuned
language models,” Journal of Machine Learning Research, vol. 25, pp. 1–53, 2024.
[8] H. Wang, J. Li, H. Wu, E. Hovy, and Y. Sun, “Pre-trained language models and their applications,” Engineering, vol. 25, pp.
51–65, 6 2023, https://doi.org/10.1016/j.eng.2022.04.024.
[9] W. Zheng, S. Lu, Z. Cai, R. Wang, L. Wang, and L. Yin, “PAL-BERT: An Improved Question Answering Model,” Computer
Modeling in Engineering & Sciences, vol. 139, no. 3, pp. 2729–2745, 2024, https://doi.org/10.32604/cmes.2023.046692.
[10] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “Indolem and indobert: A benchmark dataset and pre-trained language model for
indonesian nlp,” in Proceedings of the 28th International Conference on Computational Linguistics. International Committee
on Computational Linguistics, 2020, pp. 757–770, https://doi.org/10.18653/v1/2020.coling-main.66.
[11] B. Wilie, K. Vincentio, G. Winata, X. Li, Z. Lim, S. Soleman, R. Mahendra, P. Fung, S. Bahar, and A. Purwarianti, “Indonlu:
Benchmark and resources for evaluating indonesian natural language understanding,” in Proceedings of the 1st Conference of
the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th, 3 2020, pp. 843–857, https://doi.org/
10.48550/arXiv.2009.05387.
[12] F. J. Muis and A. Purwarianti, “Sequence-to-sequence learning for indonesian automatic question generator,” in 2020 7th International
Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA). IEEE, 9 2020, pp. 1–6,
https://doi.org/10.1109/ICAICTA49861.2020.9429032.
[13] S. Moon, H. He, H. Jia, H. Liu, and J. W. Fan, “Extractive clinical question-answering with multianswer and multifocus
questions: Data set development and evaluation study,” JMIR AI, vol. 2, p. e41818, 6 2023, https://doi.org/10.2196/41818.
[14] B. Richardson and A. Wicaksana, “Comparison of indobert-lite and roberta in text mining for indonesian language question
answering application,” vol. 18, no. 06, pp. 1719–1734, July, 2022, https://doi.org/10.24507/ijicic.18.06.1719.
[15] W. Suwarningsih, R. A. Pramata, F. Y. Rahadika, and M. H. A. Purnomo, “RoBERTa: Language modelling in building Indonesian
question-answering systems,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 20, no. 6, p.
1248, dec 2022, https://doi.org/10.12928/telkomnika.v20i6.24248.
[16] G. N. Ahmad and A. Romadhony, “End-to-end question answering system for indonesian documents using tf-idf and indobert,”
in 2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA). IEEE, 10
2023, pp. 1–6, https://doi.org/10.1109/ICAICTA59291.2023.10390111.
[17] J. H. Clark, E. Choi, M. Collins, D. Garrette, T. Kwiatkowski, V. Nikolaev, and J. Palomaki, “T y di qa:a benchmark for
information-seeking question answering in ty pologically di verse languages,” Transactions of the Association for Computational
Linguistics, vol. 8, pp. 454–470, Dec. 2020, https://doi.org/10.1162/tacl a 00317.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Fattah Al Ilmi Suhendra, Astie Darmayantie, Adang Suhendra, Pa Pa Min

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Fattah Al Ilmi Suhendra
Pa Pa Min
.png)











