Comparative Analysis of Indonesian Pre-trained BERT Models for the Extractive Question Answering Task on an Indonesian-Translated SQuAD Dataset

Authors

DOI:

https://doi.org/10.30812/matrik.v25i2.5847

Keywords:

Fine-tuning, IndoBERT, Natural Language Processing, Pre-training, Questioning-Answering

Abstract

Transformer-based architectures have significantly advanced Natural Language Processing (NLP), with Bidirectional Encoder Representations from Transformers (BERT) serving as a strong baseline for extractive Question Answering (QA). This study aims to evaluate the performance of Indonesian BERT models on extractive QA tasks and to identify the most effective model for low-resource language settings. This research employed a comparative experimental method using two Indonesian BERT variants: indobert-base- ncased (IndoLEM) and indobert-base-p1 (IndoNLU/IndoBenchmark). Both models were fine-tuned on an Indonesian version of SQuAD 2.0, automatically translated via the Google Translate API. Answer-span alignment errors caused by translation were corrected using fuzzy string matching. Evaluation was conducted under identical hyperparameter settings and training schemes, using Exact Match (EM) and F1-score as performance metrics. The results indicate that IndoLEM achieved superior performance, with better loss convergence and a higher F1-score (71.58) than IndoNLU (63.59), and the difference was statistically significant (p < 0.001). In conclusion, IndoLEM is a more effective baseline model for Indonesian extractive QA systems. The findings also demonstrate that the composition and scale of pre-trained corpora substantially influence model performance in low-resource language contexts and highlight the importance of transfer learning for advancing NLP in underrepresented languages.

Downloads

Download data is not yet available.

Author Biography

  • Fattah Al Ilmi Suhendra, Universitas Gunadarma, Depok, Indonesia

    Fattah Al Ilmi is a student in the Management of Information System Department at Gunadarma University. His research interests include natural language processing (NLP), pre-trained language models, and Indonesian language representation learning, particularly IndoBERT. His current research focuses on the evaluation and comparative performance of Indonesian pre-trained BERT models for question answering tasks.

References

[1] M. Zaib, W. E. Zhang, Q. Z. Sheng, A. Mahmood, and Y. Zhang, “Conversational question answering: A survey,” Knowledge

and Information Systems, vol. 64, no. 12, pp. 3151–3195, Dec. 2022, https://doi.org/10.1007/s10115-022-01744-y.

[2] Y. Chen, “Intelligent question answering system for internet of things data analysis and educational technology,” in Proceedings

of the 2021 International conference on Smart Technologies and Systems for Internet of Things (STS-IOT 2021). Atlantis

Press, 2022, pp. 274–279, https://doi.org/10.2991/ahis.k.220601.052.

[3] S. Cai, Q. Ma, Y. Hou, and G. Zeng, “Knowledge graph multi-hop question answering based on dependent syntactic semantic

augmented graph networks,” Electronics, vol. 13, p. 1436, 4 2024, https://doi.org/10.3390/electronics13081436.

[4] X. Luo, Z. Deng, B. Yang, and M. Y. Luo, “Pre-trained language models in medicine: A survey,” Artificial Intelligence in

Medicine, vol. 154, p. 102904, Aug. 2024, https://doi.org/10.1016/j.artmed.2024.102904.

[5] H. Wang, J. Li, H. Wu, E. Hovy, and Y. Sun, “Pre-Trained Language Models and Their Applications,” Engineering, vol. 25, pp.

51–65, jun 2023, https://doi.org/10.1016/j.eng.2022.04.024.

[6] X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, “Pre-trained models for natural language processing: A survey,” Science

China Technological Sciences, vol. 63, pp. 1872–1897, 10 2020, https://doi.org/10.1007/s11431-020-1647-3.

[7] H.W. Chung, L. Hou, S. Longpre, B. Zoph, T. Yi,W. Fedus, Y. Li, X.Wang, M. Dehghani, and S. Brahma, “Scaling instructionfinetuned

language models,” Journal of Machine Learning Research, vol. 25, pp. 1–53, 2024.

[8] H. Wang, J. Li, H. Wu, E. Hovy, and Y. Sun, “Pre-trained language models and their applications,” Engineering, vol. 25, pp.

51–65, 6 2023, https://doi.org/10.1016/j.eng.2022.04.024.

[9] W. Zheng, S. Lu, Z. Cai, R. Wang, L. Wang, and L. Yin, “PAL-BERT: An Improved Question Answering Model,” Computer

Modeling in Engineering & Sciences, vol. 139, no. 3, pp. 2729–2745, 2024, https://doi.org/10.32604/cmes.2023.046692.

[10] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “Indolem and indobert: A benchmark dataset and pre-trained language model for

indonesian nlp,” in Proceedings of the 28th International Conference on Computational Linguistics. International Committee

on Computational Linguistics, 2020, pp. 757–770, https://doi.org/10.18653/v1/2020.coling-main.66.

[11] B. Wilie, K. Vincentio, G. Winata, X. Li, Z. Lim, S. Soleman, R. Mahendra, P. Fung, S. Bahar, and A. Purwarianti, “Indonlu:

Benchmark and resources for evaluating indonesian natural language understanding,” in Proceedings of the 1st Conference of

the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th, 3 2020, pp. 843–857, https://doi.org/

10.48550/arXiv.2009.05387.

[12] F. J. Muis and A. Purwarianti, “Sequence-to-sequence learning for indonesian automatic question generator,” in 2020 7th International

Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA). IEEE, 9 2020, pp. 1–6,

https://doi.org/10.1109/ICAICTA49861.2020.9429032.

[13] S. Moon, H. He, H. Jia, H. Liu, and J. W. Fan, “Extractive clinical question-answering with multianswer and multifocus

questions: Data set development and evaluation study,” JMIR AI, vol. 2, p. e41818, 6 2023, https://doi.org/10.2196/41818.

[14] B. Richardson and A. Wicaksana, “Comparison of indobert-lite and roberta in text mining for indonesian language question

answering application,” vol. 18, no. 06, pp. 1719–1734, July, 2022, https://doi.org/10.24507/ijicic.18.06.1719.

[15] W. Suwarningsih, R. A. Pramata, F. Y. Rahadika, and M. H. A. Purnomo, “RoBERTa: Language modelling in building Indonesian

question-answering systems,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 20, no. 6, p.

1248, dec 2022, https://doi.org/10.12928/telkomnika.v20i6.24248.

[16] G. N. Ahmad and A. Romadhony, “End-to-end question answering system for indonesian documents using tf-idf and indobert,”

in 2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA). IEEE, 10

2023, pp. 1–6, https://doi.org/10.1109/ICAICTA59291.2023.10390111.

[17] J. H. Clark, E. Choi, M. Collins, D. Garrette, T. Kwiatkowski, V. Nikolaev, and J. Palomaki, “T y di qa:a benchmark for

information-seeking question answering in ty pologically di verse languages,” Transactions of the Association for Computational

Linguistics, vol. 8, pp. 454–470, Dec. 2020, https://doi.org/10.1162/tacl a 00317.

Downloads

Published

2026-03-11

Issue

Section

Articles

How to Cite

[1]
F. A. I. Suhendra, A. Darmayantie, A. S. Suhendra, and Pa Pa Min, “Comparative Analysis of Indonesian Pre-trained BERT Models for the Extractive Question Answering Task on an Indonesian-Translated SQuAD Dataset”, MATRIK, vol. 25, no. 2, pp. 311–322, Mar. 2026, doi: 10.30812/matrik.v25i2.5847.