Essay auto-scoring using N-Gram and Jaro Winkler based Indonesian Typos
DOI:
https://doi.org/10.30812/matrik.v22i2.2473Keywords:
Automation, Spelling error detection and correction, N-Gram, Jaro WinklerAbstract
Writing errors on e-essay exams reduce scores. Thus, detecting and correcting errors automatically in writing answers is necessary. The implementation of Levenshtein Distance and N-Gram can detect writing errors. However, this process needed a long time because of the distance method used. Therefore, this research aims to hybrid Jaro Winker and N-Gram methods to detect and correct writing errors automatically. This process required preprocessing and finding the best word recommendations by the Jaro Winkler method, which refers to Kamus Besar Bahasa Indonesia (KBBI). The N-Gram method refers to the corpus. The final scoring used the Vector Space Model (VSM) method based on the similarity of words between the answer keys and the respondent’s answers. Datasets used 115 answers from 23 respondents with some writing errors. The results of Jaro Winkler and N-Gram methods are good in detecting and correcting Indonesian words with the accuracy of detection averages of 83.64% (minimum of 57.14% and maximum of 100.00%). In contrast, the error correction accuracy averages 78.44% (minimum of 40.00% and maximum of 100.00%). However, Natural Language Processing (NLP) needs to improve these results for word recommendations.
Downloads
References
[2] D. Ramesh and S. K. Sanampudi, “An automated essay scoring systems: a systematic literature review,†Artif. Intell. Rev., vol. 55, no. 3, pp. 2495–2527, Mar. 2022, doi: 10.1007/s10462-021-10068-2.
[3] R. Fitri and A. N. Asyikin, “Aplikasi Penilaian Ujian Essay Otomatis Menggunakan Metode Cosine Similarity,†J. Poros Tek., vol. 7, no. 2, pp. 88–94, 2015, doi: 10.31961/porosteknik.v7i2.218.
[4] M. A. Hussein, H. Hassan, and M. Nassef, “Automated language essay scoring systems: a literature review,†PeerJ Comput. Sci., vol. 5, no. August, pp. 1–28, Aug. 2019, doi: 10.7717/peerj-cs.208.
[5] N. Süzen, A. N. Gorban, J. Levesley, and E. M. Mirkes, “Automatic short answer grading and feedback using text mining methods,†in Procedia Computer Science, 2020, pp. 726–743. doi: 10.1016/j.procs.2020.02.171.
[6] E. Hartati and M. Mardiana, “Evaluasi Penerapan Computer Based Test (CBT) sebagai Upaya Perbaikan Sistem pada Ujian Nasional untuk Sekolah Terpencil di Sumatera Selatan,†MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 18, no. 1, pp. 58–64, Nov. 2018, doi: 10.30812/matrik.v18i1.321.
[7] S. Link, M. Mehrzad, and M. Rahimi, “Impact of automated writing evaluation on teacher feedback, student revision, and writing improvement,†Comput. Assist. Lang. Learn., vol. 35, no. 4, pp. 605–634, May 2022, doi: 10.1080/09588221.2020.1743323.
[8] M. Zhu, O. L. Liu, and H.-S. Lee, “The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing,†Comput. Educ., vol. 143, no. January, pp. 1–43, Jan. 2020, doi: 10.1016/j.compedu.2019.103668.
[9] E. Lindgren, A. Westum, H. Outakoski, and K. P. H. Sullivan, “Revising at the Leading Edge: Shaping Ideas or Clearing up Noise,†in Observing Writing, BRILL, 2019, pp. 346–365. doi: 10.1163/9789004392526_017.
[10] S. J. Putra, T. Mantoro, and M. N. Gunawan, “Text mining for Indonesian translation of the Quran: A systematic review,†in 2017 International Conference on Computing, Engineering, and Design (ICCED), Nov. 2017, pp. 1–5. doi: 10.1109/CED.2017.8308122.
[11] I. Ganguli, R. S. Bhowmick, and J. Sil, “Deep Insights of Erroneous Bengali–English Code-Mixed Bilingual Language,†IETE J. Res., pp. 1–12, Jun. 2021, doi: 10.1080/03772063.2021.1934125.
[12] D. Deksne, “Bidirectional LSTM Tagger for Latvian Grammatical Error Detection,†in Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science, vol. 11697, 2019, pp. 58–68. doi: 10.1007/978-3-030-27947-9_5.
[13] W. Wei and Y. (Katherine) Cao, “Written Corrective Feedback Strategies Employed by University English Lecturers: A Teacher Cognition Perspective,†SAGE Open, vol. 10, no. 3, pp. 1–12, Jul. 2020, doi: 10.1177/2158244020934886.
[14] J. L. Hernández, F. M. Molina, and Ã. Almela, “Analysis of Context-Dependent Errors in the Medical Domain in Spanish: A Corpus-Based Study,†SAGE Open, vol. 13, no. 1, pp. 1–11, Jan. 2023, doi: 10.1177/21582440221148454.
[15] J. Zhang, C. Wang, A. Muthu, and V. M. Varatharaju, “Computer multimedia assisted language and literature teaching using Heuristic hidden Markov model and statistical language model,†Comput. Electr. Eng., vol. 98, no. March, p. 107715, Mar. 2022, doi: 10.1016/j.compeleceng.2022.107715.
[16] P. Samanta and B. B. Chaudhuri, “A simple real-word error detection and correction using local word bigram and trigram,†in Proceedings of the 25th Conference on Computational Linguistics and Speech Processing ({ROCLING} 2013), Oct. 2013, pp. 211–220.
[17] D. Sudigyo, A. A. Hidayat, R. Nirwantono, R. Rahutomo, J. P. Trinugroho, and B. Pardamean, “Literature study of stunting supplementation in Indonesian utilizing text mining approach,†in Procedia Computer Science, 2023, pp. 722–729. doi: 10.1016/j.procs.2022.12.189.
[18] A. Musyafa, Y. Gao, A. Solyman, C. Wu, and S. Khan, “Automatic Correction of Indonesian Grammatical Errors Based on Transformer,†Appl. Sci., vol. 12, no. 20, pp. 1–17, Oct. 2022, doi: 10.3390/app122010380.
[19] A. Bannayeva and M. Aslanov, “Development of the N-gram Model for Azerbaijani Language,†in 2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT), Oct. 2020, pp. 1–5. doi: 10.1109/AICT50176.2020.9368645.
[20] C. Lai, “Fast Retrieval Algorithm of English Sentences Based on Artificial Intelligence Machine Translation,†in In: Atiquzzaman, M., Yen, N., Xu, Z. (eds) 2021 International Conference on Big Data Analytics for Cyber-Physical System in Smart City. BDCPS 2021. Lecture Notes on Data Engineering and Communications Technologies, 2022, pp. 1057–1065. doi: 10.1007/978-981-16-7466-2_117.
[21] F. Friendly, “Jaro–Winkler Distance Improvement For Approximate String Search Using Indexing Data For Multiuser Application,†J. Phys. Conf. Ser., vol. 1361, no. 1, pp. 1–7, Nov. 2019, doi: 10.1088/1742-6596/1361/1/012080.
[22] Y. Rochmawati and R. Kusumaningrum, “Studi Perbandingan Algoritma Pencarian String dalam Metode Approximate String Matching untuk Identifikasi Kesalahan Pengetikan Teks,†J. Buana Inform., vol. 7, no. 2, pp. 125–134, Jan. 2016, doi: 10.24002/jbi.v7i2.491.
[23] P. Pitchandi and M. Balakrishnan, “Document clustering analysis with aid of adaptive Jaro Winkler with Jellyfish search clustering algorithm,†Adv. Eng. Softw., vol. 175, no. January, p. 103322, Jan. 2023, doi: 10.1016/j.advengsoft.2022.103322.
[24] D. A. Anggoro and I. Nurfadilah, “Active Verb Spell Checking Mem- + P in Indonesian Language Using the Jaro-Winkler Distance Algorithm,†Iraqi J. Sci., vol. 63, no. 4, pp. 1811–1822, Apr. 2022, doi: 10.24996/ijs.2022.63.4.38.
[25] F. Shole, “Perbandingan Metode Smoothing Untuk Deteksi Dan Koreksi Kesalahan Kata Dalam Teks Berbahasa Indonesia,†Unikom Repos. Diploma thesis, Univ. Komput. Indones., vol. 63, no. 4, pp. 1811–1822, 2018.
[26] I. Ahamed, M. Jahan, Z. Tasnim, T. Karim, S. M. S. Reza, and D. A. Hossain, “Spell corrector for Bangla language using Norvig’s algorithm and Jaro-Winkler distance,†Bull. Electr. Eng. Informatics, vol. 10, no. 4, pp. 1997–2005, Aug. 2021, doi: 10.11591/eei.v10i4.2410.
[27] A. M. Fanani and S. Suyanto, “Syllabification Model of Indonesian Language Named-Entity Using Syntactic n-Gram,†in Procedia Computer Science, 2021, pp. 721–727. doi: 10.1016/j.procs.2021.01.058.
[28] H. Jayadianti, W. Kaswidjanti, A. T. Utomo, S. Saifullah, F. A. Dwiyanto, and R. Drezewski, “Sentiment analysis of Indonesian reviews using fine-tuning IndoBERT and R-CNN,†Ilk. J. Ilm., vol. 14, no. 3, pp. 348–354, 2022, doi: 10.33096/ilkom.v14i3.1505.348-354.
[29] P. S. Br Ginting, B. Irawan, and C. Setianingsih, “Hate Speech Detection on Twitter Using Multinomial Logistic Regression Classification Method,†in 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), Nov. 2019, pp. 105–111. doi: 10.1109/IoTaIS47347.2019.8980379.
[30] Y. Fauziah, S. Saifullah, and A. S. Aribowo, “Design Text Mining for Anxiety Detection using Machine Learning based-on Social Media Data during COVID-19 pandemic,†in Proceeding of LPPM UPN “Veteran†Yogyakarta Conference Series 2020–Engineering and Science Series, 2020, pp. 253–261. doi: 10.31098/ess.v1i1.117.
[31] S. Saifullah, Y. Fauziyah, and A. S. Aribowo, “Comparison of machine learning for sentiment analysis in detecting anxiety based on social media data,†J. Inform., vol. 15, no. 1, pp. 45–55, Feb. 2021, doi: 10.26555/jifo.v15i1.a20111.
[32] V. C. M., R. Rudy, and D. S. Naga, “Fast and Accurate Spelling Correction Using Trie and Damerau-levenshtein Distance Bigram,†TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 16, no. 2, pp. 827–833, Apr. 2018, doi: 10.12928/telkomnika.v16i2.6890.
[33] A. Indriani, M. Muhammad, S. Suprianto, and H. Hadriansa, “Implementasi Jaccard Index dan N-Gram Pada Rekayasa Aplikasi Koreksi Kata Berbahasa Indonesia,†Sebatik, vol. 22, no. 2, pp. 95–101, Dec. 2018, doi: 10.46984/sebatik.v22i2.314.
[34] K. Chang, “5 Text Analysis (NLP) Buzzwords for Market Research,†Kai Analytics, 2019.
[35] A. A. P. Ratna, R. Sanjaya, T. Wirianata, and P. Dewi Purnamasari, “Word level auto-correction for latent semantic analysis based essay grading system,†in 2017 15th International Conference on Quality in Research (QiR) : International Symposium on Electrical and Computer Engineering, Jul. 2017, pp. 235–240. doi: 10.1109/QIR.2017.8168488.
[36] I. E. Agbehadji, H. Yang, S. Fong, and R. Millham, “The Comparative Analysis of Smith-Waterman Algorithm with Jaro-Winkler Algorithm for the Detection of Duplicate Health Related Records,†in 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD), Aug. 2018, pp. 1–10. doi: 10.1109/ICABCD.2018.8465458.
[37] T. Tinaliah and T. Elizabeth, “Perbandingan Hasil Deteksi Plagiarisme Dokumen dengan Metode Jaro-Winkler Distance dan Metode Latent Semantic Analysis,†J. Teknol. dan Sist. Komput., vol. 6, no. 1, pp. 7–12, Jan. 2018, doi: 10.14710/jtsiskom.6.1.2018.7-12.
[38] Y. Yulianingsih, “Implementasi Algoritma Jaro-Winkler dan Levenstein Distance dalam Pencarian Data pada Database,†STRING (Satuan Tulisan Ris. dan Inov. Teknol., vol. 2, no. 1, pp. 18–27, Aug. 2017, doi: 10.30998/string.v2i1.1720.
[39] D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed. 2021.
[40] C. Slamet, A. R. Atmadja, D. S. Maylawati, R. S. Lestari, W. Darmalaksana, and M. A. Ramdhani, “Automated Text Summarization for Indonesian Article Using Vector Space Model,†in IOP Conference Series: Materials Science and Engineering, Jan. 2018, pp. 1–6. doi: 10.1088/1757-899X/288/1/012037.
[41] M. E. Sulistyo, R. Saptono, and A. Asshidiq, “Penilaian Ujian Bertipe Essay Menggunakan Metode Text Similarity,†Telematika, vol. 12, no. 2, pp. 146–158, Jul. 2015, doi: 10.31315/telematika.v12i2.1422.
[42] S. Saifullah, N. H. Cahyana, Y. Fauziah, A. S. Aribowo, F. A. Dwiyanto, and R. Drezewski, “Text Annotation Automation for Hate Speech Detection using SVM-classifier based on Feature Extraction,†in International Conference on Advanced Research in Engineering and Technology, 2022.
[43] T. Tundo and S. Saifullah, “Fuzzy Inference System Mamdani dalam Prediksi Produksi Kain Tenun Menggunakan Rule Berdasarkan Random Tree,†J. Teknol. Inf. dan Ilmu Komput., vol. 9, no. 3, pp. 443–451, Jun. 2022, doi: 10.25126/jtiik.2022924212.
[44] M. R. Pratama and M. Yunus, “Sistem Deteksi Struktur Kalimat Bahasa Arab Menggunakan Algoritma Light Stemming,†MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 19, no. 1, pp. 109–118, Nov. 2019, doi: 10.30812/matrik.v19i1.509.
[45] N. H. Cahyana, S. Saifullah, Y. Fauziah, A. S. Aribowo, and R. Drezewski, “Semi-supervised Text Annotation for Hate Speech Detection using K-Nearest Neighbors and Term Frequency-Inverse Document Frequency,†Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 10, pp. 147–151, 2022, doi: 10.14569/IJACSA.2022.0131020.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Yuniar Farida, Afanin Hamidah, Silvia Kartika Sari, Lutfi Hakim, Modeling the Farmer Exchange Rate in Indonesia Using the Vector Error Correction Model Method , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Rahmaddeni Rahmaddeni, M. Teguh Wicaksono, Denok Wulandari, Agustriono Agustriono, Sang Adji Ibrahim, Enhancing Multiple Linear Regression with Stacking Ensemble for Dissolved Oxygen Estimation , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Hepatika Zidny Ilmadina, Muhammad Naufal, Dega Surono Wibowo, Drowsiness Detection Based on Yawning Using Modified Pre-trained Model MobileNetV2 and ResNet50 , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
- Nenny Anggraini, Zulkifli Zulkifli, Nashrul Hakiem, Development of Smart Charity Box Monitoring Robot in Mosque with Internet of Things and Firebase using Raspberry Pi , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- I Gusti Ayu Agung Diatri Indradewi, Ni Wayan Sumartini Saraswati, Ni Wayan Wardani, COVID-19 Chest X-Ray Detection Performance Through Variations of Wavelets Basis Function , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 1 (2021)
- Firda Yunita Sari, Maharani sukma Kuntari, Hani Khaulasari, Winda Ari Yati, Comparison of Support Vector Machine Performance with Oversampling and Outlier Handling in Diabetic Disease Detection Classification , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
- Ni Gusti Ayu Dasriani, Sirojul Hadi, Moch Syahrir, Intelligent System for Internet of Things-Based Building Fire Safety with Naive Bayes Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Bambang Krismono Triwijoyo, SEGMENTASI CITRA PEMBULUH DARAH RETINA MENGGUNAKAN METODE DETEKSI GARIS MULTI SKALA , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 15 No. 1 (2015)
- Tukino Paryono, Ahmad Fauzi, Rizki Aulia Nanda, Saepul Aripiyanto, Muhammad Khaerudin, Detecting Vehicle Numbers Using Google Lens-Based ESP32CAM to Read Number Characters , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 3 (2023)
- Aini Suri Talita, Aristiawan Wiguna, Implementasi Algoritma Long Short-Term Memory (LSTM) Untuk Mendeteksi Ujaran Kebencian (Hate Speech) Pada Kasus Pilpres 2019 , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 1 (2019)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Suhirman Suhirman, Shoffan Saifullah, Ahmad Tri Hidayat, Rr Hajar Puji Sejati, Otsu Method for Chicken Egg Embryo Detection based-on Increase Image Quality , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)