Analyzing the Application of Optical Character Recognition: A Case Study in International Standard Book Number Detection

  • Imam Fahrur Rozi Politeknik Negeri Malang, Malang, Indonesia
  • Ahmadi Yuli Ananta Politeknik Negeri Malang, Malang, Indonesia
  • Endah Septa Sintiya Politeknik Negeri Malang, Malang, Indonesia
  • Astrifidha Rahma Amalia Politeknik Negeri Malang, Malang, Indonesia
  • Yuri Ariyanto Politeknik Negeri Malang
  • Arin Kistia Nugraeni Politeknik Negeri Malang, Malang, Indonesia
Keywords: Book Number Detection, International Standard, Optical Character Recognition

Abstract

In the era of advanced education, assessing lecturer performance is crucial to maintaining educational quality. One aspect of this assessment involves evaluating the textbooks authored by lecturers. This study addresses the problem of efficiently detecting International Standard Book Numbers (ISBNs) within these textbooks using optical character recognition (OCR) as a potential solution. The objective is to determine the effectiveness of OCR, specifically the Tesseract platform, in facilitating ISBN detection to support lecturer performance assessments. The research method involves automated data collection and ISBN detection using Tesseract OCR on various sections of textbooks, including covers, tables of contents, and identity pages, across different file formats (JPG and PDF) and orientations. The study evaluates OCR performance concerning image quality, rotation, and file type. Results of this study indicate that Tesseract performs effectively on high-quality, low-noise JPG images, achieving an F1 score of 0.97 for JPG and 0.99 for PDF files. However, its performance decreases with rotated images and certain PDF conditions, highlighting specific limitations of OCR in ISBN detection. These findings suggest that OCR can be a valuable tool in enhancing lecturer performance assessments through efficient ISBN detection in textbooks.

Downloads

Download data is not yet available.

References

[1] S. M. Setiana, I. M. Yukasih, M. Dirgandini, and D. S. Halibanon, “The Effect of Lecturer Certification on Improving Teaching Performance: A Case Study of Japanese Language Lecturers in West Java,” in Proceeding of International Conference on Business, Economics, Social Sciences, and Humanities, Universitas Komputer Indonesia, Mar. 2023, pp. 396–400. doi: 10.34010/icobest.v4i.393.
[2] U. Rahardja, N. Lutfiani, A. Setiani Rafika, and E. Purnama Harahap, “Determinants of Lecturer Performance to Enhance Accreditation in Higher Education,” in 2020 8th International Conference on Cyber and IT Service Management (CITSM), IEEE, Oct. 2020, pp. 1–7. doi: 10.1109/CITSM50537.2020.9268871.
[3] A. F. Wulandari, A. Winarno, B. S. Luturlean, and F. Nur, “Explaining Gender in Moderating the Effect of Competency, Work Discipline and Job Satisfaction on Lecturer Performance,” Al-Tanzim: Jurnal Manajemen Pendidikan Islam, vol. 8, no. 2, pp. 650–663, May 2024, doi: 10.33650/al-tanzim.v8i2.7193.
[4] F. Riandari, H. T. Sihotang, and H. Husain, “Forecasting the Number of Students in Multiple Linear Regressions,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 2, pp. 249–256, 2022, doi: 10.30812/matrik.v21i2.1348.
[5] J. Memon, M. Sami, R. A. Khan, and M. Uddin, “Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR),” IEEE Access, vol. 8, pp. 142642–142668, 2020, doi: 10.1109/ACCESS.2020.3012542.
[6] S. Drobac and K. Lindén, “Optical character recognition with neural networks and post-correction with finite state methods,” International Journal on Document Analysis and Recognition (IJDAR), vol. 23, no. 4, pp. 279–295, Dec. 2020, doi: 10.1007/s10032-020-00359-9.
[7] R. M. Ahmed et al., “Kurdish Handwritten character recognition using deep learning techniques,” Gene Expression Patterns, vol. 46, p. 119278, Dec. 2022, doi: 10.1016/j.gep.2022.119278.
[8] M. Li et al., “TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models,” in The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23), 2023, pp. 13094–13102. doi: https://doi.org/10.48550/arXiv.2109.10282.
[9] C. Clausner, A. Antonacopoulos, and S. Pletschacher, “Efficient and effective OCR engine training,” International Journal on Document Analysis and Recognition (IJDAR), vol. 23, no. 1, pp. 73–88, Mar. 2020, doi: 10.1007/s10032-019-00347-8.
[10] S. Dome and A. P. Sathe, “Optical Charater Recognition using Tesseract and Classification,” in 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), IEEE, Mar. 2021, pp. 153–158. doi: 10.1109/ESCI50559.2021.9397008.
[11] T. Hegghammer, “OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment,” Journal of Computational Social Science, vol. 5, no. 1, pp. 861–882, May 2022, doi: 10.1007/s42001-021-00149-1.
[12] N. Anwar, T. Khan, and A. F. Mollah, “Text Detection from Scene and Born Images: How Good is Tesseract?,” in Recent Trends in Communication and Intelligent Systems, Singapore: Springer, May 2022, pp. 115–122. doi: 10.1007/978-981-19-1324-2_13.
[13] A. D. R N, S. Chinta, N. K. Ashili, B. S. Babu, R. R. Vydugula, and R. S. VSL, “An Intelligent Invoice Processing System Using Tesseract OCR,” in 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), IEEE, Apr. 2024, pp. 1–6. doi: 10.1109/ADICS58448.2024.10533509.
[14] A. Benaissa, A. Bahri, A. El Allaoui, and M. Abdelouahab Salahddine, “Build a Trained Data of Tesseract OCR engine for Tifinagh Script Recognition,” Data and Metadata, vol. 2, p. 185, Dec. 2023, doi: 10.56294/dm2023185.
[15] Tarun, T. Chauhan, and Varsha, “The Efficacy of Tesseract OCR: Insights from a Practical Application Study,” in 11th International Conference on Cutting-Edge Developments in Engineering Technology and Science, ICCDETS, May 2024, pp. 1601–1605. doi: 10.62919/hdsg3874.
[16] T. T. H. Nguyen, A. Jatowt, M. Coustaty, and A. Doucet, “Survey of Post-OCR Processing Approaches,” ACM Computing Surveys, vol. 54, no. 6, pp. 1–37, Jul. 2022, doi: 10.1145/3453476.
[17] D. Khairani, D. A. Bangkit, N. F. Rozi, S. U. Masruroh, S. Oktaviana, and T. Rosyadi, “Named-Entity Recognition and Optical Character Recognition for Detecting Halal Food Ingredients: Indonesian Case Study,” in 2022 10th International Conference on Cyber and IT Service Management (CITSM), IEEE, Sep. 2022, pp. 01–05. doi: 10.1109/CITSM56380.2022.9935966.
[18] L. Jianyang, B. Junrong, L. Bingjin, F. Zhiang, and Z. Su, “The Character Recognition Method Based on OCR,” in 2023 26th ACIS International Winter Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD-Winter), IEEE, Jul. 2023, pp. 92–95. doi: 10.1109/SNPD-Winter57765.2023.10223979.
[19] K. Olejniczak and M. Šulc, “Text Detection Forgot About Document OCR,” in CEUR Workshop Proceedings, CEUR Workshop Proceedings, 2023.
[20] L. Jain, M. J. Wilber, and T. E. Boult, “Issues in Rotational (Non-)invariance and Image Preprocessing,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Jun. 2013, pp. 76–83. doi: 10.1109/CVPRW.2013.19.
[21] P. Wang, J. Qiao, and N. Liu, “An Improved Convolutional Neural Network-Based Scene Image Recognition Method,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–10, Jun. 2022, doi: 10.1155/2022/3464984.
[22] J. He, Z. Zhang, H. Zhao, and J. Yang, “ACP- based Circular target image Rotation normalization system,” in 2023 4th International Conference on Computer Vision, Image and Deep Learning (CVIDL), IEEE, May 2023, pp. 17–20. doi: 10.1109/CVIDL58838.2023.10166580.
[23] D. Purwanto and A. Agustiyar, “GLOBAL THRESHOLDING IMPLEMENTATION FOR NOISE HANDLING IN DIGITAL IMAGE RECOGNITION,” Jurnal Transformatika, vol. 21, no. 2, p. 93, Jan. 2024, doi: 10.26623/transformatika.v21i2.8713.
Published
2025-02-03
How to Cite
Rozi, I., Ananta, A., Sintiya, E., Amalia, A., Ariyanto, Y., & Nugraeni, A. (2025). Analyzing the Application of Optical Character Recognition: A Case Study in International Standard Book Number Detection. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 24(2), 195-206. https://doi.org/https://doi.org/10.30812/matrik.v24i2.4367
Section
Articles