Analyzing the Application of Optical Character Recognition: A Case Study in International Standard Book Number Detection

Imam Fahrur Rozi; Ahmadi Yuli Ananta; Endah Septa Sintiya; Astrifidha Rahma Amalia; Yuri Ariyanto; Arin Kistia Nugraeni

doi:10.30812/matrik.v24i2.4367

Authors

Imam Fahrur Rozi Politeknik Negeri Malang, Malang, Indonesia
Ahmadi Yuli Ananta Politeknik Negeri Malang, Malang, Indonesia
Endah Septa Sintiya Politeknik Negeri Malang, Malang, Indonesia
Astrifidha Rahma Amalia Politeknik Negeri Malang, Malang, Indonesia
Yuri Ariyanto Politeknik Negeri Malang
Arin Kistia Nugraeni Politeknik Negeri Malang, Malang, Indonesia

DOI:

https://doi.org/10.30812/matrik.v24i2.4367

Keywords:

Book Number Detection, International Standard, Optical Character Recognition

Abstract

In the era of advanced education, assessing lecturer performance is crucial to maintaining educational quality. One aspect of this assessment involves evaluating the textbooks authored by lecturers. This study addresses the problem of efficiently detecting International Standard Book Numbers (ISBNs) within these textbooks using optical character recognition (OCR) as a potential solution. The objective is to determine the effectiveness of OCR, specifically the Tesseract platform, in facilitating ISBN detection to support lecturer performance assessments. The research method involves automated data collection and ISBN detection using Tesseract OCR on various sections of textbooks, including covers, tables of contents, and identity pages, across different file formats (JPG and PDF) and orientations. The study evaluates OCR performance concerning image quality, rotation, and file type. Results of this study indicate that Tesseract performs effectively on high-quality, low-noise JPG images, achieving an F1 score of 0.97 for JPG and 0.99 for PDF files. However, its performance decreases with rotated images and certain PDF conditions, highlighting specific limitations of OCR in ISBN detection. These findings suggest that OCR can be a valuable tool in enhancing lecturer performance assessments through efficient ISBN detection in textbooks.

Downloads

Download data is not yet available.

References

[1] S. M. Setiana, I. M. Yukasih, M. Dirgandini, and D. S. Halibanon, “The Effect of Lecturer Certification on Improving Teaching Performance: A Case Study of Japanese Language Lecturers in West Java,” in Proceeding of International Conference on Business, Economics, Social Sciences, and Humanities, Universitas Komputer Indonesia, Mar. 2023, pp. 396–400. doi: 10.34010/icobest.v4i.393.
[2] U. Rahardja, N. Lutfiani, A. Setiani Rafika, and E. Purnama Harahap, “Determinants of Lecturer Performance to Enhance Accreditation in Higher Education,” in 2020 8th International Conference on Cyber and IT Service Management (CITSM), IEEE, Oct. 2020, pp. 1–7. doi: 10.1109/CITSM50537.2020.9268871.
[3] A. F. Wulandari, A. Winarno, B. S. Luturlean, and F. Nur, “Explaining Gender in Moderating the Effect of Competency, Work Discipline and Job Satisfaction on Lecturer Performance,” Al-Tanzim: Jurnal Manajemen Pendidikan Islam, vol. 8, no. 2, pp. 650–663, May 2024, doi: 10.33650/al-tanzim.v8i2.7193.
[4] F. Riandari, H. T. Sihotang, and H. Husain, “Forecasting the Number of Students in Multiple Linear Regressions,” MATRIK: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 2, pp. 249–256, 2022, doi: 10.30812/matrik.v21i2.1348.
[5] J. Memon, M. Sami, R. A. Khan, and M. Uddin, “Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR),” IEEE Access, vol. 8, pp. 142642–142668, 2020, doi: 10.1109/ACCESS.2020.3012542.
[6] S. Drobac and K. Lindén, “Optical character recognition with neural networks and post-correction with finite state methods,” International Journal on Document Analysis and Recognition (IJDAR), vol. 23, no. 4, pp. 279–295, Dec. 2020, doi: 10.1007/s10032-020-00359-9.
[7] R. M. Ahmed et al., “Kurdish Handwritten character recognition using deep learning techniques,” Gene Expression Patterns, vol. 46, p. 119278, Dec. 2022, doi: 10.1016/j.gep.2022.119278.
[8] M. Li et al., “TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models,” in The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23), 2023, pp. 13094–13102. doi: https://doi.org/10.48550/arXiv.2109.10282.
[9] C. Clausner, A. Antonacopoulos, and S. Pletschacher, “Efficient and effective OCR engine training,” International Journal on Document Analysis and Recognition (IJDAR), vol. 23, no. 1, pp. 73–88, Mar. 2020, doi: 10.1007/s10032-019-00347-8.
[10] S. Dome and A. P. Sathe, “Optical Charater Recognition using Tesseract and Classification,” in 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), IEEE, Mar. 2021, pp. 153–158. doi: 10.1109/ESCI50559.2021.9397008.
[11] T. Hegghammer, “OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment,” Journal of Computational Social Science, vol. 5, no. 1, pp. 861–882, May 2022, doi: 10.1007/s42001-021-00149-1.
[12] N. Anwar, T. Khan, and A. F. Mollah, “Text Detection from Scene and Born Images: How Good is Tesseract?,” in Recent Trends in Communication and Intelligent Systems, Singapore: Springer, May 2022, pp. 115–122. doi: 10.1007/978-981-19-1324-2_13.
[13] A. D. R N, S. Chinta, N. K. Ashili, B. S. Babu, R. R. Vydugula, and R. S. VSL, “An Intelligent Invoice Processing System Using Tesseract OCR,” in 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), IEEE, Apr. 2024, pp. 1–6. doi: 10.1109/ADICS58448.2024.10533509.
[14] A. Benaissa, A. Bahri, A. El Allaoui, and M. Abdelouahab Salahddine, “Build a Trained Data of Tesseract OCR engine for Tifinagh Script Recognition,” Data and Metadata, vol. 2, p. 185, Dec. 2023, doi: 10.56294/dm2023185.
[15] Tarun, T. Chauhan, and Varsha, “The Efficacy of Tesseract OCR: Insights from a Practical Application Study,” in 11th International Conference on Cutting-Edge Developments in Engineering Technology and Science, ICCDETS, May 2024, pp. 1601–1605. doi: 10.62919/hdsg3874.
[16] T. T. H. Nguyen, A. Jatowt, M. Coustaty, and A. Doucet, “Survey of Post-OCR Processing Approaches,” ACM Computing Surveys, vol. 54, no. 6, pp. 1–37, Jul. 2022, doi: 10.1145/3453476.
[17] D. Khairani, D. A. Bangkit, N. F. Rozi, S. U. Masruroh, S. Oktaviana, and T. Rosyadi, “Named-Entity Recognition and Optical Character Recognition for Detecting Halal Food Ingredients: Indonesian Case Study,” in 2022 10th International Conference on Cyber and IT Service Management (CITSM), IEEE, Sep. 2022, pp. 01–05. doi: 10.1109/CITSM56380.2022.9935966.
[18] L. Jianyang, B. Junrong, L. Bingjin, F. Zhiang, and Z. Su, “The Character Recognition Method Based on OCR,” in 2023 26th ACIS International Winter Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD-Winter), IEEE, Jul. 2023, pp. 92–95. doi: 10.1109/SNPD-Winter57765.2023.10223979.
[19] K. Olejniczak and M. Å ulc, “Text Detection Forgot About Document OCR,” in CEUR Workshop Proceedings, CEUR Workshop Proceedings, 2023.
[20] L. Jain, M. J. Wilber, and T. E. Boult, “Issues in Rotational (Non-)invariance and Image Preprocessing,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Jun. 2013, pp. 76–83. doi: 10.1109/CVPRW.2013.19.
[21] P. Wang, J. Qiao, and N. Liu, “An Improved Convolutional Neural Network-Based Scene Image Recognition Method,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–10, Jun. 2022, doi: 10.1155/2022/3464984.
[22] J. He, Z. Zhang, H. Zhao, and J. Yang, “ACP- based Circular target image Rotation normalization system,” in 2023 4th International Conference on Computer Vision, Image and Deep Learning (CVIDL), IEEE, May 2023, pp. 17–20. doi: 10.1109/CVIDL58838.2023.10166580.
[23] D. Purwanto and A. Agustiyar, “GLOBAL THRESHOLDING IMPLEMENTATION FOR NOISE HANDLING IN DIGITAL IMAGE RECOGNITION,” Jurnal Transformatika, vol. 21, no. 2, p. 93, Jan. 2024, doi: 10.26623/transformatika.v21i2.8713.

Analyzing the Application of Optical Character Recognition: A Case Study in International Standard Book Number Detection

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

Similar Articles

Quick Menu

tools

whatsapp

citation

statistik

Current Issue