Thesis Topic Modeling Study: Latent Dirichlet Allocation (LDA) and Machine Learning Approach

Hairani Hairani; Mengas Janhasmadja; Abu Tholib; Juvinal Ximenes Guterres; Yuri Ariyanto

doi:10.30812/ijecsa.v3i2.4375

Thesis Topic Modeling Study: Latent Dirichlet Allocation (LDA) and Machine Learning Approach

Authors

Hairani Hairani Universitas Bumigora, Mataram, Indonesia
Mengas Janhasmadja Universitas Bumigora, Mataram, Indonesia
Abu Tholib Universitas Nurul Jadid, Probolinggo, Indonesia
Juvinal Ximenes Guterres Universidade Oriental Timur Lorosaâ€™e, Dili, Timor Leste
Yuri Ariyanto Politeknik Negeri Malang, Malang, Indonesia

DOI:

https://doi.org/10.30812/ijecsa.v3i2.4375

Keywords:

Latent Diriclet Allocation, Machine Learning Approach, Thesis Topic, Topic Modelling

Abstract

The thesis reports housed in the campus repository have yet to be analyzed to reveal valuable knowledge patterns. Analyzing trends in thesis research topics can facilitate the selection of research topics, aid in mapping research areas, and identify underexplored topics.Therefore, this research aims to model and classify thesis topics using Latent Dirichlet Allocation (LDA) and the Naã¯ve Bayes and Support Vector Machine (SVM) methods. This study employs the LDA method for thesis topic modeling, while SVM and Naã¯ve Bayes are used for classifying these topics. The research results show that LDA successfully modeled five of the most popular thesis topics, namely two related to computer networks, two on software engineering, and one on multimedia. For thesis topic classification, the SVM method demonstrated higher accuracy than Naã¯ve Bayes, reaching 92.80% after the data was balanced using Synthetic Minority Oversampling Technique (SMOTE). The implication of this study is that the topic modeling approach using LDA is able to identify dominant thesis topics. In addition, the SVM classification results obtained better accuracy than Naã¯ve Bayes in the thesis topic classification task.

References

<p align="justify">
L. P. I. Kharisma, Muh. Fahrurrozi, and Khairunnazri, “Sistem Informasi Repositori Skripsi Berbasis Web pada STMIK Syaikh Zainuddin NW Anjani,” TEKNIMEDIA: Teknologi Informasi dan Multimedia, vol. 1, no. 1, pp. 53–58, May 2020. <a href="https://doi.org/10.46764/teknimedia.v1i1.15">https://doi.org/10.46764/teknimedia.v1i1.15</a>.

R. F. Nasution, R. Sayekti, and R. Devianty, “Meningkatkan Pemanfaatan Institutional Repository Perpustakaan Institut Agama Islam Negeri (IAIN) Padangsidimpuan,” Lentera Pustaka: Jurnal Kajian Ilmu Perpustakaan, Informasi dan Kearsipan, vol. 8, no. 2, pp. 109–122, Dec. 2022. <a href="https://doi.org/10.14710/lenpust.v8i2.44801">https://doi.org/10.14710/lenpust.v8i2.44801</a>.

S. Hong, T. Park, and J. Choi, “Analyzing Research Trends in University Student Experience Based on Topic Modeling,” Sustainability, vol. 12, no. 9, pp. 1-11, Apr. 2020. <a href="https://doi.org/10.3390/su12093570">https://doi.org/10.3390/su12093570</a>.

Andre, N. Suciati, H. Fabroyir, and E. Pardede, “Educational Data Mining Clustering Approach: Case Study
of Undergraduate Student Thesis Topic,” IEEE Access, vol. 11, pp. 130 072–130 088, 2023. <a href="https://doi.org/10.1109/ACCESS.2023.3332818">https://doi.org/10.1109/ACCESS.2023.3332818</a>.

S. H. Mohammed and S. Al-augby, “LSA & LDA topic modeling classification: comparison study on e-books,”
Indonesian Journal of Electrical Engineering and Computer Science, vol. 19, no. 1, pp. 353-362, Jul. 2020. <a href="http://doi.org/10.11591/ijeecs.v19.i1.pp353-362">http://doi.org/10.11591/ijeecs.v19.i1.pp353-362</a>.

X. Li and M. F. Rosas, “Graduation Thesis Topic Recommendation Based on Neural Network,” in Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022), B. Fox, C. Zhao, and M. T. Anthony, Eds. Dordrecht: Atlantis Press International BV, 2023, vol. 9, pp. 409–414, series Title: Atlantis Highlights in Computer Sciences. <a href="https://doi.org/10.2991/978-94-6463-040-4_62">https://doi.org/10.2991/978-94-6463-040-4_62</a>.

H. Hairani, A. Anggrawan, A. I. Wathan, K. A. Latif, K. Marzuki, and M. Zulfikri, “The Abstract of Thesis Classifier by Using Naive Bayes Method,” in 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM). Pekan, Malaysia: IEEE, Aug. 2021, pp. 312–315. <a href="https://doi.org/10.1109/ICSECS52883.2021.00063">https://doi.org/10.1109/ICSECS52883.2021.00063</a>.

S.-W. Kim and J.-M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Humancentric Computing and Information Sciences, vol. 9, no. 1, pp. 1-21, Dec. 2019. <a href="https://doi.org/10.1186/s13673-019-0192-7">https://doi.org/10.1186/s13673-019-0192-7</a>.

E. M. S. Rochman, I. O. Suzanti, I. Imamah, M. A. Syakur, D. R. Anamisa, A. Khozaimi, and A. Rachmad, “Classification of Thesis Topics Based on Informatics Science Using SVM,” IOP Conference Series: Materials Science and Engineering, vol.1125, no. 1, pp. 1-6, May 2021. <a href="https://doi.org/10.1088/1757-899X/1125/1/012033">https://doi.org/10.1088/1757-899X/1125/1/012033</a>.

E. Hokijuliandy, H. Napitupulu, and Firdaniza, “Application of SVM and Chi-Square Feature Selection for Sentiment Analysis of Indonesias National Health Insurance Mobile Application,” Mathematics, vol. 11, no. 17, pp. 1-21, Sep. 2023. <a href="https://doi.org/10.3390/math11173765">https://doi.org/10.3390/math11173765</a>.

D. Meng and Y. Li, “An imbalanced learning method by combining SMOTE with Center Offset Factor,” Applied Soft Computing, vol. 120, p. 108618, May 2022. <a href="https://doi.org/10.1016/j.asoc.2022.108618">https://doi.org/10.1016/j.asoc.2022.108618</a>.

H. Hairani and M. Mujahid, “Recommendations of Thesis Supervisor using the Cosine Similarity Method,” SISTEMASI, vol. 11, no. 3, pp. 646-654, Sep. 2022. <a href="https://doi.org/10.32520/stmsi.v11i3.2003">https://doi.org/10.32520/stmsi.v11i3.2003</a>.

M. M. Adankon and M. Cheriet, “Support Vector Machine,” in Encyclopedia of Biometrics, S. Z. Li and A. Jain, Eds. Boston, MA: Springer US, 2009, pp. 1303–1308. <a href="https://doi.org/10.1007/978-0-387-73003-5_299">https://doi.org/10.1007/978-0-387-73003-5_299</a>.

D. Saini, T. Chand, D. K. Chouhan, and M. Prakash, “A comparative analysis of automatic classification and grading methods for knee osteoarthritis focussing on X-ray images,” Biocybernetics and Biomedical Engineering, vol. 41, no. 2, pp. 419–444, Apr. 2021. <a href="https://doi.org/10.1016/j.bbe.2021.03.002">https://doi.org/10.1016/j.bbe.2021.03.002</a>.

G. F. M. d. Souza, A. Caminada Netto, A. H. D. A. Melani, M. A. D. C. Michalski, and R. F. d. Silva, Reliability analysis and asset management of engineering systems, ser. Advances in reliability science. Amsterdam, Netherlands ; Cambridge, MA: Elsevier, 2022.

Y. Zhang, Y. Zhou, and J. Yao, “Feature extraction with tf-idf and game-theoretic shadowed sets,” Information
Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 722-733., 2020. <a href="https://doi.org/10.1088/1757-899X/1125/1/012033">https://doi.org/10.1088/1757-899X/1125/1/012033</a>.

H. Hairani, A. S. Suweleh, and D. Susilowaty, “Penanganan Ketidak Seimbangan Kelas Menggunakan Pendekatan Level Data,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 20, no. 1, pp. 109–116, Sep. 2020. <a href="https://doi.org/10.30812/matrik.v20i1.846">https://doi.org/10.30812/matrik.v20i1.846</a>.

N. Santoso, W. Wibowo, and H. Hikmawati, “Integration of synthetic minority oversampling technique for imbalanced class,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 13, no. 1, pp. 102–108, Jan. 2019. <a href="http://doi.org/10.11591/ijeecs.v13.i1.pp102-108">http://doi.org/10.11591/ijeecs.v13.i1.pp102-108</a>.

N. Chamidah and R. Sahawaly, “Comparison support vector machine and naive bayes methods for classifying cyberbullying in twitter,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI, vol. 7, no. 2, pp. 338–346, 2021. <a href="http://dx.doi.org/10.26555/jiteki.v7i2.21175">http://dx.doi.org/10.26555/jiteki.v7i2.21175</a>.
</p>

Downloads

Published

2024-09-03

How to Cite

[1]

H. Hairani, M. Janhasmadja, A. Tholib, J. Ximenes Guterres, and Y. Ariyanto, “Thesis Topic Modeling Study: Latent Dirichlet Allocation (LDA) and Machine Learning Approach”, IJECSA, vol. 3, no. 2, pp. 51–60, Sep. 2024.

Download Citation

Issue

Vol. 3 No. 2 (2024): September 2024

Section

Articles

Most read articles by the same author(s)

I Nyoman Switrayana, Diki Ashadi, Hairani Hairani, Afrig Aminuddin, Sentiment Analysis and Topic Modeling of Kitabisa Applications using Support Vector Machine (SVM) and Smote-Tomek Links Methods , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 2 No. 2 (2023): September 2023
Hairani Hairani, Juvinal Ximenes Guterres, Exploring Customer Purchasing Patterns: A Study Utilizing FP-Growth Algorithm on Supermarket Transaction Data , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 3 No. 1 (2024): March 2024
Abu Tholib, M Noer Fadli Hidayat, Supri yono, Resty Wulanningrum, Erna Daniati, Comparison of C4.5 and Naive Bayes for Predicting Student Graduation Using Machine Learning Algorithms , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 2 No. 2 (2023): September 2023
Hairani Hairani, Lilik Nurhayati, Muhammad Innuddin, Web-Based Application for Toddler Nutrition Classification Using C4.5 Algorithm , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 1 No. 2 (2022): September 2022
Anthony Anggrawan, Hairani Hairani, M. Ade Candra, Prediction of Electricity Usage with Back-propagation Neural Network , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 1 No. 1 (2022): March 2022
Christopher Michael Lauw, Hairani Hairani, Ilham Saifuddin, Juvinal Ximenes Guterres, Muhammad Maariful Huda, Mayadi Mayadi, Combination of Smote and Random Forest Methods for Lung Cancer Classification , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 2 No. 2 (2023): September 2023
Ramadhanti Ramadhanti, Hairani Hairani, Muhammad Innuddin, Electric Vehicle Sales-Prediction Application Using Backpropagation Algorithm Based on Web , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 2 No. 2 (2023): September 2023
Riosatria Riosatria, Hairani Hairani, Anthony Anggrawan, Moch. Syahrir, Enhancing Mental Illness Predictions: Analyzing Trends Using Multiple Linear Regression and Neural Network Backpropagation , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 3 No. 2 (2024): September 2024
Dias Nabila Huda, Anthony Anggrawan, Hairani Hairani, Clustering Analysis of Umrah Pilgrim Data Based on the K-Medoid Method , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 3 No. 2 (2024): September 2024

Thesis Topic Modeling Study: Latent Dirichlet Allocation (LDA) and Machine Learning Approach

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Similar Articles

Information

sidebarmenu

submit

tools

Current Issue