Educational Data Mining: Multiple Choice Question Classification in Vocational School

Keywords: Education data mining, Difficult index, Classification, Multiple choice questions, Vocational School

Abstract

Data mining on student learning outcomes in the education sector can overcome this problem. This research aimed to provide a solution for selecting quality multiple choice questions (MCQ) using the results of students’ mid-semester exams in vocational high schools using a Data Mining approach. The research method used was the Cross-Industry Standard Process for Machine Learning (CRISP-ML) model. Steps to assess the accuracy of analyzing the difficulty level of questions based on student profile data and midterm test results. The data used in this research were the findings of basic computer tests on mid-term exams in mathematics disciplines at vocational high schools. This research used several classification algorithms, including SVM, Naive Bayes, Random Forest, Decision Three, Linear Regression, and KNN. The results of evaluating the classification

Downloads

Download data is not yet available.

References

[1] P. Garca-Alcaraz, V. Martnez-Loya, J. L. Garca-Alcaraz, and C. Snchez-Ramrez, “The Role of ICT in Educational
Innovation,” 2019, pp. 143–165, https://doi.org/10.1007/978-3-319-93716-8 7. [Online]. Available: http://link.springer.com/
10.1007/978-3-319-93716-8f g7
[2] D. D. Prasetya, A. P. Wibawa, T. Hirashima, and Y. Hayashi, “Digital Content Model for E-Learning System
in Higher Education,” in 2019 International Conference on Electrical, Electronics and Information Engineering
(ICEEIE). IEEE, Oct. 2019, pp. 192–196, https://doi.org/10.1109/ICEEIE47180.2019.8981461. [Online]. Available:
https://ieeexplore.ieee.org/document/8981461/
[3] D. D. Prasetya, H. W. Herwanto, and W. S. G.I, “Design of Web-Based Interactive Whiteboard Application to Facilitate Online
Learning,” Proceedings of Vocational Engineering International Conference, vol. 5, pp. 118–122, 2023. [Online]. Available:
https://proceeding.unnes.ac.id/veic/article/view/2819
[4] J. Bergner, J. J. Filzen, and M. G. Simkin, “Why use multiple choice questions with excess information?” Journal
of Accounting Education, vol. 34, pp. 1–12, Mar. 2016, https://doi.org/10.1016/j.jaccedu.2015.11.008. [Online]. Available:
https://linkinghub.elsevier.com/retrieve/pii/S074857511500086X
[5] C. Zilles, M. West, G. Herman, and T. Bretl, “Every University Should Have a Computer-Based Testing Facility,”
in Proceedings of the 11th International Conference on Computer Supported Education. SCITEPRESS - Science
and Technology Publications, 2019, pp. 414–420, https://doi.org/10.5220/0007753304140420. [Online]. Available:
http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0007753304140420
[6] R. Efendi, L. S. Lesmana, F. Putra, E. Yandani, and R. A. Wulandari, “Design and Implementation of Computer Based Test
(CBT) in vocational education,” Journal of Physics: Conference Series, vol. 1764, no. 1, p. 012068, Feb. 2021, https://doi.org/
10.1088/1742-6596/1764/1/012068. [Online]. Available: https://iopscience.iop.org/article/10.1088/1742-6596/1764/1/012068
[7] R. Carpenter and T. Alloway, “Computer Versus Paper-Based Testing: Are They Equivalent When it Comes
to Working Memory?” Journal of Psychoeducational Assessment, vol. 37, no. 3, pp. 382–394, Jun. 2019,
https://doi.org/10.1177/0734282918761496. [Online]. Available: http://journals.sagepub.com/doi/10.1177/0734282918761496
[8] C. Xu, G. Zhu, J. Ye, and J. Shu, “Educational Data Mining: Dropout Prediction in XuetangX MOOCs,” Neural Processing
Letters, vol. 54, no. 4, pp. 2885–2900, Aug. 2022, https://doi.org/10.1007/s11063-022-10745-5. [Online]. Available:
https://link.springer.com/10.1007/s11063-022-10745-5
[9] M. B. Priyantono, M. Ahnan, M. A. Widhianto, and D. D. Prasetyo, “Optimasi Sistem Pelabelan Topik
Skripsi menggunakan Algoritma Naive Bayes dengan Pendekatan Design Thinking,” Jurnal Edukasi dan Penelitian
Informatika (JEPIN), vol. 8, no. 1, p. 168, Apr. 2022, https://doi.org/10.26418/jp.v8i1.50702. [Online]. Available:
https://jurnal.untan.ac.id/index.php/jepin/article/view/50702
[10] M. Zheng and D. Bender, “Evaluating outcomes of computer-based classroom testing: Student acceptance and impact on
learning and exam performance,” Medical Teacher, vol. 41, no. 1, pp. 75–82, Jan. 2019, https://doi.org/10.1080/0142159X.
2018.1441984. [Online]. Available: https://www.tandfonline.com/doi/full/10.1080/0142159X.2018.1441984
[11] J. M. Harley, N. M. Lou, Y. Liu, M. Cutumisu, L. M. Daniels, J. P. Leighton, and L. Nadon, “University students’ negative
emotions in a computer-based examination: the roles of trait test-emotion, prior test-taking methods and gender,” Assessment
& Evaluation in Higher Education, vol. 46, no. 6, pp. 956–972, Aug. 2021, https://doi.org/10.1080/02602938.2020.1836123.
[Online]. Available: https://www.tandfonline.com/doi/full/10.1080/02602938.2020.1836123
[12] S. Julaeha, T. Hidayat, and N. Y. Rustaman, “Development of web-based three tier multiple choice test to measure student’s
tree thinking; try out,” Journal of Physics: Conference Series, vol. 1521, no. 4, p. 042024, Apr. 2020, https://doi.org/10.1088/
1742-6596/1521/4/042024. [Online]. Available: https://iopscience.iop.org/article/10.1088/1742-6596/1521/4/042024
[13] E. Kim, “A study on the difficulty adjustment of programming language multiple-choice problems using machine
learning,” Journal of Korea Society of Industrial Information Systems, vol. 27, no. 2, pp. 11–24, 2022, https:
//doi.org/10.9723/JKSIIS.2022.27.2.011. [Online]. Available: http://dx.
[14] F.-Y. Yu and C.-Y. Chen, “Student- versus teacher-generated explanations for answers to online multiple-choice questions: What
are the differences?” Computers & Education, vol. 173, p. 104273, Nov. 2021, https://doi.org/10.1016/j.compedu.2021.104273.
[15] M. Yan and Y. Pan, “Meta-learning for compressed language model: A multiple choice question answering study,”
Neurocomputing, vol. 487, pp. 181–189, May 2022, https://doi.org/10.1016/j.neucom.2021.01.148. [Online]. Available:
https://linkinghub.elsevier.com/retrieve/pii/S0925231221016441
[16] A. P. Kumar, A. Nayak, M. S. K., S. Goyal, and Chaitanya, “A novel approach to generate distractors for Multiple Choice
Questions,” Expert Systems with Applications, vol. 225, p. 120022, Sep. 2023, https://doi.org/10.1016/j.eswa.2023.120022.
[Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0957417423005249
[17] M.Wasim, M. N. Asim, M. U. Ghani Khan, andW. Mahmood, “Multi-label biomedical question classification for lexical answer
type prediction,” Journal of Biomedical Informatics, vol. 93, p. 103143, May 2019, https://doi.org/10.1016/j.jbi.2019.103143.
[Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1532046419300619
[18] H. Yu, C. Liu, L. Zhang, C. Wu, G. Liang, J. Escorcia-Gutierrez, and O. A. Ghoneim, “An intent classification
method for questions in ”Treatise on Febrile diseases” based on TinyBERT-CNN fusion model,” Computers in Biology
and Medicine, vol. 162, p. 107075, Aug. 2023, https://doi.org/10.1016/j.compbiomed.2023.107075. [Online]. Available:
https://linkinghub.elsevier.com/retrieve/pii/S0010482523005401
[19] K. Xue, V. Yaneva, C. Runyon, and P. Baldwin, “Predicting the Difficulty and Response Time of Multiple Choice
Questions Using Transfer Learning,” in Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building
Educational Applications. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 193–197,
https://doi.org/10.18653/v1/2020.bea-1.20. [Online]. Available: https://www.aclweb.org/anthology/2020.bea-1.20
[20] Z. Qiu, X.Wu, andW. Fan, “Question Difficulty Prediction for Multiple Choice Problems in Medical Exams,” in Proceedings of
the 28th ACM International Conference on Information and Knowledge Management. New York, NY, USA: ACM, Nov. 2019,
pp. 139–148, https://doi.org/10.1145/3357384.3358013. [Online]. Available: https://dl.acm.org/doi/10.1145/3357384.3358013
[21] D. Kumar, R. Jaipurkar, A. Shekhar, G. Sikri, and V. Srinivas, “Item analysis of multiple choice questions: A quality
assurance test for an assessment tool,” Medical Journal Armed Forces India, vol. 77, pp. S85–S89, Feb. 2021, https:
//doi.org/10.1016/j.mjafi.2020.11.007. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0377123720302379
[22] S. Studer, T. B. Bui, C. Drescher, A. Hanuschkin, L. Winkler, S. Peters, and K.-R. Mller, “Towards CRISP-ML(Q): A Machine
Learning Process Model with Quality Assurance Methodology,” Machine Learning and Knowledge Extraction, vol. 3, no. 2,
pp. 392–413, Apr. 2021, https://doi.org/10.3390/make3020020. [Online]. Available: https://www.mdpi.com/2504-4990/3/2/20
[23] I. Kolyshkina and S. Simoff, “Interpretability of Machine Learning Solutions in Public Healthcare: The CRISP-ML
Approach,” Frontiers in Big Data, vol. 4, May 2021, https://doi.org/10.3389/fdata.2021.660206. [Online]. Available:
https://www.frontiersin.org/articles/10.3389/fdata.2021.660206/full
[24] I. N. Switrayana, D. Ashadi, H. Hairani, and A. Aminuddin, “Sentiment Analysis and Topic Modeling of Kitabisa Applications
using Support Vector Machine (SVM) and Smote-Tomek Links Methods,” International Journal of Engineering and Computer
Science Applications (IJECSA), vol. 2, no. 2, pp. 81–91, Sep. 2023, https://doi.org/10.30812/ijecsa.v2i2.3406. [Online].
Available: https://journal.universitasbumigora.ac.id/index.php/IJECSA/article/view/3406
[25] D. D. Prasetya, A. Prasetya Wibawa, and T. Hirashima, “The performance of text similarity algorithms,” International Journal
of Advances in Intelligent Informatics, vol. 4, no. 1, p. 63, Mar. 2018, https://doi.org/10.26555/ijain.v4i1.152. [Online].
Available: http://ijain.org/index.php/IJAIN/article/view/152
[26] H. Hairani and T. Widiyaningtyas, “Augmented Rice Plant Disease Detection with Convolutional Neural Networks,”
INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi, vol. 8, no. 1, pp. 27–39, Feb. 2024, https:
//doi.org/10.29407/intensif.v8i1.21168. [Online]. Available: https://ojs.unpkediri.ac.id/index.php/intensif/article/view/21168
[27] I. Saifudin and T. Widiyaningtyas, “Systematic Literature Review on Recommender System: Approach, Problem, Evaluation
Techniques, Datasets,” IEEE Access, vol. 12, pp. 19 827–19 847, 2024, https://doi.org/10.1109/ACCESS.2024.3359274.
[Online]. Available: https://ieeexplore.ieee.org/document/10415424/
[28] T. Widiyaningtyas, I. Hidayah, and T. B. Adji, “Comparing User Rating-Based Similarity to User Behavior-Based
Similarity in Movie Recommendation Systems,” in 2022 International Conference on Electrical and Information
Technology (IEIT). IEEE, Sep. 2022, pp. 52–58, https://doi.org/10.1109/IEIT56384.2022.9967884. [Online]. Available:
https://ieeexplore.ieee.org/document/9967884/
[29] A. Parhizkar, G. Tejeddin, and T. Khatibi, “Student performance prediction using datamining classification algorithms:
Evaluating generalizability of models from geographical aspect,” Education and Information Technologies, vol. 28,
no. 11, pp. 14 167–14 185, Nov. 2023, https://doi.org/10.1007/s10639-022-11560-0. [Online]. Available: https:
//link.springer.com/10.1007/s10639-022-11560-0
[30] J. L. D’Sa and M. L. Visbal-Dionaldo, “Analysis of Multiple Choice Questions: Item Difficulty, Discrimination
Index and Distractor Efficiency,” International Journal of Nursing Education, vol. 9, no. 3, p. 109, 2017,
https://doi.org/10.5958/0974-9357.2017.00079.4. [Online]. Available: http://www.indianjournals.com/ijor.aspx?target=ijor:
ijonef&gvolume=9f&gissue=3f&garticle=024
[31] S. H. Hasanah and E. Julianti, “Analysis of CART and Random Forest on Statistics Student Status at Universitas Terbuka,”
INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi, vol. 6, no. 1, pp. 56–65, Feb. 2022, https:
//doi.org/10.29407/intensif.v6i1.16156. [Online]. Available: https://ojs.unpkediri.ac.id/index.php/intensif/article/view/16156
[32] Sucipto, Kusrini, and Emha Luthfi Taufiq, “Classification method of multi-class on C4.5 algorithm for fish diseases,”
in 2016 2nd International Conference on Science in Information Technology (ICSITech). IEEE, Oct. 2016, pp. 5–9,
https://doi.org/10.1109/ICSITech.2016.7852598. [Online]. Available: http://ieeexplore.ieee.org/document/7852598/
Matrik: Jurnal
Published
2024-03-16
How to Cite
Sucipto, S., Dwi Prasetya, D., & Widiyaningtyas, T. (2024). Educational Data Mining: Multiple Choice Question Classification in Vocational School. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 23(2), 379-388. https://doi.org/https://doi.org/10.30812/matrik.v23i2.3499
Section
Articles