Comparison of k-Nearest Neighbor and Naive Bayes Methods for SNP Data Classification
DOI:
https://doi.org/10.30812/matrik.v22i1.1758Keywords:
Klasifikasi, k-Nearest Neighbor, Naive Bayes, Single Nucleotide PolymorphismAbstract
In an accident, sometimes the identity of a person who has an accident is hard to know, so it is necessary to use biological data such as Single Nucleotide Polymorphism (SNP) data to identify the person's origin. This research aims to compare the accuracy and the F1 score of the k-Nearest Neighbor method and the Naive Bayes method in classifying SNP data from 120 people who divide into groups, namely European (CEU) and Yoruba (YRI). Determination of the best method based on the average value of accuracy and the average value of F1 score from 1000 iterations with various percentage distributions of training datasets and testing datasets. In this research, the selection of SNP locations for the classification process was carried out by correlation analysis. The average accuracy obtained for the k-Nearest Neighbor method with the value of k=31 is 98.38% where the average F1 score is 98.39% while the Naive Bayes method obtained the average accuracy of 96.74% and the average F1 score of 96.63%. In this case, the k-Nearest Neighbor method is better than the Naive Bayes method in classifying SNP data to determine the origin of a person's ancestor tends to be from CEU or YRI.
Downloads
References
[2] R. Mathur, B. S. Rana, and A. K. Jha, “Single Nucleotide Polymorphism (SNP),â€, Encyclopedia of Animal Cognition and Behavior. United States of America: Springer, Cham, 2018.
[3] X. Ding and X. Guo, “A Survey of SNP Data Analysis,†Journal Big Data Mining and Analytics, vol. 1, no. 3, pp. 173–190, 2018.
[4] A. A. Komar, Methods Single Nucleotide Polymorphisms - Methods and Protocols, 2nd ed. United States of America: Humana Press, 2009.
[5] J. Ren et al., “Genetic Diversity Revealed by Single Nucleotide Polymorphism Markers in a Worldwide Germplasm Collection of Durum Wheat,†International Journal of Molecular Sciences, vol. 14, no. 4, pp. 7061–7088, 2013.
[6] E. Salwati, S. Handayani, and R. P. Jekti, “Identifikasi Single Nucleotide Polymorphism ( SNP ) Gen pvmdr1 pada Penderita Malaria Vivaks di Minahasa Tenggara ( Sulawesi Utara ),†Jurnal Biotek Medisiana Indonesia, vol. 3, no. 2, pp. 49–57, 2014.
[7] A. Putri and S. Wathon, “Aplikasi Single Nucleotide Polymorphism (SNP) dalam Studi Farmakogenomik untuk Pengembangan Obat,†Jurnal BioTrends, vol. 9, no. 2, pp. 69–74, 2018.
[8] Triwani and I. Saleh, “Single Nucleotide Polymorphism Promoter -765g /C Gen Cox-2 sebagai Faktor Risiko Terjadinya Karsinoma Kolorektal,†Biomedical Journal of Indonesia, vol. 1, no. 1, pp. 2–10, 2015.
[9] V. D. M. Butarbutar, A. Setiawan, and T. Mahatma, “Analisis Data SNP (Single Nucleotide Polymorphism) dengan Metode Chi-Square,†in Prosiding Seminar Nasional Matematika dan Pendidikan Matematika (Sendika) 2020, 2020, vol. 6, no. 1, pp. 97–103.
[10] J. Gaudillo et al., “Machine Learning Approach to Single Nucleotide Polymorphism-based Asthma Prediction,†Journal PLOS ONE, vol. 14, no. 12, pp. 1–12, 2019.
[11] F. Bertolini et al., “Preselection Statistics and Random Forest Classification Identify Population Informative Single Nucleotide Polymorphisms in Cosmopolitan and Sutochthonous Cattle Breeds,†Journal Animal, vol. 12, no. 1, pp. 12–19, 2018.
[12] N. Batnyam, A. Gantulga, and S. Oh, “An Efficient Classification for Single Nucleotide Polymorphism (SNP) Dataset,â€, Computer and Information Science. Studies in Computational Intelligence. Heidelberg: Springer, 2013, pp. 171–185.
[13] S. N. Kamalina, 2018. Identifikasi Single Nucleotide Polymorphism (SNP) pada Genom Kedelai Menggunakan Algoritme C5.0. Skripsi Sarjana. Institut Pertanian Bogor.
[14] Paiman, Korelasi dan Regresi Ilmu-Ilmu Pertanian. Yogyakarta: UPY Press, 2019.
[15] P. Schober, L. A. Schwarte, and C. Boer, “Correlation Coefficients: Appropriate Use and Interpretation,†Journal Anesthesia and Analgesia, vol. 126, no. 5, pp. 1763–1768, 2018.
[16] D. Napitupulu et al., “Analysis of Student Satisfaction Toward Quality of Service Facility,†Journal of Physics: Conference Series, vol. 954, pp. 1–7, 2018.
[17] M. R. Faisal and D. T. Nugrahedi, Belajar Data Science: Klasifikasi dengan Bahasa Pemrograman R. Banjarbaru: Scripta Cendekia, 2019.
[18] R. T. Vulandari, Data Mining Teori dan Aplikasi Rapidminer. Yogyakarta: Gava Media, 2017.
[19] D. Berrar, “Bayes’ Theorem and Naive Bayes Classifier,â€, Encyclopedia of Bioinformatics and Computational Biology, vol. 1, Elsevier, 2018, pp. 403–412.
[20] F. Gorunescu, Data Mining Concepts, Models and Techniques. New York: Springer-Verlag Berlin Heidelberg, 2011.
[21] D. Srianto and E. Mulyanto, “Perbandingan K-Nearest Neighbor Dan Naive Bayes,†Jurnal Techno.COM, vol. 15, no. 3, pp. 241–245, 2016.
[22] A. Indriani, “Analisa Perbandingan Metode Naive Bayes Classifier dan K-Nearest Neigbor terhadap Klasifikasi Data,†Jurnal SEBATIK, vol. 24, no. 1, pp. 1–7, 2020.
[23] R. N. Devita, H. W. Herwanto, and A. P. Wibawa, “Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa Indonesia,†Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 4, pp. 427–434, 2018.
[24] A. Ashari, I. Paryudi, and A. M. Tjou, “Performance Comparison between Naïve Bayes, Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool,†International Journal of Advanced Computer Science and Applications, vol. 4, no. 11, pp. 33–39, 2013.
[25] M. K. Anam, B. N. Pikir, M. B. Firdaus, S. Erlinda, and Agustin, “Penerapan Na ̈ıve Bayes Classifier, K-Nearest Neighbor (KNN) dan Decision Tree untuk Menganalisis Sentimen pada Interaksi Netizen dan Pemeritah,†MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 1, pp. 139–150, 2021.
[26] Y. F. Safri, R. Arifudin, and M. A. Muslim, “K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor,†Scientific Journal of Informatics, vol. 5, no. 1, pp. 10–18, 2018.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- yusri ikhwani, As`ary Ramadhan, Muhammad Bahit, Taufik Hidayat Faesal, Single elimination tournament design using dynamic programming algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Fadhilah Dwi Ananda, Yoga Pristyanto, Analisis Sentimen Pengguna Twitter Terhadap Layanan Internet Provider Menggunakan Algoritma Support Vector Machine , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 20 No. 2 (2021)
- Wahyu Styo Pratama, Didik Dwi Prasetya, Triyanna Widyaningtyas, Muhammad Zaki Wiryawan, Lalu Ganda Rady Putra, Tsukasa Hirashima, Performance Evaluation of Artificial Intelligence Models for Classification in Concept Map Quality Assessment , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Purnawarman Musa, Eri Prasetyo Wibowo, Saiful Bahri Musa, Iqbal Baihaqi, Pelican Crossing System for Control a Green Man Light with Predicted Age , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- Lalu Ganda Rady Putra, Anthony Anggrawan, Pengelompokan Penerima Bantuan Sosial Masyarakat dengan Metode K-Means , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 1 (2021)
- Hadi Santoso, Hilyah Magdalena, Helna Wardhana, Aplikasi Dynamic Cluster pada K-Means BerbasisWeb untuk Klasifikasi Data Industri Rumahan , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 3 (2022)
- Muhammad Ibnu Choldun Rachmatullah, The Application of Repeated SMOTE for Multi Class Classification on Imbalanced Data , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Mochamad Wahyudi, Firmansyah Firmansyah, Analisis Performa Open Shortest Path First Load Balancing dengan Metode Cost Manipulation , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 3 (2022)
- Zilvanhisna Emka Fitri, Lalitya Nindita Sahenda, Sulton Mubarok, Abdul Madjid, Arizal Mujibtamala Nanda Imron, Implementing K-Nearest Neighbor to Classify Wild Plant Leaf as a Medicinal Plants , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Muhamad Nur Gunawan, Titi Farhanah, Siti Ummi Masruroh, Ahmad Mukhlis Jundulloh, Nafdik Zaydan Raushanfikar, Rona Nisa Sofia Amriza, Accuracy of K-Nearest Neighbors Algorithm Classification For Archiving Research Publications , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 3 (2024)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Vikky Aprelia Windarni, Adi Setiawan, Atina Rahmatalia, Comparison of the Karney Polygon Method and the Shoelace Method for Calculating Area , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
.png)











