Comparison of k-Nearest Neighbor and Naive Bayes Methods for SNP Data Classification
DOI:
https://doi.org/10.30812/matrik.v22i1.1758Keywords:
Klasifikasi, k-Nearest Neighbor, Naive Bayes, Single Nucleotide PolymorphismAbstract
In an accident, sometimes the identity of a person who has an accident is hard to know, so it is necessary to use biological data such as Single Nucleotide Polymorphism (SNP) data to identify the person's origin. This research aims to compare the accuracy and the F1 score of the k-Nearest Neighbor method and the Naive Bayes method in classifying SNP data from 120 people who divide into groups, namely European (CEU) and Yoruba (YRI). Determination of the best method based on the average value of accuracy and the average value of F1 score from 1000 iterations with various percentage distributions of training datasets and testing datasets. In this research, the selection of SNP locations for the classification process was carried out by correlation analysis. The average accuracy obtained for the k-Nearest Neighbor method with the value of k=31 is 98.38% where the average F1 score is 98.39% while the Naive Bayes method obtained the average accuracy of 96.74% and the average F1 score of 96.63%. In this case, the k-Nearest Neighbor method is better than the Naive Bayes method in classifying SNP data to determine the origin of a person's ancestor tends to be from CEU or YRI.
Downloads
References
[2] R. Mathur, B. S. Rana, and A. K. Jha, “Single Nucleotide Polymorphism (SNP),â€, Encyclopedia of Animal Cognition and Behavior. United States of America: Springer, Cham, 2018.
[3] X. Ding and X. Guo, “A Survey of SNP Data Analysis,†Journal Big Data Mining and Analytics, vol. 1, no. 3, pp. 173–190, 2018.
[4] A. A. Komar, Methods Single Nucleotide Polymorphisms - Methods and Protocols, 2nd ed. United States of America: Humana Press, 2009.
[5] J. Ren et al., “Genetic Diversity Revealed by Single Nucleotide Polymorphism Markers in a Worldwide Germplasm Collection of Durum Wheat,†International Journal of Molecular Sciences, vol. 14, no. 4, pp. 7061–7088, 2013.
[6] E. Salwati, S. Handayani, and R. P. Jekti, “Identifikasi Single Nucleotide Polymorphism ( SNP ) Gen pvmdr1 pada Penderita Malaria Vivaks di Minahasa Tenggara ( Sulawesi Utara ),†Jurnal Biotek Medisiana Indonesia, vol. 3, no. 2, pp. 49–57, 2014.
[7] A. Putri and S. Wathon, “Aplikasi Single Nucleotide Polymorphism (SNP) dalam Studi Farmakogenomik untuk Pengembangan Obat,†Jurnal BioTrends, vol. 9, no. 2, pp. 69–74, 2018.
[8] Triwani and I. Saleh, “Single Nucleotide Polymorphism Promoter -765g /C Gen Cox-2 sebagai Faktor Risiko Terjadinya Karsinoma Kolorektal,†Biomedical Journal of Indonesia, vol. 1, no. 1, pp. 2–10, 2015.
[9] V. D. M. Butarbutar, A. Setiawan, and T. Mahatma, “Analisis Data SNP (Single Nucleotide Polymorphism) dengan Metode Chi-Square,†in Prosiding Seminar Nasional Matematika dan Pendidikan Matematika (Sendika) 2020, 2020, vol. 6, no. 1, pp. 97–103.
[10] J. Gaudillo et al., “Machine Learning Approach to Single Nucleotide Polymorphism-based Asthma Prediction,†Journal PLOS ONE, vol. 14, no. 12, pp. 1–12, 2019.
[11] F. Bertolini et al., “Preselection Statistics and Random Forest Classification Identify Population Informative Single Nucleotide Polymorphisms in Cosmopolitan and Sutochthonous Cattle Breeds,†Journal Animal, vol. 12, no. 1, pp. 12–19, 2018.
[12] N. Batnyam, A. Gantulga, and S. Oh, “An Efficient Classification for Single Nucleotide Polymorphism (SNP) Dataset,â€, Computer and Information Science. Studies in Computational Intelligence. Heidelberg: Springer, 2013, pp. 171–185.
[13] S. N. Kamalina, 2018. Identifikasi Single Nucleotide Polymorphism (SNP) pada Genom Kedelai Menggunakan Algoritme C5.0. Skripsi Sarjana. Institut Pertanian Bogor.
[14] Paiman, Korelasi dan Regresi Ilmu-Ilmu Pertanian. Yogyakarta: UPY Press, 2019.
[15] P. Schober, L. A. Schwarte, and C. Boer, “Correlation Coefficients: Appropriate Use and Interpretation,†Journal Anesthesia and Analgesia, vol. 126, no. 5, pp. 1763–1768, 2018.
[16] D. Napitupulu et al., “Analysis of Student Satisfaction Toward Quality of Service Facility,†Journal of Physics: Conference Series, vol. 954, pp. 1–7, 2018.
[17] M. R. Faisal and D. T. Nugrahedi, Belajar Data Science: Klasifikasi dengan Bahasa Pemrograman R. Banjarbaru: Scripta Cendekia, 2019.
[18] R. T. Vulandari, Data Mining Teori dan Aplikasi Rapidminer. Yogyakarta: Gava Media, 2017.
[19] D. Berrar, “Bayes’ Theorem and Naive Bayes Classifier,â€, Encyclopedia of Bioinformatics and Computational Biology, vol. 1, Elsevier, 2018, pp. 403–412.
[20] F. Gorunescu, Data Mining Concepts, Models and Techniques. New York: Springer-Verlag Berlin Heidelberg, 2011.
[21] D. Srianto and E. Mulyanto, “Perbandingan K-Nearest Neighbor Dan Naive Bayes,†Jurnal Techno.COM, vol. 15, no. 3, pp. 241–245, 2016.
[22] A. Indriani, “Analisa Perbandingan Metode Naive Bayes Classifier dan K-Nearest Neigbor terhadap Klasifikasi Data,†Jurnal SEBATIK, vol. 24, no. 1, pp. 1–7, 2020.
[23] R. N. Devita, H. W. Herwanto, and A. P. Wibawa, “Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa Indonesia,†Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 4, pp. 427–434, 2018.
[24] A. Ashari, I. Paryudi, and A. M. Tjou, “Performance Comparison between Naïve Bayes, Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool,†International Journal of Advanced Computer Science and Applications, vol. 4, no. 11, pp. 33–39, 2013.
[25] M. K. Anam, B. N. Pikir, M. B. Firdaus, S. Erlinda, and Agustin, “Penerapan Na ̈ıve Bayes Classifier, K-Nearest Neighbor (KNN) dan Decision Tree untuk Menganalisis Sentimen pada Interaksi Netizen dan Pemeritah,†MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 1, pp. 139–150, 2021.
[26] Y. F. Safri, R. Arifudin, and M. A. Muslim, “K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor,†Scientific Journal of Informatics, vol. 5, no. 1, pp. 10–18, 2018.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Winny purbaratri, Hindriyanto Dwi Purnomo, Danny Manongga, Iwan Setyawan, Hendry Hendry, Sentiment Analysis of e-Government Service Using the Naive Bayes Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Ni Gusti Ayu Dasriani, Sirojul Hadi, Moch Syahrir, Intelligent System for Internet of Things-Based Building Fire Safety with Naive Bayes Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
- Hermila A., Rahmat Taufik R. L Bau, Sitti Suhada, Abdulaziz Ahmed siyad, Predicting Gen Z’s Sentiments on Gorontalo’s CulturalWisdom UsingSentiment Analysis Models , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 2 (2025)
- Supangat Supangat, Mohd Zainuri Bin Saringat, Mochamad Yovi Fatchur Rochman, Predicting Handling Covid-19 Opinion using Naive Bayes and TF-IDF for Polarity Detection , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 2 (2023)
- Sucipto Sucipto, Didik Dwi Prasetya, Triyanna Widiyaningtyas, Educational Data Mining: Multiple Choice Question Classification in Vocational School , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Hety Handayani Hidayat, Ardiansyah Ardiansyah, Poppy Arsil, Laras Isna Rahmawati, Pemetaan Kata Kunci dan Polaritas Sentimen Pengguna Twitter Terhadap Kehalalan Produk , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 1 (2021)
- Taufik Hidayat, Mohammad Ridwan, Muhamad Fajrul Iqbal, Sukisno Sukisno, Robby Rizky, William Eric Manongga, Determining Toddler's Nutritional Status with Machine Learning Classification Analysis Approach , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 2 (2025)
- Reo Wicaksono, Didik Dwi Prasetya, Ilham Ari Elbaith Zaeni, Nadindra Dwi Ariyanta, Tsukasa Hirashima, Machine Learning for Open-ended Concept Map Proposition Assessment: Impact of Length on Accuracy , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 25 No. 1 (2025)
- Ni Wayan Sumartini Saraswati, I Gusti Ayu Agung Diatri Indradewi, Recognize The Polarity of Hotel Reviews using Support Vector Machine , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Annisa Nurul Puteri, Arizal Arizal, Andini Dani Achmad, Feature Selection Correlation-Based pada Prediksi Nasabah Bank Telemarketing untuk Deposito , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 20 No. 2 (2021)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Vikky Aprelia Windarni, Adi Setiawan, Atina Rahmatalia, Comparison of the Karney Polygon Method and the Shoelace Method for Calculating Area , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
.png)











