Comparison of k-Nearest Neighbor and Naive Bayes Methods for SNP Data Classification
DOI:
https://doi.org/10.30812/matrik.v22i1.1758Keywords:
Klasifikasi, k-Nearest Neighbor, Naive Bayes, Single Nucleotide PolymorphismAbstract
In an accident, sometimes the identity of a person who has an accident is hard to know, so it is necessary to use biological data such as Single Nucleotide Polymorphism (SNP) data to identify the person's origin. This research aims to compare the accuracy and the F1 score of the k-Nearest Neighbor method and the Naive Bayes method in classifying SNP data from 120 people who divide into groups, namely European (CEU) and Yoruba (YRI). Determination of the best method based on the average value of accuracy and the average value of F1 score from 1000 iterations with various percentage distributions of training datasets and testing datasets. In this research, the selection of SNP locations for the classification process was carried out by correlation analysis. The average accuracy obtained for the k-Nearest Neighbor method with the value of k=31 is 98.38% where the average F1 score is 98.39% while the Naive Bayes method obtained the average accuracy of 96.74% and the average F1 score of 96.63%. In this case, the k-Nearest Neighbor method is better than the Naive Bayes method in classifying SNP data to determine the origin of a person's ancestor tends to be from CEU or YRI.
Downloads
References
[2] R. Mathur, B. S. Rana, and A. K. Jha, “Single Nucleotide Polymorphism (SNP),â€, Encyclopedia of Animal Cognition and Behavior. United States of America: Springer, Cham, 2018.
[3] X. Ding and X. Guo, “A Survey of SNP Data Analysis,†Journal Big Data Mining and Analytics, vol. 1, no. 3, pp. 173–190, 2018.
[4] A. A. Komar, Methods Single Nucleotide Polymorphisms - Methods and Protocols, 2nd ed. United States of America: Humana Press, 2009.
[5] J. Ren et al., “Genetic Diversity Revealed by Single Nucleotide Polymorphism Markers in a Worldwide Germplasm Collection of Durum Wheat,†International Journal of Molecular Sciences, vol. 14, no. 4, pp. 7061–7088, 2013.
[6] E. Salwati, S. Handayani, and R. P. Jekti, “Identifikasi Single Nucleotide Polymorphism ( SNP ) Gen pvmdr1 pada Penderita Malaria Vivaks di Minahasa Tenggara ( Sulawesi Utara ),†Jurnal Biotek Medisiana Indonesia, vol. 3, no. 2, pp. 49–57, 2014.
[7] A. Putri and S. Wathon, “Aplikasi Single Nucleotide Polymorphism (SNP) dalam Studi Farmakogenomik untuk Pengembangan Obat,†Jurnal BioTrends, vol. 9, no. 2, pp. 69–74, 2018.
[8] Triwani and I. Saleh, “Single Nucleotide Polymorphism Promoter -765g /C Gen Cox-2 sebagai Faktor Risiko Terjadinya Karsinoma Kolorektal,†Biomedical Journal of Indonesia, vol. 1, no. 1, pp. 2–10, 2015.
[9] V. D. M. Butarbutar, A. Setiawan, and T. Mahatma, “Analisis Data SNP (Single Nucleotide Polymorphism) dengan Metode Chi-Square,†in Prosiding Seminar Nasional Matematika dan Pendidikan Matematika (Sendika) 2020, 2020, vol. 6, no. 1, pp. 97–103.
[10] J. Gaudillo et al., “Machine Learning Approach to Single Nucleotide Polymorphism-based Asthma Prediction,†Journal PLOS ONE, vol. 14, no. 12, pp. 1–12, 2019.
[11] F. Bertolini et al., “Preselection Statistics and Random Forest Classification Identify Population Informative Single Nucleotide Polymorphisms in Cosmopolitan and Sutochthonous Cattle Breeds,†Journal Animal, vol. 12, no. 1, pp. 12–19, 2018.
[12] N. Batnyam, A. Gantulga, and S. Oh, “An Efficient Classification for Single Nucleotide Polymorphism (SNP) Dataset,â€, Computer and Information Science. Studies in Computational Intelligence. Heidelberg: Springer, 2013, pp. 171–185.
[13] S. N. Kamalina, 2018. Identifikasi Single Nucleotide Polymorphism (SNP) pada Genom Kedelai Menggunakan Algoritme C5.0. Skripsi Sarjana. Institut Pertanian Bogor.
[14] Paiman, Korelasi dan Regresi Ilmu-Ilmu Pertanian. Yogyakarta: UPY Press, 2019.
[15] P. Schober, L. A. Schwarte, and C. Boer, “Correlation Coefficients: Appropriate Use and Interpretation,†Journal Anesthesia and Analgesia, vol. 126, no. 5, pp. 1763–1768, 2018.
[16] D. Napitupulu et al., “Analysis of Student Satisfaction Toward Quality of Service Facility,†Journal of Physics: Conference Series, vol. 954, pp. 1–7, 2018.
[17] M. R. Faisal and D. T. Nugrahedi, Belajar Data Science: Klasifikasi dengan Bahasa Pemrograman R. Banjarbaru: Scripta Cendekia, 2019.
[18] R. T. Vulandari, Data Mining Teori dan Aplikasi Rapidminer. Yogyakarta: Gava Media, 2017.
[19] D. Berrar, “Bayes’ Theorem and Naive Bayes Classifier,â€, Encyclopedia of Bioinformatics and Computational Biology, vol. 1, Elsevier, 2018, pp. 403–412.
[20] F. Gorunescu, Data Mining Concepts, Models and Techniques. New York: Springer-Verlag Berlin Heidelberg, 2011.
[21] D. Srianto and E. Mulyanto, “Perbandingan K-Nearest Neighbor Dan Naive Bayes,†Jurnal Techno.COM, vol. 15, no. 3, pp. 241–245, 2016.
[22] A. Indriani, “Analisa Perbandingan Metode Naive Bayes Classifier dan K-Nearest Neigbor terhadap Klasifikasi Data,†Jurnal SEBATIK, vol. 24, no. 1, pp. 1–7, 2020.
[23] R. N. Devita, H. W. Herwanto, and A. P. Wibawa, “Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa Indonesia,†Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 4, pp. 427–434, 2018.
[24] A. Ashari, I. Paryudi, and A. M. Tjou, “Performance Comparison between Naïve Bayes, Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool,†International Journal of Advanced Computer Science and Applications, vol. 4, no. 11, pp. 33–39, 2013.
[25] M. K. Anam, B. N. Pikir, M. B. Firdaus, S. Erlinda, and Agustin, “Penerapan Na ̈ıve Bayes Classifier, K-Nearest Neighbor (KNN) dan Decision Tree untuk Menganalisis Sentimen pada Interaksi Netizen dan Pemeritah,†MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 1, pp. 139–150, 2021.
[26] Y. F. Safri, R. Arifudin, and M. A. Muslim, “K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor,†Scientific Journal of Informatics, vol. 5, no. 1, pp. 10–18, 2018.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Saiful Nur Arif, Muhammad Dahria, Sarjon Defit, Dicky Novriansyah, Ali Ikhwan, Implementation of Single Linked on Machine Learning for Clustering Student Scientific Fields , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 1 (2022)
- Pungkas Subarkah, Penerapan Algoritme Klasifikasi Classification And Regression Trees (CART) Untuk Diagnosis Penyakit Diabetes Retinopathy , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 2 (2020)
- Evan Tanuwijaya, Angelica Roseanne, Modifikasi Arsitektur VGG16 untuk Klasifikasi Citra Digital Rempah-Rempah Indonesia , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 1 (2021)
- Imamah Imamah, Akhmad Siddiqi, Penerapan Teorema Bayes untuk Mendiagnosa Penyakit Telinga Hidung Tenggorokan (THT) , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 18 No. 2 (2019)
- Anthony Anggrawan, Mayadi Mayadi, Application of KNN Machine Learning and Fuzzy C-Means to Diagnose Diabetes , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 2 (2023)
- Abdurraghib Segaf Suweleh, Dyah Susilowaty, Hairani Hairani, Khairan Marzuki, Penanganan Ketidak Seimbangan Kelas Menggunakan Pendekatan Level Data , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 20 No. 1 (2020)
- Darwan Darwan, Penggunaan Jaringan Syaraf Tiruan dan Wavelet Pada Citra EKG 12 Lead , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 20 No. 2 (2021)
- Vivi Aida Fitria, Lilis Widayanti, Enhancing Accuracy in Stock Price Prediction: The Power of Optimization Algorithms , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Aditya Dwi Putro, Arief Hermawan, Pengaruh Cahaya dan Kualitas Citra dalam Klasifikasi Kematangan Pisang Cavendish Berdasarkan Ciri Warna Menggunakan Artificial Neural Network , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 1 (2021)
- Aris Tjahyanto, Faisal Johan Atletiko, Peningkatan Kinerja Pengklasifikasi Objek Bawah Laut dengan Deep Learning , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 3 (2022)
You may also start an advanced similarity search for this article.
Most read articles by the same author(s)
- Vikky Aprelia Windarni, Adi Setiawan, Atina Rahmatalia, Comparison of the Karney Polygon Method and the Shoelace Method for Calculating Area , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 1 (2023)
.png)











