Comparison of k-Nearest Neighbor and Naive Bayes Methods for SNP Data Classification

  • Denny Indrajaya Departemen Matematika dan Sains Data, Fakultas Sains dan Matematika, Universitas Kristen Satya Wacana, Salatiga, Jawa Tengah 50711
  • Adi Setiawan Departemen Matematika dan Sains Data, Fakultas Sains dan Matematika, Universitas Kristen Satya Wacana, Salatiga, Jawa Tengah 50711
  • Bambang Susanto Departemen Matematika dan Sains Data, Fakultas Sains dan Matematika, Universitas Kristen Satya Wacana, Salatiga, Jawa Tengah 50711
Keywords: Klasifikasi, k-Nearest Neighbor, Naive Bayes, Single Nucleotide Polymorphism

Abstract

In an accident, sometimes the identity of a person who has an accident is hard to know, so it is necessary to use biological data such as Single Nucleotide Polymorphism (SNP) data to identify the person's origin. This research aims to compare the accuracy and the F1 score of the k-Nearest Neighbor method and the Naive Bayes method in classifying SNP data from 120 people who divide into groups, namely European (CEU) and Yoruba (YRI). Determination of the best method based on the average value of accuracy and the average value of F1 score from 1000 iterations with various percentage distributions of training datasets and testing datasets. In this research, the selection of SNP locations for the classification process was carried out by correlation analysis. The average accuracy obtained for the k-Nearest Neighbor method with the value of k=31 is 98.38% where the average F1 score is 98.39% while the Naive Bayes method obtained the average accuracy of 96.74% and the average F1 score of 96.63%. In this case, the k-Nearest Neighbor method is better than the Naive Bayes method in classifying SNP data to determine the origin of a person's ancestor tends to be from CEU or YRI.

Downloads

Download data is not yet available.

References

[1] M. Tamimi, “Tes DNA dalam Menetapkan Hubungan Nasab,” Al-Istinbath: Jurnal Hukum Islam, vol. 13, no. 1, pp. 83–98, 2014.
[2] R. Mathur, B. S. Rana, and A. K. Jha, “Single Nucleotide Polymorphism (SNP),”, Encyclopedia of Animal Cognition and Behavior. United States of America: Springer, Cham, 2018.
[3] X. Ding and X. Guo, “A Survey of SNP Data Analysis,” Journal Big Data Mining and Analytics, vol. 1, no. 3, pp. 173–190, 2018.
[4] A. A. Komar, Methods Single Nucleotide Polymorphisms - Methods and Protocols, 2nd ed. United States of America: Humana Press, 2009.
[5] J. Ren et al., “Genetic Diversity Revealed by Single Nucleotide Polymorphism Markers in a Worldwide Germplasm Collection of Durum Wheat,” International Journal of Molecular Sciences, vol. 14, no. 4, pp. 7061–7088, 2013.
[6] E. Salwati, S. Handayani, and R. P. Jekti, “Identifikasi Single Nucleotide Polymorphism ( SNP ) Gen pvmdr1 pada Penderita Malaria Vivaks di Minahasa Tenggara ( Sulawesi Utara ),” Jurnal Biotek Medisiana Indonesia, vol. 3, no. 2, pp. 49–57, 2014.
[7] A. Putri and S. Wathon, “Aplikasi Single Nucleotide Polymorphism (SNP) dalam Studi Farmakogenomik untuk Pengembangan Obat,” Jurnal BioTrends, vol. 9, no. 2, pp. 69–74, 2018.
[8] Triwani and I. Saleh, “Single Nucleotide Polymorphism Promoter -765g /C Gen Cox-2 sebagai Faktor Risiko Terjadinya Karsinoma Kolorektal,” Biomedical Journal of Indonesia, vol. 1, no. 1, pp. 2–10, 2015.
[9] V. D. M. Butarbutar, A. Setiawan, and T. Mahatma, “Analisis Data SNP (Single Nucleotide Polymorphism) dengan Metode Chi-Square,” in Prosiding Seminar Nasional Matematika dan Pendidikan Matematika (Sendika) 2020, 2020, vol. 6, no. 1, pp. 97–103.
[10] J. Gaudillo et al., “Machine Learning Approach to Single Nucleotide Polymorphism-based Asthma Prediction,” Journal PLOS ONE, vol. 14, no. 12, pp. 1–12, 2019.
[11] F. Bertolini et al., “Preselection Statistics and Random Forest Classification Identify Population Informative Single Nucleotide Polymorphisms in Cosmopolitan and Sutochthonous Cattle Breeds,” Journal Animal, vol. 12, no. 1, pp. 12–19, 2018.
[12] N. Batnyam, A. Gantulga, and S. Oh, “An Efficient Classification for Single Nucleotide Polymorphism (SNP) Dataset,”, Computer and Information Science. Studies in Computational Intelligence. Heidelberg: Springer, 2013, pp. 171–185.
[13] S. N. Kamalina, 2018. Identifikasi Single Nucleotide Polymorphism (SNP) pada Genom Kedelai Menggunakan Algoritme C5.0. Skripsi Sarjana. Institut Pertanian Bogor.
[14] Paiman, Korelasi dan Regresi Ilmu-Ilmu Pertanian. Yogyakarta: UPY Press, 2019.
[15] P. Schober, L. A. Schwarte, and C. Boer, “Correlation Coefficients: Appropriate Use and Interpretation,” Journal Anesthesia and Analgesia, vol. 126, no. 5, pp. 1763–1768, 2018.
[16] D. Napitupulu et al., “Analysis of Student Satisfaction Toward Quality of Service Facility,” Journal of Physics: Conference Series, vol. 954, pp. 1–7, 2018.
[17] M. R. Faisal and D. T. Nugrahedi, Belajar Data Science: Klasifikasi dengan Bahasa Pemrograman R. Banjarbaru: Scripta Cendekia, 2019.
[18] R. T. Vulandari, Data Mining Teori dan Aplikasi Rapidminer. Yogyakarta: Gava Media, 2017.
[19] D. Berrar, “Bayes’ Theorem and Naive Bayes Classifier,”, Encyclopedia of Bioinformatics and Computational Biology, vol. 1, Elsevier, 2018, pp. 403–412.
[20] F. Gorunescu, Data Mining Concepts, Models and Techniques. New York: Springer-Verlag Berlin Heidelberg, 2011.
[21] D. Srianto and E. Mulyanto, “Perbandingan K-Nearest Neighbor Dan Naive Bayes,” Jurnal Techno.COM, vol. 15, no. 3, pp. 241–245, 2016.
[22] A. Indriani, “Analisa Perbandingan Metode Naive Bayes Classifier dan K-Nearest Neigbor terhadap Klasifikasi Data,” Jurnal SEBATIK, vol. 24, no. 1, pp. 1–7, 2020.
[23] R. N. Devita, H. W. Herwanto, and A. P. Wibawa, “Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa Indonesia,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 4, pp. 427–434, 2018.
[24] A. Ashari, I. Paryudi, and A. M. Tjou, “Performance Comparison between Naïve Bayes, Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool,” International Journal of Advanced Computer Science and Applications, vol. 4, no. 11, pp. 33–39, 2013.
[25] M. K. Anam, B. N. Pikir, M. B. Firdaus, S. Erlinda, and Agustin, “Penerapan Na ̈ıve Bayes Classifier, K-Nearest Neighbor (KNN) dan Decision Tree untuk Menganalisis Sentimen pada Interaksi Netizen dan Pemeritah,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 1, pp. 139–150, 2021.
[26] Y. F. Safri, R. Arifudin, and M. A. Muslim, “K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor,” Scientific Journal of Informatics, vol. 5, no. 1, pp. 10–18, 2018.
Published
2022-11-22
How to Cite
Indrajaya, D., Setiawan, A., & Susanto, B. (2022). Comparison of k-Nearest Neighbor and Naive Bayes Methods for SNP Data Classification. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 22(1), 149-164. https://doi.org/https://doi.org/10.30812/matrik.v22i1.1758
Section
Articles