Comparison of Support Vector Machine Performance with Oversampling and Outlier Handling in Diabetic Disease Detection Classification

Firda Yunita Sari; Maharani sukma Kuntari; Hani Khaulasari; Winda Ari Yati

doi:10.30812/matrik.v22i3.2979

Firda Yunita Sari Universitas Islam Negeri Sunan Ampel, Surabaya, Indonesia
Maharani sukma Kuntari Universitas Islam Negeri Sunan Ampel, Surabaya, Indonesia
Hani Khaulasari Universitas Islam Negeri Sunan Ampel, Surabaya, Indonesia
Winda Ari Yati Universitas Islam Negeri Sunan Ampel, Surabaya, Indonesia

DOI: https://doi.org/10.30812/matrik.v22i3.2979

Keywords: Accuracy, Diabetes Mellitus, Support Vector Machine, Synthetic Minority Over-Sampling, Technique

Abstract

Diabetes mellitus is a disease that attacks chronic metabolism, characterized by the body’s inability to process carbohydrates, fats so that glucose levels are high. Diabetes mellitus is the sixth cause of death in the world. Classifying data about diabetes mellitus makes it easier to predict the disease. As technology develops, diabetes mellitus can be detected using machine learning methods. The method that can be done is the support vector machine. The advantage of SVM is that it is very effective in completing classification, so it can quickly separate each positive and negative point. This study aimed to obtain the best SVM classification model based on accuracy, sensitivity, and precision values in detecting diabetes by adding Synthetic Minority Over-Sampling Technique (SMOTE) and handling outliers. The SMOTE method was applied to handle class imbalance. The Support Vector Machine (SVM) method aimed to produce a function as a dividing line or what can be called a hyperplane that matches all input data with the smallest possible error. The data studied were indications of diabetes, consisting of 8-factor variables and 1 class variable. The test results show that the SVM-SMOTE scenario produces the best accuracy. The SVM SMOTE scenario produced an accuracy value of the RBF kernel of 88% with an error of 12%, and this is obtained from the division of test data and training data of 90:10. This SVM-SMOTE scenario produced a precision value of 0.880 and a sensitivity value of 0.880. The research results showed that factor classification was more accurate if it is carried out using the support vector machine (SVM) method with imbalance data handling (SMOTE), and it can be concluded that the distribution of test data and training data influences a test scenario.

Downloads

Download data is not yet available.

References

[1] R. Amelia, “Hubungan Perilaku Perawatan Kaki dengan Terjadinya Komplikasi Luka Kaki Diabetes pada Pasien Diabetes
Melitus Tipe 2 di Puskesmas Tuntungan Kota Medan,” Talenta Conference Series: Tropical Medicine (TM), vol. 1, no. 1, pp.
124–131, 2018.
[2] B. Delvika, S. Nurhidayarnis, and P. D. Rinada, “Comparison of Classification Between Naive Bayes and K-Nearest Neighbor
on Diabetes Risk in Pregnant Women Perbandingan Klasifikasi Antara Naive Bayes dan K-Nearest Neighbor Terhadap Resiko
Diabetes Pada Ibu Hamil,” vol. 2, no. 2 october 2022, pp. 68–75, 2022.
[3] M. D. M. Tito Putri, P. Wahjudi, and I. Prasetyowati, “Gambaran Kondisi Ibu Hamil dengan Diabetes Mellitus di RSD dr.
Soebandi Jember Tahun 2013-2017,” Pustaka Kesehatan, vol. 6, no. 1, p. 46, 2018.
[4] I. Diabetes Atlas, “International Diabetes Federation,” Diabetes Research and Clinical Practice, vol. 10, no. 2, pp. 1–133, 2021.
[5] I. Maria, Asuhan Keperawatan Diabetes Mellitus Dan Asuhan Keperawatan Stroke. Deepublish, 2021.
[6] D. P. Paramita and A. W. Lestari, “Pengaruh Riwayat Keluarga Terhadap Kadar Glukosa Darah Pada Dewasa Muda Keturunan
Pertama Dari Penderita Diabetes Mellitus Tipe 2 Di Denpasar Selatan,” Jurnal Medika, vol. 8, no. 1, pp. 61–66, 2019.
[7] M. K. Murtiningsih, K. Pandelaki, and B. P. Sedli, “Gaya Hidup sebagai Faktor Risiko Diabetes Melitus Tipe 2,” Jurnal Ilmiah
Kedokteran Klinik, vol. 9, no. 2, p. 328, mar 2021.
[8] L. Hansur, D. Ugi, and A. Febriza, “Pencegahan Penyakit Diabetes Melitus Di Kelurahan Tamarunang Kec Sombaopu Kabupaten
Gowa Sulawesi Selatan,” SELAPARANG Jurnal Pengabdian Masyarakat Berkemajuan, vol. 4, no. 1, p. 417, 2020.
[9] F. Andaresta, S. Sudarsih, and M. Achwandi, “Asuhan Keperawatan Dengan Ketidakstbilan Kadar Gula Darah Pada Klien
Diabetes Mellitus,” Ph.D. dissertation, 2022.
[10] V. K. Putri and F. I. Kurniadi, “Klasifikasi Diabetes Menggunakan Model Pembelajaran Ensemble Blending,” Jurnal ULTIMATICS,
vol. 10, no. 1, pp. 11–15, 2018.
[11] A. Rahman Isnain, A. Indra Sakti, D. Alita, and N. Satya Marga, “Sentimen Analisis Publik Terhadap Kebijakan Lockdown
Pemerintah Jakarta Menggunakan Algoritma Svm,” Jdmsi, vol. 2, no. 1, pp. 31–37, 2021.
[12] A. Muqiit WS and R. Nooraeni, “Penerapan Metode Resampling Dalam Mengatasi Imbalanced Data Pada Determinan Kasus
Diare Pada Balita Di Indonesia (Analisis Data Sdki 2017),” Jurnal MSA ( Matematika dan Statistika serta Aplikasinya ), vol. 8,
no. 1, p. 19, 2020.
[13] R. D. Fitriani, H. Yasin, and Tarno, “Penanganan Klasifikasi Kelas Data Tidak Seimbang Dengan Random Oversampling Pada
Naive Bayes (Studi Kasus: Status Peserta Kb Iud Di Kabupaten Kendal,” Jurnal Gaussian, vol. 10, no. 1, pp. 11–20, 2021.
[14] S. Mutmainah, “Penanganan Imbalance Data Pada Klasifikasi,” in SNATi, vol. 1, 2021, pp. 10–16.
[15] P. M. Joshi, T. N., &Chawan, “Logistic Regression and Svm Based Diabetes,” International Journal For Technological Research
In Engineering, vol. 5, no. July, pp. 4347–4350., 2018.
[16] V. C. Bavkar and A. A. Shinde, “Machine learning algorithms for Diabetes prediction and neural network method for blood
glucose measurement,” Indian Journal of Science and Technology, vol. 14, no. 10, pp. 869–880, 2021.
[17] O. D. Amelia, A. M. Soleh, and S. Rahardiantoro, “Pemodelan Support Vector Machine Data Tidak Seimbang Keberhasilan
Studi Mahasiswa Magister IPB,” Xplore: Journal of Statistics, vol. 2, no. 1, pp. 33–40, 2018.
[18] V. P. K. Turlapati and M. R. Prusty, “Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19,”
Intelligence-Based Medicine, vol. 3-4, no. November, p. 100023, 2020.
[19] D. Sepri, A. Fauzi, R. Wandira, O. S. Riza, Y. F. Wahyuni, and H. Hutagaol, “Prediksi Harga Cabai Merah Menggunakan
Support Vector Regression,” Computer Based Information System Journal, vol. 02, pp. 1–5, 2020.
[20] D. I. Ramadhan and B. Santosa, “Analisis Kinerja Peramalan dan Klasifikasi Permintaan Auto Parts Berbasis Data Mining,”
Jurnal Teknik ITS, vol. 9, no. 2, pp. 162–169, jan 2021.
[21] R. M. Mashita, S. Basuki, and N. Hayatin, “Prediksi Pemakaian Kwh Listrik Menggunakan Metode Support Vector Regression
(SVR) (Studi Kasus: PT. PLN (Persero) Rayon Seririt),” Jurnal Repositor, vol. 2, no. 4, pp. 525–540, 2020.
[22] D. A. Agatsa, R. Rismala, and U. N. Wisesty, “Klasifikasi Pasien Pengidap Diabetes Metode Support Vector Machine,” e-
Proceeding of Enginering, vol. 7, no. 1, pp. 2517–2525, 2020.
[23] H. Khaulasari, “Combine Sampling Least Square Support Vector Machine Untuk Klasifikasi Multi Class Imbalanced Data,”
Jurnal Widyaloka IKIP Widya Darma, vol. 5, no. 3, pp. 261–278, 2018.
[24] L. Luo, S. Bao, and X. Peng, “Robust monitoring of industrial processes using process data with outliers and missing values,”
Chemometrics and Intelligent Laboratory Systems, vol. 192, p. 103827, sep 2019.
[25] E. A. Sembiring, “Pengaruh metode pencatatan persediaan dengan sistem periodik dan perpetual berbasis SIA terhadap stock
opname pada perusahaan dagang di PT Jasum Jaya,” Accumulated Journal (Accounting and Management Research Edition),
vol. 1, no. 1, pp. 69–77, 2019.
[26] P. R. Fitrayana and D. R. S. Saputro, “Algoritme Clustering Large Application (CLARA) untuk Menangani Data Outlier,” in
PRISMA, Prosiding Seminar Nasional Matematika, vol. 5, 2022, pp. 721–725.
[27] R. Andhykha, H. R. Handayani, and N.Woyanti, “Analisis Pengaruh PDRB, Tingkat Pengangguran, dan IPM Terhadap Tingkat
Kemiskinan di Provinsi Jawa Tengah,” Media Ekonomi dan Manajemen, vol. 33, no. 2, pp. 113–123, 2018.
[28] D. Alita, Y. Fernando, and H. Sulistiani, “Implementasi Algoritma Multiclass Svm Pada Opini Publik Berbahasa Indonesia Di
Twitter,” Jurnal Tekno Kompak, vol. 14, no. 2, p. 86, 2020.
[29] D. Darwis, E. S. Pratiwi, and A. F. O. Pasaribu, “Penerapan Algoritma SVM untuk Analisis Sentimen pada Data Twitter Komisi
Pemberantasan Korupsi Republik Indonesia,” Edutic - Scientific Journal of Informatics Education, vol. 7, no. 1, pp. 1–11, 2020.
[30] D. Wahyuni, “Optimasi parameter support vector machine (svm) classifier menggunakan firefly algorithm (ffa) optimization
untuk klasifikasi mri tumor otak,” Ph.D. dissertation, 2019.
[31] N. Nafiah, “Klasifikasi Kematangan Buah Mangga Berdasarkan Citra HSV dengan KNN,” Jurnal Elektronika Listrik dan
Teknologi Informasi Terapan, vol. 1, no. 2, pp. 1–4, 2019.
[32] M. Vakili, M. Ghamsari, and M. Rezaei, “Performance analysis and comparison of machine and deep learning algorithms for
IoT data classification,” arXiv preprint arXiv:2001.09636, 2020.
[33] N. Singh and P. Singh, “Stacking-based multi-objective evolutionary ensemble framework for prediction of diabetes mellitus,”
Biocybernetics and Biomedical Engineering, vol. 40, no. 1, pp. 1–22, 2020.