Combination of Smote and Random Forest Methods for Lung Cancer Classification
DOI:
https://doi.org/10.30812/ijecsa.v2i2.3333Keywords:
Data Mining, Lung Cancer, Prediction Method, Random ForestAbstract
Lung cancer is a network of cells that grow abnormally in the lungs. Lung cancer has four severity levels, namely stages 1 to 4. If lung cancer is not treated quickly, it is at risk of causing death. This research aimed to combine Synthetic Minority Over-sampling (Smote) and Random Forest methods for lung cancer classification. The method used was a combination of Smote and Random Forest. Smote was used to balance the data, while Random Forest was used to classify lung cancer data. The results showed that the combination of Smote and Random Forest methods obtained an accuracy of 94.1%, sensitivity of 94.5, and specificity of 93.7%. Meanwhile, without Smote, the accuracy is 89.1%, sensitivity is 55%, and specificity is 94.5%. The use of Smote can improve the performance of the Random Forest classification method based on accuracy and sensitivity. There was an increase of 5% in accuracy and a 39% increase in sensitivity.
Downloads
References
P. Saha, R. O. Nyarko, P. Lokare, I. Kahwa, P. O. Boateng, and C. Asum, “Effect of Covid-19 in Management of Lung Cancer Disease: A Review,†Asian Journal of Pharmaceutical Research and Development, vol. 10, no. 3, pp. 58–64, 2022.
A. Bhattacharjee, R. Murugan, and T. Goel, “A hybrid approach for lung cancer diagnosis using optimized random forest classification and K-means visualization algorithm,†Health and Technology, pp. 1–14, 2022.
E. Wulandari, “Klasifikasi Kanker Paru-Paru Menggunakan Metode Naive Bayes,†International Research on Big-Data and Computer Technology: I-Robot, vol. 6, no. 2, pp. 20–24, 2022, doi: 10.53514/ir.v6i2.325.
A. Fauzi, R. Supriyadi, and N. Maulidah, “Deteksi Penyakit Kanker Payudara dengan Seleksi Fitur berbasis Principal Component Analysis dan Random Forest,†Jurnal Infortech, vol. 2, no. 1, pp. 96–101, 2020, doi: 10.31294/infortech.v2i1.8079.
H. Harafani and H. A. Al-Kautsar, “Meningkatkan Kinerja K-NN Untuk Klasifikasi Kanker Payudara Dengan Forward Selection,†Jurnal Pendidikan Teknologi dan Kejuruan, vol. 18, no. 1, pp. 99–110, 2021, doi: 10.23887/jptk-undiksha.v18i1.29905.
E. Nemlander et al., “Lung cancer prediction using machine learning on data from a symptom e-questionnaire for never smokers, formers smokers and current smokers,†PLoS ONE, vol. 17, no. 10 October, pp. 1–11, 2022, doi: 10.1371/journal.pone.0276703.
G. A. Shanbhag, K. A. Prabhu, N. V. S. Reddy, and B. A. Rao, “Prediction of Lung Cancer using Ensemble Classifiers,†in Journal of Physics: Conference Series, 2022, vol. 2161, no. 1, pp. 1–11, doi: 10.1088/1742-6596/2161/1/012007.
A. Helisa, T. H. Saragih, I. Budiman, F. Indriani, and D. Kartini, “Prediction of Post-Operative Survival Expectancy in Thoracic Lung Cancer Surgery Using Extreme Learning Machine and SMOTE,†Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 2, pp. 239–249, 2023, doi: 10.26555/jiteki.v9i2.25973.
Hairani, M. N. Abdillah, and M. Innuddin, “An Expert System for Diagnosis of Rheumatic Disease Types Using Forward Chaining Inference and Certainty Factor Method,†in 2019 International Conference on Sustainable Information Engineering and Technology (SIET), 2019, pp. 104–109, doi: 10.1109/SIET48054.2019.8986035.
H. Hairani, A. Anggrawan, and D. Priyanto, “Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link,†International Journal on Informatics Visualization, vol. 7, no. 1, pp. 258–264, 2023.
H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes,†Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, pp. 89–93, Apr. 2020, doi: https://doi.org/10.14710/jtsiskom.8.2.2020.89-93.
Erlin, Y. N. Marlim, Junadhi, L. Suryati, and N. Agustina, “Early Detection of Diabetes Using Machine Learning with Logistic Regression Algorithm,†Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 11, no. 2, pp. 88–96, 2022.
D. Dablain, B. Krawczyk, and N. V Chawla, “DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data,†IEEE Transactions on Neural Networks and Learning Systems, 2022.
H. Hairani, “Peningkatan Kinerja Metode SVM Menggunakan Metode KNN Imputasi dan K-Means-Smote untuk Klasifikasi Kelulusan Mahasiswa Universitas Bumigora,†Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 8, no. 4, p. 713, Jul. 2021, doi: 10.25126/jtiik.2021843428.
M. M. Hassan and D. Kadir, “Improving Classification Performance for a Novel Imbalanced Medical Dataset using SMOTE Method Bayesian Support Vector Machine for Classifying Medical Images View project Bayesian Machine Learning Models for Classification of Diabetes Diagnosis in Kurdistan View project SEE PROFILE,†Article in International Journal of Advanced Trends in Computer Science and Engineering, 2020, doi: 10.30534/ijatcse/2020/104932020.
Downloads
Published
Issue
Section
How to Cite
Most read articles by the same author(s)
- I Nyoman Switrayana, Diki Ashadi, Hairani Hairani, Afrig Aminuddin, Sentiment Analysis and Topic Modeling of Kitabisa Applications using Support Vector Machine (SVM) and Smote-Tomek Links Methods , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 2 No. 2 (2023): September 2023
- Riosatria Riosatria, Hairani Hairani, Anthony Anggrawan, Moch. Syahrir, Enhancing Mental Illness Predictions: Analyzing Trends Using Multiple Linear Regression and Neural Network Backpropagation , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 3 No. 2 (2024): September 2024
- Hairani Hairani, Juvinal Ximenes Guterres, Exploring Customer Purchasing Patterns: A Study Utilizing FP-Growth Algorithm on Supermarket Transaction Data , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 3 No. 1 (2024): March 2024
- Hairani Hairani, Mengas Janhasmadja, Abu Tholib, Juvinal Ximenes Guterres, Yuri Ariyanto, Thesis Topic Modeling Study: Latent Dirichlet Allocation (LDA) and Machine Learning Approach , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 3 No. 2 (2024): September 2024
- Hairani Hairani, Lilik Nurhayati, Muhammad Innuddin, Web-Based Application for Toddler Nutrition Classification Using C4.5 Algorithm , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 1 No. 2 (2022): September 2022
- Dias Nabila Huda, Anthony Anggrawan, Hairani Hairani, Clustering Analysis of Umrah Pilgrim Data Based on the K-Medoid Method , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 3 No. 2 (2024): September 2024
- Anthony Anggrawan, Hairani Hairani, M. Ade Candra, Prediction of Electricity Usage with Back-propagation Neural Network , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 1 No. 1 (2022): March 2022
- Ramadhanti Ramadhanti, Hairani Hairani, Muhammad Innuddin, Electric Vehicle Sales-Prediction Application Using Backpropagation Algorithm Based on Web , International Journal of Engineering and Computer Science Applications (IJECSA): Vol. 2 No. 2 (2023): September 2023