Combination of Smote and Random Forest Methods for Lung Cancer Classification

  • Christopher Michael Lauw Unuversitas Bumigora
  • Hairani Hairani Universitas Bumigora
  • Ilham Saifuddin Universitas Muhammadiyah Jember
  • Juvinal Ximenes Guterres Universidade Oriental Timur Lorosa’e
  • Muhammad Maariful Huda Politeknik Angkatan Darat
  • Mayadi Mayadi University Teknologi Mara
Keywords: Data Mining, Lung Cancer, Prediction Method, Random Forest


Lung cancer is a network of cells that grow abnormally in the lungs. Lung cancer has four severity levels, namely stages 1 to 4. If lung cancer is not treated quickly, it is at risk of causing death. This research aimed to combine Synthetic Minority Over-sampling (Smote) and Random Forest methods for lung cancer classification. The method used was a combination of Smote and Random Forest. Smote was used to balance the data, while Random Forest was used to classify lung cancer data. The results showed that the combination of Smote and Random Forest methods obtained an accuracy of 94.1%, sensitivity of 94.5, and specificity of 93.7%. Meanwhile, without Smote, the accuracy is 89.1%, sensitivity is 55%, and specificity is 94.5%. The use of Smote can improve the performance of the Random Forest classification method based on accuracy and sensitivity. There was an increase of 5% in accuracy and a 39% increase in sensitivity.


[1] F. A. Hermawati and M. I. Safii, “Sistem Deteksi Keganasan Kanker Paru-Paru pada CT Scan dengan Menggunakan Metode Mask Region-based Convolutional Neural Network (Mask R-CNN),” Proceeding KONIK (Konferensi Nasional Ilmu Komputer), vol. 5, pp. 193–197, 2021.
[2] P. Saha, R. O. Nyarko, P. Lokare, I. Kahwa, P. O. Boateng, and C. Asum, “Effect of Covid-19 in Management of Lung Cancer Disease: A Review,” Asian Journal of Pharmaceutical Research and Development, vol. 10, no. 3, pp. 58–64, 2022.
[3] A. Bhattacharjee, R. Murugan, and T. Goel, “A hybrid approach for lung cancer diagnosis using optimized random forest classification and K-means visualization algorithm,” Health and Technology, pp. 1–14, 2022.
[4] E. Wulandari, “Klasifikasi Kanker Paru-Paru Menggunakan Metode Naive Bayes,” International Research on Big-Data and Computer Technology: I-Robot, vol. 6, no. 2, pp. 20–24, 2022, doi: 10.53514/ir.v6i2.325.
[5] A. Fauzi, R. Supriyadi, and N. Maulidah, “Deteksi Penyakit Kanker Payudara dengan Seleksi Fitur berbasis Principal Component Analysis dan Random Forest,” Jurnal Infortech, vol. 2, no. 1, pp. 96–101, 2020, doi: 10.31294/infortech.v2i1.8079.
[6] H. Harafani and H. A. Al-Kautsar, “Meningkatkan Kinerja K-NN Untuk Klasifikasi Kanker Payudara Dengan Forward Selection,” Jurnal Pendidikan Teknologi dan Kejuruan, vol. 18, no. 1, pp. 99–110, 2021, doi: 10.23887/jptk-undiksha.v18i1.29905.
[7] E. Nemlander et al., “Lung cancer prediction using machine learning on data from a symptom e-questionnaire for never smokers, formers smokers and current smokers,” PLoS ONE, vol. 17, no. 10 October, pp. 1–11, 2022, doi: 10.1371/journal.pone.0276703.
[8] G. A. Shanbhag, K. A. Prabhu, N. V. S. Reddy, and B. A. Rao, “Prediction of Lung Cancer using Ensemble Classifiers,” in Journal of Physics: Conference Series, 2022, vol. 2161, no. 1, pp. 1–11, doi: 10.1088/1742-6596/2161/1/012007.
[9] A. Helisa, T. H. Saragih, I. Budiman, F. Indriani, and D. Kartini, “Prediction of Post-Operative Survival Expectancy in Thoracic Lung Cancer Surgery Using Extreme Learning Machine and SMOTE,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 2, pp. 239–249, 2023, doi: 10.26555/jiteki.v9i2.25973.
[10] Hairani, M. N. Abdillah, and M. Innuddin, “An Expert System for Diagnosis of Rheumatic Disease Types Using Forward Chaining Inference and Certainty Factor Method,” in 2019 International Conference on Sustainable Information Engineering and Technology (SIET), 2019, pp. 104–109, doi: 10.1109/SIET48054.2019.8986035.
[11] H. Hairani, A. Anggrawan, and D. Priyanto, “Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link,” International Journal on Informatics Visualization, vol. 7, no. 1, pp. 258–264, 2023.
[12] H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, pp. 89–93, Apr. 2020, doi:
[13] Erlin, Y. N. Marlim, Junadhi, L. Suryati, and N. Agustina, “Early Detection of Diabetes Using Machine Learning with Logistic Regression Algorithm,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 11, no. 2, pp. 88–96, 2022.
[14] D. Dablain, B. Krawczyk, and N. V Chawla, “DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
[15] H. Hairani, “Peningkatan Kinerja Metode SVM Menggunakan Metode KNN Imputasi dan K-Means-Smote untuk Klasifikasi Kelulusan Mahasiswa Universitas Bumigora,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 8, no. 4, p. 713, Jul. 2021, doi: 10.25126/jtiik.2021843428.
[16] M. M. Hassan and D. Kadir, “Improving Classification Performance for a Novel Imbalanced Medical Dataset using SMOTE Method Bayesian Support Vector Machine for Classifying Medical Images View project Bayesian Machine Learning Models for Classification of Diabetes Diagnosis in Kurdistan View project SEE PROFILE,” Article in International Journal of Advanced Trends in Computer Science and Engineering, 2020, doi: 10.30534/ijatcse/2020/104932020.
How to Cite
C. Michael Lauw, H. Hairani, I. Saifuddin, J. Ximenes Guterres, M. Maariful Huda, and M. Mayadi, “Combination of Smote and Random Forest Methods for Lung Cancer Classification”, International Journal of Engineering and Computer Science Applications (IJECSA), vol. 2, no. 2, pp. 59 - 64, Sep. 2023.

Most read articles by the same author(s)

1 2 > >>