TY - JOUR AU - Muhammad Latief AU - Luthfi Nabila AU - Wildan Miftakhurrahman AU - Saihun Ma'rufatullah AU - Henri Tantyoko PY - 2024/02/04 Y2 - 2025/04/03 TI - Handling Imbalance Data using Hybrid Sampling SMOTE-ENN in Lung Cancer Classification JF - International Journal of Engineering and Computer Science Applications (IJECSA) JA - IJECSA VL - 3 IS - 1 SE - Articles DO - https://doi.org/10.30812/ijecsa.v3i1.3758 UR - https://journal.universitasbumigora.ac.id/index.php/IJECSA/article/view/3758 AB - The classification problem is one instance of a problem that is typically handled or resolved using machine learning. When there is an imbalance in the classes within the data, machine learning models have a tendency to overclassify a greater number of classes. The model will have low accuracy in a few classes and high accuracy in many classes as a result of the issue. The majority of the data has the same number of classes, but if the difference is too great, it will differ. The issue of data imbalance is also evident in the data on lung cancer, where there are 283 positive classes and negative classes 38. Therefore, this research aims to use a hybrid sampling technique, combining Synthetic Minority Over-sampling Technique (SMOTE) with Edited Nearest Neighbors (ENN) and Random Forest, to balance the data of lung cancer patients who experience class imbalance. This research method involves the SMOTE-ENN preprocessing method to balance the data and the Random Forest method is used as a classification method to predict lung cancer by dividing training data and testing 10-fold cross validation. The results of this study show that using SMOTE-ENN with Random Forest has the best performance compared to SMOTE and without oversampling on all metrics used. The conclusion is using the SMOTE-ENN hybrid sampling technique with the Random Forest model significantly improves the model's ability to identify and classify data. ER -