Evaluating Random Forest Regression for Air Quality Prediction
DOI:
https://doi.org/10.30812/varian.v9i1.6046Keywords:
Air Pollution Prediction, Air Quality, Environmental Data Analysis, Machine Learning, Random ForestAbstract
Air pollution is a growing environmental issue in Makassar due to rapid urban development and increasing transportation activity. This study aims to model and predict air pollutant concentrations using the Random Forest (RF) regression method. The data consist of daily PM2.5, PM10, CO, NO2, SO2, and O3 measurements from September 2024 to September 2025, totaling 395 observations. Missing values (14.05%) were addressed using a hybrid approach combining linear interpolation and multiple linear regression. The RF model was trained under two data-split scenarios (70:30 and 80:20) and evaluated using SMAPE, RMSE, MAE, and R2. The results show that the 80:20 configuration provides the best predictive accuracy. CO and O3 yield the most accurate predictions with SMAPE values of 9.75% and 10.87%, and R2 of 0.973 and 0.964, respectively. PM2.5 and PM10 also show strong performance, with R2 values above 0.84. These results indicate that the RF model effectively captures pollutant variability and provides reliable forecasts. Overall, Random Forest has been shown to be a robust and accurate method for predicting air quality in Makassar, supporting environmental monitoring and early warning systems. Despite its strong performance, this study is limited to two data-partition schemes and does not incorporate temporal deep-learning architectures. Future studies may investigate hybrid ensembles or deep learning approaches to determine whether incorporating sequential modeling further enhances predictive stability.
Downloads
References
Al-Mahdawi, H. K., Alkattan, H., Subhi, A. A., Al-hadrawi, H. F., Abotaleb, M., Ali, G. K., Mijwil, M. M., Towfeek, A.-S. K., & Helal, A. H. (2023). Analysis and prediction of evaporation rates using random forest models: A case study of Almaty city. Babylonian Journal of Machine Learning, 2023, 55–64. https://doi.org/10.58496/BJML/2023/010
Alzu’bi, F., Al-Rawabdeh, A., & Almagbile, A. (2024). Predicting air quality using random forest: A case study in Amman-Zarqa. The Egyptian Journal of Remote Sensing and Space Sciences, 27(3), 604–613. https://doi.org/10.1016/j.ejrs.2024.07.004
Anggraini, R. A., Widagdo, G., Budi, A. S., & Qomaruddin, M. (2019). Penerapan Data Mining Classification untuk Data Blogger Menggunakan Metode Naïve Bayes. Jurnal Sistem dan Teknologi Informasi (JUSTIN), 7(1), 47. https://doi.org/10.26418/justin.v7i1.30211
Beck, E., & Wolf, M. (2026). Forecasting inflation with the hedged random forest. Empirical Economics, 70(2), 23. https://doi.org/10.1007/s00181-025-02879-x
Benedict, L. (2022). Prediksi Tingkat Kematian Covid-19 di Indonesia dengan menggunakan Metode Linear Regression. https://kc.umn.ac.id/id/eprint/22407/
Farhan, M., Sanusi, W., & Ihsan, H. (2024). Pemodelan Pencemaran Udara sebagai Solusi Penurunan Kualitas Udara Menggunakan Generalized Space-Time Autoregressive di Kota Makassar. Journal of Mathematics, Computations and Statistics, 7(2), 258–274. https://doi.org/10.35580/jmathcos.v7i2.4304
Hu, J., & Szymczak, S. (2023). A review on longitudinal data analysis with random forest. Briefings in Bioinformatics, 24(2), bbad002. https://doi.org/10.1093/bib/bbad002
Indartini, M., & Mutmainah, M. (2024). Analisis Data Kuantitatif: Uji Instrumen, Uji Asumsi Klasik, Uji Korelasi dan Regresi Linier Berganda. Penerbit Lakeisha. ISBN: 978-623-119-036-9
Irawan, Y. (2021). Penerapan Algoritma Decision Tree C4.5 untuk Memprediksi Kelayakan Calon Pendonor Melakukan Donor Darah dengan Klasifikasi Data Mining. JTIM : Jurnal Teknologi Informasi dan Multimedia, 2(4), 181–189. https://doi.org/10.35746/jtim.v2i4.75
Jose, C., & Gopakumar, G. (2019). An Improved Random Forest Algorithm for classification in an imbalanced dataset. 2019 URSI Asia-Pacific Radio Science Conference (AP-RASC), 1–4. https://doi.org/10.23919/URSIAP-RASC.2019.8738232
Li, X., Li, L., Chen, L., Zhang, T., Xiao, J., & Chen, L. (2022). Random Forest Estimation and Trend Analysis of PM2.5 Concentration over the Huaihai Economic Zone, China (2000–2020). Sustainability, 14(14), 8520. https://doi.org/10.3390/su14148520
Lubis, F. H., Fakhriza, F., & Putri, R. A. (2026). Analysis of Air Pollution Standard Index Using Support Vector Machine Algorithm. Building of Informatics, Technology and Science (BITS), 7(4), 2761–2770. https://doi.org/10.47065/bits.v7i4.9506
Mansyur, N. N., Arman, A., Gubu, L., Somayasa, W., & Aswani, A. (2024). Penerapan Metode Interpolasi Lagrange dalam Meramalkan Jumlah Pendapatan pada Percetakan (Studi Kasus: Gevira Advertising). Jurnal Matematika Komputasi dan Statistika, 4(1), 540–546. https://doi.org/10.33772/jmks.v4i1.80
Musu, W., Ibrahim, A., & Heriadi, H. (2021). Pengaruh Komposisi Data Training dan Testing terhadap Akurasi Algoritma C4.5. SISITI : Seminar Ilmiah Sistem Informasi dan Teknologi Informasi, 10(1), 186–195. https://doi.org/10.36774/sisiti.v10i1.802
Rahmat, R. W., Annas, S., & Rais, Z. (2023). Analisis Support Vector Regression (SVR) untuk meramalkan Indeks Kualitas Udara di Kota Makassar. VARIANSI: Journal of Statistics and Its application on Teaching and Research, 5(03), 104–117. https://doi.org/10.35580/variansiunm107
Soedjarwo, M., Arifin, B., Tahir, A. M., Farmasiantoro, A., Priyanto, I., & Fauzi, A. (2023). Daily prediction of air quality standard in Makassar city, Indonesia. AIP Conference Proceedings, 040008. https://doi.org/10.1063/5.0181597
Sun, J., Gong, J., & Zhou, J. (2021). Estimating hourly PM2.5 concentrations in Beijing with satellite aerosol optical depth and a random forest approach. Science of The Total Environment, 762, 144502. https://doi.org/10.1016/j.scitotenv.2020.144502
Yang, J., Han, S., & Chen, Y. (2023). Prediction of Traffic Accident Severity Based on Random Forest. Journal of Advanced Transportation, 2023, 1–8. https://doi.org/10.1155/2023/7641472
Yang, L., Xu, H., & Yu, S. (2020). Estimating PM2.5 concentrations in Yangtze River Delta region of China using random forest model and the Top-of-Atmosphere reflectance. Journal of Environmental Management, 272, 111061. https://doi.org/10.1016/j.jenvman.2020.111061
Yuliyanto, M. R., Wuryandari, T., & Utami, I. T. (2023). Peramalan Pendapatan Bulanan Menggunakan Fuzzy Time Series Chen Orde Tinggi. Jurnal Gaussian, 12(1), 61–70. https://doi.org/10.14710/j.gauss.12.1.61-70
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Muh. Basyar Izabi, Suwardi Annnas, Ansari Saleh Ahmar

This work is licensed under a Creative Commons Attribution 4.0 International License.
Muh. Basyar Izabi












