Evaluating Random Forest Regression for Air Quality Prediction

Authors

DOI:

https://doi.org/10.30812/varian.v9i1.6046

Keywords:

Air Pollution Prediction, Air Quality, Environmental Data Analysis, Machine Learning, Random Forest

Abstract

Air pollution is a growing environmental issue in Makassar due to rapid urban development and increasing transportation activity. This study aims to model and predict air pollutant concentrations using the Random Forest (RF) regression method. The data consist of daily PM2.5, PM10, CO, NO2, SO2, and O3 measurements from September 2024 to September 2025, totaling 395 observations. Missing values (14.05%) were addressed using a hybrid approach combining linear interpolation and multiple linear regression. The RF model was trained under two data-split scenarios (70:30 and 80:20) and evaluated using SMAPE, RMSE, MAE, and R2. The results show that the 80:20 configuration provides the best predictive accuracy. CO and O3 yield the most accurate predictions with SMAPE values of 9.75% and 10.87%, and R2 of 0.973 and 0.964, respectively. PM2.5 and PM10 also show strong performance, with R2 values above 0.84. These results indicate that the RF model effectively captures pollutant variability and provides reliable forecasts. Overall, Random Forest has been shown to be a robust and accurate method for predicting air quality in Makassar, supporting environmental monitoring and early warning systems. Despite its strong performance, this study is limited to two data-partition schemes and does not incorporate temporal deep-learning architectures. Future studies may investigate hybrid ensembles or deep learning approaches to determine whether incorporating sequential modeling further enhances predictive stability.

Downloads

Download data is not yet available.

Author Biographies

  • Suwardi Annnas, Universitas Negeri Makassar, Makassar, Indonesia

    Dean of the Faculty of Mathematics and Neural Science in State University of Makassar

  • Ansari Saleh Ahmar, Universitas Negeri Makassar, Makassar, Indonesia

    Director of Information and Communication Technology (ICT) in State University of Makassar

References

Al-Mahdawi, H. K., Alkattan, H., Subhi, A. A., Al-hadrawi, H. F., Abotaleb, M., Ali, G. K., Mijwil, M. M., Towfeek, A.-S. K., & Helal, A. H. (2023). Analysis and prediction of evaporation rates using random forest models: A case study of Almaty city. Babylonian Journal of Machine Learning, 2023, 55–64. https://doi.org/10.58496/BJML/2023/010

Alzu’bi, F., Al-Rawabdeh, A., & Almagbile, A. (2024). Predicting air quality using random forest: A case study in Amman-Zarqa. The Egyptian Journal of Remote Sensing and Space Sciences, 27(3), 604–613. https://doi.org/10.1016/j.ejrs.2024.07.004

Anggraini, R. A., Widagdo, G., Budi, A. S., & Qomaruddin, M. (2019). Penerapan Data Mining Classification untuk Data Blogger Menggunakan Metode Naïve Bayes. Jurnal Sistem dan Teknologi Informasi (JUSTIN), 7(1), 47. https://doi.org/10.26418/justin.v7i1.30211

Beck, E., & Wolf, M. (2026). Forecasting inflation with the hedged random forest. Empirical Economics, 70(2), 23. https://doi.org/10.1007/s00181-025-02879-x

Benedict, L. (2022). Prediksi Tingkat Kematian Covid-19 di Indonesia dengan menggunakan Metode Linear Regression. https://kc.umn.ac.id/id/eprint/22407/

Farhan, M., Sanusi, W., & Ihsan, H. (2024). Pemodelan Pencemaran Udara sebagai Solusi Penurunan Kualitas Udara Menggunakan Generalized Space-Time Autoregressive di Kota Makassar. Journal of Mathematics, Computations and Statistics, 7(2), 258–274. https://doi.org/10.35580/jmathcos.v7i2.4304

Hu, J., & Szymczak, S. (2023). A review on longitudinal data analysis with random forest. Briefings in Bioinformatics, 24(2), bbad002. https://doi.org/10.1093/bib/bbad002

Indartini, M., & Mutmainah, M. (2024). Analisis Data Kuantitatif: Uji Instrumen, Uji Asumsi Klasik, Uji Korelasi dan Regresi Linier Berganda. Penerbit Lakeisha. ISBN: 978-623-119-036-9

Irawan, Y. (2021). Penerapan Algoritma Decision Tree C4.5 untuk Memprediksi Kelayakan Calon Pendonor Melakukan Donor Darah dengan Klasifikasi Data Mining. JTIM : Jurnal Teknologi Informasi dan Multimedia, 2(4), 181–189. https://doi.org/10.35746/jtim.v2i4.75

Jose, C., & Gopakumar, G. (2019). An Improved Random Forest Algorithm for classification in an imbalanced dataset. 2019 URSI Asia-Pacific Radio Science Conference (AP-RASC), 1–4. https://doi.org/10.23919/URSIAP-RASC.2019.8738232

Li, X., Li, L., Chen, L., Zhang, T., Xiao, J., & Chen, L. (2022). Random Forest Estimation and Trend Analysis of PM2.5 Concentration over the Huaihai Economic Zone, China (2000–2020). Sustainability, 14(14), 8520. https://doi.org/10.3390/su14148520

Lubis, F. H., Fakhriza, F., & Putri, R. A. (2026). Analysis of Air Pollution Standard Index Using Support Vector Machine Algorithm. Building of Informatics, Technology and Science (BITS), 7(4), 2761–2770. https://doi.org/10.47065/bits.v7i4.9506

Mansyur, N. N., Arman, A., Gubu, L., Somayasa, W., & Aswani, A. (2024). Penerapan Metode Interpolasi Lagrange dalam Meramalkan Jumlah Pendapatan pada Percetakan (Studi Kasus: Gevira Advertising). Jurnal Matematika Komputasi dan Statistika, 4(1), 540–546. https://doi.org/10.33772/jmks.v4i1.80

Musu, W., Ibrahim, A., & Heriadi, H. (2021). Pengaruh Komposisi Data Training dan Testing terhadap Akurasi Algoritma C4.5. SISITI : Seminar Ilmiah Sistem Informasi dan Teknologi Informasi, 10(1), 186–195. https://doi.org/10.36774/sisiti.v10i1.802

Rahmat, R. W., Annas, S., & Rais, Z. (2023). Analisis Support Vector Regression (SVR) untuk meramalkan Indeks Kualitas Udara di Kota Makassar. VARIANSI: Journal of Statistics and Its application on Teaching and Research, 5(03), 104–117. https://doi.org/10.35580/variansiunm107

Soedjarwo, M., Arifin, B., Tahir, A. M., Farmasiantoro, A., Priyanto, I., & Fauzi, A. (2023). Daily prediction of air quality standard in Makassar city, Indonesia. AIP Conference Proceedings, 040008. https://doi.org/10.1063/5.0181597

Sun, J., Gong, J., & Zhou, J. (2021). Estimating hourly PM2.5 concentrations in Beijing with satellite aerosol optical depth and a random forest approach. Science of The Total Environment, 762, 144502. https://doi.org/10.1016/j.scitotenv.2020.144502

Yang, J., Han, S., & Chen, Y. (2023). Prediction of Traffic Accident Severity Based on Random Forest. Journal of Advanced Transportation, 2023, 1–8. https://doi.org/10.1155/2023/7641472

Yang, L., Xu, H., & Yu, S. (2020). Estimating PM2.5 concentrations in Yangtze River Delta region of China using random forest model and the Top-of-Atmosphere reflectance. Journal of Environmental Management, 272, 111061. https://doi.org/10.1016/j.jenvman.2020.111061

Yuliyanto, M. R., Wuryandari, T., & Utami, I. T. (2023). Peramalan Pendapatan Bulanan Menggunakan Fuzzy Time Series Chen Orde Tinggi. Jurnal Gaussian, 12(1), 61–70. https://doi.org/10.14710/j.gauss.12.1.61-70

Downloads

Published

2026-02-28

Issue

Section

Articles

How to Cite

[1]
“Evaluating Random Forest Regression for Air Quality Prediction”, JV, vol. 9, no. 1, pp. 77–84, Feb. 2026, doi: 10.30812/varian.v9i1.6046.

Most read articles by the same author(s)