Performance Comparison of Decision Tree, KNN, and Naive Bayes for Air Quality Classification
DOI:
https://doi.org/10.30812/matrik.v25i2.5121Keywords:
Air Quality Classification, Decision Tree, k-Nearest Neighbor, Naive Bayes, Stochastic Gradient DescentAbstract
Air quality degradation has become a critical environmental and public health issue, necessitating accurate
and reliable classification models to support effective monitoring systems. This study aims to
conduct a comparative analysis of four machine learning algorithms-Decision Tree, k-Nearest Neighbor (kNN), Naive Bayes, and Stochastic Gradient Descent (SGD)-for classifying air quality using environmental parameters, including particulate matter ≤ 2.5 μm (PM2.5), carbon monoxide (CO), temperature, humidity, nitrogen dioxide (NO2), and sulfur dioxide (SO2). The methodology employs
supervised learning, where each model is trained and evaluated using classification accuracy, area under the receiver operating characteristic curve (AUC), F1-Score, precision, recall, and Matthews Correlation Coefficient (MCC), supported by ROC curve and confusion matrix analyses. The results show that the Decision Tree algorithm achieves the best overall performance, attaining a classification accuracy of 93.8% with a balanced precision, recall, and F1-Score, indicating strong and consistent predictive capability. The kNN and Naive Bayes models record the highest AUC values (0.980 and 0.982, respectively), demonstrating excellent class separability, although their accuracy and F1-Score are lower than those of the Decision Tree. In addition, the SGD model, implemented with a modified Huber loss function and L2 regularization, provides interpretable feature-weight analysis, identifying
PM2.5 and CO as dominant indicators of the Hazardous air quality class, while temperature and humidity significantly influence the Fair and Good classes. Based on the comprehensive evaluation, the Decision Tree algorithm is recommended as the most reliable model for accurate air quality classification, whereas the SGD model is particularly suitable for feature contribution analysis to enhance interpretability. These findings offer practical insights for selecting appropriate machine learning models in air quality monitoring and decision-support systems.
Downloads
References
[1] N. S. Gupta, Y. Mohta, K. Heda, R. Armaan, B. Valarmathi, and G. Arulkumaran, “Prediction of Air Quality Index Using
Machine Learning Techniques: A Comparative Analysis,” Journal of Environmental and Public Health, vol. 2023, pp. 1–26,
Jan. 2023, https://doi.org/10.1155/2023/4916267.
[2] E. X. Neo, K. Hasikin, K.W. Lai, M. I. Mokhtar, M. M. Azizan, H. F. Hizaddin, S. A. Razak, and Yanto, “Artificial intelligenceassisted
air quality monitoring for smart city management,” PeerJ Computer Science, vol. 9, p. e1306, May 2023, https://doi.
org/10.7717/peerj-cs.1306.
[3] S. Al-Eidi, F. Amsaad, O. Darwish, Y. Tashtoush, A. Alqahtani, and N. Niveshitha, “Comparative Analysis Study for Air
Quality Prediction in Smart Cities Using Regression Techniques,” IEEE Access, vol. 11, pp. 115 140–115 149, 2023, https:
//doi.org/10.1109/ACCESS.2023.3323447.
[4] A. Pant, S. Sharma, M. Bansal, and M. Narang, “Comparative Analysis of Supervised Machine Learning Techniques for AQI
Prediction,” in 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA). Coimbatore,
India: IEEE, march(04-05) 2022, pp. 1–4, https://doi.org/10.1109/ICACTA54488.2022.9753636.
[5] S. K. Sunori, P. B. Negi, and P. Juneja, “Estimation of Air Quality Index using AI and ML Techniques,” in 2023 International
Conference on Sustainable Communication Networks and Application (ICSCNA). Theni, India: IEEE, November (15-17)
2023, pp. 1078–1082, https://doi.org/10.1109/ICSCNA58489.2023.10370690.
[6] V. Behal and R. Singh, “Personalised healthcare model for monitoring and prediction of airpollution: Machine learning
approach,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 33, no. 3, pp. 425–449, May 2021, https:
//doi.org/10.1080/0952813X.2020.1744197.
[7] I. Dawar, M. Singal, V. Singh, S. Lamba, and S. Jain, “Predicting air quality index using machine learning: A case study
of the Himalayan city of Dehradun,” Natural Hazards, vol. 121, no. 5, pp. 5821–5847, Mar. 2025, https://doi.org/10.1007/
s11069-024-07027-9.
[8] A. Rowley and O. Karakus, “Predicting air quality via multimodal AI and satellite imagery,” Remote Sensing of Environment,
vol. 293, p. 113609, Aug. 2023, https://doi.org/10.1016/j.rse.2023.113609.
[9] P. Vongruang, K. Suppoung, S. Kirtsaeng, K. Prueksakorn, P. T. B. Thao, and S. Pimonsree, “Development of Meteorological
Criteria for Classifying PM2.5 Risk in a Coastal Industrial Province in Thailand,” Aerosol and Air Quality Research, vol. 24,
no. 10, p. 230321, 2024, https://doi.org/10.4209/aaqr.230321.
[10] S. Saminathan and C. Malathy, “Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological
data,” Frontiers in Big Data, vol. 6, p. 1175259, Jun. 2023, https://doi.org/10.3389/fdata.2023.1175259.
[11] I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” IEEE Access,
vol. 10, pp. 99 129–99 149, 2022, https://doi.org/10.1109/ACCESS.2022.3207287.
[12] A. S. Handayani, S. Soim, T. E. Agusdi, and N. L. Husni, “Air Quality Classification Using Support Vector Machine,” Computer
Engineering and Applications Journal, vol. 10, no. 1, pp. 55–69, Feb. 2021, https://doi.org/10.18495/comengapp.v10i1.350.
[13] A. Demircioglu, “The effect of feature normalization methods in radiomics,” Insights into Imaging, vol. 15, no. 1, p. 2, Jan.
2024, https://doi.org/10.1186/s13244-023-01575-7.
[14] T. Bikaun, T. French, M. Hodkiewicz, M. Stewart, and W. Liu, “LexiClean: An annotation tool for rapid multi-task lexical
normalisation,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System
Demonstrations. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, November, 2021,
pp. 212–219, https://doi.org/10.18653/v1/2021.emnlp-demo.25.
[15] M. Sivananda and D. G. K. Kumar, “Classification and Regression Based on Decision Tree Algorithm for Machine Learning,”
Interantional Journal of Scientific Research in Engineering and Management, vol. 08, no. 02, pp. 1–13, Feb. 2024, https:
//doi.org/10.55041/IJSREM28533.
[16] N. Kokash and L. Makhnist, “Using Decision Trees for Interpretable Supervised Clustering,” SN Computer Science, vol. 5,
no. 2, p. 268, Feb. 2024, https://doi.org/10.1007/s42979-023-02590-7.
[17] R. Kumar, B. Krishna Goswami, S. Motiram Mhatre, and S. Agrawal, “Naive Bayes in Focus: A Thorough Examination of
its Algorithmic Foundations and Use Cases,” International Journal of Innovative Science and Research Technology (IJISRT),
vol. 9, no. 5, pp. 2078–2081, Jun. May, 2024, https://doi.org/10.38124/ijisrt/IJISRT24MAY1438.
[18] L.-K. Foo, S.-L. Chua, and N. Ibrahim, “Attribute weighted naive bayes classifier,” Comput. Mater. Contin, vol. 71, no. 1, pp.
1945–1957, 2022, https://doi.org/10.32604/cmc.2022.022011.
[19] P. Dilliswar Reddy and L. Rama Parvathy, “Prediction of Air Pollution Level in Particular Region Area Using Logistic Regression
and Naive Bayes,” in Advances in Parallel Computing, D. J. Hemanth, T. N. Nguyen, J. Indumathi, and S. Lakshmanan,
Eds. IOS Press, Nov. 2022, vol. 41, no. 1, https://doi.org/10.3233/APC220088.
[20] M. Suyal and P. Goyal, “A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based
on Supervised Learning,” International Journal of Engineering Trends and Technology, vol. 70, no. 7, pp. 43–48, Jul. 2022,
https://doi.org/10.14445/22315381/IJETT-V70I7P205.
[21] C. Feng, B. Zhao, X. Zhou, X. Ding, and Z. Shan, “An Enhanced Quantum K-Nearest Neighbor Classification Algorithm Based
on Polar Distance,” Entropy, vol. 25, no. 1, p. 127, Jan. 2023, https://doi.org/10.3390/e25010127.
[22] M. Gavidia-Calderon, D. Schuch, A. Vara-Vela, R. Inoue, E. D. Freitas, T. T. D. A. Albuquerque, Y. Zhang, M. D. F. Andrade,
and M. L. Bell, “Air quality modeling in the metropolitan area of Sao Paulo, Brazil: A review,” Atmospheric Environment, vol.
319, p. 120301, Feb. 2024, https://doi.org/10.1016/j.atmosenv.2023.120301.
[23] S. Li and S. Qu, “Fund Performance Evaluation Based on Bayesian Model and Machine Learning Algorithm,” Discrete Dynamics
in Nature and Society, vol. 2022, no. 1, p. 2467521, Jan. 2022, https://doi.org/10.1155/2022/2467521.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Yan Yang Thanri, Juli Iriani, Lili Tanti, Luthfi Zaidi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Yan Yang Thanri
.png)











