Performance Comparison of Decision Tree, KNN, and Naive Bayes for Air Quality Classification

Yan Yang Thanri; Juli Iriani Iriani; Lili Tanti Tanti; Luthfi Zaidi Zaidi

doi:10.30812/matrik.v25i2.5121

Authors

Yan Yang Thanri Universitas Potensi Utama, Medan, Indonesia
Juli Iriani Universitas Potensi Utama, Medan, Indonesia
Lili Tanti Universitas Potensi Utama, Medan, Indonesia
Luthfi Zaidi Universitas Potensi Utama, Medan, Indonesia

DOI:

https://doi.org/10.30812/matrik.v25i2.5121

Keywords:

Air Quality Classification, Decision Tree, k-Nearest Neighbor, Naive Bayes, Stochastic Gradient Descent

Abstract

Air quality degradation has become a critical environmental and public health issue, necessitating accurate
and reliable classification models to support effective monitoring systems. This study aims to
conduct a comparative analysis of four machine learning algorithms-Decision Tree, k-Nearest Neighbor (kNN), Naive Bayes, and Stochastic Gradient Descent (SGD)-for classifying air quality using environmental parameters, including particulate matter ≤ 2.5 μm (PM2.5), carbon monoxide (CO), temperature, humidity, nitrogen dioxide (NO2), and sulfur dioxide (SO2). The methodology employs
supervised learning, where each model is trained and evaluated using classification accuracy, area under the receiver operating characteristic curve (AUC), F1-Score, precision, recall, and Matthews Correlation Coefficient (MCC), supported by ROC curve and confusion matrix analyses. The results show that the Decision Tree algorithm achieves the best overall performance, attaining a classification accuracy of 93.8% with a balanced precision, recall, and F1-Score, indicating strong and consistent predictive capability. The kNN and Naive Bayes models record the highest AUC values (0.980 and 0.982, respectively), demonstrating excellent class separability, although their accuracy and F1-Score are lower than those of the Decision Tree. In addition, the SGD model, implemented with a modified Huber loss function and L2 regularization, provides interpretable feature-weight analysis, identifying
PM2.5 and CO as dominant indicators of the Hazardous air quality class, while temperature and humidity significantly influence the Fair and Good classes. Based on the comprehensive evaluation, the Decision Tree algorithm is recommended as the most reliable model for accurate air quality classification, whereas the SGD model is particularly suitable for feature contribution analysis to enhance interpretability. These findings offer practical insights for selecting appropriate machine learning models in air quality monitoring and decision-support systems.

Downloads

Download data is not yet available.

References

[1] N. S. Gupta, Y. Mohta, K. Heda, R. Armaan, B. Valarmathi, and G. Arulkumaran, “Prediction of Air Quality Index Using

Machine Learning Techniques: A Comparative Analysis,” Journal of Environmental and Public Health, vol. 2023, pp. 1–26,

Jan. 2023, https://doi.org/10.1155/2023/4916267.

[2] E. X. Neo, K. Hasikin, K.W. Lai, M. I. Mokhtar, M. M. Azizan, H. F. Hizaddin, S. A. Razak, and Yanto, “Artificial intelligenceassisted

air quality monitoring for smart city management,” PeerJ Computer Science, vol. 9, p. e1306, May 2023, https://doi.

org/10.7717/peerj-cs.1306.

[3] S. Al-Eidi, F. Amsaad, O. Darwish, Y. Tashtoush, A. Alqahtani, and N. Niveshitha, “Comparative Analysis Study for Air

Quality Prediction in Smart Cities Using Regression Techniques,” IEEE Access, vol. 11, pp. 115 140–115 149, 2023, https:

//doi.org/10.1109/ACCESS.2023.3323447.

[4] A. Pant, S. Sharma, M. Bansal, and M. Narang, “Comparative Analysis of Supervised Machine Learning Techniques for AQI

Prediction,” in 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA). Coimbatore,

India: IEEE, march(04-05) 2022, pp. 1–4, https://doi.org/10.1109/ICACTA54488.2022.9753636.

[5] S. K. Sunori, P. B. Negi, and P. Juneja, “Estimation of Air Quality Index using AI and ML Techniques,” in 2023 International

Conference on Sustainable Communication Networks and Application (ICSCNA). Theni, India: IEEE, November (15-17)

2023, pp. 1078–1082, https://doi.org/10.1109/ICSCNA58489.2023.10370690.

[6] V. Behal and R. Singh, “Personalised healthcare model for monitoring and prediction of airpollution: Machine learning

approach,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 33, no. 3, pp. 425–449, May 2021, https:

//doi.org/10.1080/0952813X.2020.1744197.

[7] I. Dawar, M. Singal, V. Singh, S. Lamba, and S. Jain, “Predicting air quality index using machine learning: A case study

of the Himalayan city of Dehradun,” Natural Hazards, vol. 121, no. 5, pp. 5821–5847, Mar. 2025, https://doi.org/10.1007/

s11069-024-07027-9.

[8] A. Rowley and O. Karakus, “Predicting air quality via multimodal AI and satellite imagery,” Remote Sensing of Environment,

vol. 293, p. 113609, Aug. 2023, https://doi.org/10.1016/j.rse.2023.113609.

[9] P. Vongruang, K. Suppoung, S. Kirtsaeng, K. Prueksakorn, P. T. B. Thao, and S. Pimonsree, “Development of Meteorological

Criteria for Classifying PM2.5 Risk in a Coastal Industrial Province in Thailand,” Aerosol and Air Quality Research, vol. 24,

no. 10, p. 230321, 2024, https://doi.org/10.4209/aaqr.230321.

[10] S. Saminathan and C. Malathy, “Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological

data,” Frontiers in Big Data, vol. 6, p. 1175259, Jun. 2023, https://doi.org/10.3389/fdata.2023.1175259.

[11] I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” IEEE Access,

vol. 10, pp. 99 129–99 149, 2022, https://doi.org/10.1109/ACCESS.2022.3207287.

[12] A. S. Handayani, S. Soim, T. E. Agusdi, and N. L. Husni, “Air Quality Classification Using Support Vector Machine,” Computer

Engineering and Applications Journal, vol. 10, no. 1, pp. 55–69, Feb. 2021, https://doi.org/10.18495/comengapp.v10i1.350.

[13] A. Demircioglu, “The effect of feature normalization methods in radiomics,” Insights into Imaging, vol. 15, no. 1, p. 2, Jan.

2024, https://doi.org/10.1186/s13244-023-01575-7.

[14] T. Bikaun, T. French, M. Hodkiewicz, M. Stewart, and W. Liu, “LexiClean: An annotation tool for rapid multi-task lexical

normalisation,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System

Demonstrations. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, November, 2021,

pp. 212–219, https://doi.org/10.18653/v1/2021.emnlp-demo.25.

[15] M. Sivananda and D. G. K. Kumar, “Classification and Regression Based on Decision Tree Algorithm for Machine Learning,”

Interantional Journal of Scientific Research in Engineering and Management, vol. 08, no. 02, pp. 1–13, Feb. 2024, https:

//doi.org/10.55041/IJSREM28533.

[16] N. Kokash and L. Makhnist, “Using Decision Trees for Interpretable Supervised Clustering,” SN Computer Science, vol. 5,

no. 2, p. 268, Feb. 2024, https://doi.org/10.1007/s42979-023-02590-7.

[17] R. Kumar, B. Krishna Goswami, S. Motiram Mhatre, and S. Agrawal, “Naive Bayes in Focus: A Thorough Examination of

its Algorithmic Foundations and Use Cases,” International Journal of Innovative Science and Research Technology (IJISRT),

vol. 9, no. 5, pp. 2078–2081, Jun. May, 2024, https://doi.org/10.38124/ijisrt/IJISRT24MAY1438.

[18] L.-K. Foo, S.-L. Chua, and N. Ibrahim, “Attribute weighted naive bayes classifier,” Comput. Mater. Contin, vol. 71, no. 1, pp.

1945–1957, 2022, https://doi.org/10.32604/cmc.2022.022011.

[19] P. Dilliswar Reddy and L. Rama Parvathy, “Prediction of Air Pollution Level in Particular Region Area Using Logistic Regression

and Naive Bayes,” in Advances in Parallel Computing, D. J. Hemanth, T. N. Nguyen, J. Indumathi, and S. Lakshmanan,

Eds. IOS Press, Nov. 2022, vol. 41, no. 1, https://doi.org/10.3233/APC220088.

[20] M. Suyal and P. Goyal, “A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based

on Supervised Learning,” International Journal of Engineering Trends and Technology, vol. 70, no. 7, pp. 43–48, Jul. 2022,

https://doi.org/10.14445/22315381/IJETT-V70I7P205.

[21] C. Feng, B. Zhao, X. Zhou, X. Ding, and Z. Shan, “An Enhanced Quantum K-Nearest Neighbor Classification Algorithm Based

on Polar Distance,” Entropy, vol. 25, no. 1, p. 127, Jan. 2023, https://doi.org/10.3390/e25010127.

[22] M. Gavidia-Calderon, D. Schuch, A. Vara-Vela, R. Inoue, E. D. Freitas, T. T. D. A. Albuquerque, Y. Zhang, M. D. F. Andrade,

and M. L. Bell, “Air quality modeling in the metropolitan area of Sao Paulo, Brazil: A review,” Atmospheric Environment, vol.

319, p. 120301, Feb. 2024, https://doi.org/10.1016/j.atmosenv.2023.120301.

[23] S. Li and S. Qu, “Fund Performance Evaluation Based on Bayesian Model and Machine Learning Algorithm,” Discrete Dynamics

in Nature and Society, vol. 2022, no. 1, p. 2467521, Jan. 2022, https://doi.org/10.1155/2022/2467521.

Performance Comparison of Decision Tree, KNN, and Naive Bayes for Air Quality Classification

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

sidebar menu 2

tools

citation