Enhancing Multiple Linear Regression with Stacking Ensemble for Dissolved Oxygen Estimation
DOI:
https://doi.org/10.30812/matrik.v24i1.4280Keywords:
Dissolved oxygen, Multiple Linear Regression, Stacking EnsembleAbstract
Maintaining optimal dissolved oxygen levels is essential for aquatic ecosystems, yet industrial and domestic waste has led to a global decline in dissolved oxygen. Traditional measurement methods, such as oxygen meters and Winkler titration, are often costly or time-consuming. This study aims to improve the Root Mean Square Error, Mean Absolute Error, and R2 values for estimating dissolved oxygen levels. The research method uses Multiple Linear Regression with various training and testing data splits, both before and after applying polynomial features. The model is further optimized using a stacking technique, with Random Forest Regressor and Gradient Booster Regressor as base models.
The results show that the best model was achieved using the stacking ensemble technique with a 90:10 data split and polynomial features, yielding a Root Mean Square Error of 1.206, Mean Absolute Error of 0.990, and R2 of 0.670. This model has also met the assumptions of linear regression, such as residual normality, homoscedasticity, and no autocorrelation of residuals. This study concluded that the ensemble stacking technique and the addition of polynomial features could improve the model in estimating dissolved oxygen values and also contribute by providing an accessible user interface using the Gradio Framework, allowing users to estimate dissolved oxygen levels effectively.
Downloads
References
[2] B. Ali, . A., and A. Mishra, “Effects of dissolved oxygen concentration on freshwater fish: A review,†Int J Fish Aquat Stud, vol. 10, no. 4, pp. 113–127, 2022, https://doi.org/10.22271/fish.2022.v10.i4b.2693.
[3] C. Garcia-Soto et al., “An Overview of Ocean Climate Change Indicators: Sea Surface Temperature, Ocean Heat Content, Ocean pH, Dissolved Oxygen Concentration, Arctic Sea Ice Extent, Thickness and Volume, Sea Level and Strength of the AMOC (Atlantic Meridional Overturning Circula,†Front Mar Sci, vol. 8, no. September, 2021, https://doi.org/10.3389/fmars.2021.642372.
[4] K. M. Abbott, P. A. Zaidel, A. H. Roy, K. M. Houle, and K. H. Nislow, “Investigating impacts of small dams and dam removal on dissolved oxygen in streams,†PLoS One, vol. 17, no. 11 November, pp. 1–23, 2022, http://dx.doi.org/10.1371/journal.pone.0277647.
[5] J. C. C. Casila, M. D. Nicolas, M. Duka, S. Haddout, K. L. Priya, and K. Yokoyama, “Assessing dissolved oxygen dynamics in Pasig River, Philippines: A HEC-RAS modeling approach during the COVID-19 pandemic,†Water Pract Technol, vol. 19, no. 4, pp. 1365–1381, 2024, https://doi.org/10.2166/wpt.2024.078.
[6] H. Wang, L. Zhang, R. Wu, and H. Zhao, “Enhancing Dissolved Oxygen Concentrations Prediction in Water Bodies: A Temporal Transformer Approach with Multi-Site Meteorological Data Graph Embedding,†Water (Switzerland), vol. 15, no. 17, 2023, https://doi.org/10.3390/w15173029.
[7] E. Prasetyo, M. F. Al-adni, and R. F. Tias, “Classification of Cash Direct Recipients Using the Naive Bayes with Smoothing,†Matrik: Jurnal Manajemen, Teknik Informatika, dan Rekayasa Komputer, vol. 23, no. 3, pp. 615–626, 2024, https://doi.org/10.30812/matrik.v23i3.3584.
[8] X. Shu and Y. Ye, “Knowledge Discovery: Methods from data mining and machine learning,†Soc Sci Res, vol. 110, no. October 2022, p. 102817, 2023, https://doi.org/10.1016/j.ssresearch.2022.102817.
[9] H. Santoso, H. Magdalena, and H. Wardhana, “Aplikasi Dynamic Cluster pada K-Means BerbasisWeb untuk Klasifikasi Data Industri Rumahan,†MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 541–554, 2022, https://doi.org/10.30812/matrik.v21i3.1720.
[10] Z. Liu, H. Gao, M. Zhang, R. Yan, and J. Liu, “A data mining method to extract traffic network for maritime transport management,†Ocean Coast Manag, vol. 239, no. February, p. 106622, 2023, https://doi.org/10.1016/j.ocecoaman.2023.106622.
[11] A. Nugroho and Y. Religia, “Analisis Optimasi Algoritma Klasifikasi Naive Bayes menggunakan Genetic Algorithm dan Bagging,†Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 3, pp. 504–510, 2021, https://doi.org/10.29207/resti.v5i3.3067.
[12] K. Aulakh, R. K. Roul, and M. Kaushal, “E-learning enhancement through educational data mining with Covid-19 outbreak period in backdrop: A review,†Int J Educ Dev, vol. 101, no. March, p. 102814, 2023, https://doi.org/10.1016/j.ijedudev.2023.102814.
[13] Yoga Religia, Agung Nugroho, and Wahyu Hadikristanto, “Klasifikasi Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing,†Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 187–192, 2021, https://doi.org/10.29207/resti.v5i1.2813.
[14] Ondra Eka Putra and Randy Permana, “Hybrid Data Mining For Member Determination And Financing Prediction In Syariah Financing Saving And Loan Cooperatives,†Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 8, no. 2, pp. 309–320, 2024, https://doi.org/10.29207/resti.v8i2.5683.
[15] D. Feng, Q. Han, L. Xu, F. Sohel, S. G. Hassan, and S. Liu, “An ensembled method for predicting dissolved oxygen level in aquaculture environment,†Ecol Inform, vol. 80, no. August 2023, p. 102501, 2024, https://doi.org/10.1016/j.ecoinf.2024.102501.
[16] J. Huang, S. Liu, S. G. Hassan, L. Xu, and C. Huang, “A hybrid model for short-term dissolved oxygen content prediction,†Comput Electron Agric, vol. 186, no. May, p. 106216, 2021, https://doi.org/10.1016/j.compag.2021.106216.
[17] A. Chatziantoniou, S. Charalampis Spondylidis, O. Stavrakidis-Zachou, N. Papandroulakis, and K. Topouzelis, “Dissolved oxygen estimation in aquaculture sites using remote sensing and machine learning,†Remote Sens Appl, vol. 28, no. July, p. 100865, 2022, https://doi.org/10.1016/j.rsase.2022.100865.
[18] J. Liang, “Multivariate linear regression method based on SPSS analysis of influencing factors of CPI during epidemic situation,†Proceedings - 2020 2nd International Conference on Economic Management and Model Engineering, ICEMME 2020, vol., no., pp. 294–297, 2020, https://doi.org/10.1109/ICEMME51517.2020.00062.
[19] Z. Zhao, Y. Peng, X. Zhu, X. Wei, X. Wang, and J. Zuo, “Research on prediction of electricity consumption in smart parks based on multiple linear regression,†vol. 2020, no., pp. 812–816, 2020, https://doi.org/10.1109/ITAIC49862.2020.9338976.
[20] D. Alita, A. D. Putra, and D. Darwis, “Analysis of classic assumption test and multiple linear regression coefficient test for employee structural office recommendation,†IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, p. 295, 2021, https://doi.org/10.22146/ijccs.65586.
[21] K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,†Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, 2022, https://doi.org/10.1016/j.gltp.2022.04.020.
[22] J. Hu, “Data cleaning and feature selection for gravelly soil liquefaction,†Soil Dynamics and Earthquake Engineering, vol. 145, no. March, p. 106711, 2021, https://doi.org/10.1016/j.soildyn.2021.106711.
[23] H. Mende, M. Frye, P. A. Vogel, S. Kiroriwal, R. H. Schmitt, and T. Bergs, “On the importance of domain expertise in feature engineering for predictive product quality in production,†Procedia CIRP, vol. 118, no., pp. 1096–1101, 2023, https://doi.org/10.1016/j.procir.2023.06.188.
[24] D. Dallah and H. Sulieman, “Outlier Detection Using the Range Distribution BT - Advances in Mathematical Modeling and Scientific Computing,†F. Kamalov, R. Sivaraj, and H.-H. Leung, Eds., Cham: Springer International Publishing, vol., no. pp. 687–697, 2024. [Online]. Available: https://link.springer.com/book/10.1007/978-3-031-41420-6
[25] V. N. G. Raju, K. P. Lakshmi, V. M. Jain, A. Kalidindi, and V. Padma, “Study the Influence of Normalization/Transformation process on the Accuracy of Supervised Classification,†Proceedings of the 3rd International Conference on Smart Systems and Inventive Technology, ICSSIT 2020, vol., no. Icssit, pp. 729–735, 2020, https://doi.org/10.1109/ICSSIT48917.2020.9214160.
[26] F. A. S. H et al., “Application of the Polynomial Regression Algorithm to Predict Covid-19 Cases Per Day in Colombia,†vol. 9, no. 3, pp. 49–61, 2021, [Online]. Available: https://advancesinmechanics.com/view-97.php
[27] J. Y. Chan et al., “Mitigating the multicollinearity problem and its machine learning approach : A review,†Mathematics, vol. 10, no. 8, p. 1283, 2022, https://doi.org/10.3390/math10081283.
[28] M. Greenacre, P. J. F. Groenen, T. Hastie, A. I. D’Enza, A. Markos, and E. Tuzhilina, “Principal component analysis,†Nature Reviews Methods Primers, vol. 2, no. 1, p. 100, 2022, https://doi.org/10.1038/s43586-022-00184-w.
[29] K. Lee, S. Im, and B. Lee, “Prediction of renewable energy hosting capacity using multiple linear regression in KEPCO system,†Energy Reports, vol. 9, no. S12, pp. 343–347, 2023, https://doi.org/10.1016/j.egyr.2023.09.121.
[30] T. O. Hodson, “Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not,†Geosci Model Dev, vol. 15, no. 14, pp. 5481–5487, 2022, https://doi.org/10.5194/gmd-15-5481-2022.
[31] A. Monter-Pozos and E. González-Estrada, “On testing the skew normal distribution by using Shapiro–Wilk test,†J Comput Appl Math, vol. 440, p. 115649, 2024, doi: https://doi.org/10.1016/j.cam.2023.115649.
[32] Y. Y. Zhao, J. Q. Zhao, and S. A. Qian, “A new test for heteroscedasticity in single-index models,†J Comput Appl Math, vol. 381, no., p. 112993, 2020, https://doi.org/10.1016/j.cam.2020.112993.
[33] A. Katsileros, N. Antonetsis, P. Mouzaidis, E. Tani, P. J. Bebeli, and A. Karagrigoriou, “A comparison of tests for homoscedasticity using simulation and empirical data,†Commun Stat Appl Methods, vol. 31, no. 1, pp. 1–35, 2024, https://doi.org/10.29220/CSAM.2024.31.1.001.
[34] S. S. Uyanto, “Power comparisons of five most commonly used autocorrelation tests,†Pakistan Journal of Statistics and Operation Research, vol. 16, no. 1, pp. 119–130, 2020, https://doi.org/10.18187/PJSOR.V16I1.2691.
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Vivin Nur Aziza, Utami Dyah Syafitri, Anwar Fitrianto, Optimizing Currency Circulation Forecasts in Indonesia: A Hybrid Prophet- Long Short Term Memory Model with Hyperparameter Tuning , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Fristi Riandari, Hengki Tamando Sihotang, Husain Husain, Forecasting the Number of Students in Multiple Linear Regressions , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 2 (2022)
- M Safii, Rika Setiana, Population Prediction Using Multiple Regression and Geometry Models Based on Demographic Data , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Mamluatul Hani'ah, Moch Zawaruddin Abdullah, Wilda Imama Sabilla, Syafaat Akbar, Dikky Rahmad Shafara, Google Trends and Technical Indicator based Machine Learning for Stock Market Prediction , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 22 No. 2 (2023)
- Mardiana Mardiana, Eka Hartati, Analisis Pengukuran Tingkat Kepuasan Pengguna Terhadap Penerapan Aplikasi SISKEUDES Pada Kabupaten Banyuasin Sumatera Selatan , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 1 (2019)
- Sucipto Sucipto, Didik Dwi Prasetya, Triyanna Widiyaningtyas, Educational Data Mining: Multiple Choice Question Classification in Vocational School , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 23 No. 2 (2024)
- Wahyu Styo Pratama, Didik Dwi Prasetya, Triyanna Widyaningtyas, Muhammad Zaki Wiryawan, Lalu Ganda Rady Putra, Tsukasa Hirashima, Performance Evaluation of Artificial Intelligence Models for Classification in Concept Map Quality Assessment , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 3 (2025)
- Dewa Ayu Kadek Pramita, Ni Wayan Sumartini Saraswati, I Putu Dedy Sandana, Poria Pirozmand, I Kadek Agus Bisena, Optimizing Hotel Room Occupancy Prediction Using an Enhanced Linear Regression Algorithms , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 24 No. 1 (2024)
- Bakhtiyar Hadi Prakoso, Implementasi Support Vector Regression pada Prediksi Inflasi Indeks Harga Konsumen , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 19 No. 1 (2019)
- Abd Mizwar A Rahim, Andi Sunyoto, Muhammad Rudyanto Arief, Stroke Prediction Using Machine Learning Method with Extreme Gradient Boosting Algorithm , MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer: Vol. 21 No. 3 (2022)
You may also start an advanced similarity search for this article.