Optimizing Rain Prediction Model Using Random Forest and Grid Search Cross-Validation for Agriculture Sector
Abstract
Agriculture, as a sector that is highly influenced by weather conditions, faces challenges due to increasingly unpredictable changes in weather patterns. The aim of this research is to create an optimal rainfall prediction model to help farmers create irrigation schedules, use fertilizer, and planting schedules, and protect plants from extreme weather events. The method used in this research to obtain the best rain prediction model is to use the random forest algorithm and the grid search cross-validation algorithm. Random Forest, known for its robustness and accuracy, emerged as a suitable algorithm for predicting rain. utilizing a substantial dataset from the West Nusa Tenggara Meteorology, Climatology, and Geophysics Agency covering the period 2000 to 2023. The data is then processed first to ensure its readiness for use. This process involves removing outlier data points, empty data entries, and unused features. After the preprocessing stage, the data underwent training using the Random Forest algorithm, resulting in an R-squared value of 0.1334. To obtain the optimal model, Grid Search Cross Validation is used. The results of this research obtained the best rain prediction model with an R-squared value of 0.0268. This model will be used to predict rain in the agricultural sector. This research concludes that we can get the best rain prediction model by combining Random Forest and Gird Search Cross-Validation. For further research, we can compare other rain prediction methods, add features, and combine datasets from a wider area.
Downloads
References
[2] M. R. Aprillya and U. Chasanah, “Geographic Information System Multi Attribute Utility Theory for Flood Mitigation in Agricultural Sector,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, Vol. 22, No. 1, PP. 117–182, 2022, DOI: https://doi.org/10.30812/matrik.v22i1.1511
[3] G. Meti, R. K. G. Krishnegowda, and G. S. Swamy, “Rainfall analysis and prediction using ensemble learning for Karnataka State,” Indonesian Journal of Electrical Engineering and Computer Science, Vol. 32, No. 2, PP. 1187–1198, Nov. 2023, DOI: http://doi.org/10.11591/ijeecs.v32.i2.pp1187-1198
[4] T. P. Agyekum, P. Antwi-Agyei, and A. J. Dougill, “The contribution of weather forecast information to agriculture, water, and energy sectors in East and West Africa: A systematic review,” Frontiers in Environmental Science, Vol. 10, No. 1, PP. 1–14, 2022. DOI: https://doi.org/10.3389/fenvs.2022.935696
[5] E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, and F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, Vol. 21, No. 3, PP. 677–690, 2022, doi: https://doi.org/10.30812/matrik.v21i3.1726
[6] P. R. Sihombing and I. F. Yuliati, “Penerapan Metode Machine Learning dalam Klasifikasi Risiko Kejadian Berat Badan Lahir Rendah di Indonesia,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, Vol. 20, No. 2, PP. 417–426, 2021, DOI: https://doi.org/10.30812/matrik.v20i2.1174
[7] M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” The Stata Journal, Vol. 20, No. 1, PP. 3–29, 2020, DOI: https://doi.org/10.1177/1536867X20909688
[8] H. Yao, X. Li, H. Pang, L. Sheng, and W. Wang, “Application of random forest algorithm in hail forecasting over Shandong Peninsula,” Atmospheric Research, Vol. 244, No. 6, PP. 1-39, 2020, DOI: https://doi.org/10.1016/j.atmosres.2020.105093
[9] V. Shalamov, V. Efimova, and A. Filchenkov, “Faster Hyperparameter Optimization via Finding Minimal Regions in Random Forest Regressor,” Procedia Computer Science, Vol. 212, No. 1, PP. 378–386, 2022. DOI: https://doi.org/10.1016/j.procs.2022.11.022.
[10] R. Rofik, R. A. Hakim, J. Unjung, B. Prasetiyo, and M. A. Muslim, “Optimization of SVM and Gradient Boosting Models Using GridSearchCV in Detecting Fake Job Postings,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, Vol. 23, No. 2, PP. 419–430, 2024, DOI: https://doi.org/10.30812/matrik.v23i2.3566.
[11] S. Rasheed, G. K. Kumar, D. M. Rani, M. V. V. P. Kantipudi, and M. Anila, “Heart Disease Prediction Using GridSearchCV and Random Forest,” EAI Endorsed Transactions on Pervasive Health and Technology, Vol. 10, No. 1, PP. 1–8, 2024, DOI: https://doi.org/10.4108/eetpht.10.5523.
[12] S. George and B. Sumathi, “Grid Search Tuning of Hyperparameters in Random Forest Classifier for Customer Feedback Sentiment Prediction,” International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 11, No. 9, PP. 173-178, 2020. DOI: https://doi.org/10.14569/IJACSA.2020.0110920.
[13] R. Meenal, P. A. Michael, D. Pamela, and E. Rajasekaran, “Weather prediction using random forest machine learning model,” Indonesian Journal of Electrical Engineering and Computer Science, Vol. 22, No. 2, PP. 1208–1215, 2021, DOI: http://doi.org/10.11591/ijeecs.v22.i2.pp1208-1215.
[14] C. D. Usman, A. P. Widodo, K. Adi, and R. Gernowo, “Rainfall prediction model in Semarang City using machine learning,” Indonesian Journal of Electrical Engineering and Computer Science, Vol. 30, No. 2, PP. 1224–1231, 2023, DOI: http://doi.org/10.11591/ijeecs.v30.i2.pp1224-1231.
[15] A. J. Hill, G. R. Herman, and R. S. Schumacher, “Forecasting severe weather with random forests,” Monthly Weather Review, Vol. 148, No. 5, PP. 2135–2161, 2020, DOI: http://doi.org/10.1175/MWR-D-19-0344.1.
[16] E. D. Loken, A. J. Clark, and A. McGovern, “Comparing and Interpreting Differently Designed Random Forests for Next-Day Severe Weather Hazard Prediction,” Weather and Forecasting, Vol. 37, No. 6, PP. 871–899, 2022, DOI: http://doi.org/10.1175/waf-d-21-0138.1.
[17] S. Efendi and P. Sihombing, “Sentiment Analysis of Food Order Tweets to Find Out Demographic Customer Profile Using SVM,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, Vol. 21, No. 3, PP. 583–594, 2022, DOI: http://doi.org/10.30812/matrik.v21i3.1898.
[18] C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, “A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,” Frontiers in Energy Research, Vol. 9, No. 29, PP. 1-17, 2021. DOI: https://doi.org/10.3389/fenrg.2021.652801.
[19] Y. Shaikh, V. Parvati, and S. R. Biradar, “Early disease prediction algorithm for hypertension-based diseases using data aware algorithms,” Indonesian Journal of Electrical Engineering and Computer Science, Vol. 27, No. 2, PP. 1100–1108, 2022, DOI: http://doi.org/10.11591/ijeecs.v27.i2.pp1100-1108.
[20] R. Siddabathuni, S. Palanivel, and G. L. N. Murthy, “Alzheimer image registration using hybrid random forest and deep regression network algorithm,” Indonesian Journal of Electrical Engineering and Computer Science, Vol. 33, No. 2, PP. 824–831, 2024, DOI: http://doi.org/10.11591/ijeecs.v33.i2.pp824-831.
[21] M. Adnan, A. A. S. Alarood, M. I. Uddin, and I. ur Rehman, “Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models,” PeerJ Computer Science, Vol. 8, No. 1 PP. 1–29, 2022, DOI: http://doi.org/10.7717/PEERJ-CS.803.
[22] N. W. Hesty, D. G. Cendrawati, A. Aminuddin, B. Pranoto, and A. Fahim, “Estimasi Potensi Energi Angin Indonesia Menggunakan Model Weather Research and Forecast - Four Dimension Data Assimiliation (WRF-FDDA),” Jurnal Sains Dirgantara, Vol. 19, No. 2, PP. 11–20, 2022, DOI: http://doi.org/10.30536/j.jsd.2022.v19.3614.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.