Kernel Nonparametric Regression for Forecasting Local Original Income

Regional Original Revenue (ROR) is an income collected based on regional regulations under statutory regulations. ROR aims to give authority to Regional Governments to sponsor the implementation of regional autonomy following regional potential. Every year, the Central Lombok Regency government sets ROR targets to assist the government in formulating regional policies. The targets set by the government are sometimes not following their realization. This study aims to determine a model that can be used in forecasting ROR targets. One way to predict the value of ROR is by using a nonparametric regression approach. This approach is ﬂexible since it is not dependent on a particular model. The use of the nonparametric kernel regression method with the Gaussian kernel function obtained a minimum GCV value of 1,769688931 with an optimum bandwidth value of h 1 of 0,212740452 and h 2 of 0,529682589. Modeling with optimum bandwidth produces a coefﬁcient of determination of 87,55%. The best model is used for forecasting and produces a MAPE value of 5,4%. The analysis results show that what inﬂuences the value of ROR is ROR receipts in the previous month and the previous 12 months. Accredited


B. RESEARCH METHOD
The data used is secondary data obtained from the Office of the Central Lombok Regency Revenue Management Agency which is located at Jalan Raden Puguh PrayaPuyung 2, Praya. The variables used are data on the amount of ROR receipts in Central Lombok for the last five years, from January 2016 to December 2020, with 60 data as in-samples used to obtain the nonparametric kernel model and 2021 data as many as 12 data as out-samples. The out-sample data is used to predict the ROR value 12 months later. The prediction results obtained will be compared with actual data to see the accuracy of the predictions. ROR data consists of the results of regional taxes, regional levies, separated wealth management, and other legitimate regional income. ROR data is time series data with monthly time. Data analysis was assisted by R software. This study aims to determine the nonparametric kernel model and predict ROR acceptance in Central Lombok. The method used is nonparametric regression analysis, with research procedures in  The steps in this research are divided into several stages: 1. Determine the dependent variable (Y ) and independent variable (X) by looking at the correlated lag value on the autocorrelation function (ACF) plot (Palma, 2016). 2. Create a scatterplot. A scatterplot is used to see the pattern of data distribution. 3. Perform a multicollinearity test. The occurrence of multicollinearity is a condition that indicates a strong correlation or relationship between independent variables in the regression model. Determining multicollinearity can be known by looking at the Variance Inflation Factors (VIF) value. If the VIF value > 10, then multicollinearity occurs. The multicollinearity test can be done with Equation 1 (Michael et al., 2015): R 2 j is the coefficient of determination. 4. Determine the optimal bandwidth value that minimizes the value of Generalized Cross Validation (GCV). The formula is in Equation 2 (Budiantara et al., 2015;Fitriyani and Budiantara, 2014) (2) y i is the actual data, y i is the estimated data, I is the identity matrix, X(h) is the weighting matrix, n is the number of data and tr is the number of main diagonal elements of weighting matrix. 5. Model the ROR data using nonparametric kernel regression. In general, kernel functions is defined in Equation 3 (Deshpande et al., 2017): K is the kernel function and h is the bandwidth. Kernel functions must meet several conditions, i.e.: A kernel estimator is a nonparametric approach to estimating curves. The kernel estimator is very sensitive to the choice of bandwidth as a curve smoothness controller. Nadaraya and Watson define a kernel regression estimator called the Nadaraya-Watson estimator in Equation 4 (Ghosh, 2018;Pratiwi et al., 2020): The kernel function used is the Gaussian kernel in Equation 5: 6. Calculate the goodness of the model with the coefficient of determination in Equation 6 (Cheng et al., 2014;Zhang, 2017): y i represent the forecasting values of y i , andȳ indicates the average value of y i . 7. Check the random residual assumptions, i.e., to be identical and independent (ε ∼ IID(0, σ 2 )). 8. Forecast with the model obtained will be tested for forecasting accuracy with the Mean Absolute Percentage Error (MAPE) criteria in Equation 7 (Palma, 2016): y i represent the forecasting values of y i

C. RESULTS AND DISCUSSION
The data used in this research is secondary data, namely monthly ROR data in Central Lombok from January 2016 to December 2020. The total data from January 2016 to December 2020 is 60 as in sample data and 12 from January 2021 to December 2021 as data out samples. The in-sample data will be used to form the best model, and the out-sample data will be used to measure the accuracy of ROR predictions in Central Lombok for the next 12 months based on the model that has been obtained.
Plotting the ACF on ROR data was used to determine the independent variable. It determined the autocorrelation between the time lag. The following is an ACF plot of ROR data: It can be noticed in Figure 2 that the ACF plot cut off at lag one and lag 12. Therefore, in this study, lag one and lag 12 were used as independent variables that were indicated to affect the ROR value at time t, with t = 1, 2, 3, . . . , 60. Furthermore, ROR lag 1 data or data to x t−1 is referred to as x 1 , ROR lag data 12 or data to x t−12 is referred to as x 2 and data x t is referred to as y. Figure  2 shows the scatter plots between the variables and is used to identify the data distribution pattern.  Figure 3, indicates the relationship pattern between the dependent variable y and each independent variable x 1 and x 2 . It can be seen that the pattern of relationships produced by both variables between x 1 and y as well as variables between x 2 and y can be said to be spread or not have specific patterns such as linear, quadratic, and others. Therefore, the appropriate approach for the relationship pattern is a nonparametric regression oncoming. In this study, a nonparametric kernel regression approach was used.

Figure 3. Data autocorrelation function plot
Moreover, the VIF values were used to determine whether or not multicollinearity occurs. If the VIF value is more than 10, multicollinearity exists, so there is a strong relationship within the independent variables. Based on the analysis, it is found that both variables have a VIF value less than 10, i.e., the VIF value of 1,016, which indicates that there is no multicollinearity.
Furthermore, it is compulsory to determine the optimum bandwidth value used in the modeling. A bandwidth that is too small will produce a too-rough curve; on the other hand, a too-large bandwidth will create a too-smooth curve that does not fit the data pattern. Therefore, it is necessary to choose the optimal bandwidth. One method to get the optimal bandwidth is using the GCV criteria, a criterion usually used in nonparametric regression approaches (Fitriyani and Budiantara, 2014;Ghosh, 2018). Some of the GCV values obtained are as follows: Based on the GCV criteria in Equation 2, the minimum GCV value is 1,769688931, with the optimal bandwidth h 1 = 0, 212740452 and h 2 = 0, 529682589. The kernel regression curve estimation model with the estimator used is Nadaraya-Watson multivariate form with j = 1, 2, where j is the number of independent variables. The data used are i = 1, 2, 3 The W hi (x j ) function is a multivariate kernel density function, namely: The kernel function used is the Gaussian kernel in Equation 5. Therefore, the form of the function W hi (x j ) becomes, The summation form of y i is: Since the model applies from the i = 1 to i = 48, so the model can also be written in matrix form.
The estimation curve of the nonparametric kernel model is written in the matrix form in Equation 8. with, Furthermore, a coefficient of determination (R 2 ), a criterion for the model's goodness, was used to see the estimate's accuracy. The value of R 2 is an indicator of how good the resulting model can describe the variability of the data. Based on Equation 6, the coefficient of determination is 0,8755. The coefficient of determination for the Nadaraya-Watson kernel estimator indicates that the independent variable used influences the dependent variable by 87,55%. The model obtained with the optimum bandwidth can be used for forecasting in the future. In this study, the out-sample data used is 2021, with i = 1, 2, 3, . . . , 12. Figure 4 indicates the comparison between the actual and estimated data.   Figure 4, comparing the estimated data's value to the actual data shows results close to the actual data. Based on the forecasting results, the forecast accuracy is measured by a MAPE value of 5,4%. The results indicate that the prediction results obtained are very good because the MAPE value is less than 10%, so it can be said that the prediction results are very accurate. From the analysis results, it is found that what influences the value of ROR is ROR receipts in the previous month and 12 months before so that the relevant government can consider the analysis results. Compared to the research conducted by Latipah (Latipah et al., 2019), who predicted ROR using the Grey-Markov model (1,1), the comparison of prediction accuracy is still better than the results obtained by the kernel nonparametric method. Similar research was also conducted by Fitriyani et al. (2023), who performed modeling and comparison of estimated kernel nonparametric regression curves.

D. CONCLUSION AND SUGGESTION
Based on the analysis performed, it can be concluded that the kernel nonparametric regression model obtained with optimum bandwidth h 1 of 0,212740452 and h 2 of 0,529682589, with a minimum GCV value is 1,769688931. Based on the modeling that has been done using nonparametric kernel regression in the ROR case in Central Lombok, it was found that the forecasting results had a very accurate prediction. Based on the analysis performed, it can be concluded that the kernel nonparametric regression model obtained with optimum bandwidth h 1 of 0,212740452 and h 2 of 0,529682589, with a minimum GCV value is 1,769688931. Based on the modeling that has been done using nonparametric kernel regression in the ROR case in Central Lombok, it was found that the forecasting results had a very accurate prediction. The results of this study suggest the Regional Revenue Management Agency for Central Lombok Regency considers the state of the ROR data for the previous month and 12 months in determining ROR targets. In addition, future research is expected to utilize optimization algorithms in determining optimal bandwidth values in kernel approaches, such as genetic algorithms, Particle Swarm Optimization (PSO), and Simulated Annealing (SA).