Factors Affecting Dissolved Oxygen at Bengawan Solo River: A Spatial Filtering with Eigenvector Technique

The quality of the river changes according to the development of the surrounding environment which is influenced by various human activities. Analysis of factors affecting Dissolved Oxygen (DO) at Bengawan Solo River is crucial for river management purpose and pollution control. Previous research suggested the use classic multiple linear regression. However, DO measurement were usually took place of sampling sites along the river channel. Therefore, there is a high chance that the measurements results may spatially correlated. As the consequence, the utilization of multiple linear regression technique for the dataset can be inappropriate. In this paper, we applied a modification of multiple linear regression model to incorporate with spatial autocorrelation that exist in the data by adding control variable such vector eigen to the model which known as Spatial Filtering with Eigenvector (SFE). The results showed that nitrate and nitrite were the predictor variables that have a negative and significant effect. However, the model contains spatial autocorrelation. The application of SFE technique by adding three eigenvectors as control variables in the model succeeded in making the residual model free from spatial autocorrelation. However, a new problem arose where there was a violation of the non-heteroscedasticity assumption.


A. INTRODUCTION
Rivers are water source that have many advantages for human life. The benefits of rivers to humans are for household needs, tourism, fisheries, irrigation and rivers can also be used for transportation facilities (Iloms et al., 2020). Another benefit of the river where the river channel can also be used for human needs (Wurtsbaugh et al., 2019). The river channel can be used to flow into agriculture, because the river water can be used for irrigation, flowing into rice fields and for fishing ponds . However, the use of river water must meet the water quality criteria in accordance with its utilization (Musa et al., 2019).
The quality of the river changes according to the development of the surrounding environment which is influenced by various human activities. The number of activities along the riverbanks causes the higher the potential for water quality degradation (Arnop et al., 2019). The water quality condition of most rivers in Indonesia is categorized as polluted, mainly due to the density of settlements and the large number of human activities (Hamakonda et al., 2019). This condition can cause disruption of the river ecosystem. As a result, fish die, discoloration in the water that produces odors and causes human health problems (Hertika et al., 2018). One of the important indicators of river water quality is Dissolved Oxygen (DO). Oxygen is needed for respiration and metabolism of organisms in the water (Lusiana et al., 2021). Dissolved oxygen is used in the process of decomposing waste that discharges to the waters, so the amount of waste in waters causes a decrease in DO levels (Sandi et al., 2017). This can have a negative impact on aquatic organisms where fish are stressed. DO range 0.3-1 mg/l will cause death in fish if it lasts a long time, while DO 1-5 mg/l will cause slow growth of fish (Arend et al., 2011).
Bengawan Solo River is one of the longest rivers on the island of Java (600 km) which crosses two provinces, namely Central Java and East Java with a river flow area of 16,000 km2. The area drained by this river passes through 9 districts/municipalities in Central Java and 11 districts/municipalities in East Java (Dani et al., 2015). The condition of the waters in the Bengawan Solo River from upstream to downstream has decreased water quality from year to year (Sutriati, 2012). The upstream and downstream parts of the river physically do not meet the requirements for clean water which are characterized by an unpleasant smell, the color of the river water is yellow-black, and the amount of garbage on the riverbanks (Astuti, 2015). Another factor is that the Bengawan Solo River water is used by the community for fishing, agriculture, industry, domestic activities, and others (Dani et al., 2015). This activity causes a decrease in water quality and there are also development activities along the river so that waste disposal is not controlled (Sutriati, 2012).
DO modelling of Bengawan Solo River is crucial for river management purpose and pollution control. Previous research suggested the use classic multiple linear regression approach to identify factors affecting DO in river (Imran et al., 2014;Ouma et al., 2020;Prasad et al., 2014). However, DO measurement and any other water quality factors were usually took place of sampling sites along the river channel. Therefore, there is a high chance that the measurements results may spatially correlated (Pratiwi et al., 2018). As the consequence, the utilization of multiple linear regression technique for the dataset can be inappropriate (Thayn and Simanis, 2013). This is because multiple linear regression has several underlying assumptions that should be met such as normality, non-heteroscedasticity, non-multicollinearity, and non-autocorrelation (Thayn, 2017). The existence of spatial connectivity among water quality measurements will violent non-autocorrelation assumption and thus making the results of linear regression analysis invalid (Blanchet et al., 2008). In this paper, we applied a modification of multiple linear regression model to incorporate with spatial autocorrelation that exist in the data by adding control variable such vector eigen to the model (Getis and Griffith, 2002). The technique is known as spatial filtering with eigenvector (SFE).

B. LITERATURE REVIEW 1. Linear Regression Model
Linear regression analysis is a statistical method used to determine the effect of one or more independent variables on one dependent variable (Laake and Fagerland, 2015). The general model of linear regression is stated as follows where: The matrix form can be written as follows in equation.
Parameter estimation of linear regression model usually done by using Ordinary Least Square (OLS) with the formulâ β = (X T X) −1 Y (Lusiana et al., 2019).

Residual Diagnostics Test
Linear regression model is a popular data analysis technique even though it has a simple structure (Altman and Krzywinski, 2016 (Zeileis and Hothorn, 2002). Validation of the fulfillment of the linear regression assumptions is carried out by performing diagnostic testing on the residual models (Cox, 2004). Particularly with regard to linear regression inference, important residual diagnostic tests to do are normality homoscedasticity and nonautocorrelation in residuals for geographic data. In addition, it takes the assumption of non-multicollinearity for the independent variables (Thayn and Simanis, 2013).

Spatial Autocorrelation: Global Morans I
Global Morans'I statistic is a measure used to indicate the presence of spatial autocorrelation globally (Getis, 1995;Moran, 1950). This measure is basically a cross product between a variable and its spatial weighting with the variable expressed as the deviation from its average, as shown in equation (3) where: I = Global Morans'I statistic n = sample size w ij = spatial weight between i and j observations x i = i-th observed values x = mean of observed values

Spatial Filtering with Eigenvector (SFE)
Spatial filtering is a method that produces dummy spatial variables which are used as additional independent variables in the regression model. This resolves model specification errors. Thus, spatial data can be appropriately included in the regression model and its diagnostic statistics, making the regression results more robust and easy to interpret (Thayn and Simanis, 2013).
SFE is a spatial filtering technique by creating a set of dummy spatial patterns by searching for eigenvectors associated with independent variables in linear regression models and spatial weighting matrices. (Getis and Griffith, 2002). Eigen vector extraction is obtained from the matrix MCM, where M = I − X(X t X) −1 X T and C is the spatial weighted connectivity matrix. Furthermore, the selected combination of eigenvectors is determined by a step-wise regression procedure (Thayn, 2017). Finally, a linear regression model with SFE is presented in equation (4) as follows where Eγ is misspesification term and Eγ +η is equivalent with ε from equation (2). By substituting X T mod β mod = X T β +Eγ, equation (4) can re-written as equation (5) Y = X T mod β mod + η Therefore, the parameter can be estimated using OLS method β mod = (X T mod X mod ) −1 Y .

C. RESEARCH METHOD
The study was conducted by collecting secondary data through the Bengawan Solo River Basin Center (BBWS) in the 2016 2020 period. The monitoring site collection in this study was an observation taken by BBWS agency. There were 7 sampling sites as presented by Figure 1. Furthermore, we used several water quality variables which presented in Table 1 mg/L Chemical oxygen dissolved (COD) mg/L Biological oxygen dissolved (BOD) mg/L The data analysis steps in this study are as follows 1. Correlation analysis to filter independent variables which has significant association with dependent variable 2. Estimation of linear regression model parameters using the OLS method 3. Linear regression residual diagnostic test, namely spatial autocorrelation (Global Moran's I test), normality (Shapiro-Wilk test), and heteroscedasticity (Breusch-Pagan test) 4. Spatial weighted matrix formation (k-nearest neighbour) 5. Extraction of eigenvectors associated with independent variables and spatial weighting matrix 6.
Step-wise selection of eigenvector combinations. 7. Estimation of regression model parameters with spatial filtering (addition of selected eigenvector combinations) 8. Diagnostic test of residual linear regression model of (SFE)

D. RESULT AND DISCUSSION 1. Association Between Variables
The degree of association between independent and dependent variable (DO) in this research is shown in Figure 2. It is clearly seen that DO has significant correlation with pH, NO3, NO2, COD and BOD. Therefore, these variables will be utilized in linear regression model.

OLS Linear Regression Model of DO
The following Table 2 presents linear regression model of DO measured from Bengawan Solo River. From Table 2 it can be seen that NO2 and NO3 were significantly affect DO at 0.01 and 0.10 level, respectively. Furthermore, both NO2 and NO3 have negative coefficients. It implies that the higher NO3 and NO2, the lower DO value. Further diagnostic on the model residual is presented in Table 3. The results of the diagnostic residuals in Table 3 and Table 4 shows that the residuals of the linear regression model indicate that they contain a spatial autocorrelation with a positive Moran coefficient. On the other hand, the normality test showed that the model residuals are normally distributed as well as the Breusch-Pagan test show that the residual variance was constant (homoscedastic). In addition, there were no independent variables with VIF value greater than 10. Meaning that the non-multicollinearity assumption was met.

SFE Linear Regression Model
Based on the residual diagnostic test of the linear regression model which shows the existence of spatial autocorrelation, then the SFE technique was carried out. This technique produces three eigenvectors which are added as independent variables in the linear regression model, as shown in Table 5. The number of eigen vector that incorporated in the model was based on the eigenvector spatial pattern in Figure 3. It can be seen that on the third eigen vector, the Morans I value was began to stable around zero. Therefore, we add three eigenvectors into the linear regression model.  Note: significant at level 0.05 (), 0.01(*), 0.001 (**), 0.0000 (***) SFE regression model did not change the coefficient estimation of independent variables compared to classis linear regression. Furthermore, the results of the diagnostic residual test for the SFE linear regression model are presented in Table 6 and  Table 7. Based on this table, it appears that the presence of spatial autocorrelation has been successfully resolved. However, the issue of heteroscedasticity eventually emerged in this model where in the standard model this assumption was met.

E. CONCLUSION AND SUGGESTION
Modelling of dissolved oxygen in the Bengawan Solo River with linear regression technique shows that nitrate and nitrite were the predictor variables that have a negative and significant effect. However, based on the results of the diagnostic residual test, it shows that this model contains spatial autocorrelation. The application of SFE technique by adding three eigenvectors as control variables in the model succeeded in making the residual model free from spatial autocorrelation. However, a new problem arose where there was a violation of the non-heteroscedasticity assumption. Therefore, it is recommended that further investigations related to the SFE technique in linear regression models can be carried out which can help to fulfil this assumption.