Robust Singular Value Decomposition Method on Minor Outlier Data

Article history: Received : 26-08-2020 Revised : 08-09-2020 Accepted : 09-09-2020 In multivariate statistics, Singular Value Decomposition (SVD) for a data matrix containing outliers does not provide data that can be analyzed optimally. This study aims to overcome outlier data using the Robust Singular Value Decomposition (RSVD) method and compare it with the SVD method. The analysis using the RSVD method includes several steps, namely determining the initial predictive value of the vector u and regressing it then normalizing the estimator vector β and carrying out the iteration process until convergent results are obtained. The results of this study indicate that the RSVD for dealing with minor outliers data is not influenced by initial estimates. The RSVD method is strongly influenced by the large amount of outliers data, the more extreme outliers data, the more iterations are.


A. INTRODUCTION
Outline data is a datum that deviates from another set of datum, (Neter, J., Wasserman, W., and Kutner, 1990). To identify outliers data, a scatter plot or boxplot can be used in the statistics software package. In the regression method, the existence of outliers data will interfere with the fulfillment of assumptions so that the resulting model is unreliable. Likewise in the multivariate case, the result is inaccurate interpretations and errors in decision making on the model obtained. That is why outlier datum omitted as much as possible in the data (Liu, Hawkins, Gosh, & Young, 2003). For example, studies accommodating missing data are in mortality data by (Zhang, L., Shen, H., Huang, 2013). The singular value decomposition method was introduced in (Zhang, L., Marron, J.S., Shen, H., 2007)which is used to make sequential estimates of the eigenvalues and left and right eigenvectors and ignore the missing values and are resistant to outliers. In multivariate statistics, outlier data can be overcome by using the Singular Value Decomposition (SVD) method even though it often does not provide the expected results or there are still deviations in the data (Huber & Ronchetti, 2011). For this reason, a better method is needed, in this case, the author proposes the Robust Singular Value Decomposition (RSVD) method as a solution for handling outliers data. The RSVD method is formulated to minimize problems caused by eliminating outliers data through a matrix approach from the data (Liu et al., 2003). The RSVD method utilizes a regression approach by conducting an iteration process for each of the eigenvalues and eigenvectors. This study aims to obtain the estimated results of the RSVD method and compare it with the SVD method.

B. LITERATURE REVIEW 1. Eigenvalue and Eigenvector
If is an matrix then the nonzero vector in is called the eigenvector of is the scalar multiple of , is an equation (1) (Aa, Morsche, & Mattheij, 2007): 爠 (1) for a scalar, is called the eigenvalue of while is the eigenvector corresponding to (Anton, 1987).

Singular Value
The singular value of the matrix is the root of the eigenvalues of symmetric matrix is denoted by and arranged in the order of (Locantore et al., 1999).

Singular Value Decomposition
Singular Value Decomposition is a method that can be applied to any matrix of size . This method can also be applied to matrices that have an inverse or not with a matrix with rank = or rank . Suppose that matrix of size decomposed is an equation (2) (2) with: ࠀ = matrix of size and orthogonal (column ࠀ is the eigenvector of ) = matrix of size and orthogonal (column is the eigenvector of ) = diagonal matrix of size with non-negative diagonal elements which is called a singular value with then the singular value of is 爠 (Bretscher, 1997 The robust method is a statistical procedure that is not very sensitive to deviations from the underlying assumptions. In research, outlier data is often obtained which can result in large errors (Bali, Boente, Tyler, & Wang, 2011). The occurrence of cases like this causes the assumptions in the regression analysis to be not fulfilled. So that the interpretation of the model becomes wrong if it continues to apply the Ordinary Least Squares (OLS) method directly (Huber & Ronchetti, 2011).
To overcome outlier data, regression analysis can be done by applying regression coefficient estimation methods that are proven to be robust against outliers data, namely the Alternating Least Square (ALS) development method. This method utilizes a regression approach by iterating the eigenvalues and eigenvectors of an matrix measuring (Ren, Li, & Haupt, 2017).

Outlier Data
Outlier data appear as a result of recording errors or data processing or regression modeling. Due to the existence of outlier data, it causes a large variety and has the potential to influence the predictive model so that the resulting regression model cannot be relied on because this outlier data will make the estimated regression line drawn disproportionately to the outlier data (Härdle & Simar, 2015).
To overcome this outlier data, if the value is extreme enough, the data is often removed or discarded. In the case of less extreme outlier data values, researchers often hesitate to decide whether to exclude or not.
Outliers data is divided into two parts, namely major outliers and minor outliers. The major outliers data are located in the area outside the ㉈ࠀ (3 times the distance between quartiles), while the minor outliers data are located in the ͷ ㉈ࠀ (1.5 minuses 3 times the distance between quartiles (Filzmoser & Gregorich, 2020).

C. RESEARCH METHOD
In this study, the authors used data containing minor outliers as presented in Table 1. 6. Perform an iterative process to get convergent results. 7. After obtaining the first triple eigen, calculate the error to find the next triple eigen, 爠 ͷ

D. RESULTS AND DISCUSSION
In the case of minor outliers data, the data matrix measuring is used which has been corrected for its mean as follows: Based on the Singular Value Decomposition, the singular value is obtained 爠 ⺁ ⬀ 爠 ⬀ 爠 ⬀ ⺁ 爠 ⬀ 爠 ⺁ The orthogonal matrix is the eigenvector of is the matrix denoted by ͷ ͷ ͷ ͷ ͷ ͷ ͷ ͷ ࠀ The orthogonal matrix which is the eigenvector of the matrix is the matrix ࠀ denoted by Obtained matrix ݁ ݀ 爠 ࠀ as follows: The steps for decomposing a robust singular value decomposition are: This process is carried out by iteration to obtain convergent results. In the first triple eigen, this is done in three iterations so that the results are convergent. Based on the results of the analysis, it was obtained the fifth triple eigen which was carried out in four iterations to obtain convergent results. Thus the results of the robust singular value decomposition are: Then the description matrix through the robust singular value decomposition method is multiplied back to ࠀ to get ݁ ݀ .
In the minor outlier data, the ݁ ݀ obtained through the robust singular value decomposition method is the same as the actual .

E. CONCLUSION AND SUGGESTION
Based on the results of the analysis using the singular value decomposition method and the robust singular value decomposition method, the completion of data clusters containing minor outliers in the sample data on the nutrient content of organic fertilizers using the robust singular value decomposition method, has better results than using the singular value decomposition method. The suggestion for further research is to examine the RSVD method on major outlier data.