Response Surface Regression with LTS and MM-Estimator to Overcome Outliers on Red Roselle Flowers

Article history: Accepted : 05-09-2020 Revition : 18-01-2021 Approved : 21-01-2021 The surface response method is similar to the regression analysis method which uses procedures or ways of estimating the response function regression model based on the Ordinary Least Square (OLS) method. Unfortunately, using the quadratic method has no drawbacks because it is easily sensitive to assumption deviations due to outlier cases. One of the solutions to the outlier problem is using robust regression. The method of parameters in the regression is very diverse, but the methods used in this study are the Least Trimmed Square (LTS) and MM-estimator methods because both methods have a high breakdown point of nearly 50%. The variables studied were the response variable consisting of red roselle plant height (Y1) and red roselle flower weight (Y2). While the independent variables were soil moisture factor (X1) and NPK fertilizer application factor (X2). The purpose of this study is to estimate the response surface regression parameters. using the LTS and MM-estimator methods on data that contains outliers. The resulting model in data analysis shows the same result that the best model is using the LTS estimation method. The modeling result of plant height obtained an R-Square value of 98,27% with an error is 1,243. Meanwhile, for the red rosella plant flower weight model, the R-Square value was 97,31% with an error is 0.6632. Keyword: Response Surface Regression; Outliers; Least Trimmed Square (LTS); MM-estimator This is an open access article under the CC BY-SA license. DOI: https://doi.org/10.30812/varian.v4i2.882 ————————————————————


A. INTRODUCTION
The experiment is an activity that is often carried out in all fields of science and knowledge, especially science and technology. The purpose of the experiment is to obtain output results or get new information and insights from the data source or the existence of the experiment being carried out, to then produce new scientific studies that can later be used as research material. The term experimental design is usually very commonly used when referring to experiments in mathematics. The experimental design is a pattern or form of the experiment that is carried out for observation, both random observations and consecutive observations (Jensen, 2017). In the study of experimental design science, there is a statistical analysis method known as the response surface methodology or the response surface method.
The response surface method is a combined method of mathematics and statistics used to model a response, in this case, it is usually the quality of a product that is influenced by certain variables to optimize the response (Shemi & Procter, 2018). So it can be said that this method makes use of the results of the experimental design and uses the help of statistics to find the optimal value of a response. The optimum response is obtained when the response variable is stated as y, whose value is influenced by n predictor variables 1 , 2 , . . . , 2 , it can be assumed that the variable can be explained by a polynomial regression model in a certain area (Peroumal et al., 2019). The response surface method was first introduced around the beginning of 1951 and until now it is still widely used in the world of research, especially in the industrial sector. The response surface method is similar to the regression analysis method that uses procedures or ways of estimating response function regression model parameters based on the OLS method (Sawale et al., 2020).
The OLS method is a method that is usually used to estimate parameter values in the second-order response surface method equation. Unfortunately, using the OLS method has drawbacks due to easy errors in assumptions due to outliers or outlier cases. The existence of outliers in the data can result in a greater error in the value and variance of the data (Zhang & Ato Xu, 2017). Rousseeuw & Hubert, (2018) introduce a robust regression method that can be used to solve outlier cases in data and produce a strong model for outliers. The method of estimation parameters in the regression is very diverse, but the ones used in this study are the Least Trimmed Square (LTS) and MM-estimator (Method of Moment) methods. Previous research on LTS was conducted by (Wulandari et al., 2013) and MM-estimator by (Yuliana et al., 2014) where their statements stated that each person has advantages and disadvantages. Both of these parameter estimation methods are used because they both have quite high breakdown points of almost 50%. The breakdown point value shows a measure of the robustness of an estimator parameter (Rousseeuw & Hubert, 2018).

B. LITERATURE REVIEW
This article contains the application of the LTS and MM-estimator response surface regression analysis to overcome the outlier cases of plant height and flower weight of red rosella plants. Rosella plants have been known since 1922 as hedges, ornamental plants, and fiber-producing plants by Indonesians. Rosella plant has now become a plant that is in great demand by the people of Indonesia because various products can be produced from both flowers and stem fibers so that this plant has experienced a fairly high increase in cultivation. The plant variety from the Malvaceae family that is commonly cultivated is the roselle plant with red flower petals with the Latin name H. sabdariffa var. Sabdariffa (Winarti and Firdaus, 2010). The events that affect the growth of roselle plants and their development are very complex, among others, influenced by plant factors themselves and environmental factors. Environmental factors usually include water factors, light factors, air humidity factors as well as factors from temperature, planting distance, nutrient content, and so on. Several other environmental factors that are thought to affect the growth and yield of red roselle flower production are the regulation of soil moisture and the dose of NPK fertilizer given.
Because several factors influence and also affect more than one response, the regression analysis method using the response surface method is used. This method is widely used to find levels of predictors that are thought to optimize the responses being studied. The advantage of the response surface method is that it does not require too much data so that the optimum response conditions can be obtained in not too long and the costs are minimum (Yu et al., 2019). In addition to the first-order modeling experiment, in conditions where the predictor is close to the response, the use of the second-order model is widely applied to approach the response state because of the curvature in its surface (Ajala et al., 2017). Modeling in the response surface method uses the function of the second-order model because it is better at solving problems in the response surface. The response surface model for the second order is stated by (Yu et al., 2019).
Second-order regression modeling was formed using a Central Composite Design (CCD) to form the data from the experiments used (Calfee, R. and Piontkowski, 2016).
The data analysis stage in this research is to conduct several literature reviews, namely examining problems encountered both technically and theoretically based on the literature sources that have been obtained. The first stage is to conduct a descriptive statistical analysis of the data. After that, perform a factor filtering of the variables used by looking at the data plot whether it has a linear or quadratic effect, and the results of the regression model. Then perform analysis of variance (ANOVA) and see the interaction plot of the variables used, followed by making a response surface regression model design using a second-order design with a design of experiment (DOE) for the two responses studied.
The next step is to analyze the results of the response surface regression, which is to test the coefficient of the response surface regression equation both simultaneously (as a whole) and partially (for each variable) on the two responses studied and to test the ANOVA hypothesis and the lack of fit on the two responses. Then test the assumptions on the residuals of the two responses studied and perform an outlier test by looking at the value of DFITS and Cook's Distance, an observation contains outliers if (1) | DFITS |> 1 and (2) Cook's Distance > 4 , n: many observations (Rohmawati & Dwidayati, 2018). After knowing the outlier cases, the next step is to design a robust regression equation model using the LTS and MM-estimator methods.
In estimating parameters using the LTS method, where this method is suggested by Rousseeuw & Hubert, (2011) as an alternative solution in robust regression analysis to overcome the weaknesses of parameter estimation using the OLS method, is to use as much as h (ℎ ≤ ) squared residuals that are derived (Seheult et al., 1989). The advantage of the LTS method is that it will not affect the estimated character even though the number of outliers increases (Shodiqin et al., 2018). LTS estimation is obtained by completing: (2) The number of h above is several subsets or a portion of the amount of data with the results of the smallest objective quadratic function, where the h value will provide a large breakdown point value or close to 50%. According to Rousseeuw & Hubert, (2018), the LTS algorithm is a combined estimation method of the FAST-LTS method and the C-Steps method, namely by estimating the β parameter using the OLS method then determining the error value from the data. Next, calculate ∑ ( 2 ) ℎ =1 with ℎ = + +1 2 observations where the value of 2 is the smallest. These steps are carried out until the minimum and convergent objective function value is obtained.
Robust regression analysis using the MM-estimator method (Method of Moment) was first introduced by Yohai in early 1987. Method of Moment is a combination of robust high breakdown value regression estimator methods and high method efficiency. Parameter estimation using the MM-estimator method begins by calculating the parameter ̂u sing the OLS method. Then calculate the error value of the estimate S, the value ̂=̂ and calculate the value =̂. Then calculate the parameter ̂ using the WLS method with a weighting of with a value of = 4.685, it is used to obtain a constant that has a high-efficiency value with the residuals normally distributed in the weighting function until a convergent ̂ value is obtained. After obtaining the regression model equation for each response variable through parameter estimation using the LTS and MM-estimator methods, then determining or selecting the best regression model equation by looking at the results of 2 and error and interpreting the results of the equation model obtained against the two variables (Aminuddin, A., Sudarno, S. and Sugito, 2013).

C. RESEARCH METHODS
This study using data in the form of secondary data. This data comes from data that has been published by the Indonesian Sweetener and Fiber Crops Research Institute or is usually abbreviated as BALITTAS in 2019. This study used 18 samples of data with 2 factors that are thought to affect two response variables, namely plant height and weight of red rosella flowers. The data used included two response variables and two predictor variables. 1 is the red roselle plant height (cm) and 2 is the red roselle plant flower weight (kg) as the response variable. Then 1 is the factor of soil moisture (%) and 2 is the factor of giving NPK fertilizer (kg) as a predictor variable. In this research, you can use the help of software R. The steps that will be taken in this problem-solving stage are: (1) Design a response surface regression model using a second-order design.
(2) Testing the coefficient of the response surface regression equation and ANOVA test and lack of fit.
(4) Perform robust regression modeling using the LTS and MM-estimator methods. (5) Doing the best model selection by looking at the results of R ^ 2 and error.

Estimation of Order Two Response Surface Regression Model
The equation of the response surface regression model for plant height and weight of red rosella flowers is as follows: ̂1 = 186.45 + 0.253 1 + 9.485 2 − 5.61 1 2 − 4.42 2 2 + 0.80 1 2 dengan 2 = 92.34% ̂2 = 39.832 − 0.193 1 + 2.083 2 − 3.021 1 2 − 5.993 2 2 − 0.100 1 2 dengan 2 = 86.17% Through the results of the acquisition of R ^ 2 of that size, it shows that the predictor variables are very strong in explaining the two response variables. Furthermore, hypothesis testing is carried out on the second-order equation generated by the parameter estimation of the OLS method including the significance test with analysis of variance (ANOVA) and the Lack of Fit test. Before analyzing variance significance test (ANOVA) is to test the identity, independence, and normality of the two response variables.    Figure 4 show the errors of the two responses are randomly distributed and do not form a certain pattern which indicates that the identical residual assumption has been fulfilled. Through Figure 2 and Figure 5, it can be seen that the distribution of data from the observation sequence of the height and weight of the red roselle flowers tends to be random and does not form a certain pattern, so it can be said that the assumption of independence has been fulfilled. Then in Figure 3 and Figure 6 show the results of the Anderson-Darling statistic for the normal distribution test with a significant degree α = 0.05. The Andersondarling statistical value obtained is in the form of p-values of 0.437 and 0.358. The p-value obtained turns out to be a value greater than the degree of significance α = 0.05, which means that the regression model obtained is normally distributed. After fulfilling all the assumptions above, the ANOVA test is valid to be used. The hypothesis of ANOVA is: 0 : = 0, = 0, = 0 1 : ≠ 0, ≠ 0, ≠ 0 ANOVA testing criteria, namely 0 is rejected if ℎ > , , − −1 and 0 is rejected if the P-Value <α (0.05). ANOVA results for plant height and weight of red rosella flowers, respectively, can be seen in Table 1 and  Table 2. From the distribution list F with db numerator = 5, db denominator = 12 and α = 0.05, we get =( ,5,12) = 3.1058. Because ℎ > and − < for ℎ and respectively − on the ANOVA output results on plant height and weight of red rosella flowers, respectively, was 28.95> 3.1058 and 14.96> 3.1058 and 0.000 <α (0.05) which means rejecting 0 . So that the resulting equation is significant or the predictor variables together affect the response variable. Next, do the Lack of Fit test to determine the suitability of the model that has been obtained. The insignificant value of the Lack of Fit is the main requirement for knowing a good regression model equation because it shows the suitability or suitability of each response variable with the model equation obtained (Keshani et al., 2010). The Lack of Fit test on plant height and red roselle flower weight is shown in Table 4 and Table 5. The test hypothesis is 0 : The model does not contain a lack of fit (the regression model equation is appropriate) and 1 : the model contains a lack of fit (the regression model equation does not appropriate), the significance level used is α = 0.05. The test criterion is to reject 0 if > Based on the results of the two tables, it is obtained db numerator = 3, db denominator = 9, and α = 0.05 obtained =( ,3,9) = 3.86255, then sequentially to test the lack of fit on plant height and weight of red rosella flowers is < = 1.42 < 3.86255 and < = 0.65 < 3.86255 which resulted in 0 being accepted, which means that the model is suitable and there is no lack of fit both on plant height and on the weight of red roselle flowers.

Outlier Detection
From the results of the response surface regression using parameter estimation using the OLS method, the value of DFITS and Cook's Distance can be calculated to detect outliers in the data. The outlier detection results are shown in Table 5. Based on Table 5, the results show that some observations in the data were detected as outliers, namely the 4th, 7th, 10th, 12th, and 13th observations. The existence of outliers resulted in the obtained equation being unusable, therefore parameter estimation was carried out using robust regression analysis that was resistant in cases of outliers.

Estimation of Parameters Using the Least Trimmed Square (LTS) Method on Plant Height and Weight of Red Rosella Flowers
The parameter estimation of the LTS method was carried out by applying the FAST-LTS and C-Steps algorithm and using the assistance of R software (Roozbeh et al., 2018). The obtained response surface regression equation using the LTS method on plant height and weight of red roselle flowers in the sequence is as follows: 1 * = 185.2238 + 5.4426 1 * + 5.5543 2 * − 3.0114 2 * 2 − 5.0986 1 2 2 * = 40.8597 + 1.3518 1 * + 3.6134 2 * − 4.5383 1 * 2 − 7.5477 2 * 2 − 2.4017 1 2 *

Determination of the Best Method by Comparing the Least Trimmed Square (LTS) and MMestimator Methods on Plant Height and Weight of Red Rosella Flowers
Based on the parameter estimation results of the LTS method with the MM-estimator estimation method, the 2 2 value and the error of each method are also obtained which can be seen in Table 6 and Table 7. Based on Table 6 and Table 7 above, it can be found that the 2 value of the LTS method is greater than the MM-estimator estimation method both on plant height and on the weight of red rosella flowers. Likewise, the residual error value generated in the LTS method is smaller than the MM-estimator estimation method both on plant height and on the weight of red rosella flowers. So that the LTS estimation method is a better estimator than the MM-estimator method in dealing with outlier cases in the second-order response surface regression model parameters on plant height and red rosella flower weight data.

E. CONCLUSIONS AND SUGGESTIONS
Based on the results obtained and the discussion, it can be concluded that the regression estimation of the response surface using LTS is better than the MM-estimator. The LTS method is more robust in overcoming outliers in modeling plant height and weight of red rosella flowers. This can be seen from the value of the coefficient of determination ( 2 ) of the LTS method which is greater than the MM-estimator method. Besides, the residual error value of the LTS method is smaller than the MM-estimator method.
Suggestions for the next research are that it is necessary to consider other factors that influence plant height and weight of red roselle flowers and in determining the model, we can compare other methods to overcome outliers, such as robust least medium square (LMS) regression, S method, and M method.