Spline and Kernel Mixed Nonparametric Regression for Malnourished Children Model in West Nusa Tenggara

Article history: Received : 23-12-2020 Reviced : 16-01-2021 Accepted : 21-01-2021 Health sector development is essential to improve human life quality, especially in West Nusa Tenggara (NTB) Province. Based on data from the NTB Provincial Health Office from 2011 to 2016, children under five suffering from malnutrition continued to increase, caused by several factors that affected the incident. Therefore, appropriate analysis is needed to model children who suffer from malnutrition in NTB Province in 2016, consisting of 10 districts based on the variables that influence it. The analysis in this study was carried out using a nonparametric regression mixed-model spline truncated and kernel. The estimation of the nonparametric regression curve depends on the optimal knot points and bandwidths parameter. Therefore, in determining the optimal knot points and bandwidths obtained from Generalized Cross-Validation (GCV). Based on the analysis that has been done, we obtained a nonparametric regression mixed-model spline truncated and kernel optimal knot points, such as ; 13846154 . 10 11  k ; 19692308 . 72 12  k ; 25846154 . 44 13  k 01461538 . 91 14  k for each variable and optimum bandwidths, such as ; 5561442 . 0 1  h ; 0220133 . 1 2  h ; 7163110 . 0 3  h ; 2240464 . 1 4  h and 2900146 . 1 5  h , with 003038719 . 0 the value of GCV. The mixed model acquired has a good model by considering the values of 2 R and MSE. Besides, the MAPE value indicated a high degree of accuracy, so that the model obtained has an excellent forecast. Keyword: Bandwidth; Kernel; Knot; Malnutrition; Truncated Spline This is an open access article under the CC BY-SA license. DOI: https://doi.org/10.30812/varian.v4i2.1003 ———————————————————— A. INTRODUCTION Building healthiness was organized to improve awareness, desire, and capability to live healthy for everyone to create high public health. To build it as soon as possible, a sound health information system is needed, especially in West Nusa Tenggara province (NTB). Improving the quality of human life in society is one of the most critical parts to make a better future. One of the indicators to monitor the health society level is to see the status of baby nutrition. The nutrition status is a description of balance condition in a particular variable form. If the condition is disturbed, so it tends to be an inference of body growth. Many found that mothers give birth with low weight, even the dead baby is caused by a mother who has malnutrition before giving birth. A baby with low weight can affect deficient nutrients, even malnourished (BPS, 2016). According to the healthcare center of NTB province year 2016, there are many malnourished cases found in West Nusa Tenggara province. Data in 2011-2015 show that malnourished cases were found to be running down, but they increased to 403 cases in 2016. According to (Ramadani et al., 2013), some factors are causing malnourished in baby, such as the percentage of unexclusive baby care; low weight baby (<2500g); unhealth house categorized; households with clean water access; the active integrated center of 100 |Jurnal Varian| Vol.4, No.2, April 2021, Hal. 99-108 the ministry; and utilizing health facility. Another factor are the percentage of incomplete immunization given; the percentage of getting A vitamin; the percentage of household utilizing health facility; the percentage of baby with health service; the percentage of poor villagers; and the percentage of first age marriage < 15 years old (Maulani et al., 2016). The malnourished percentage is one of the simple parameters to know the baby's nutrition status (WHO, 2010). One way to monitor the malnourished percentage is modeling to check out the relationship between malnourished and affection factors. Modeling the relationship between a dependent variable with one or more independent variables can be presented through statistics into the regression model. A regression analysis model is a statistics method related to systematic relationship pattern between the variables (Daoud, 2017). According to (Hardle et al., 2004), there are two approaches that can be used to determine the regression curve, namely the parametric regression approach and nonparametric regression approach. Parametric regression has an assumption to be filled up like normal distribution normal and constant variance. In applying parametric regression, a deviation to the assumption often happens like the normal distribution. Therefore, to avoid tight and robust assumptions, the statistics technique is not linked to the tight assumption and certain regression. One of the alternative ways to solve it is the nonparametric regression approach. This approach is used when the first information relates to a curve regression limit or not (Eubank, 1999). Nonparametric approach methods that are often used are truncated spline and kernel estimator. Generally, growth patterns for babies tend to have changed at certain ages. The pattern has a form that cannot be determined so that when estimated using parametric regression, the results are inaccurate. Therefore, data case related to the percentage of children's babies suffering from malnutrition uses nonparametric regression (Pratiwi, 2017). Initial analysis shows that the pattern formed between the data on the percentage of malnourished children under five with several variables that influence it fluctuates at certain intervals. This characteristic is following the spline approach, which has high flexibility without the subjectivity of the researcher (Eubank, 1999). Besides, other variables that also influence it do not show any particular pattern in the data, so this pattern is compatible with the kernel approach, which can model data without certain patterns. Besides, this approach has also received special attention from researchers because it has a relatively fast convergence speed compared to other approaches (Hardle et al., 2004). Two basic assumptions need to be considered when the nonparametric regression model is explored. The first assumption is the patterns in each independent multivariable are considered to have the same pattern. The second assumption is that researchers only use one form of the model estimator for each independent variable. In applying it in various cases, data patterns often differ from each of the independent variables. Therefore, if only one estimator is used to estimate the nonparametric regression curve, the estimator generated does not match the data pattern. As a result, the result regression model's estimation is less precise and tends to produce large errors (Budiantara et al., 2015). Based on the description explained, this study was conducted to model the percentage of malnourished children baby in the NTB Province using a nonparametric mixed truncated spline and kernel regression model. B. LITERATURE REVIEW 1. Nonparametric Regression Nonparametric regression is one of the approaches used to determine the relationship pattern between dependent and independent variables whose regression curve is unknown, or there is no complete past information about the shape of the data pattern. This approach has high flexibility because it is expected to find its own form of regression curve estimation without being influenced by the researcher's subjectivity factor. The general nonparametric regression model is as equetion 1. Muhammad Sopian Sauri, Nonparametric Regression Mixed...101   n i x f y i i i , , 2 , 1 ,      (1) i y is the dependent variable, i x is the independent variable,   i x f is a regression function of unknown shape, and i  is an error that is assumed to be random with zero mean and constant variance (Eubank, 1999). 2. Mixed Estimator Nonparametric Regression Truncated Spline and Kernel Consider data   i i i y z t , , and the relations between independent variables ) ( , i i z t and a dependent variable   i y are assumed to follow the nonparametric regression model. In general, the nonparametric regression model is defined as equetion 2.   n i z t y i i i i , , 2 , 1 , ,       (2) The shape of the regression curve   i i z t ,  is assumed to be unknown and smooth, meaning continuous and differentiable. A random error i  has a normal distribution with zero mean and constant variance. The regression curve   i i z t ,  is assumed to be additive, meaning it can be written as equetion 3.       i i i i z g t f z t   ,  (3) The main problem is sounding nonparametric mixed curve regression. It is how to get estimation curve regression form as defined below, with a vector of bandwidth parameters h and a vector of knot points k .       i i i i z g t f z t h k h,k ˆ ˆ ˆ ,    (4) Regarding obtain an estimator of mixed spline truncated and kernel regression, regression   i t fk is approached using the function of the truncated spline with knot points   k k k k , , , 2 1   k , and then, regression curve   i z gh is approached using kernel's function. For example, given a basis for truncated spline space as follow, with I being an indicator function.               k k m k t I k t k t I k t k t I k t t t t       , , , , , , , , 1 2 2 1 1 2   (5) Regression curve   i t fk can be written as follows, with k m       , , , , , , , 2 1 1 0   being unknown parameters.           k i m k i k i m i m i i i k t I k t k t I k t t t f                 ..... ... 1 1 1 1 0 k (6) Moreover, the estimation of the kernel’s regression curve   i z g can be presented as the following formula.

Health sector development is essential to improve human life quality, especially in West Nusa Tenggara (NTB) Province. Based on data from the NTB Provincial Health Office from 2011 to 2016, children under five suffering from malnutrition continued to increase, caused by several factors that affected the incident. Therefore, appropriate analysis is needed to model children who suffer from malnutrition in NTB Province in 2016, consisting of 10 districts based on the variables that influence it. The analysis in this study was carried out using a nonparametric regression mixed-model spline truncated and kernel. The estimation of the nonparametric regression curve depends on the optimal knot points and bandwidths parameter. Therefore, in determining the optimal knot points and bandwidths obtained from Generalized Cross-Validation (GCV). Based on the analysis that has been done, we obtained a nonparametric regression mixed-model spline truncated and kernel optimal knot points, such as the value of GCV.
The mixed model acquired has a good model by considering the values of 2 R and MSE. Besides, the MAPE value indicated a high degree of accuracy, so that the model obtained has an excellent forecast.

A. INTRODUCTION
Building healthiness was organized to improve awareness, desire, and capability to live healthy for everyone to create high public health. To build it as soon as possible, a sound health information system is needed, especially in West Nusa Tenggara province (NTB). Improving the quality of human life in society is one of the most critical parts to make a better future. One of the indicators to monitor the health society level is to see the status of baby nutrition. The nutrition status is a description of balance condition in a particular variable form. If the condition is disturbed, so it tends to be an inference of body growth. Many found that mothers give birth with low weight, even the dead baby is caused by a mother who has malnutrition before giving birth. A baby with low weight can affect deficient nutrients, even malnourished (BPS, 2016).
According to the healthcare center of NTB province year 2016, there are many malnourished cases found in West Nusa Tenggara province. Data in 2011-2015 show that malnourished cases were found to be running down, but they increased to 403 cases in 2016. According to (Ramadani et al., 2013), some factors are causing malnourished in baby, such as the percentage of unexclusive baby care; low weight baby (<2500g); unhealth house categorized; households with clean water access; the active integrated center of the ministry; and utilizing health facility. Another factor are the percentage of incomplete immunization given; the percentage of getting A vitamin; the percentage of household utilizing health facility; the percentage of baby with health service; the percentage of poor villagers; and the percentage of first age marriage < 15 years old (Maulani et al., 2016).
The malnourished percentage is one of the simple parameters to know the baby's nutrition status (WHO, 2010). One way to monitor the malnourished percentage is modeling to check out the relationship between malnourished and affection factors. Modeling the relationship between a dependent variable with one or more independent variables can be presented through statistics into the regression model. A regression analysis model is a statistics method related to systematic relationship pattern between the variables (Daoud, 2017).
According to (Hardle et al., 2004), there are two approaches that can be used to determine the regression curve, namely the parametric regression approach and nonparametric regression approach. Parametric regression has an assumption to be filled up like normal distribution normal and constant variance. In applying parametric regression, a deviation to the assumption often happens like the normal distribution. Therefore, to avoid tight and robust assumptions, the statistics technique is not linked to the tight assumption and certain regression. One of the alternative ways to solve it is the nonparametric regression approach. This approach is used when the first information relates to a curve regression limit or not (Eubank, 1999).
Nonparametric approach methods that are often used are truncated spline and kernel estimator. Generally, growth patterns for babies tend to have changed at certain ages. The pattern has a form that cannot be determined so that when estimated using parametric regression, the results are inaccurate. Therefore, data case related to the percentage of children's babies suffering from malnutrition uses nonparametric regression (Pratiwi, 2017). Initial analysis shows that the pattern formed between the data on the percentage of malnourished children under five with several variables that influence it fluctuates at certain intervals. This characteristic is following the spline approach, which has high flexibility without the subjectivity of the researcher (Eubank, 1999). Besides, other variables that also influence it do not show any particular pattern in the data, so this pattern is compatible with the kernel approach, which can model data without certain patterns. Besides, this approach has also received special attention from researchers because it has a relatively fast convergence speed compared to other approaches (Hardle et al., 2004).
Two basic assumptions need to be considered when the nonparametric regression model is explored. The first assumption is the patterns in each independent multivariable are considered to have the same pattern. The second assumption is that researchers only use one form of the model estimator for each independent variable. In applying it in various cases, data patterns often differ from each of the independent variables. Therefore, if only one estimator is used to estimate the nonparametric regression curve, the estimator generated does not match the data pattern. As a result, the result regression model's estimation is less precise and tends to produce large errors (Budiantara et al., 2015). Based on the description explained, this study was conducted to model the percentage of malnourished children baby in the NTB Province using a nonparametric mixed truncated spline and kernel regression model.

Nonparametric Regression
Nonparametric regression is one of the approaches used to determine the relationship pattern between dependent and independent variables whose regression curve is unknown, or there is no complete past information about the shape of the data pattern. This approach has high flexibility because it is expected to find its own form of regression curve estimation without being influenced by the researcher's subjectivity factor. The general nonparametric regression model is as equetion 1.

 
is a regression function of unknown shape, and i  is an error that is assumed to be random with zero mean and constant variance (Eubank, 1999).

Mixed Estimator Nonparametric Regression Truncated Spline and Kernel
Consider data   The shape of the regression curve   i i z t ,  is assumed to be unknown and smooth, meaning continuous and differentiable. A random error i  has a normal distribution with zero mean and constant variance. The is assumed to be additive, meaning it can be written as equetion 3.
The main problem is sounding nonparametric mixed curve regression. It is how to get estimation curve regression form as defined below, with a vector of bandwidth parameters h and a vector of knot points k .
Regarding obtain an estimator of mixed spline truncated and kernel regression, regression   Moreover, the estimation of the kernel's regression curve   i z g can be presented as the following formula.
can be written as the following equations, respectively, with a kernel function  

Estimation of Mixed Nonparametric Regression Model Truncated Spline and Kernel
According to Ratnasari et al. (2016), to acquire the estimation curve of nonparametric regression using mixed truncated spline and kernel, given some lemmas and a theorem. Given the nonparametric multivariable regression model as in Equation (3)

Choosing Knot Points and Optimal Bandwidths on Mixed Truncated Spline and Kernel
The estimation of nonparametric curve regression mixed truncated spline and kernel depends on knot point and optimal bandwidth parameters based on the previous explanation. According to Budiantara et al. (2015), a method that can be used to chose knot point and optimal parameter bandwidth is Generalized Cross-Validation (GCV). This method is a modified form of Cross-Validation (CV). A case data by Fitriyani & Budiantara (2014) showed that GCV has a better result than CV, taking into account the model's goodness criteria. The following formula gives the GCV function as in the equation 13.

Goodness Criteria
One of the objectives expected by conducting regression analysis is to obtain the best model that can explain the relationship between the dependent variable and the independent variable based on specific criteria. Criteria used in measuring the goodness of fit of a regression model are to use the coefficient of determination 2 R and Mean Square Error (MSE). The smaller the MSE value obtained, the better the model obtained. Conversely, the model obtained will be better if the value of 2 R obtained is considerable (Cheng et al., 2014).

Residual Assumptions and Accuracy of Prediction
The mixed truncated spline and kernel model residuals must fulfill several assumptions, including being identical and independent. The purpose of testing this residual assumption is to provide certainty that the regression equation obtained has estimation accuracy, is unbiased and consistent. Moreover, one of the regression analysis objectives is to predict the average value of the dependent variable based on data from known independent variables. Therefore, the accuracy of predictions is essential to note. Several calculations can be used to calculate prediction errors based on the model, one of which is the Mean Absolute Percentage Error (MAPE) value (Wei, 2006). This measure is calculated by dividing the absolute error in each period by the actual data values. Furthermore, the percentage is calculated on average. This approach is useful when an evaluation measure of the accuracy of the prediction variable is desired and shows how much prediction error is generated (Khair et al., 2017). x .

C. RESEARCH METHODS
The research steps were divided into several stages, including (1) determining variables and collecting data; (2) forming scatter plots of data and determining the appropriate approach based on the scatter plots; (3) modeling the data based on the kernel characteristic and determining the optimal bandwidth based on the minimum GCV; (4) modeling the data using a nonparametric mixed regression model truncated spline and kernel, with various knot points and using bandwidths obtained in step (3), and calculating the GCV value for each of the determined model; (5) determining the optimal knots and bandwidths and modeling the data; (6) determining the criteria for the goodness of the model and testing the residual assumptions; (7) determining predictions and testing the accuracy of predictions based on MAPE value; and (8) concluding.

D. RESULTS AND DISCUSSION
The choice of truncated spline and kernel nonparametric regression approach for each independent variable can be seen from how the shape of the relationship patterns produced by factors that affect malnourished under-five years children. The scatter plots between the dependent variable   y and each independent variable   j x are shown in Figure 1. x forming specific patterns so that the approach used for these relationship patterns was the truncated spline nonparametric regression approach. The pattern of relationships generated y with , , , x x x x and 9 x the distribution of data produced does not form a particular pattern. The right approach to solve the relationship pattern is the nonparametric kernel regression approach.
According to (Budiantara et al., 2015), the main thing to do in processing the nonparametric kernel regression approach is by determining the optimum bandwidths. If the chosen bandwidth is too small, it will produce an under-smoothing curve regression. Otherwise, if the chosen bandwidth is too big, it will create Based on the optimal knot points and bandwidths, the obtaining parameter estimation for mixed model truncated spline and kernel nonparametric regression with one knot point for each related independent variables and optimum bandwidths are listed in Table 3. R value indicates that the independent variable used has a significant influence on babies suffering from malnutrition in the province of West Nusa Tenggara, and the rest is the influence of other variables. Therefore, it can be concluded that the mixed model attained is excellent and proper to use.
Furthermore, the residual assumption tests are needed to test the feasibility of the model derived. The analysis results show that the residual model has fulfilled the identical and independent assumptions for the residual model to be used for prediction. As one of the regression analysis objectives, Figure 2 below reveals the comparative line charts between the actual data and the prediction value attained, based on data from known independent variables. Based on the prediction acquired, MAPE value was then used to measure the accuracy of predictive values mixed spline truncated and kernel model. The results represent that the resulting model is the best and has a high degree of accuracy by considering the MAPE value that did not exceed the 10% margin of error (Khair et al., 2017). Therefore, the model obtained has the good prediction with a MAPE value of 0.306711%.

E. CONCLUSIONS AND SUGGESTION
Based on the implementation of a mixed truncated spline and kernel nonparametric regression model on the percentage of malnourished children in West Nusa Tenggara Province, a good model was found with an MSE value of 0.008353 and 2 R a value of 99.998838%. In addition, this model produced a value of predictive accuracy, measured by MAPE value of 0.3067109%. The MAPE value indicated a high degree of accuracy so that the model obtained has an excellent forecast. Furthermore, future research is suggested to conduct mixed model spline truncated and kernel with more than one knot point.