Classiﬁcation Of Perceptions Of The Covid-19 Vaccine Using Multivariate Adaptive Regression Spline

Indonesia is one of the countries infected with the covid-19 virus. One of the government’s efforts is the covid-19 vaccination. However, the covid-19 vaccination caused controversy for some people because many people refused to be vaccinated. Public perception of the covid-19 vaccine can be categorized into two, namely positive and negative, based on survey from Indonesia ministry of health about acceptance of covid-19 vaccine state that this can be inﬂuenced by many factors. These factors are important to know as an effort to increase acceptance of covid-19. Multivariate Adaptive Regression Splines (MARS). The purpose of this study is to determine the classiﬁcation model of public perception of the covid-19 vaccine and the factors that inﬂuence it. The method used in this study is Multivariate Adaptive Regression Splines (MARS). This method is appropriate classiﬁcation method to be applied to categorical response variable data, The outcomes demonstrate that the optimum mars model is produced by combining BF = 24 , MI = 3 , MO = 1 , and GCV = 0 . 07340546 . The resulting classiﬁcation level is 91.5% with inﬂuencing factors yaitu gender ( x 1 ) , age ( x 2 ) , last education ( x 4 ) , willingness to vaccinate ( x 6 ) , education ( x 8 ) . Based on the results obtained, the government can consider these factors for socialization


A. INTRODUCTION
The city of Wuhan in China has reported the emergence of the corona virus since December 2019, later named Severe Acute Respiratory Syndrome Coronavirus 2 (Sars-Cov-2). Sars-Cov-2 is a virus that produces a group of atypical pneumonia diseases that spread rapidly throughout the world and are known as coronavirus disease 2019 (covid-19) (Kim et al., 2020). The Covid-19 emergency was designated a public health emergency on January 30, 2020 when the World Health Organization (WHO) stated that the pandemic was of international concern (PHEIC). The World Health Organization (WHO) formally classified covid-19 a pandemic on March 11, 2020. Covid-19 symptoms include a cough, fever, diarrhea, shortness of breath, myalgia, sore throat, headache, and tiredness (Vkovski et al., 2021). More than 114 countries had been affected by the covid-19 infection as of October 25, 2020 and there had been more than 43,140,173 confirmed cases and more than 1,155,235 fatalities as a result (Ozkara et al., 2020).
One of the nations with the covid-19 infection is Indonesia. In March 2020, the covid-19 pandemic was first reported. The covid-19 vaccine is one of the strategies the state has put in place. The covid-19 vaccine effort, however, did not go as planned. Some people were upset by this, including some community groups who opposed vaccinations. People's opinions of the covid-19 vaccination will change as a result of the propagation of false information, which will also change how they act. Based on information from the internet, particularly social media, decisions and choices are made (Rizma et al., 2020).
Perception is a process of selection, arrangement, and completion by individuals who interpret information as meaningful logical images. Perception occurs when a person imitates external stimuli and is captured by others and then enters the brain. Perception is 137 138 | RIZKI FITRI ANANDA JURNAL VARIAN | e-ISSN: 2581-2017 the process of using sensory tools to find information to be understood (Duri Kartika et al., 2015). Indonesian people's perceptions of the covid-19 vaccine can be categorized into two, namely positive and negative (Priadi, 2017), this can be influenced by many factors which of course can be different for each individual. These factors can be analyzed using statistical methods.
In statistical methods, there are classification techniques used to organize data systematically. A good classification method produces a minimal number of errors. One of the method for data classification is Multivariate Adaptive Regression Spline (MARS). This method is created by Friedman (1991). MARS is commonly used to solve two types of statistical problems: continuous and categorical response variables. MARS is a flexible method for determining nonlinear relationships between response variables and predictor variables that does not depend on model assumptions from the regression method. The MARS technique produces continuous knot models based on the smallest generalized cross-validation (GCV) values, which can be used to overcome the difficulties of high-dimensional data and produce accurate predictions of response variables (Addini et al., 2023). Data with many predictor variables is referred to as high-dimensional data. High-dimensional data is data that has a number of predictor variables of 3 ≤ n ≤ 20 (Zurimi et al., 2020). Based on this explanation, the MARS method is very appropriate to use on public perception data of the covid-19 vaccine, where this data is high-dimensional data with a total of 8 variables and this data is categorical data with perception categories, namely positive and negative. The purpose of this study was to analyze public perceptions of the covid 19 vaccine where this problem is still very new and no one has discussed it using the MARS method.

Multivariate Adaptive Regression Spline (MARS)
Multivariate adaptive regression splines (MARS) is a nonparametric regression technique that can be used to identify nonlinear relationships and interactions between response and predictor variables. The MARS method is based on a statistical approach, dependent and independent variable data with a series of splines different slopes (Friedman, 1991). The endpoint of splines (knots) mark the end and beginning of another dataset. resulting in piecewise curves called basis functions or hinge functions (Qureshi et al., 2022). The base function's maximum is 2-4 times the quantity of predictor variables. The analysis of MARS model use Earth package (Milborrow, 2021). The following equation can be used to represent the general MARS model.
Description: a 0 = regression constant of the base function a m = base function's m-th coefficient, where m = 1, 2, . . . , M M = maximum base function K m = interactions in the m-th base function in terms of number S km = ±1 if the data is to the right or left of the knot point X v(k,m) = the v-th predictor variable, the-select and km subregion t km = the knot value of the predictor variable X v(k,m) (Kharisma et al., 2021)

MARS Parameter Estimation
The estimation method used in this study is Ordinary Least Square (OLS). The MARS model can be written in the form of a matrix, namely (Zurimi et al., 2020): with: In order to get an estimator a, the OLS method is carried out by minimizing the square of error by squaring the equation as a linear regression. So that α OLS is obtained as follows:

Choosing the Best MARS Model
The forward stepwise and backward stepwise algorithms are used to determine knots on MARS, and they are based on the least Generalized Cross Validation (GCV) value. In other words, the knot point that was chosen has a minimum GCV value. This demonstrates that the best way to choose a model is to look at its GCV value when it was built using the value of a particular basis function. The model with the lowest or minimum GCV value among the others is the one that is the best. The following equation gives the definition of the minimum GCV function (Addini et al., 2023): 4. Significance Test of MARS Model 1. Simultaneous testing Hypothesis formulation (Azmi & Perdana, 2021): Source: (Addini et al., 2023) Critical area Hypothesis formulation: H 0 = a m = 0 (a m coefficient has no effect on the model) H 1 = every a m = 0 dengan m = 1, 2, . . . , M (a m coefficient affects the model) Statistic test : The entries on the major diagonal of the matrix (B T B) −1 are denoted as C ij (Azmi and Perdana, 2021).

Multicollinearity Assumption
A multicollinearity test determines whether or not predictor variables in a regression model have a high correlation. If the predictor variables have a high correlation, the relationship between the predictor variables and the response variables will be disrupted. The value of Tolerance and VIF (Variance Inflantion Factor) indicates multicollinearity. A regression model is said to be multicollinear if it has a VIF value of < 10 and a tolerance number greater than 0.10. The following equation can be used to calculate the VIF value: Source: (Kim, 2019) 6. Classification Accuracy The Apparent Error Rate can be used to calculate the model prediction error on grouping results (APER). APER is a measurement metric used to determine the possibility of a classification function's misclassification (Kharisma et al., 2021). The APER value represents the proportion of samples incorrectly classified by the classification function . In this study, a binary response variable was used so that the classified error could be calculated using the Table 1: Table 1. Classification Table   Results Observation Predictions Results y 1 y 2 y 1 n 11 n 12 y 2 n 21 n 22 Source: (Utami et al., 2020) Information: y 1 : response variable category 1 y 2 : response variable category 2 n 11 : the number of y 1 observations that are correctly classifie as y 1 n 12 : the number of y 1 observations that are incorrectly classifie as y 1 n 21 : the number of y 2 observations that are incorrectly classified as y 1 n 22 : the number of y 2 observations that are correctly classified as y 1 AP ER(%) = n 21 + n 12 n 11 + n 12 + n 21 + n 22 × 100% (8)

B. RESEARCH METHOD
This study makes use of primary data from surveys that were distributed in NTB with ages 18-59 years . The number of respondents is 390 respondents. The study's response variable is the public's view of the covid-19 vaccination, with categories 1 and 0 denoting favorable and negative assessment, respectively. While the predictor variables are gender (x 1 ), age (x 2 ), employment status (x 3 ), last education (x 4 ), insurance ownership status (x 5 ), willingness to be vaccinated (x 6 ), history of non-communicable diseases (x 7 ), and education on covid-19 vaccines (x 8 ). Descriptive statistics aims to describe the object of research taken from a sample or population to produce useful information. In this study, descriptive statistics are useful for knowing the characteristics of people's perceptions of the covid-19 vaccine. Furthermore, the distribution table of respondents based on predictor variables is as follows:

Test for Multicollinearity
A multicollinearity test can determine whether the predictor variables in a regression model have a high degree of correlation. If a regression model's VIF score is less than 10 and its tolerance number is greater than 0.10, it can be said to be multicollinearityfree. It is clear from the results in Table 4 that there is no multicollinearity in the data because every variable has a VIF value that is less than 10.

Constructing the Best Mars Model
The  The MARS model with the minimum GCV value is the best one. The number of basis functions (BF ), maximum interaction (M I), and minimum observation (M O) are combined to get the GCV value. Based on the table above, there are 36 models, and the combination of BF = 24, 32, M I = 3, and M O = 1, 2, 3 results in the best model. The next step is to find the lower MSE value if there are many best models based on the same GCV value. The three models all have the same lowest MSE of 0.06571309 for BF = 24, 32 with M O = 1, 2, 3, hence the value of classification accuracy will be examined next. For the three models, the classification accuracy value is 91.53%. Therefore, the smallest combination of models, BF = 24, M I = 3, and M O = 1 with a GCV value of 0.07340546, is examined in the final stage to find the optimal model. Consequently, the resulting MARS model is : : last education variable (x 4 ) SA : variable of willingness to vaccinate (x 6 ) E : covid-19 vaccine education variable (x 8 ). Based on the MARS model which is formed from 8 variables, there are 6 variables that influence covid-19 vaccine perception in the general public, namely last education (x 4 ), age (x 2 ), education (x 8 ). insurance status (x 5 ), willingness to vaccine (x 6 ) and gender (x 1 ). The following is an explanation of the model formed in the equation: 1. BF 1 = (SV ) with a coefficient of 0.2003294 in the model means that if the variable willingness to be vaccinated is equal to 1, namely, willing to be vaccinated, it will increase positive perceptions by 0.2003294 2. BF 2 = max(0, 1 − U ) with a coefficient of -0.6303254 in the model means that if the age variable is more than smaller than 1, namely, 18-25 years old, it will reduce positive perceptions of the covid-19 vaccine of -0.6303254 . 3. BF 3 = JK * max(0, 1 − U ) with a coefficient of -0.1023236 in the model means that if the sex variable is equal to 1, that is, males interact with the age variable smaller than 1, namely 18-25 years, it will reduce positive perceptions against the covid-19 vaccine is -0.1023236 . 4. BF 4 = max(0, 1 − U ) * SA with a coefficient of -0.1267894 in the model means that if the age variable is smaller than 1, namely 18-25 years, the insurance ownership status variable is equal to 1, namely BPJS/Private it will reduce the positive perception of the covid-19 vaccine amounted to -0.1267894. 5. BF 5 = SV * max(0, 1−E) with a coefficient of -0.1918498 in the model means that if the variable willingness to be vaccinated is equal to 1, that is, the willingness to be vaccinated, interacts with the educational variable smaller than 1, i.e. never, it will reduce positive perceptions of covid-19 vaccine amounted to -0.1918498. 6. BF 6 = max(0, 1 − U ) * max(0, P − 2) with a coefficient of 0.6741352 in the model means that if the age variable is smaller than 1, namely 18-25 years, it interacts with the last education variable greater than two, namely D3 /S1/S2/S3 will increase positive perceptions of the covid-19 vaccine of 0.7056162. 7. BF 7 = max(0, 1 − U ) * max(0, E − 1) with a coefficient of -0.1242206 in the model means that if the age variable is smaller than 1, namely 18-25 years, it interacts with education variables greater than one or often it will reduce the positive perception of the covid-19 vaccine amounted to -0.1242206. 8. BF 8 = max(0, 1 − U ) * max(0, P − 2) * max(0, 1 − E) with a coefficient of -0.714493 in the model means that if the age variable is smaller than 1, namely 18-25 years old interact with the last education variable being greater than two, namely D3/S1/S2/S3 and the education variable being less than one or never , it will reduce positive perceptions of the covid-19 vaccine. amounted to -0.714493. Based on the value F count obtained, the decision is rejected H 0 because the value of F count > F α (k; n − k − 1) or 129.5531898 > 1.96 so that the conclusion is a significant model.

Partial Test
Hypothesis formulation : H 0 : a m = 0 (the coefficient a m has no effect on the model) H 1 : a m = 0 with m = 1, 2, 4, 5, 6, 8 (coefficient a m effect on the model) The following table shows the calculated t value: Table 7. T Count Score Values t table as follows: t alpha/2 (n − k) = t 0.025 (382) = 1.96 Based on the value t count obtained, the decision is rejected H 0 because the value for all |t count | > t alpha/2 (n − k) so that in conclusion each coefficient has an a m effect on the model.

MARS CLASSIFICATION
The following table shows the results that were obtained: The APER value can be calculated in the following equation: AP ER(%) = n 21 + n 12 n 11 + n 12 + n 21 + n 22 = 6 + 27 160 + 27 + 6 + 197 = 33 390 = 0.085 = 8.5% So the value of classification accuracy is 100%-8.5%=91.5% Based on the APER value obtained, it shows that the MARS method in this case obtained good results with a classification level of 91.5%, so that these results can be used as a reference for government in increasing acceptance of the covid-19 vaccine and for further researchers. The difference between this study and previous research is in the cases used. The case that will be analyzed in this study is public perception of the Covid-19 vaccine, where previously no one has discussed this case using statistical methods, especially the MARS method.

D. CONCLUSION AND SUGGESTION
Based on the results, the conclusion obtained is the MARS model used to classify peoples perceptions of the covid-19 vaccine is a model with a combined value of BF = 24, M I = 3, and M O = 1 because it has a GCV of at minimum 0.07340546. From the model obtained, there are 6 variables that affect the model, namely age (x 2 ), last education (x 4 ), willingness to vaccinate (x 6 ), insurance ownership status (x 5 ), education (x 8 ) and gender (x 1 ) with level of classification accuracy obtained is 91.5%. so that the government can determine further efforts to increase vaccine acceptance by considering these factors. The next researcher is advised to expand on this research by increasing the number of predictor factors and observed response variables, or by applying MARS methods or by applying the MARS method using different cases.