Robust Spatial-Temporal Analysis of Toddler Pneumonia Cases and its Inﬂuencing Factors

Pneumonia is a disease that causes inﬂammation of the lungs and is one of the most common diseases infecting toddlers. As a directly infectious disease, there is a possibility of the inﬂuence of location diversity on the number of pneumonia sufferers. This study was conducted to model pneumonia sufferers under ﬁve and to ﬁnd out the factors that signiﬁcantly affect the number of sufferers in each observation. Robust Geographically and Temporally Weighted Regression (RGTWR) is a method used to model data by considering the heterogeneity of location and time and to overcome outliers in the data. The data used is the number of pneumonia sufferers aged under ﬁve and the factors that are thought to inﬂuence it, namely the number of health centers, population density, percentage of children under ﬁve with complete basic immunizations, percentage of children under ﬁve who are exclusively breastfed 0-6 months, and percentage of poor people. RGTWR produces an optimal model with an R 2 value of 99.9997%, a Mean Absolute Deviation of 21.6852, and a Median Absolute Deviation of 6.9661 compared to Geographically and Temporally Weighted Regression model. Variables number of puskesmas, percentage of infants with complete basic immunization, and percentage of poor population are factors that inﬂuence the number of pneumonia sufferers under ﬁve in most locations in 34 provinces and 5 years of observation.


A. INTRODUCTION
Pneumonia is one type of direct infectious disease. The Ministry of Health of the Republic of Indonesia through the publication of Profil Kesehatan Indonesia stated that the handling of pneumonia has so far been prioritized on controlling pneumonia under five (Kementerian Kesehatan Republik Indonesia, 2020). The number of children aged 0-4 years in Indonesia in 2018 is estimated to be around 23,729,583 people (Kementerian Kesehatan Republik Indonesia, 2019). Global estimates show that in that year, every hour 71 children in Indonesia contracted pneumonia and more than 19,000 children under five died from pneumonia, or more than 2 children under five every hour (UNICEF, 2019). The importance of preventing pneumonia in children under 5 years of age should be considered, including considering the factors that influence on increasing the number of pneumonia sufferers in children under five.
Spatial analysis techniques have proven useful in understanding the epidemiology of infectious diseases. For example, it can identify new risk factors or areas of increased risk that could benefit from targeted public health interventions (Blain et al., 2014). Referring to previous research, pneumonia as an infectious disease causes a possible spatial effect on the number of cases (Alwi et al., 2022;Nadya, 2017;Renika et al., 2021;Ristiani et al., 2021). For example, research conducted by (Nadya, 2017), shows that in general the factors that influence the number of pneumonia sufferers in toddlers in West Java are the number of severely malnourished toddlers, the percentage of infants who are fully immunized, and the number of health centers. Research by (Ristiani et al., 2021) shows the results that the number of poor people, population density, malnutrition, and water suitable for consumption JURNAL VARIAN | e- ISSN: 2581ISSN: -2017 are the significant factors that affect the number of toddlers with pneumonia in West Java. Spatio-temporal modelling is the process of extracting hidden and useful knowledge from large spatio-temporal datasets, finds wide-ranging applications in the geospatial domain (Liu et al., 2017;Yin et al., 2016). One method that can be used to analyse the factors that influence a variable by considering different locations or the presence of spatial diversity is the Geographically Weighted Regression (GWR) model (Fotheringham et al., 2003). GWR is a popular local regression method used in various disciplines (Wu et al., 2019). Geographically Weighted model is suitable when the data is not well explained by the global model (Gollini et al., 2013). The GWR models produce local parameter estimators for each observation location, contrast to the linear regression model (Yasin et al., 2012). Then, to get more accurate parameter estimates, a time component should be added to the GWR model (Fotheringham et al., 2015). Adding the time element in GWR can be done using the Geographically and Temporally Weighted Regression (GTWR) method. This explains the spatial and temporal heterogeneity (Guo et al., 2017;Huang et al., 2010;Ma et al., 2018;Wu et al., 2018). Unlike standard GWR models, GTWR combines temporal and spatial information in a weighting matrix to identify the presence of spatial and temporal heterogeneity (Conita and Purwaningsih, 2020).
GTWR is not robust to outliers. Therefore, outliers in the data create biased coefficient estimates and errors in completing regression relationships (Zhang and Mei, 2011). One method that accommodates outliers is Robust Geographically and Temporally Weighted Regression (RGTWR) using M-Estimator. Robust regression is a regression method that is used when the distribution of the residuals is not normal and/or when there are a large number of outliers that influence the model (Olive, 2008). The M-Estimator is a robust regression method that is commonly used which was introduced by Huber (Huber, 1965). The parameter estimation in this method uses an iterative process called iteratively reweighted least squares (IRLS), and used to build outlier-resistant models (Siswanto et al., 2017). Several previous research results show that RGTWR model produces a better model than GTWR model in explaining data conditions (Erda et al., 2019;Haryanto et al., 2019;Putra, 2019). For example, (Putra, 2019) modelled the RGTWR using the S-Estimator in cases of crime rates in East Java. This research produced a GTWR model with a high coefficient of determination, which is 97.5%, but there are many outliers in the model error. RGTWR modeling with the S-Estimator then produces a higher coefficient of determination, amounting to 98.2%, and the value of the goodness models which is smaller than the GTWR model.
Based on the description above, the purpose of this study was to model the RGTWR with the M-Estimator in pneumonia patients under five years of age in Indonesia and obtain the factors that significantly influence the number of cases. Research related to factors that affect the number of under-five pneumonia sufferers previously used many spatial, temporal, or focused aspects to overcome outliers with robust methods, but no one has used all three. In this study, these three aspects will be used, so that the spatial and temporal aspects of the factors thought to influence changes in the number of pneumonia sufferers under five can be accommodated, as well as outlier problems that can reduce the accuracy of the analysis results can be overcome. The data used in this study is the number of under-five pneumonia sufferers in Indonesia during the period 2016 to 2020 and the factors that can influence it.
Aside from the various data and research locations, the novel aspect of this study is the use of a robust M estimator in estimating the regression parameters, as well as the use of two robust objective functions in the estimation process, namely the Tukey and Huber objective functions, with the Tukey objective function being the objective function used because it produces the highest coefficient of determination.
where the element vector f is: with: e i : Residual at observation-i Z : A Matrix of size n × (p + 1) containing the standardized vectors for each observation. H 0 will be rejected if BP value > χ 2 (α;p) (Breusch and Pagan, 1980

Geographically and Temporally Weighted Regression
Geographically and temporally weighted regression (GTWR) is a method developed for GWR that accommodates for spatial and temporal heterogeneity (Huang et al., 2010). The general equation for the GTWR model of the response variable p with the explanatory variable y i at the location (u i , v i , t i ) is as follows: with: y i : The observed value of the response variable for the-i observation location : Observation value of the k-th explanatory variable at the i-th observation location with k = 1, 2, . . . , p ε i : Observation residual at location-i Parameter estimation is written as follows.

Robust Geographically and Temporally Weighted Regression
A Robust regression for GTWR is done by adding weights to Equation 2. Robust Geographically and Temporally Weighted Regression (RGTWR) model for the i-th location and time with outliers is: RGTWR using M-Estimator is done by minimizing the error with equation: The M-Estimator is performed using the Iteratively Reweighted Least Square (IRLS) method, or weighted least squares iteration. In this iteration, the value of w i changes its value each iteration, so that: with m is the number of iterations. At the given weight W m we get the estimator: The computation of the above equation is repeated until a convergence estimator is obtained, that is, until the difference between the values of (u i , v i , t i ) m+1 and β(u i , v i , t i ) m approaches zero (Erda et al., 2019).

B. RESEARCH METHOD 1. Data
The data used in this study is secondary data on pneumonia sufferers aged under five in 34 provinces of Indonesia in 2016-2020 obtained through the publication of Badan Pusat Statistik (BPS) and also the publication of the Minister of Health of the Population density X 3 Percentage of infants with complete basic immunization X 4 Percentage of infants exclusively breastfed 0-6 months X 5 Percentage of poor people

Analysis Step
The research flow is explained through the flowchart according to Figure 1, and explained in detail through the following steps: 1. Exploring data on the number of pneumonia sufferers in toddlers 2. Perform linear regression modelling and test residual assumptions 3. Checking the spatial diversity and time diversity of the data using a boxplot and the Breusch-Pagan test using Equation 1 4. GTWR model analysis (a) Determination of temporal spatial ratio parameter (τ ) (b) Determination of spatial parameters (λ) and temporal parameters (µ) . 5. Detect outlier data using a boxplot 6. Analyze RGTWR model using M-Estimator for each observation in location and time (a) Calculate the value of y i using the formula:  From Table 2, every year the median and average pneumonia sufferers aged under five in Indonesia are decreasing, and a large decline occurred in 2020. One of the reasons for this large decline was the stigma of COVID-19 sufferers which affected the decrease in the number of visits. toddler cough or difficulty breathing at the Puskesmas. In 2019 the number of visits for toddlers with coughing or difficulty breathing was 7,047,834 visits, in 2020 it was 4,972,553 visits, a decrease from visits in 2019 which ultimately resulted in the discovery of pneumonia under five (Kementerian Kesehatan Republik Indonesia, 2020)

Multiple Linear Regression Modelling
Linear regression analysis is a method of linearly describing the relationship between response and predictor variables (Draper & Smith, 1998). Multiple linear regression modelling was carried out on the data on the number of pneumonia patients aged under five. In the simultaneous test of the model, the results are as shown in Table 3 with the hypothesis used as follows. H 0 : β 0 = β 1 = . . . = β 5 = 0 H 1 : At least one β k = 0; k = 0, 1, 2, . . . , 5 Obtained p − value = 0.000 < α = 0.05, then reject H 0 , there is at least 1 parameter that affects the number of pneumonia patients aged under five. The independent variables simultaneously affect the number of pneumonia sufferers under five. Subsequently, a partial test of the parameters of the linear regression model was carried out as shown in Table 4 with the following hypothesis.
H 0 : β k = 0 H 1 : β k = 0; k = 0, 1, 2, . . . , 5 distributed or not. The Kolmogorov-Smirnov test results yield a value of 0.52353 and a p − value < 2.2 × 10 −16 , which is smaller than the significant level of 0.05, so the decision taken is that the residual is not normally distributed.

Independent Test
The autocorrelation test was conducted to determine the residuals of the multiple linear regression models were independent of each other or not (Draper and Smith, 1998 it is concluded that there is no correlation between residuals, and the independent assumptions of the model are met.

Identic Test
Identical assumption (homoscedasticity) means that the variance on the residuals is the same or identical. The opposite is heteroscedasticity, which is a condition if the residual variance is not identical (Gujarati, 2003). The Glejser test was carried out to see whether there was no heteroscedasticity problem in the model, the F test value was 15.683 and the p-value was 0.000. Based on the decision criteria for the Glejser test, because the value F test = 15.683 > F (α,p,(n−p−1)) = 2.2693 and p−value = 0.000 < α = 0.05, the identical assumption is not met, there is a heteroscedasticity problem in the residual model. One of the possible causes of the data not meeting the assumption of homoscedasticity is the presence of spatial heterogeneity, which will be tested using the Breusch-Pagan test.

Multicollinearity Test
A Multicollinearity test was performed to determine if there was a correlation between the independent variables. A VIF score > 10 indicates the presence of multicollinearity (Laksana, 2018). Based on the test results as shown in Table 5, it is found that the VIF value for all independent variables is less than 10, so it can be concluded that there is no correlation between the independent variables.

Spatial and Temporal Heterogeneity Test
The Breuch-Pagan test was conducted to determine the presence or absence of spatial heterogeneity in the data. After testing, the value of = 39.3860 and p − value = 1.985 × 10 −7 . BP value = 39.3860 > χ 2 0.05,7 = 11.0705 and p − value = 1.985 × 10 −7 < α = 0.05, it can be concluded that there is spatial heterogeneity in the data. Furthermore, spatial heterogeneity testing was carried out using a boxplot diagram.

Modelling Geographically and Temporally Weighted Regression
The initial step taken in GTWR modelling is to determine the kernel function that will be used for weighting, with the selection criteria being the kernel function with the optimum bandwidth that produces the smallest Cross Validation value, which is the best kernel function used for GTWR modelling.  Table 6 shows that the Exponential kernel function produces the smallest CV value compared to the optimal bandwidths of the other kernel functions, so the Exponential kernel function is chosen for weighting the GTWR model. Next, determine the bandwidth value and calculate the parameters that will be used in GTWR modelling.  Table 7 shows the obtained GTWR model parameter values. The spatial temporal bandwidth value of 0.9875 indicates that the area that is less than or equal to 0.9875 has a large influence on the i-th observation point. GTWR modelling produced 170 models from 34 provinces and 5 years of observation. The following shows an example of the GTWR model obtained, namely the model for Aceh Province in 2016.    Figure 4 shows the distribution of the parameter β 1 , the number of puskesmas. From Figure 4 it can be seen that most of the parameters have positive values. This mean that an increase in the number of puskesmas will also increase the number of pneumonia sufferers found. Figure 5. Distribution of parameters β 2 Figure 5 Shows the distribution of the parameter β 2 , namely population density. From Figure 5 it can be seen that most of the parameters show positive values, meaning that the greater population density will also increase the number of pneumonia sufferers. West Sumatra Province is the province that has the highest parameter value, it means that in that province, population density has a big effect when compared to other observations.    Figure 8 shows the distribution of the parameter β 5 , the percentage of poor people. Most of the parameters are negative, meaning that the higher the percentage of poor people, the number of pneumonia sufferers found will decrease. The following are some of the RGTWR models obtained.
Aceh Province in 2016: North Sumatra Province in 2016: y = 3.6458 + 6.8796X 1 + 30.1988X 2 + 0.3026X 3 − 51.9513X 4 + 15.9813X 5 Figure 9. Boxplot Residual Model RGTWR Figure 9 shows the detection of outliers using a boxplot on the RGTWR model which resulted in 32 observations whose model residuals were detected as outliers. Next, the best model was selected using the criteria of Mean Absolute Deviation (MeAD), Median Absolute Deviation (MdAD) and R 2 .  Table 8 shows that the RGTWR model succeeded in reducing the MdAD and MeAD values from the GTWR modelling. This means that the RGTWR modelling with the M-Estimator has an estimated value that is closer to the actual value. The R 2 value of the RGTWR model is also higher than the GTWR model, so it can be said that the RGTWR model is able to explain the diversity of models better than the GTWR model. Compared with previous research, for example, research by (Febrianti et al., 2023) modeling the incidence of pneumonia in toddlers in Bandung using the GTWR method, obtained a coefficient of determination of 87.83%. Research conducted by (Renika et al., 2021) who modeled the extrinsic risk factors for pneumonia in toddlers on Java Island in 2018 using GWR obtained a coefficient of determination of 46.2617%, higher than modeling with multiple linear regression of 35.9%. This study used the RGTWR by taking into account spatial, temporal, and outlier factors, which resulted in a model with a high coefficient of determination, indicating that the method used was correct compared to the GTWR method or multiple linear regression. Based on this, it can be concluded that the RGTWR model with the M-Estimator is better used in modelling pneumonia patients aged under five in Indonesia.
The results of the RGTWR modeling with the M-Estimator show that the variability variable has a significant effect on the number of cases of pneumonia under five each year, both in terms of the magnitude of the influence and the direction of the influence of the variable, namely positive or negative effect. In general, the number of puskesmas, the proportion of infants receiving complete basic gymnastics, and the proportion of poor people are variables that significantly influence the number of pneumonia sufferers under five in most of the observations. Compared to previous studies (Nadya, 2017;Renika et al., 2021;Vol. 6 Ristiani et al., 2021), the factors that influence the number of pneumonia sufferers under five show mixed results, but there are several significant variables that are the same. In line with research conducted by Erda (Erda et al., 2019), the RGTWR method produces a model that is better at describing data conditions, compared to the GTWR method and multiple linear regression.

D. CONCLUSION AND SUGGESTION
The number of cases of pneumonia patients under five in Indonesia can be modelled using the RGTWR method with the M-Estimator. This is due to the spatial and temporal heterogeneity of the data, and the outliers in the GTWR model so that the RGTWR method is appropriate. The RGTWR model with M-Estimator produces a MeAD value of 21.6852, an MdAD of 6.9661 and an R 2 value of 99,9997%. Variable number of puskesmas, percentage of infants with complete basic immunization, and percentage of poor population are variables that affect the number of pneumonia sufferers under five in Indonesia in most of the observations in 34 provinces and 5 years of observation.
To produce a more optimal model for all observations, it is advised to conduct future research in an area with fewer islands. Alternatively, adaptive bandwidth, which has a different bandwidth value for each observation, can be used in conjunction with other robust estimators in RGTWR modeling to produce the best model.