Application of Principal Component Regression in Analyzing Factors Affecting Human Development Index

The human development index is an indicator to measure the quality of people’s lives. If the human development index number increases, the better the quality of people’s lives. There are many factors or variables that affect the level of the human development index, ranging from economic issues, education, health and other factors. However, not all factors have a positive and signiﬁcant effect. Thus, this study aims to determine the factors that signiﬁcantly affect the human development index in South Sulawesi. The method used in this study is principal component regression which involves many variables. The variables involved are expected length of schooling, average length of schooling, percentage of population with the highest Diploma, Bachelor and Masters education, school enrollment rate for people aged 7-24 years, percentage of poor people, spending per capita, and life expectancy. From the results of data processing using principal component analysis, 4 main components are obtained which represent the other components, for principal component regression, taking into account the cumulative proportion of > 80% . The results of this study indicate that the human development index in South Su-lawesi is inﬂuenced by all the variables involved, which is equal to 95.7%. With the variable percentage of poverty being one of the variables that has a negative effect on HDI in South Sulawesi which shows that the higher the percentage of poverty, the lower the human development index. Thus, in order to increase the human development index in Indonesia, it is necessary to take strategic steps to improve people’s welfare.


A. INTRODUCTION
The very important role of human resources in the creation of a development which aims to create a healthy community environment and live a productive life. To achieve sustainable development, Human resources must be able to develop and optimize their capabilities. In a simple sense, development can be interpreted as an effort or process to carry out change for the better. In its implementation, development has various the complexity of the problem. The development process occurs in all aspects of people's lives, both economic, political, social and cultural aspects (Mahrany, 2012).
In 1990 the United Nations Development Program (UNDP) for the first time introduced the concept of the Human Development Index (HDI) or the Development Index Humans, where this concept combines the life expectancy index, education index and purchasing power index.
The human development index is an indicator to measure the quality of life or community welfare. A nation and a country is said to be advanced when it has a human development index that continues to increase. Because the human development index (HDI) has an indicator of knowledge, and a decent standard of living. So to know or calculate the human development index can be seen 199 200 | SUMARNI SUSILAWATI JURNAL VARIAN | e-ISSN: 2581-2017 from three things, namely education, health and economy. For education, it can be measured from the school participation rate, the expected length of school and the average length of schooling as well as several other things that support education. Meanwhile, from the economic sector, it can be seen in terms of the work participation rate, poverty rate, unemployment rate, per capita expenditure and also the Gini ratio. While the health factor can be measured from the human life expectancy.
Based on data from the Central Statistics Agency for South Sulawesi, the HDI in South Sulawesi in 2021 increased by 0.31% from the previous year. Where it currently reaches 72.24%, which is accompanied by an increase in the school participation rate for children aged 7 years which reaches 13.52%. This figure is almost the same as the population aged 25 years and has completed education at the diploma and undergraduate level as much as 13.45% (Badan Pusat Statistika., 2022).
Other factors that can also affect the HDI in South Sulawesi are the per capita expenditure figures, the percentage of poverty and unemployment. The rate of per capita expenditure rose to 0.95%, and the percentage of poverty also increased by 0.11%. Meanwhile, the unemployment rate decreased by 0.59%.
Research related to the human development index was carried out by Winanda (2021) with the results of research that, the percentage of poor people and economic growth, has a significant negative effect on the development index in South Sulawesi. On the other study (Hasanah et al., 2021) conducted research on the human development index in South Sulawesi using a nonparametric spline regression, with the results of the research that the participation rate of work rate, student and school ratio, population density, health facilities and gross regional domestic product greatly influence the increase in the human development index in South Sulawesi. Furthermore, (Humaira and Nugraha, 2018) the influencing variables HDI are Life Expectancy (AHH), Adjusted Per Capita (Expenditure), School Average (RLS), School Expectation (HLS), and Gross Regional Domestic Product at Constant Price (GRDP).
Referring to the existing data, there are many factors related to the increase in HDI. However, it is necessary to know which factors are more dominant in supporting the improvement of the HDI in South Sulawesi, so that plans and strategies for increasing the HDI can be more well planned and on target. To find out, further analysis using statistical methods is needed. The method used in research involving many factors or multivariable is multiple regression and factor analysis which is divided into several methods. One of them is principal component analysis (PCA).
Several previous studies only involved two to six independent variables which were considered to have an effect on the human development index, so this study will involve more factors which are considered to have an effect on the development index in South Sulawesi, such as old school expectations, average length of schooling, percentage of population with highest education Diploma, undergraduate and postgraduate, school enrollment rate for population aged 7-24 years, gini ratio, percentage of poor people, expenditure per capita, labor force and life expectancy. So that principal component regression (PCR) will be used, which is a combination of principal component analysis and regression analysis. This method is used because it involves many factors or variables that allow multicollinearity between variables to occur. Or these variables have a very close correlation which will produce a large standard error that affects the results of the analysis. So several studies suggest using PCR to overcome multicollinearity problems in data.

Human Development Index
The Human Development Index (HDI) explains how the population can access development outcomes in terms of income, health, education and so on. HDI is formed on three basic dimensions, namely a long and healthy life, knowledge and a decent standard of living. Economic development is an important topic for developed countries and developing in search of growth in production and consumption. There are several benefits for countries that want to improve economic development through investment human development. The main benefit is to improve the welfare of citizens country (Ridha and Parwanto, 2020).
Local government responsible for the needs of the population working on development human resources by allocating funds to the education and health sectors. Cost training is used to build and maintain facilities and infrastructure for guide quality investment models (Muliza et al., 2017).
Government spending can be interpreted as a financial plan that describes policy choices for a certain period in the future will come. Consumption pattern describes budget allocation based on structure certain (Astri et al., 2013). Depending on the form of spending used, provincial government spending is called the selected budget. According to In the Indonesian budget method, there are two types of government spending: daily expenditures and development expenditures. Daily expenses are operational expenses government and payment of wages to employees and other expenses. Shopping Development is expenditure which is classified as government investment expenditure. Including investment in education and health (Astri et al., 2013).
HDI was introduced by the association of nations published by the Unit Nation Development Program (UNDP) in 1996. In the publication it was written that human development is a process to improve aspects of people's lives. The aspect of life in question is an adequate level of education, healthy living and a decent life. HDI is one of the important indicators in seeing the increase in human resource development and measuring the government's performance in improving the quality of education, health, and other services. As a country that is committed to increasing HDI, Indonesia uses HDI data as a reference for determining general allocation funds in all sectors that affect human development.
The indicators used in measuring HDI are the expected length of school and the average length of school as an indicator of knowledge. For indicators of decent living standards, per capita GNI data is used. Meanwhile, for indicators of healthy life and longevity, data on life expectancy at birth are used. (Meirianti, 2016) conducted an analysis using the fixed effect model of HDI data in 38 Regencies / Cities in East Java Province in 2010-2014. The results show that the level of poverty, health spending and economic spending have a positive and significant impact, education spending has a positive but not significant effect on the HDI.

System Equilibrium Point
Regression analysis is a statistical method used to identify variables that affect other variables. In regression analysis, the terms dependent variable and independent variable are known. Cases that have one dependent variable and one independent variable can be analysed using simple linear regression analysis. Meanwhile, cases that have two or more independent variables can be solved using multiple regression analysis.
The multiple linear regression equation model is To calculate the regression coefficient, two methods can be used, namely the ordinary least square. Calculating β 1 and β 1 using ordinary least squares method Calculating α, β 1 and β 1

Multicollinearity
Multicollinearity is the occurrence of a linear relationship between independent variables in the regression equation. Multicollinearity causes the estimator to have a large variance, making it difficult to get an accurate estimate (Faizia et al., 2019). To find out what happens in the regression model, a multicollinearity test is performed to see if there is a high or perfect correlation between the independent variables. A good regression model should have no correlation between variables. If there is a high correlation between the independent variables, then the relationship between the independent variables and the dependent variable is disrupted.
Multicollinearity between independent variables can be detected by using the Variance Inflation Factors values with the following conditions: 1. If VIF value > 10 and Tolerance value < 0.10 then there is multicollinearity 2. If VIF value < 10 and Tolerance value > 0.10 then there is not multicollinearity.
The following is the equation for calculating the VIF value: Where R 2 j is the coefficient of determination between the independent variables on the regression equation model. If the independent variable X j does not collaborate with other independent variables, then the value of R 2 j will be small and the VIF value will be close to 1 (Widiyawati, 2021

Principal Component Analysis
Principal Component Analysis (PCA) is a statistical method used to reduce or simplify data to facilitate data interpretation (Maubanu and Kartiko, 2018). The advantage of this PCA method is that it can be used for all conditions of research data, and there is no need to reduce the number of original variables. However, please note that PCA is not the way to the end. PCA is often used as an input to other methods such as multiple regression and cluster analysis.
Linear algebraically Principal component is a special linear combination of p random variabel X 1 , X 2 , X 3 , . . . , X p . Geometrically this linear combination is a new coordinate system obtained from the rotation of the original system with X 1 , X 2 , X 3 , . . . , X p as the coordinate axes. The new axis is the direction with maximum variability and gives a simpler covariance.
Principal component depends entirely on the covariance matrix or correlation matrix ρ (Tazliqoh et al., 2015). Suppose the random vector X = X 1 , X 2 , X 3 , . . . , X p have a covarian matrix with eigen value λ 1 ≥ λ 2 ≥ . . . ≥ λ p ≥ 0, then a linear combination is formed: Principal component development does not require normal multivariate assumptions. But to get a normal multivariate can use interpretation.
To find variance Y i and covariace (Y j , Y k ) used equation: V ar(Y j ) = a j a j , j = 1, 2, . . . , k Cov(Y j , Y k ) = a j a j , j, k = 1, 2, . . . , k Principal component is a linear combination Y 1 , Y 2 , . . . , Y p which has no correlation. Next, if we use the covariance matrix with paired eigenvectors (λ 1 , e 1 ), (λ 2 , e 2 ), . . . , (λ i , e i ) where λ 1 ≥ λ 2 ≥ . . . ≥ λ k ≥ 0. Until the Principal component to i is, Y j = e j X = e j1 X 1 + e j2 X 2 + . . . + e jk X k , j = 1, 2, . . . , k then V ar(Y j ) = e j e j , j = 1, 2, . . . , k Cov(Y j , Y k ) = e j e j , j, k = 1, 2, . . . , k If several λ i are equal, then the appropriate vector choice is e i then Y i not single. This formula is an uncorrelated principal component equation and has the same variance to the eigenvalues in the covariance matrix .
Y 1 is the first component that satisfies the maximum value of e 1 e 1 = λ 1 . Y 2 is the second component that satisfies the remaining variance other than the first component by maximizing the value of e 2 e 2 = λ 2 , Y p is the p-th component that satisfies the remainder of the variance other than Y 1 , Y 2 , . . . , Y k−1 by maximizing value e k e k = λ k . Order Y 1 , Y 2 , . . . , Y k must fulfill λ 1 ≥ λ 2 ≥ . . . ≥ λ k ≥ 0.
Meanwhile, the total variance of the principal component population is So the proportion of the total population variance described by the principal component is  If the total variance of the population is greater than 80% for a large number of variables, it can be explained by two or three components, then these components can replace the original (original) variables, without losing a lot of information.

Principal Component Regression
Principal Component Regression (PCR) is a combination of regrejssion analysis and PCA. The data were first analyzed using PCA. From the results of the analysis, new variables will be obtained which are the result of a linear combination of the previous variables. The resulting new variables are free from multicollinearity or not correlated with one another.
Principal component regression equation based on covariance matrix or correlation matrix, namely X 1 , X 2 , . . . , X k replaced with a standardized variable, namely Z 1 , Z 2 , . . . , Z k . Both equations will be in accordance with the observed variables.
If the number of principal components formed is given the notation K 1 , K 2 , . . . , K n Then the general form of the principal component equation is The principal component is a linear combination of the standard variables Z, then the form of the equation is : Next, the 14th equation is substituted into the 13th equation. Then the equation is formed: B. RESEARCH METHOD In this study, secondary data obtained from (Badan Pusat Statistika., 2022) the South Sulawesi Statistics Agency in 2021 were used. The variables in this study used one dependent variable and eight independent variables. The dependent variable used is the Human Development Index (Y ). Meanwhile, for the Independent variable, the old school hope (X 1 ), Average length of school (X 2 ), Percentage of Population with the highest education Diploma, graduate dan post graduate (X 3 ), School participation rate of the population aged 7-24 years (X 4 ), gini ratio (X 5 ), percentage of poor people (X 6 ), per capita expenditure (X 7 ), Labor (X 8 ) Life expectancy(X 9 ).
The steps of data analysis carried out in the study: 1. Looking for data to be used in research 2. Determine the dependent and independent variables.
Based on the results of the regression analysis which shows the number of determination coefficient is 99.7%. This shows that the independent variables can explain the dependent variable by 99.7%.

Detect Multicollinearity
Multicollinearity that occurs in the independent variables can be detected by looking at each VIF value. If the VIF value > 10, it can be concluded that there is a case of multicollinearity between the independent variables. By using the Minitab application, VIF values are obtained in the following Based on table 1, it can be seen that the VIF values in the variables X 1 and X 2 > 10. So, it can be concluded that there is multicollinearity between the independent variables. For more details can be seen from the results of the following correlation analysis: From the results of calculating the correlation between the independent variables, it can be seen that there are several variables that have very strong correlation coefficients. This shows a linear relationship between variables. Some variables that have a correlation value > 0.8, namely X 1 and X 2 , X 1 and X 3 , X 2 and X 3 . So that there is multicollinearity which can be overcome using principal component regression.

Principal Component Analysis
Principal Component Analysis basically aims to reduce data, without eliminating a lot of information from the Independent variables. The results of data reduction will form a new component that involves all variables. Prior to PCA analysis, data from the original variables where standardized using the Minitab application, to eliminate correlations between variables. Table 3. Principal Component that is formed Next, determine the main components selected for further analysis by looking at the eigenvalues and the proportion of variance, where the cumulative value of the proportion is > 80%. Eigenvalues and cumulative proportions are presented in the following table: From the equation above, the main component scores for each component are obtained in the following table: Based on the results of data processing in table 6, the value of 0.0000 for constant coefficient β, -0.44882 for β 1 , -0.13075 for β 2 , 0.24051 for β 3 , and -0.02954 for β 4 . Thus, the regression equation obtained with the OLS method is as follows: The coefficient of determination of the principal component regression equation model is 0.95. That means that the independent variable is able to explain the dependent variable by 95.7%, and 4.3% of it is explained by other variables not included in the study. In table 8, it can be seen that all independent variables have a significant influence on the dependent variable. This can be seen from the significance figure which is smaller than the significance = 0.05.
If the 17th equation is substituted into the 18th equation, then the regression model is From the regression model, it can be seen that several independent variables have a positive effect on the dependent variable such as Z 1 , Z 2 , Z 3 , Z 4 , Z 7 , and Z 9 which means, when the value of the independent variable increases, the value of the dependent variable will also increase. While the variable Z 5 , Z 6 and Z 8 has a negative effect on the dependent variable. If the value of the variable Z 5 increases, the value of the dependent variable will decrease.
The results of this study indicate that the independent variables that have a significant positive effect on the human development index are expected length of schooling, average length of schooling, percentage of residents with the highest education Diploma, bachelor and postgraduate, school enrollment rate of residents aged 7-24 years, per capita spending, and life expectancy. By looking at the variable coefficients, the most influential is the percentage of diploma, undergraduate and postgraduate education. So to improve the quality of life of the community, it must start from the development and development of education.
Several other studies have also shown that educational indicators have a very significant effect, such as research (Mahya and Widowati, 2021) related to the human development index in the province which involves the variables average length of schooling and expected length of schooling. Then, research (Nurkuntari et al., 2016) that educational indicators, namely the average length of schooling, and the expected length of schooling, are indicators that have a significant effect on increasing the human development index.

D. CONCLUSION AND SUGGESTION
There are 9 variables used in this study. but these variables can be replaced with four new main component variables, namely K 1 , K 2 , K 3 , and K 4 , with a variance of 88.1%. The newly formed variable has a value of 95.7% which indicates the magnitude of the influence of the independent variable on the dependent variable. Based on the principal component regression model formed, it is known that the average length of schooling, residents with Diploma, Bachelor, and Masters education levels, school enrollment rates for residents aged 7-24 years, expenditure per capita, and life expectancy have a positive effect on the increase in human development index. Meanwhile, the variable percentage of the poverty rate has a negative effect on the HDI in South Sulawesi. In line with previous research related to the human development index conducted by (Winanda, 2021) that the percentage of poor people has a significant negative effect on the development index in South Sulawesi. This means that, reducing poverty through improving education and the economy is needed to improve the quality of life of the people.