Algorithms Error in The VisualGSCA Program

ABSTRACT


A. INTRODUCTION
There are two approaches in the analysis of the Structural Equation Model, namely Covariance Based SEM and Component Based SEM (Tenenhaus, 2008). Component Based SEM, which is often identified with Partial Least Square (PLS), has a weakness because it is unable to provide a global measure of the goodnessfit of the model, meanwhile Covariance Based SEM which is commonly used in AMOS and LISREL programs is able to provide the goodness-fit criteria of the model. Generalized Structured Component Analysis (GSCA) is a further development of PLS by providing goodness of fit test criteria. To get this goodness-fit test criteria, GSCA uses the Alternating Least Square (ALS) algorithm technique, while PLS uses a fixed point algorithm (Schlittgen, 2018).
At first the GSCA method was developed in the form of a program by Hwang Takane under the name VisualGSCA (Hwang, 2008). The advantage of the VisualGSCA program is that it is able to estimate the same model based on Covariance and Component. Hwang et al. states that if the structural model misspecification then avoid other SEM programs and should adopt the GSCA program (Hwang et al., 2010). The suggestion to use the GSCA program received sharp criticism from Jorg Henseler. Henseler said that the GSCA program uses the wrong algorithm, which results in scale inconsistencies against observed and latent variables. The observed variable is standardized, while the latent variable is normalized. This affects the calculation of the wrong estimate parameter value and the goodness-fit value of FIT and AFIT is not accurate (Henseler, 2012a).
Realizing this, Hwang has improved his originally desktop-based program (VisualGSCA) to become a web-based GeSCA with the address http://sem-gesca.org/gsca.php (Jung et al., 2012). Although the GSCA algorithm used in the program has changed, there are still many researchers in Indonesia who use the GSCA program with algorithms that have not been improved in processing their research data. This of course causes the results of these studies to be less precise. Until now, there has been no research that discusses the use of this outdated software, so this could be one of the research gaps in this study.
This study aims to prove that there is an error in the algorithm used in the VisualGSCA program so that the output of the program is not accurate. The method used in this research is to reprocess research data that had previously been processed with the VisualGSCA program using the GeSCA program. The results from the processing of the GeSCA program will be compared with the previous results to see if there are differences or not.

B. LITERATURE REVIEW
GSCA can be viewed as Component Based SEM where the latent variable is defined as a component or weighted composite of the observed variable with the equation (1): (1) where zi and γi are J with 1 observed or indicator variable vector and d with 1 latent variable or construct vector for observation i (i = 1, ..., N) and W is d with J matrix consisting of component weight. GSCA includes also a measurement model that describes the relationship between indicators and constructs, as well as structural models that connect between constructs. Mathematically the measurement model is written as equation (2): where C is J with the loading matrix and εi is J with 1 residual vector for zi. While the structural model is stated by: where B is d with d path coefficient matrix, and ξi is d with 1 residual vector for γi. GSCA integrates the three equations above into a single equation as equation (4) In the GSCA model all indicators and constructs are included in the ui and the interdependence is expressed by A. GSCA parameters (W and A) are estimated so that the sum squares value of all residuals (e i ) is as small as possible for all observations. This is the same as minimizing the following least squares criteria: With regard to W and A and constraint identification ∑ γ 2 =1 = 1, where γ 2 is the k-th element of γi.
Equation (5) is minimized by the Alternating Least Square (ALS) algorithm until it reaches convergence (Suk & Hwang, 2016). GSCA produces a measure of the overall model fit called FIT which is calculated by the formula: The FIT value ranges from 0 to 1. The greater the FIT value the greater the variance of the data that can be explained by the model. However, the FIT value is influenced by the complexity of the model so that the Adjusted FIT (AFIT) is developed which includes the complexity of the model: AFIT = 1-(1-FIT) d0 / d1, where d0 = NJ degrees of freedom for the null model (W = 0 and A = 0) and d1 = NJ-P which is the degree of freedom of the model being tested, and P is the number of independent parameters. GSCA also provides two additional fit model sizes: (1) unweighted least-squares GFI and (2) SRMR (standardized root mean square residual). GFI values close to 1 and SRMR close to 0 is an indication of good fit (Hwang & Takane, 2014).
A program to estimate the GSCA model has been developed by Hwang which is named VisualGSCA 1.0 (Hwang, 2008). This software is the first program to implement the GSCA algorithm (Hwang & Takane, 2004). VisualGSCA 1.0 can be downloaded for free from the web address http://www.psych.mcgill.ca/perpg/fac/hwang/software.html. The initial version of the program provides a Graphical User Interface (GUI) with a design that adopts VisualPLS. This software is used to estimate parameters using MATLAB, while the GUI was developed with the C ++ program.
Three years later, Henseler stated that there was an algorithm error used in the GSCA application. This error occurs because the algorithm used is not a pure GSCA algorithm but a reduced GSCA algorithm that ignores the structural model, resulting in an incorrect FIT value. According to Henseler, the first indication of algorithm error can be seen in Tenenhaus's research which states that the results of the VisualGSCA program are almost the same as the results of a series of basic component analyzes which are a reduced form of GSCA (Tenenhaus, 2008). Then Henseler strengthened his argument by re-analyzing the simulation model that had been done by Hwang et al. using GeSCA (version 9 December 2009), reduced GSCA (rGSCA) and the pure GSCA algorithm (Henseler, 2012b).
Responding to the results of Henseler's study, Hwang et al. then improved the algorithms of the web version of the GeSCA program and no longer made the VisualGSCA program available for download. Furthermore, the 24 June 2013 version of the GeSCA program can be run through the website at the address http://sem-gesca.org/gsca.php or http://sem-gesca.com/gsca.php. Although the GSCA algorithm used in the program has changed, there are still many researchers in Indonesia who use the GSCA program with algorithms that have not been improved in processing their research data, here are some research results indicated that they still use the reduced GSCA algorithm (rGSCA): Based on the data from Table 1  articles do not explicitly include the web address of the use of the program and include references to the GeSCA algorithm prior to improvement (Bharata & Widyaningrum, 2017;Ekasari, 2018;Ristianto & Fauziah, 2016;Wahyu Widyaningrum, 2017).

C. RESEARCH METHODS
To achieve the objectives of this study, namely to prove that there is indeed an algorithm error used in the VisualGSCA program so that the program output is not accurate, the method used in this study is to reanalyze the simulation model that has been carried out by previous researchers using VisualGSCA. and GeSCA version 9 December 2009 with the GeSCA program version 24 June 2013. The results of the processing of the GeSCA program will be compared with the previous results to see whether there are differences or not. According Henseler's recommendation, the processing results of the two applications to be compared are the Loading Factor, FIT and AFIT values.
Loading Factor is the amount of correlation between the indicator and its latent construct. Indicators with high loading factors have a higher contribution to explain the latent construct. Conversely, indicators with low loading factors have a weak contribution to explain their latent constructs. The analysis of the loading factor value is part of the convergent validity to test the validity of the indicators. Meanwhile, the FIT value is used to evaluate the structural model.
FIT measures how much variance of the data can be explained by the model and the value ranges from 0 to 1. The FIT value is closer to 1, the better the model, but the FIT value is very sensitive to the complexity of the model, so the adjusted FIT (AFIT) must also be considered. AFIT is used to compare models. A larger AFIT value is selected for comparing models. Because all articles presented in the literature review do not provide raw data for reprocessing, in this study the data to be processed is the result of direct data collection that has been done before and has been processed using VisualGSCA in 2011.

Previous Research
Research that has been done before is research in the field of Information Systems using SEM (Structural Equation Model). This study aims to analyze the factors that influence student acceptance of e-Learning users using a modified UTAUT (Unified Theory of Acceptance and Use of Technology) Model. The research conducted took a sample of 281 respondents from the academic community of the University of Indonesia Vocational Program in 2011. The study used SEM as an analysis tool using VisualGSCA software. Figure 1 is the research model: The model in Figure 1 was developed by determining the form of the correlation between the UTAUT factor and its SEM indicator variables. The SEM indicator variables which will become the basis for making questionnaires to research objects are mostly taken from the indicators in the UTAUT Model (Venkatesh et al., 2012). These indicators are then modified and developed by adjusting the e-Learning system. Some indicators for certain variables such as PE, EE, BI, and AU are partly taken from other studies that have a level of similarity and compatibility with the e-Learning system. This is intended to select indicators that have been proven and recognized based on a system that has a relatively close level of similarity and suitability. The adjustment is intended to increase the closeness and accuracy of the models that will be produced from this research.
Performance Expectancy (PE) is described into 5 (five) indicators that mean to measure the level of learning performance of objects in system use. Several indicators, among others, measure: (1) an increase in the chance of getting high scores, (2) faster completion of assignments, (3) an increase in the amount of material studied every day, (4) efficiency in learning and (5) a decrease in learning load. The Effort Expectations (EE) variable is further elaborated into 4 (four) indicators that are carried out to determine the extent to which the user needs to be able to use the system. Several indicators that are used as references include: (1) ease of use, (2) ease of learning the system, (3) speed of mastery in system use and (4) clarity and ease of understanding of the system. Social influence (SI) is the influence of the social environment around the life of the object of research. SI in this study is translated into 4 (four) indicators in measurement, including: (1) influence from the parties (2) influence from the family, (3) support from friends, and (4) campus support.
The facility conditions required by users to access the system are divided into 5 (five) indicators for the FC (Facilitating Conditions) variable. This indicator is used to measure the extent to which support for students to access the system is optimally available. Several indicators take measurements in terms of: (1) availability of connections and other facilities, (2) availability of help if needed, (3) availability of instructions (guides and / or tutorials), (4) system compatibility with other systems used, and (5) the readiness of knowledge possessed by users. The variable measurement indicators in the use of the system in this study are divided into 2 (two) variables, namely BI (Behavioral Intention) and AU (Actual Use). BI variables prioritize user behavior towards the system, such as: (1) intended use, (2) usage plans, (3) certainty of use and (4) recommendations for using e-Learning to other students. The AU (Actual Use) variable in this study uses indicators to determine the reality of system use in terms of the time domain, namely: (1) frequency of use and (2) duration of time (length) of use / access to the system when: (3) downloading material , (4) uploading assignments and (5) doing quizzes.

Comparison of VisualGSCA and GeSCA Result
The first analysis conducted in research using VisualGSCA and GeSCA is to analyze the Loading Factor value. The comparison of the VisualGSCA results presented in the previous research with the GeSCA outputs is juxtaposed as table 2: Based on Table 2 above, basically the value generated by almost all outer loading has the same relative value for numbers up to one digit after the comma. Different values were obtained for PE3, FC5, AU1 and AU4. In the Performance Expectancy 3 and Actual Usage 4 indicators, the results of VisualGSCA are slightly above GeSCA. Different things are found in the indicators Facilitating Conditions 5 and Actual Usage 1, where the results of VisualGSCA are slightly below GeSCA. In SEM analysis this resulted in FC5 and AU1 being dropped on GSCA, while GeSCA was not, and vice versa, PE3 was dropped on GeSCA while GeSCA was not.
Furthermore, if testing using software proves that the model does not fit the existing data, then the model will be modified. Modification of this model is made to the user acceptance model in order to have a high level of conformity with existing data. Model modification is done by removing indicators that have a loading factor value of less than 0.5 in accordance with previous research. From Table 2 it can be seen that according to the VisualGSCA results, there are four (4) indicators that have a loading factor value of less than 0.5, namely indicators PE5, SI4, FC5 and AU1. While the results from GeSCA, there are three (3) indicators that have a loading factor value of less than 0.5, namely indicators PE3, PE5 and SI4. These indicators will be excluded from the analysis to modify the model so that it has a better fitness level (suitability).
The second analysis carried out in research using VisualGSCA and GeSCA is to analyze the FIT and AFIT values. The comparison of the VisualGSCA results presented in the previous research with the GeSCA outputs is juxtaposed as below: Based on Table 3 above, basically the values generated by almost all FIT models have relatively the same values for numbers up to one digit after the comma. Different values were obtained in the FIT where the VisualGSCA program produced a value of 0.547 while GeSCA produced a value of 0.469. This reinforces Henseler's opinion that the error resulting from the VisualGSCA program occurs because the algorithm used is not a pure GSCA algorithm but a reduced GSCA algorithm that ignores the structural model, resulting in an incorrect FIT value. Furthermore, the results of re-analysis carried out by Henseler on the previous simulation model using GeSCA version 9 December 2009 produced an FIT value of .606, while when tested using the pure GSCA algorithm it produced an FIT value of .557. Due to SEM analysis, two variables being dropped in GSCA, while in GeSCA it is not, and vice versa, one variable is dropped in GeSCA while in GeSCA it is not.

E. CONCLUSSION AND SUGGESTION
In this study, errors in the use of the GSCA algorithm were found in the VisualGSCA program. This causes the Loading Factor and FIT values generated by the VisualGSCA program to be incorrect. Based on resimulation using the GeSCA program version 24 June 2013 (http://sem-gesca.org/gsca.php), some Loading Factors have higher scores and some are lower than the values generated by the VisualGSCA program. In addition, the FIT value generated by the VisualGSCA program was also higher than that produced by the GeSCA program version 24 June 2013. This finding reinforces the similar findings put forward by Henseler. The error in using the algorithm in the VisualGSCA program is due to the use of the reduced GSCA algorithm which ignores the structural model, resulting in an incorrect FIT value.
Furthermore, the reduced GSCA algorithm was also used in the 9 December 2009 version of the GeSCA program (http://sem-gesca.org/), so that the results of the old GeSCA program were also inaccurate. Based on these findings, researchers in Indonesia should no longer use the 9 December 2009 version of the VisualGSCA and GeSCA programs but use the 24 June 2013 version of the GeSCA program. Several GSCA research publications after 2013, but still using the GSCA program that has not been improved, it should be revised with the results of re-analysis using the pure GSCA algorithm.