Support Vector Machine for Predicting Candlestick Chart Movement on Foreign Exchange

Foreign Exchange, commonly called Forex, is a form of investment in the non-real sector in great demand. Forex is a marketplace that specializes in foreign exchange trading. Technology advancements have made it easy to monitor investment conditions in real time and present them in an easy-to-understand graphical form. As a result, predictions are closely related to investment, starting from market sentiment and economic conditions to technical matters. One of the Artiﬁcial Intelligence methods that can be used in classifying is the Support Vector Machine (SVM). SVM is a machine learning classiﬁcation method based on the Structural Risk Minimization (SRM) principle to ﬁnd the best hyperplane that separates two classes in the input space that determines the classiﬁcation decision function by minimizing empirical risk. This study used candlestick patterns to predict foreign exchange chart movements using the Support Vector Machine (SVM) classiﬁcation method. The purpose of this study was to measure the accuracy of the Support Vector Machine method in making predictions using candlestick patterns so that it can assist traders in making decisions in forex trading. The accuracy level obtained from the data classiﬁcation results reached 90.72% with a precision of 87.69%. With a relatively good level of accuracy, the Support Vector Machine (SVM) method can be used to predict chart movements in foreign exchange using candlesticks to indicate the current trend’s direction.


INTRODUCTION
Investment in the non-real sector has recently started to become a concern. Many vendors have started to appear to offer investment forms that are expected to attract investors with high or low financial ability. Foreign Exchange, commonly called Forex, is a form of non-real sector investment in great demand. Forex is a marketplace specialized in foreign exchange trading. With the main characteristics of high-risk, high-return, Forex offers an investment with high risk. However, it will provide extraordinary benefits if investors can manage risk and make good predictions [1,2] On the other hand, the inability of investors to manage risk can result in significant losses. Therefore, investors need to analyze indications of the direction of the current trend in order to minimize the risk of possible losses.
Existing technological advances have made real-time conditions of the investment environment easy to monitor and presented in easy-to-understand graphical form. One example is the MetaTrader 4 application which displays the changing currency exchange conditions as a candlestick chart. The candlestick chart has many functional visual patterns and can interpret prices during the session simply so that it is easy to explain in decision-making. As a result, the candlestick chart is one of the most widely used charts [3,4].
Predictions are closely related to investment, starting from market sentiment, economic conditions, and technical matters. These indicators have been widely used to predict when and what to do as an investor. Investors need accurate predictions in making decisions to reduce the risks that can occur in their investments. In the Forex prediction process, candlestick charts can make it easier for users to extract the information they need. Predictions can be made using Machine Learning or Data Mining techniques. Data mining techniques can produce added value from a data set in the form of previously unknown knowledge, such as information about relationships between variables by predicting patterns, identifying characteristics, and classifying data [5,6]. Support Vector Machine is a data mining technique that can predict accurately [7]. According to L. Auria and R. A. Moro, cited in [8], Support Vector Machine is widely used in various studies to perform automatic classification, including image recognition, medical analysis, or to make predictions. The SVM algorithm is based on statistical theory and gives better results than other methods because it works effectively when removing irrelevant features [9].
Research [10,11] in 2020 created a prediction model for foreign currency exchange rates using last year's data as input. Three machine learning models are used to compare their performance in predicting foreign currency exchange rates, namely Long Short Term Memory (LSTM), Simple Recurrent Neural Network (SRNN), and Gated Recurrent Unit (GRU) in predicting currency exchange rates of 22 countries against the United States. Stated Dollar (USD) simultaneously. The best results were obtained from the LSTM Model with 200-100-30 coating units. Similar results are shown with the three activation functions; Linear, Sigmoid, and Than. This research indicates that Neural Networks can predict many foreign exchange rates simultaneously very efficiently.
Stock price movements were predicted by [4] in 2021. This study proposes a Deep Predictor for Price Movement (DPP) using candlestick charts on the proposed stock history data. First, DPP decomposes the given candlestick chart into several sub-charts. Then, the best representation of the sub-charts is obtained using the CNN autoencoder, and GRU makes price movement predictions. The DPP-based model performance assessment uses trading data from the Taiwan Stock Exchange Capitalization-Weighted Stock Index and the stock market index, Nikkei 225, for the Tokyo Stock Exchange. The experimental results show that DPP is superior to other methods.
Predicting foreign exchange was carried out by [12] in 2021 using Digital Signal Processing. A digital processing model was developed to predict foreign exchange using the ARIMA algorithm and Artificial Neural Network. The data ranges from 20 years with five currencies; USD, Swiss Pound, Yen, Euro, and Franc. This study produces a minimum error value of 0.7 within 26 seconds, measured using the Root Mean Square Error (RMSE).
M.S. Islam and E. Hossain conducted research [13] in 2021 that combined Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) to predict the future closing price of FOREX currencies. The currency data used in this study are EUR/USD, GBP/USD, USD/CAD, and USD/CHF. Furthermore, model performance is validated using MSE, RMSE, MAE, and R2. The results obtained are the performance of the GRU-LSTM combined model, giving the best results for the GBP/USD and USD/CAD currency pairs based on the performance values of MSE, RMSE, and MAE. Meanwhile, regarding the performance of the R2 system built, M.S. Islam and E. Hossain show the best performance of all systems that have been compared and prove that the system built is a system model with the least risk among all models.
In 2022, a candlestick pattern prediction ability test was carried out by [14] to determine trading positions. The Gramian Angular Field (GAF) presents candlestick patterns as images to identify 3 hours and 5 hours of 6 candlestick patterns using a Convolutional Neural Network (CNN). Predictions use Long Short-Term Memory (LSTM) to predict the closing price. This study shows that CNN successfully detected 3-hour and 5-hour candlestick patterns with an accuracy rate of 90% and 93%. LSTM can predict closing price trends with an RMSE score of 155.458 and a MAPE of 0.9754% with a 10-hour look-back display. The holding duration is 3 hours, and CNN-LSTM as an additional model produces 85 candlestick patterns from the test data that can be recognized with an accuracy of up to 82.7%, better than candlestick pattern recognition using CNN alone, which achieves an accuracy Based on the research that has been done, where candlesticks can be used to indicate the current trend direction using Convolutional Neural Network (CNN) method and Long Short-Term Memory (LSTM), this study uses candlestick patterns to predict chart motion on foreign exchange with the Support Vector Machine (SVM) classification method. Furthermore, this study aims to measure the accuracy of the Support Vector Machine method in making predictions using candlestick patterns in the MetaTrader 4 application so that it can help traders make decisions in foreign exchange trading.

RESEARCH METHOD
This research went through several systematic stages, from data collection to obtaining foreign exchange prediction results, as seen in Figure 1. This stage consists of data retrieval, literature study, data preprocessing, data processing, system testing, and analysis. The data used in this study is a collection of candlestick patterns from 2017 to 2021, which can be accessed freely on the MetaTrader 4 application. The data consists of 7 attributes which are all numerical, with a total of 1835 data before going through the preprocessing stage. This study uses the currency pair Great Britain Pound Sterling and United States Dollar (GBP/USD) with a timeline of four hours (4h).

Preprocessing
This stage requires the process of cleaning data from missing values because it can negatively affect the performance of the method used. Irrelevant data is also cleaned at this stage [15]. At the preprocessing stage, normalization is also carried out to remove redundant data, repetitive data, and outliers and change the range of data values for each attribute to 0 to 1, not to reduce the data testing accuracy potentially (see equation (1)). Data from the MetaTrader 4 application will be selected based on the shape of the candlestick and then converted into data with three attributes, namely body, top wick, and bottom wick.
Normalization in this study uses Min-Max Normalization. According to T. T. Hanifa and S. Al-Faraby in Nasution et al. l [16], Min-Max Normalization is a normalization method that produces a balance of comparison values between the data before and after processing, carrying out a linear transformation of the original data. The formula for determining the normalization value can be seen in equation (1).

Classification
Classification is a data mining technique that groups data based on the closeness or attachment of the data to the sample data. A model can be found from a data set consisting of several features and classes of various other features [17]. In [18], Derek and Schnyer stated that most of the existing guided classification methods are based on traditional statistics, which can provide ideal results if the sample size is not limited. However, in practice, the number of samples obtained is limited.
Classification can be done using several methods, one of which is the Support Vector Machine (SVM). SVM is a machine learning classification method that works based on the Structural Risk Minimization (SRM) principle to find the best hyperplane that separates two classes in the input space that determines the classification decision function by minimizing empirical risk [19,20].

Figure 2. Defining hyperplane
Several patterns belonging to two classes: +1 and 1, are shown in Figure 2. Patterns belonging to class 1 are denoted by a square shape, while a circle shape denotes patterns in class +1. The classification problem can be interpreted by finding a line (hyperplane) separating the two groups [21]. Hyperplane determination can also be formulated using equation (2): In Equation (2), l and f represent the sample size and classification decision function. Each SVM aims to determine the dividing hyperplane in the optimal classification class to generate low errors. However, in some cases, the classification decision function has problems in linear separation. A hyperplane is needed to determine the minimum boundary with Equation (3).
The minimization problem can be modified to allow data points to be misclassified with Equation (4r cases that cannot be separated linearly. SVM can be applied to multiclass classification by combining SVM.

Performance Evaluation
Performance evaluation is carried out to see the error rate of the classification method. The performance of the model that has been made is measured using a confusion matrix. : negative data are classified as positive data False Negative (FN) : positive data that is classified as negative data Accuracy is the degree of closeness between actual data and predictive data. Accuracy can also be defined as the ratio of the data amount correctly classified by the system [22,23]. The level of accuracy can be calculated using the formula (5). The level of accuracy between the answers given by the system and the requested information is called precision. Precision calculations can be done with the equation formula (6).

RESULT AND ANALYSIS
Data taken from the Metatrader 4 application is then processed using Matlab 2016a. After going through the preprocessing stage, the amount of data that is feasible to use is 970 data with three numeric attributes. The data is divided into training and test data, with a percentage of 70% and 30%.

Preprocessing
Data from MetaTrader 4 is processed into data first that represents the candlestick shape. In Figure 3, data will be selected based on the candlestick shape and converted into three attributes: body, top, and bottom wick. This stage is carried out to find the difference between the opening price, the highest price, the lowest price, and the closing price in one sequence. The results of the candlestick representation in the form of attributes that will be used in this study can be seen in Table 2  Normalization is carried out to change the range of values for each attribute to a range of 0 to 1 using Equation (1). An example of data before and after going through the normalization stage can be seen in Table 3. In addition, the target type is also changed to 0 for the "No" class and 1 for the "Yes" class.

Classification
The classification method used in this research is Support Vector Machine (SVM). Classification is done on training data that has gone through the preprocessing stage, as shown in Figure 4. Hyperplane search is used in the classification stage with Equation (2). The classification was carried out using 679 training data and 291 test data. The training data is used to form the model, and the test data is to test the model performance that has been created. Analysis of the output of SVM is carried out to see the parameters that affect the method's performance. SVM output is also used to evaluate the model that has been formed.

Analysis
Several experiments were carried out in this study to get the best performance from SVM. Figure 5 is a training visualization model to obtain optimal hyperparameters and minimize errors during cross-validation. Figure 5. SVM training results in 3D model Figure 5 is a training visualization model to obtain optimal hyperparameters and minimize errors during cross-validation. The fitcsvm function will help to get optimal box constraints and kernel functions so that the observation points obtained can be more accurate. It can be seen in Figure 5 that the best observation points were obtained at X with a value of 10.9854, Y with a value of 59.6362, and Z with a value of 0.20419. A 2D visualization of hyperplane healing efforts from the input training data can be seen in Figure 6. Performance evaluation uses a connection matrix to see the accuracy and precision of the data being tested. The results of the classification can be seen in Table 4. From the classification results in Table 4, calculations can be made to see accuracy with Equation (5) and precision with equation (6). The classification results have an accuracy rate of 90.72% and a precision of 87.69%. In this section, the researcher compared the accuracy results between the SVM method that was carried out by researchers with CNN and CNN-LSTM, which was carried out in the study [14], which can be seen in Figure 7.  Figure 7 that SVM has a better accuracy rate than CNN, with an accuracy rate of 60%, and the combination of CNN and LSTM, with an accuracy rate of 82.7%, was carried out by [14].

CONCLUSION
The accuracy level obtained from the data classification results is 90.7%, with a precision of 87.69%. This value is obtained from testing 291 data using three attributes, namely body, top wick, and bottom wick. Therefore, the Support Vector Machine (SVM) method can be used to predict the movement of charts in foreign exchange using candlesticks to indicate the current trend's direction so that it can help traders make decisions in foreign exchange trading. SVM can also be used to create system models for classification. Further research can be carried out by adding a feature selection method to achieve better accuracy.