Performance Prediction of Airport Trafﬁc Using LSTM and CNN-LSTM Models

During the COVID-19 pandemic, airports faced a signiﬁcant drop in passenger numbers, impacting the vital hub of the aircraft transportation industry. This study aimed to evaluate whether Long Short-Term Memory Network (LSTM) and Convolutional Neural Network - Long Short-Term Memory Network (CNN-LSTM) offer more accurate predictions for airport trafﬁc during the COVID-19 pandemic from March to December 2020. The studies involved data ﬁltering, applying min-max scaling, and dividing the dataset into 80% training and 20% testing sets. Parameter adjustment was performed with different optimizers such as RMSProp, Stochastic Gradient Descent (SGD), Adam, Nadam, and Adamax. Performance evaluation uses metrics that include Mean Absolute Error (MAE), Mean Absolute Percent-age Error (MAPE), Root Mean Square Error (RMSE), and R-squared (R2). The best LSTM model achieved an impressive MAPE score of 0.0932, while the CNN-LSTM model had a slightly higher score of 0.0960. In particular, the inclusion of a balanced data set representing a percentage of the base period for each airport had a signiﬁcant impact on improving prediction accuracy. This research contributes to providing stakeholders with valuable insights into the effectiveness of predicting airport


INTRODUCTION
Airports serve as crucial terminals for aircraft takeoff, landing, and passenger transfer, playing a pivotal role in the transportation industry [1]. However, the COVID-19 pandemic in 2020 profoundly impacted the aviation sector, resulting in a significant decrease in passenger numbers and the non-operation of an estimated 17,000 aircraft fleets from the peak period from March to June [2]. Such fluctuations in air traffic are a characteristic feature of the aviation industry [3] and are time-dependent, making them suitable for analysis as Time Series data [4]. The air transport industry has been focusing on traffic forecasting methodologies for several decades, while formal studies and academic research on this topic have emerged relatively recently, around three decades ago [5]. Various forecasting techniques have been developed to analyze Time Series data, including statistical methods, computational intelligence, or a combination of both [6]. The primary goal of Time Series data analysis is to utilize historical observations to develop accurate models that reflect the underlying structure of the series. These models enable the prediction and classification of future events [7]. In recent years, deep learning methods and techniques have garnered significant attention in academic research and have been successfully applied to real-world prediction problems, including the analysis of Time Series data. Among these techniques, Long short-term memory (LSTM) and Convolutional Neural Networks (CNN) have emerged as popular and effective deep learning models [8]. Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), including LSTM, have demonstrated considerable improvements in handling Time Series data and have shown superior performance, particularly when dealing with raw data [9].
Previous research has already been conducted on analyzing and predicting airports. One study, carried out by [10], utilized a Bayesian Network (BN) approach to predict the level of congestion at airports and the reliability of flight delays. This research achieved a performance rate of 70%. In another study conducted by [11], a machine learning approach was employed, utilizing LightBGM, Multilayer perceptron (MLP), and Random Forests (RF) Classifier. This study aimed to evaluate the accuracy of predicting departure delays, arrival delays, and cancellations. The obtained accuracy score was above 0.794, indicating a high level of accuracy. Furthermore, [12] conducted a study using a deep graph-embedded LSTM (DGLSTM) model, which demonstrated higher prediction accuracy and robustness compared to the existing mainstream methods. Moreover, another study conducted by [13] focused on accurately predicting flight delays using a CNN-LSTM model. This model achieved a classification accuracy of 92.39% and an overall accuracy rate of 84%, surpassing the performance of several benchmark models. Additionally, [14] conducted a study to reduce airport analysis prediction errors. They compared the prediction error of a Conv-LSTM model with a single LSTM and found that the Conv-LSTM model reduced the prediction error by 11.41%. Furthermore, the study introduced an Att-Conv-LSTM model, which further reduced the prediction error by 10.83% compared to the Conv-LSTM model.
In this paper, we present our main contributions, which involve comparing the performance of two powerful models, LSTM (Long Short-Term Memory) and CNN-LSTM (Convolutional Neural Network Long Short-Term Memory), for predicting airport traffic. Our analysis focuses on multivariate time series data, specifically the traffic to and from the airport represented as a percentage of the traffic volume during a reference period. Our evaluation focuses on essential indices such as Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and R-squared (R2). By contrasting our models with previous approaches that utilized autoregressive moving average (ARMA) models, we aim to assess their effectiveness thoroughly. Based on the research cited in [15], the CNN-LSTM hybrid model proves to be superior to the LSTM model, exhibiting an average reduction of 21.62% in prediction error. Additionally, as mentioned in [16], the CNN-LSTM hybrid model achieves an 18% decrease in Mean Squared Error (MSE) compared to the LSTM model. Furthermore, according to [17], the CNN-LSTM model outperforms other models, achieving an impressive MSE score of 0.37 and demonstrating its excellence when compared to the LSTM model. Moreover, the other research highlights that both the CNN-LSTM and LSTM models outperform the CNN and the back-propagation neural network (BPNN). On the other hand, [18] indicates that the performance of CNN or LSTM models falls short when compared to the traditional SARIMA model.
While previous research has delved into comparing LSTM and CNN-LSTM models, our study introduces a novel aspect by employing distinct datasets for predicting airport traffic patterns in forecasting airport percent of baseline during the COVID-19 period. By using different datasets, we aim to determine which model provides more accurate predictions for this critical application. One of the challenges in this research lies in the imbalanced nature of the airport dataset across different countries. To address this issue, we took the initiative to create the average percentage of airport baseline daily for four representative countries: the USA, Canada, Chile, and Australia. We employed various optimizers, including RMSProp, Stochastic Gradient Descent (SGD), Adam, Nadam, and Adamax, in fine-tuning the model parameters, and they can significantly impact the predictive accuracy. The findings of this research have significant implications for airport management and various stakeholders in making informed choices during these unprecedented times.

RESEARCH METHOD
This quantitative research data is analyzed using numerical methods to quantify patterns, encompassing data collection from Kaggle repositories, data preprocessing including data filtering and MinMaxScaler, data partitioning into train 80% and test set 20%, data modeling using LSTM and CNN-LSTM, parameter tuning using RMSProp, Stochastic Gradient Descent (SGD), Adam, Nadam, and Adamax, and performance evaluation using metrics such as Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE) and R-squared (R2). The process and outcomes of these stages are illustrated in Figure 1.

Dataset
In this study, we utilize a dataset from March 16, 2020, to December 12, 2020, and readily accessible on Kaggle [19]. The dataset comprises a single numerical attribute that can be transposed, and it includes a total of 7247 baseline data points with daily aggregation in Table 1.

Data Preprocessing
In this stage of the research, the focus is on cleaning irrelevant data using data filtering and performing normalization using techniques such as feature scaling or min-max scaler. Normalization aims to adjust the range of independent variables or features within the data so that they fall between 0 and 1 [20]. This process helps ensure that the data values for each attribute are in a standardized and consistent range, as in Equation (1):

Train/Test Split
The dataset used in this study is divided into training data and testing data through the Train/Test Split process. In this case, the dataset is split in an 80/20 ratio. Due to the imbalanced nature of the airport dataset across different countries, we took the initiative to create the average airport baseline daily on the basis for USA, Canada, Chile, and Australia. ISSN: 2476-9843

Data Modeling
The LSTM component of the model consists of two layers. The dropout layer plays a crucial role in mitigating overfitting. In the final stage, the fully connected layer or dense layer utilizes the spatial correlation patterns extracted from the previous layers to predict future values. By leveraging these patterns, the model can make accurate estimations about the forthcoming values, facilitating its ability to forecast future outcomes as illustrated in Figure 2 : The CNN-LSTM model comprises a Conv1D layer and an AveragePooling1D layer. The purpose of these layers is to capture relevant patterns and features in the input data. Additionally, to prevent overfitting, a Dropout layer is included. The LSTM component of the model consists of two layers. In the final stage, the fully connected layer or dense layer utilizes the spatial correlation patterns extracted from the previous layers to predict future values. By considering these patterns, the model can make informed estimations regarding the future values of the time series, as illustrated in Figure 3.

Parameter Tuning
We utilize various optimization algorithms to enhance performance in deep learning models, RMSProp (Root Mean Square Propagation) is an optimization algorithm that aims to adjust the learning rate based on the magnitude of recent gradients, Stochastic Gradient Descent (SGD) is a fundamental optimization algorithm which updates the model parameters by taking small steps proportional to the negative gradient of the loss function, while Adam (Adaptive Moment Estimation) maintains a running average of past gradients and their squares, and uses this information to update the model parameters, Nadam is an extension of Adam that incorporates Nesterov accelerated gradient (NAG) into its update rule and calculates the gradient using an intermediate point ahead of the current parameters, which helps the algorithm to converge faster, and Adamax is another variant of Adam that uses the infinity norm (maximum absolute value) of the gradients and uses the exponentially weighted infinity norm to update the parameters.

Performance Metric
To evaluate and compare the performances of the implemented methods, Equation (2) is used to calculate MAE from a sample of N data points. Assume y i andŷ i are variables of paired observations that express the same phenomenon. Equation (3) is used to calculate the MAPE. MAPE is a measure prediction accuracy of a CNN-LSTM method in statistics, for example, in trend prediction. Where y i is the actual airport baseline value andŷ i is the predicted value N.
Equation (4) is used to calculate the RMSE. Ifŷ i is a vector of n predictions generated from a sample of N data points on all airport baseline variables, and y i is the vector of the observed variable of the airport baseline being predicted.
Equation (5) is used to calculate the R-squared represents the coefficient of how well the predicted values fitŷ compared to the original values y i . The value from 0 to 1 is interpreted as percentages. The higher the value is, the better the model is.

RESULT AND ANALYSIS 3.1. Dataset
Based on the 7247 datasets, we have summarized the number of data entries for different countries. In the United States of America (USA), we have 4441 data entries. For Australia, the dataset contains 257 data entries. In Chile, we have 238 data entries; in Canada, there are 2311. These numbers can be visualized in Figure 4.

Data Preprocessing
To ensure the relevance and accuracy of the analysis, a filtering process was applied to eliminate irrelevant data. The results of this filtering process are presented in Table 2, which contains only the data deemed relevant for the analysis.  The MinMaxScaler rescales the data features, such as attributes or variables, to a specified range, typically between 0 and 1. This scaling process maintains the relative relationships between the values of different features, ensuring that the data retains its original distribution and structure, presented in Table 3.

Train/Test Split
Considering the imbalanced distribution of data entries across various countries in the airport dataset, we took the initiative to address this issue by creating an average airport baseline on a daily basis. This process involved using the Pandas dataframe.groupby().mean() function. The dataset was divided into an 80/20 ratio, and the mean values were calculated for the USA, Canada, Chile, and Australia, resulting in average baseline values, presented in Table 4.

Data Modeling
In this LSTM model, we implemented a 2-layer architecture. The initial layer consisted of 64 units with return sequences set to True, allowing it to produce outputs for each time step. To mitigate overfitting, we incorporated a dropout rate of 0.2, which randomly set a portion of the input units to 0 during training. The second layer of the LSTM model consisted of 32 units with return sequences set to False. During training, we optimized the models using the Rectified Linear Unit (ReLU) activation function, which helps the model capture non-linear relationships in the data. We trained the models for a total of 60 epochs, with a batch size of 64. For a visual representation of the USA LSTM model architecture, please refer to Figure 5. In the CNN-LSTM model, we utilized a Conv1d layer with 64 filters. The kernel size was set to 3, the padding was set to same, and the strides were set to 1. This convolutional layer helped extract relevant features from the input data. Furthermore, we included an AveragePooling1D layer with a pool size of 32, strides of 1, and padding set to same. This pooling layer helped down-sample the data and capture important temporal patterns. To prevent overfitting, we incorporated a dropout rate of 0. dropout layer randomly sets input units to 0. In the LSTM layer, we utilized 64 units with return sequences set to True. Additionally, we added another LSTM layer with 32 units, where return sequences were set to False. We employed the Rectified Linear Unit (ReLU) activation function throughout the training process. We trained the models for a total of 60 epochs, with a batch size of 64. For a visual representation of the USA CNN-LSTM model models for a total of 60 epochs, with a batch size of 64. For a visual representation of the USA CNN-LSTM model architecture, please refer to Figure 6.

Performance Metric
The performance of the models was evaluated using various metrics, including Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and R-Squared. Table 5 presents the performance metric scores for different countries: the USA, Canada, Chile, and Australia. The scores were calculated using different optimizers, including RMSProp, Stochastic Gradient Descent (SGD), Adam, Nadam, and Adamax. By analyzing these scores, we can assess the effectiveness of the models in predicting the airport percentage of baseline for each country.  The results of the LSTM and CNN-LSTM models, utilizing the optimizers with the best performance based on the R-squared score tests, are visually depicted in Figure 7 to Figure 10. These figures provide a graphical representation of the model predictions and actual airport percentage of baseline. In these visualizations, the red line represents the actual values of the airport percentage of baseline. The blue line corresponds to the predictions obtained from the LSTM model, while the green line represents the predictions derived from the CNN-LSTM model. By analyzing these graphical representations, we can assess the accuracy and effectiveness of the chosen optimizers in predicting the airport percentages of baseline. Comparing the predicted lines to the actual values allows us to evaluate how well the models capture the underlying patterns and trends in the data. The USA dataset gained our highest-performing LSTM model in this research and achieved an impressive MAPE score of 0.0932. The CNN-LSTM model for the USA had a slightly higher MAPE score of 0.0960. On the other hand, the Chile dataset gained the lowest MAPE score achieved by the CNN-LSTM model, which achieved an impressive score of 0.4382. Conversely, the LSTM model for Chile had a slightly higher MAPE score of 0.4440. Table 6 compares our results with previous research, utilizing the Mean Absolute Percentage Error (MAPE) metric. A lower MAPE score suggests more accurate predictions.  [17] 35.78 CNN-LSTM [17] 31.84 LSTM [21] 20.2824 CNN-LSTM [21] 20.8055 LSTM [22] 44.06 CNN-LSTM [22] 40.38 LSTM (USA) 0.0932 CNN-LSTM (USA) 0.096

CONCLUSION
After analyzing a total of 7,247 recorded baseline data points, we calculated the average data for each country on the corresponding dates, resulting in 1,019 baseline data points. However, it is important to acknowledge that the prediction performance varied across different countries: the USA, Canada, Chile, and Australia. This variation can be attributed to the imbalanced dataset and the limited data availability, with only around ten months of data for each airport. The USA demonstrated the best prediction performance among the countries analyzed, while Chile demonstrated the lowest prediction performance. Our best LSTM model achieved a MAPE score of 0.0932, while the CNN-LSTM model achieved a slightly higher MAPE score of 0.0960. These scores indicate a relatively low average percentage error in the predictions. To enhance the accuracy of future predictions, we recommend conducting further research addressing the data imbalance issue. Incorporating a more balanced dataset for each country, with a more comprehensive and representative sample of data, can potentially lead to more reliable and precise prediction results.