LeNet Convolutional Neural Network for Face Mask Usage Classiﬁcation Using a Low-Cost Device

-Background: One of the eﬀorts to prevent the spread of the COVID-19 virus is to wear a face mask in public places. However, there are still many people who use masks in the wrong way, and some do not even wear masks when in public places. From these problems, we need an image-based classiﬁcation system that can be employed to identify the use of face masks. The system built must use a low-cost device to be purchased by sundry groups. In previous studies, some classiﬁcation systems for face mask use were designed using various methods, but there were limitations. The Convolutional Neural Network (CNN) method provides high accuracy. However, it has a heavy computational level and cannot be used in real-time on low-cost devices. In contrast, the haar-cascade method provides a fast processing time but is less accurate than the CNN method. Objective: In this article, research was conducted on the development of image processing algorithms for the classiﬁ-cation process of face mask use using low-cost devices. Methods: The method used was CNN with LeNet architecture which has a light computational level. In the machine learning process, a dataset of 400 images was used, which was split into 240 images for training needs and 160 images for validation needs. Result: This study produced a classiﬁcation with an accuracy rate of 98.75%. The prediction process that is carried out using a low-cost device requires an average time of 0.235 seconds. Conclusion: This research showed that the system can be run in real time.


INTRODUCTION
The pandemic of COVID-19 that started at the end of 2019 not only increased the cause of death rates but also caused an economic crisis on a global scale.The first case was found in December 2019 in the city of Wuhan, Hubei Province, China [1].This pandemic is caused by a new type of coronavirus, SARS-CoV-2, which can spread massively quickly [2].Efforts to prevent the coronavirus spread can be made through vaccination and implementing health protocols [3].Vaccination is done to form antibodies that can suppress the growth of the virus in the body.Health protocols are implemented in public places by keeping a distance, washing hands frequently, and using face masks [4,5].
The use of face masks in public places can prevent the spread of the coronavirus through the air [6].
Many people still wear face masks incorrectly or do not even use face masks when in public places.This causes How to Cite: R.P.M.D. Labib, A. Soetedjo, S. Hadi, and P. D. Widayaka, "LeNet Convolutional Neural Network for Face Mask Usage Classification using a Low-Cost Device" Jurnal Bumigora Information Technology (BITe), vol.5, no. 1, pp. 9∼16 2023.This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) the recommended health protocols to be ineffective and the transmission rate of the coronavirus to be high.
From these problems, a system is needed that can classify the level of correctness in wearing face masks so that they can give warnings if the use of face masks is inappropriate.
In previous studies, some classification systems for face mask use were designed using various methods, but there were limitations [7].The CNN method provides a high level of accuracy, but it has a heavy computational level, so it cannot be used in real-time on low-cost devices.The haar-cascade method works by detecting two facial features: the nose and lips.If one or both features are detected, it is concluded that the face mask is not being used properly.The system has an accuracy rate of 88.89% and can run on low-cost devices.The main drawback of the system is the limitation in distinguishing whether the nose and lips are being covered by a mask or covered by another object, such as a hand.It provides fast processing time but has a lower accuracy than the CNN method.Further research has been carried out by Ahmadi and his team [8] in 2021 regarding the detection of mask use using the haar cascade classifier method.This study's results are the system's accuracy using the haar cascade classifier method of 90.5%.This study was limited to a maximum facial condition of 30 degrees, turning sideways and looking up.Subsequent research has been conducted by Dicky Giancini and his team [9] regarding the identification of mask use using the CNN YOLOv3-Tiny algorithm.The results of this study are the accuracy of mask detection by the system by 90%.The built system can be used in real-time.
Based on previous research, the novelty of our research is that the detection of mask use is carried out using the LeNet-based CNN method.Using this method can increase the system's accuracy when detecting the use of masks.The system built also works in real-time, and the system is low-cost.The purpose of mask detection using the Convolutional Neural Network (CNN) method with a LeNet network structure is to provide a low computational level but have high enough accuracy so that it can still be embedded in low-cost devices.

METHODS 2.1. Research Stages
The stages involved in this research include literature study, dataset collection, algorithm design, implementation, and evaluation.The stages of the research can be seen in Figure 1.A literature study is needed to design programming algorithms to suit existing theories.The collection of datasets is needed as learning data for the system, as well as material for evaluation and validation of the programming algorithms that have been designed.The programming algorithm was designed using the Python programming language.

LeNet
The LeNet network structure is a CNN model originally used as an image-based handwriting classification system [10].In its development, the LeNet network structure can also be used as a classification system for various objects such as diseases, odor types, and also cataract maturity levels [11][12][13][14].This network structure Jornal Homepage: https://journal.universitasbumigora.ac.id/index.php/biteconsists of a convolution layer, a pooling layer, and a fully-connected layer [15][16][17][18].The LeNet network structure used in this study is shown in Figure 2.

Figure 2. LeNet Network Structure
The LeNet network structure is built with seven layers of artificial neural networks.The first layer is the input which contains an image matrix with three channels representing the basic colors Red, Green Blue (RGB).The second layer is a feature map measuring 24 × 24 pixels with a depth of 20 channels which is the convolution result of the first layer.The third layer is a feature map measuring 12 × 12 pixels with a depth of 20 channels, resulting from max pooling from the second layer.The fourth layer is a feature map measuring 8 × 8 pixels with a depth of 50 which is the convolution result of the third layer.The fifth layer is 4 × 4 pixels feature map with a depth of 50 resulting from max pooling from the fourth layer.The feature extraction process occurs from the second to the fifth layer.Then a dense process is carried out on the feature map in the fifth layer to produce 500 hidden units in the sixth layer.Moreover, the last process is fully connected to get the prediction results which will be displayed on the seventh layer.

Dataset
The dataset is a collection of data used in the process of machine learning in artificial neural network systems [19].This study used a dataset of 400 images with details of 200 positive data images and 200 negative data images.Positive data consists of images of various faces with the correct use of face masks.Meanwhile, negative data consists of images of various faces using the wrong face mask and images of various faces without using a face mask.The dataset is obtained by capturing the process manually using the internal webcam camera from the laptop.The amount of this dataset is divided into 60% for machine learning purposes and 40% for validation purposes.Some of the datasets can be seen in Figure 3.

Machine Learning
Artificial neural networks are designed to mimic the work of the human brain through a combination of input data, weights, and biases [20].These elements work together to identify, classify, and accurately describe objects.A deep neural network consists of several layers of interconnected nodes.Artificial neural network models require a learning process based on the dataset that has been prepared to get the desired predictive results.The stages of machine learning can be seen in Figure 4.These results prove that a classification system created using the LeNet network structure can be used to identify the use of face masks that are good and correct.This system also has a higher accuracy level than previous studies, which had an accuracy rate of 88.89%.In Figure 6, the accuracy value is improving, and the loss value is decreasing.Therefore, it can be concluded that the artificial neural network model that has been built has good classification quality because there is no under-fitting or over-fitting during the machine learning process.

Classification Results
System testing is carried out by applying image processing algorithms combined with the LeNet network model that has been built.The input is a static image of the face with various face mask usage conditions.

Figure 3 .
Figure 3. Dataset: a) Negative Data, b) Positive Data The first stage is the initialization of the current epoch, max epoch, and model of the CNN which will be used.The second stage is preparing the dataset by loading it from memory.The third stage is splitting the dataset into sixty percent for training purposes and forty percent for validation purposes.The next stage is training the model, which consists of two processes, including forward propagation and backpropagation.The training is carried out as many as the max epoch specified in the first stage.The results of training the CNN model will be stored in memory.Machine learning algorithms are built using the Python programming language combined with the open-source Tensorflow library.

Figure 4 .
Figure 4. Stages of The Machine Learning Process