Automated Detection of Breast Cancer Histopathology Image Using Convolutional Neural Network and Transfer Learning

Breast cancer is the most common cancer in women and the leading cause of death worldwide. Breast cancer caused 2.3 million cases and 685,000 deaths in 2020. Histopathology analysis is one of the tests used to determine a patient’s prognosis. However, histopathology analysis is a time-consuming and stressful process. With advances in deep learning methods, computer vision science can be used to detect cancer in medical images, which is expected to improve the accuracy of prognosis. This study aimed to apply Convolutional Neural Network (CNN) and Transfer Learning methods to classify breast cancer histopathology images to diagnose breast tumors. This method used CNN, Transfer Learning ((Visual Geometry Group (VGG16), and Residual Network (ResNet50)). These models undergo data augmentation and balancing techniques applied to undersampling techniques. The dataset used for this study was ”The BreakHis Database of microscopic biopsy images of breast tumors (benign and malignant),” with 1693 data classiﬁed into two categories: Benign and Malignant. The results of this study were based on recall, precision, and accuracy values. CNN accuracy was 94%, VGG16 accuracy was 88%, and ResNet50 accuracy was 72%. The conclusion was that the CNN method is recommended in detecting breast cancer to diagnose breast cancer.


INTRODUCTION
Breast cancer is the foremost predominant cancer in women and the driving cause of death worldwide. Cancer is abnormal cell growth that can spread to other body parts and even cause death [1]. There were 2.3 million breast cancer cases and 685,000 deaths in 2020, with incidence rates of <40 per 100,000 women [2]. According to the World Health Organization's (WHO) Global Burden of Cancer Study (Globocan) report, 396,914 cancer diseases will affect the Indonesian population in 2020. Breast cancer is the foremost general sort of illness in Indonesia, accounting for 65,858 cases. This figure represents 16.6% of all cancer cases in the country. Cervical cancer came in second with 36,633 cases, accounting for 9.2% of all cancer cases in the country. Lung cancer came in third with 34,189 cases (8.8%). Colorectal cancer was followed by 34,189 (8.6%) cases, and liver cancer by 21,392 (5.4%). Meanwhile, other types of cancer account for 204,059 cases, or 51.4% of all cancer instances in the country [3]. The breast consists of two types of tissue: glandular tissue (glands) and stromal tissue (buffer). Glandular tissue includes the mammary glands (lobes) and their ducts (ducts), while supporting tissue includes fatty and connective tissue. The breast also has a lymph flow where the flow of breast lymph is often associated with the onset of cancer and the spread (metastasis) of breast cancer [4]. Tumors are new tissues (neoplasms) that appear in the body due to the influence of various tumor-causing factors that cause local tissues at the gene level to lose standard control over their growth. Tumors or neoplasms can be classified into two types, benign and malignant. The characteristics of benign tumors in macroscopic and microscopic terms are that they are well differentiated, usually grow in size, have a slow growth rate, do not infiltrate the surrounding tissue, and do not metastasize. Meanwhile, malignant or cancerous tumors tend to be poorly differentiated, anaplastic, have a faster growth rate, and infiltrate the surrounding tissue while destroying it (damaging) and metastasizing [5].
Histopathology, or the study of tissue, significantly contributes to determining cancer prognosis at the tissue level. Quantitative analysis of histopathological images can be used to determine the treatment to be carried out [6]. In carrying out a prognosis for the spread or detection of cancer, research or examination of patient tissue is generally carried out with the help of medical images [7]. Histopathology or tissue study plays a significant role in determining cancer prognosis [8]. A histopathologist manually examined it with the help of a microscopic instrument. In detecting cancer in medical images, many methods are used. For example, breast cancer, namely mammography, ultrasound (US), computed tomography (CT), and magnetic resonance imaging (MRI). Although such diagnostic methods are widely used, more accuracy is required. The accuracy of the diagnosis can be directly impacted by the quality of the feature set that is gathered manually, which makes experienced doctors vital in the manual feature extraction process [8]. However, histopathological analysis is a relatively laborious process due to the complexity of the structures in different tissues. Pathologists need a lot of expertise and skills to carry out the analysis. A qualitative examination of large amounts of data presented in images is an uphill task that demands hours of expert effort. Moreover, due to subjective human nature, visual inspection of images by professionals might result in an inaccurate conclusion [9]. The use of computer vision science for disease detection in medical images has been carried out until now [10]. This is expected to help improve the accuracy of the prognosis and speed of identification performed by pathologists and histopathologists. Convolutional Neural Network (CNN) is an algorithm in the field of image processing that can pre-train its own weight parameters on large data sets and fine-adjust the trained weight parameters to avoid excessive similarity of small data so that new data that has not been trained gets the possibility of accuracy that is not as expected and ensures the amount of accuracy after processing [11]. CNN is one of the Deep Learning which has the advantage of automatic feature extraction. One of the uses of CNN in agriculture is weed mapping [12].
In previous research on the automatic detection of breast cancer using deep learning with the Convolutional Neural Network method, they were using a Resnet-Based convolutional neural network to perform end-to-end segmentation of breast cancer. Based on the segmentation results and extracted images, the classification of breast cancer in the form of malignant and benign can be used to identify tumors. The results showed that the Resnet-based convolutional neural network method could identify tumors with an accuracy rate of 98.82% [13]. In a previous study entitled Boosting breast cancer using convolutional neural tissue. Uses the CNN method to automatically identify breast cancer by comparing it with results from other Machine Learning algorithms. This study uses different CNN models, Model 1 (59%), Model 2 (76%), and Model 3 (87%), while the standard classification of machine learning is LR (71.80%), KNN (71.26%) , and SVM(78.56%). Thus the 3 CNN model obtains a higher accuracy than machine learning accuracy (78%) with an increase of 9% [14]. The purpose of this research is to be able to detect breast cancer histopathology images automatically. Of course, this makes it easier for someone to know whether the person has cancer or not. It can also assist medical personnel with faster, more accurate, and more efficient health screening. This research uses CNN and transfers learning methods to get maximum accuracy. This accuracy is the main parameter of the assessment of the results of this study.
In the Breast Cancer Prediction study, the performance of a simpler convolutional neural network called VGG-7 was analyzed and compared with VGG-16 and VGG-19. The result obtained is that VGG-7 obtains 98% accuracy. VGG-16 obtains 97% accuracy, while VGG-19 obtains 96% accuracy [15]. The results of this study indicated that VGG-7 obtained better results than VGG-16 and VGG-19. The purpose of using the CNN Model is to prevent problems during the data processing process, found in the testing process in previous studies using data augmentation, data balancing, and layering by manipulating images in the dataset and replacing the architecture in the pooling layer. Meanwhile, to prevent imbalance, prevention can be done using undersampling techniques. Undersampling is a technique that reduces the dataset in majority classes to align with minority classes. In addition to using the CNN model, this research also uses transfer learning to make prediction time more efficient. Transfer learning is a technique that uses a pre-trained convolutional neural network model. Large datasets are typically used to train Transfer Learning models, and the trained result will be used on the new datasets so that they do not require data training from scratch. The Transfer Learning model is used in this study because it reduces training time and does not require much data.
Based on previous research, this research will be classified using the BreakHis dataset, which consists of breast cancer image data in the form of benign and malignant with a 400X zoom scale. The outcome will be classified using CNN with Transfer Learning VGG16 and ResNet50. Data balancing and augmentation are also applied in this model using the undersampling method. The difference between this research and previous research is the use of sampling on the dataset used and justifies updates from previous research. The novelty of this research is the use of the sampling method. The sampling method is a method that aims to equalize the number of datasets used. Two common sampling methods are usually used, selecting a minority number of datasets (Undersampling) or selecting a majority number of datasets (Oversampling). The sampling method used in this research is the undersampling method. The undersampling method is a method that aims to generalize the dataset based on the least amount. Researchers use the undersampling method compared to the oversampling method because the dataset has an unbalanced data condition, a condition between one class of data and another unbalanced data class. Researchers use data balancing with undersampling techniques from these conditions to equalize class data conditions by reducing the amount of majority class data equalized by the amount of minority class data.

RESEARCH METHOD
The method used is quantitative by utilizing Convolutional Neural Network (CNN) technology and transfer learning using the VGG16 and ResNet50 model architectures. The data used comes from the Kaggle dataset consisting of 1,693 breast cancer histopathology images divided into 70% training data, 20% test data, and 10% validation data.

Research Stages
The research stages used in this study can be seen in Figure 1. The first stage is to gather the datasets in the form of Breast Cancer Histopathology Images from the Kaggle dataset. After that, the preprocessing stage is carried out, which consists of three stages. The first stage is Dataset Balancing because it is still completely unbalanced. Hence, balancing is performed utilizing undersampling techniques to solve these issues [16]. The next stage is to divide the data into test data, train, and validation folders. At last, there is data augmentation which is the final step of the preprocessing procedure.

Dataset
Microscopic biopsy images of benign and malignant tumors were available in the BreaKHis database. Images were collected through P&D Laboratories in Brazil, and participants with clinical indications of breast cancer were invited to participate in the study. Biopsy slides of breast tissue stained with hematoxylin and eosin were used to generate (HE) samples. Pathologists from P&D Laboratories collected the samples through surgical (open) biopsy (SOB), prepared them for histological studies, and labeled them. According to Table 1, The Breast Cancer Histopathology Image Classification (BreakHis) dataset comprises 1,693 microscopic images of breast tumor tissue collected from 82 patients, 547 of which are benign and 1,146 of which are malignant (700X460 pixels, RGB 3 channels, 8-bit depth in each channel, PNG format). An example of a histopathology image of breast cancer showed in Figure  2, which is classified into two classes, benign and malignant. All information was anonymized. This dataset has been divided into training and test data folders, each containing a different slide image. This dataset samples 400x optical zoom [17].

Architecture Model
This research uses three models of CNN, VGG 16, and RestNet 50 methods which will be compared later.

CNN Model
The first model used by the researchers is a CNN model, a deep learning algorithm included in the feedforward neural network with many dimensions [18]. With an input layer of 150 × 150 pixels that starts at 224 × 224 pixels. The change in pixel size to 150 × 150 aims to focus the image on objects that can help determine whether the image is benign or malignant and speed up the computation to distinguish between two classes of images: benign and benign malignant. Figure 3 depicts the architecture of the first model, which implements max pooling with 2 × 2 filters using three pooling layers and three convolution layers. Aside from the pooling layer, this research employs three convolutional layers, each with a filter of 128, 64, and 32, a 3 × 3 convolutional kernel, and an activation relu. Then comes a fully connected layer with a flattened layer, a dense layer with activation relu, and a dense layer with activation softmax. Lastly, to prevent overfitting during training, there is a dropout layer set at 0.5.

ResNet50 Model
Transfer learning from the Residual Network 50 (ResNet50) is the third model architecture in this research with 50 layers deep. ResNet 50 is an architecture from CNN that introduces a new concept named shortcut connection, which skips layers in the forward step of input [21]. The model shown in Figure 5 is divided into three blocks: the leftmost block illustrates the ResNet-50 design, the center block shows a convolution block that alters the input dimension, and the rightmost block defines an identity block that does not modify the input dimension [22].

Data Augmentation
Data augmentation aims to modify and change the images in the dataset. Augmentation has a working system that can change the human view of the dataset image. Humans will think the images are the same, but they are different. This is the effect of data augmentation on the dataset that can change the view of the dataset regarding image changes or modifications. This can allow the model to view the image from a different perspective. Of course, this data augmentation will further improve the model's performance.

Testing Scenario
This research will include two categorizations: benign and malignant tumors. In addition, the undersampling technique is implemented for the dataset to balance the quantity of data in both categories. Figure 6 depicts the quantity of data in each category before applying undersampling. 1,693 images consist of 547 benign's images and 1,146 malignant's images. Figure 7 depicts the quantity of data in each category before applying undersampling. 1,094 images consist of 547 benign's images and 547 malignant's images.  The dataset is then separated into three data types for each class, with 70% training data, 20% test data, and 10% validation data dispersed from all data. The amount of datasets pre and post-augmentation procedure is the same since the augmentation technique transforms each current image into numerous shapes based on the augmentation parameters rather than adding new images. Table  2 displays the number of training, validation, and test data points prior to augmentation and undersampling. The initial dataset, which researchers obtained from the official Kaggle web page, is shown in Table 2. This dataset contains unbalanced data. Hence, the researcher adjusts the data by applying undersampling approaches, balancing the inequitably distributed dataset, and decreasing the majority class dataset to equalize the minority classes. Table 3 shows the outcomes of utilizing the undersampling approach to equalize the quantity of data in all categories based on the quantity of data in the minority class. Following augmentation and undersampling processes, here are the train, val, and test data. Furthermore, the dataset will be examined using three different scenarios. For scenario 1, we used CNN. For scenario 2, we used the VGG 16 model, and for scenario 3, we used the ResNet 50 model.  3.

RESULTS AND ANALYSIS
The research results are the stages based on the testing scheme described in the research method. The three proposed models will be tested at this stage, and their accuracy, precision, and recall values will be compared. There is also the usage of a confusion matrix as supplementary information when comparing the model's predicted results to the actual categorization.

CNN Handcraft Scenario Model
The dataset is tested based on the Convolutional Neural Network (CNN) model. Data were obtained through graphs, displaying the accuracy and loss graphs during training. Figures 8 and 9 show graphical plots indicating that this model received val accuracy 0.92 and val loss 0.14. The model is then evaluated using the Confusion matrix. Figure 10 depicts the table used in this process and explains 102 correctly and 9 incorrectly predicted images in the Benign class. Moreover, the Malignant class has 106 correctly and 5 incorrectly predicted images. Furthermore, testing will be done with test data displaying images of the CNN model's prediction results. In Figure 11, the Benign class is predicted with an accuracy of 0.99 with an estimated time of 0.072 seconds. Moreover, the Malignant class is predicted with an accuracy of 0.91 with an estimated time of 0.078 seconds.

VGG16 Scenario Model
The VGG 16 model was used to test the dataset. Data were obtained through graphs, displaying the accuracy and loss graphs during training. Figures 12 and 13 show graphical plots indicating that this model received val accuracy 0.80 and val loss 0.52. The model is then evaluated using the Confusion matrix. Figure 14 depicts the table used in this process and explains that 105 correctly and 6 incorrectly predicted images are in the Benign class. Moreover, The Malignant class has 91 correctly and 20 incorrectly predicted images. Furthermore, testing will be carried out using test data that display images of the VGG 16 model's prediction results. In Figure 15

ResNet50 Scenario Model
The ResNet 50 model was used to test the dataset. Data were obtained through graphs, displaying the accuracy and loss graphs during training. Figures 16 and 17 show graphical plots indicating that this model received val accuracy 0.59 and val loss 0.68. The model is then evaluated using the Confusion matrix. Figure 18 depicts the table used in this process and explains that 88 correctly and 23 incorrectly predicted images are in the Benign class. Moreover, The Malignant class has 71 correctly and 40 incorrectly predicted images. Furthermore, testing will be done with test data displaying images of the ResNet 50 model's prediction results. In Figure  19, the Benign class is predicted with an accuracy of 0.82 with an estimated time of 0.106 seconds. Moreover, the Malignant class is predicted with an accuracy of 0.96 with an estimated time of 0.098 seconds.  Table 4 shows that Scenario 1 has the highest accuracy (94%), and Scenario 3 has the lowest accuracy (72%). Scenario 3 is the only one with a precision and recall percentage of less than 80%.

Comparison of Best Model Performance with Previous Research
Following the scenario of the testing procedure, the next thing that has to be done is to evaluate the best model's performance outcomes with the earlier research findings. Based on data from Figure 20, CNN scenario 1 gets 94% accuracy with precision in the Benign class 95% and Malignant class 92%. Based on the previous research [14], the model used is CNN. Although the model used is the same, the model's structure is different, resulting in different outcomes. According to Table 5, this research reveals that scenario 1 is the best model. It has shown an increase in accuracy of 7% compared to previous studies.  CNN has an advantage that leads to accuracy rather than other methods. CNN method is more directed towards computer vision assignment points, where feature extraction in the image serves as its main capability [23]. Previous research uses the same method, namely CNN, but has a different model structure, which will produce different results. There are also other things that lead to accuracy. Namely, the algorithm used certainly reduces the matrix length of the image without reducing the information points in the image so that this classification process can produce the right results [24].

CONCLUSION
Based on the results of the research that has been done. The CNN model is the best model for predicting images with 94% accuracy. In contrast, the ResNet50 model is the lowest of the three models, with 72% accuracy. It can be concluded that the number of layers, kernel size, and data conditions affect a model's performance during data processing. Proper augmentation techniques are needed to help get good accuracy. The number of datasets also affects the performance of the model built. In order to advance this research, it is recommended to use different dataset conditions (adding/reducing datasets) and conduct experiments using methods other than CNN or other transfer learning methods, such as AlexNet and GoogleNet, in order to find out the changes in accuracy, precision, and recall values.