Hypothesis – This study’s hypothesis is that a CNN system could be efficient to graduate the relevant GG patterns. Accordingly, we managed to merge computing engineering science with high-standard pathological reports to construct, train and evaluate a specific CNN system to classify G3, G4, and G5.
Design – The study design is divided into two main flow of work, Clinical Actions, and Computational Actions, with corresponding procedures.
Clinical Actions: The laboratory of medical investigations from the Medical School of the University of Sao Paulo (FMUSP) collected 32 previously reported radical prostatectomy specimens. They were colored by hematoxylin and eosin (H&E) method, and scanned by an Aperio® microscope, the slide’s images were analyzed, and Gleason patterns 3, 4, and 5 were delineated by the corresponding specialists. Additionally, images from “the Prostate Cancer Grade Assessment (PANDA) Challenge” were also added to the dataset. Hence, providing a richer dataset to support to the model to improve performance, alongside robustness and capacity of generalizing. PANDA includes two open-access datasets: Karolinska Institute (images divided into background, benign and cancerous tissue) and Radboud University Medical Center (13 images divided into background, stroma, benign tissue, and Gleason patterns 3, 4, and 5). To ensure methodological similarity, only Radboud images were used to improve the initial training sample. All the samples underwent the same screening process previously presented19.
Computational Actions
the computational procedures are composed by Patch Extraction Step, Deep Learning Step (Fig. 1).
Patch Extraction Step – This step consists in building a dataset of extracted patches, small sample images sized 256x256 pixels, with a corresponding 20x zoom of previously marked regions. The zoom and patch size values were chosen considering they are adequate for individual and combined clinical element identification. Considering this parameter, we obtained a total of 6982 patches (5036 from FMUSP prostatectomy samples and 1946 from the PANDAS Challenge dataset). As a result, patches of Gleason´s 3, 4, and 5 were obtained and can be identified by their corresponding Slide (Fig. 2).
Deep Learning Step - This step included topology construction considering a combination of multiple blocks. Previous architectures, characteristics, and important elements were considered to establish the structure proposed. Several experiments were performed using features of complex neural nets, combined blocks, and learning methods, resulting in the obtention of a high-performing architecture for this purpose, as shown in Fig. 3. The neural net input starts with two convolutional layers containing 32 filters with 5x5 kernel, and 64 filters with 5x5 kernel, respectively. The number of filters is related to the feature extraction diversity in the input – the more filters, the more complementary features are extracted and considered to support decision. The batch normalization layer standardizes output values of the corresponding layer, decreasing the chances of value range saturation. Max Pooling decreased the feature matrix dimension, allowing only the best parameters to proceed; the first sequence ends in a dropout layer. The other sequences (Second and Third) work similarly, except for the number of the filter (64 and 128 in the second, 128 and 256 in the third). Lastly, information goes through the Fully Connected layer containing 512 neurons within the hidden layer, batch normalization; the dropout of 0.5. SoftMax was used as an activation function. RMSprop was chosen as an optimizer for training; thus, the neural net can output the images classified as Gleason patterns 3, 4, and 5.
Study setting - The patch images extracted (Fig. 2) were applied to the architecture described (Fig. 3) for training and evaluation. The separation of training, validation, and test groups was performed using the 80%, 10%, and 10% ratios. In addition, to obtain the most from our image set, we carried out a 3-time k-fold cross-validation, as shown in Fig. 4.
Inclusion criteria - Specifically, this cross-validation took patches from slides SA, SB, and SC (Figs. 2 and 4) to be individually used as validation and test, and the patches from the spare slides complete the training data. This prevented patches of the same patient and slides from being present in the training, validation, and test groups; accordingly, providing a wider context variation, and composed outcome, thus leading to a more reliable and unbiased outcome, supporting better interpretation for corroboration. Slides SA, SB, and SC were chosen because each of them had the most balanced distribution of patches for Gleason 3, 4, and 5. The corresponding distribution and number of patches used for each k of the k-fold can be seen in Table 1. Finally, to improve model accuracy together with robustness, minimizing potential overfitting, data augmentation was performed before being applied to the neural net; specifically, this process includes random rotations, brightness, and zoon.
Table 1
Dataset separation considering, approximately, 80%, 10%, 10% ratio, for training, validation, and test of each corresponding k, respectively.
k-fold | Class | Train(n) ~ 80% | Val(n) ~ 10% | Test(n) ~ 10% |
k = 1 | Slides | SC + SD+... | SB | SA |
Gleason 3 | 614 | 73 | 69 |
Gleason 4 | 2205 | 271 | 258 |
Gleason 5 | 2801 | 348 | 349 |
k = 2 | Slides | SB + SD+... | SA | SC |
Gleason 3 | 593 | 69 | 94 |
Gleason 4 | 2224 | 259 | 251 |
Gleason 5 | 2821 | 352 | 325 |
k = 3 | Slides | SA + SD+... | SC | SB |
Gleason 3 | 589 | 94 | 73 |
Gleason 4 | 2185 | 281 | 268 |
Gleason 5 | 2807 | 339 | 352 |
Primary and secondary outcome measures, and analysis – The evaluation process occurred in two steps for each k during the proposed k-fold cross-validation (Fig. 4). The first was the training and validation step, and the second was the test, computing typical and additional parameters of performance for better interpretation. The training and validation step evaluate the potential of learning the current application; thus, the parameters accuracy and loss were computed to validate this step. Once high rate of accuracy and loss were achieved, the corresponding trained CNN topology were saved and submitted to the test step. The test step corroborates the previously obtained accuracy; as well as, measuring robustness and potential of generalize classification. During the test step, the corresponding image samples were applied to the trained CNN topology to be classified. As a result, generating the regarding confusion matrix; hence, allowing computing precision, sensitivity, and specificity, for interpretation of possible consequences.