Computer-Aided Decision-Making System for Endometrial Atypical Hyperplasia based on Multimodal & Multi-Instance Deep Convolution Networks

The pathological diagnosis is the gold standard for neoplasms and their precursors, which is highly relevant to the treatment planning and the prognostic analysis. Currently, deep learning networks have been used for the pathological computer-assisted diagnosis and treatment decision-makings. However, due to extremely large size of the whole slide images (WSIs) of pathological slides, the prevailing deep learning models are un-applicable directly in the WSIs analysis. Moreover, the precise exclusion of the blank regions and interfere regions, as well as the manual annotation of various lesioned and normal regions in super large WSIs are infeasible and unavailable in clinical practice. To address aforementioned problems, we develop an computer-aided decision-making system based on multimodal and multi-instance deep convolution networks (CNN) to assist in the diagnosis and treatment of endometrial atypical hyperplasia (AH)/ endometrial intraepithelial hyperplasia (EIH). Firstly, we set up the frame-work of computer-aided decision-making system based on the WSIs image patterns of AH/EIH, and then transfer the large-scale WSI analysis to the small-scale analysis of multiple suspected lesion regions which can be accomplished the major computer vision models, and eventually the results of prognostic analysis for multiple small-scale suspected lesion regions are summarized to obtain the prognostic results of WSIs by the decision supporting algorithm based on the cognition intelligence. We validate the method via experimental analysis of 102 endometrial atypical hyperplasia patients at the West China Second University Hospital of Sichuan University. The performance achieved for endometrial AH/EIH prognostic analysis includes accuracy (85.3%), precision (84.6%), recall (86.3%). Meanwhile, the method has superior performance to prognostic judgment of a single pathologist as well as approximates to analysis results determined by three pathologists according to the majority voting method.


Introduction
Endometrioid adenocarcinoma is a gynecological malignancy, characterized by high incidence worldwide (Carlson and Mccluggage 2019). EC's precancerous lesion is the atypical hyperplasia of endometrium (AH)/endometrial intraepithelial hyperplasia (EIH) (Downing et al. 2020). The pathlogical diagnosis is the gold standard for AH/EIH, which is highly relevant to the treatment planning and prognostic analysis. However, precise differential diagnosis of AH/EIH and EC remains problematic, even for experienced pathologists, due to the common coexistence of EIH/AH and EC in the pathological slides as well as the low coincidence of AH/EIH diagnosis in biopsies.
In recent years, computer vision technology has achieved great success in the medical field of computer-assisted diagnosis and decision-making, due to the development of the deep convolution network (CNN) which can automatically learn from image data to obtain excellent feature expressions (Guo et al. 2020). Meanwhile,along with the progress in the micrography and whole-slide scanning technology, the pathological slides can be retained in the form of digital images nowadays, enabling wide applications of the computer vision technology in the field of computer-assisted pathological diagnosis. However, even if deep learning technology has shown great promise in the field image analysis, it faces a series of unique challenges when applied in the field of pathological image analysis (Pinckaers et al. 2020;Rijthoven et al. 2020;Xing et al. 2021).
First of all, the size of a histopathological whole slide image (WSI) it too large to be input directly into a convolutional neural network (Ciresan et al. 2013;Shahzad et al. 2020). Secondly, the precise annotations of the lesion and normal regions are infeasible and unavailable in clinical practice, because it takes both time and effort to make manual labeling on a large number of digital image patches, especially when the lesion and normal regions are consistently mixed-up in the images(Am Mendola et al. 2020;Hashimoto et al. 2020). Furthermore, there are some unique challenges when the technology is applied in a specific field. For the analysis of WSI patterns from endometrial biopsy, there are two main problems: (1) Uterine curettage causes breakage of the endometrial tissue, resulting in a large number of blank areas without the endometrium. Therefore, it is necessary to exclude these blank areas prior to analysis; (2) Existence of blood and mucus in the endometrial biopsy tissue is inevitable, thus it is necessary to exclude blood interference areas and mucus interference areas in the image.
To solve these problems, we propose an auxilliary diagnosis and decision-making system , termed EndometrialPrognosisNet, based on multimodal and multiple instance CNN. We further validate prognostic value of the method using 102 cases of endometrial AH/EIH diagnosed in the West China Second University Hospital of Sichuan University.
The rest of this paper is organized as follows. Some related work is reviewed in Section.2.
In Section.3, we present the methodology involved. We present the experiments and results in Section. 4. Section. 5 is conclusion.

Related works 2.1 EC and AH/EIH
Endometrial adenocarcinoma is a gynecological malignancy, characterized by high incidence worldwide, of which endometrioid carcinoma (EC) represents the highest proportion (Carlson and Mccluggage 2019). EC's precancerous lesion is the atypical hyperplasia of endometrium (AH)/endometrial intraepithelial hyperplasia (EIH), the characteristic image pattern of which is defined as crowded aggregation of the tubular or branching endometrial glands with cytological atypia (Downing et al. 2020). However, variability of the image patterns for glandular and cytological atypia, coupled by complexity of the glandular structure complicated by the menstrual cycle, inevitably result in subjectivity, thus lowering inter-and intra-observer reliabilities.
Meanwhile, AH/EIH and EC have been found to coexist in around 25~40% cases, while around 1/3 of all AH/EIH cases diagnosed in the biopsy are expected to eventually be diagnosed as EC in the immediate hysterectomy or a 1-year follow-up (Elke et al. 2010). Notably, precise differential diagnosis of AH/EIH and EC remains problematic, even for experienced pathologists, due to the common coexistence of EIH/AH and EC in the pathological slide as well as the low coincidence of AH/EIH diagnosis in biopsies. To date, two main options for the treatment of AH/EIH, namely hysterectomy and conservative hormone treatment, have been described (Papke et al.). Particularly, hysterectomy is recommended for patients above 40 years old, who do not require or are unresponsive to hormone treatment, as well as AH/EIH cases that clinicians find it hard to distinguish from EC in biopsy (Papke et al.). However, hysterectomy application is not applicable in patients at a fertility age, thereby necessitating conservative treatment which is accompanied by the risk of EC and fast progression. This poses a great challenge for gynecologists, necessitating development of an optimal therapy for patients.

Problem setting
An introduction is given to problem setting, while the multimodal and multiple instance learning technology is used for method development. For an arbitrary natural number N, the following definition is given:     : 1 ,..., N N  . A set of vectors, with non-negative elements and whose sum is one, are referred to as probability vectors. Two probability vectors p, and q are given, and their cross entropy represented by  

Multiple Instance Learning (MIL)
In multiple instance learning (MIL), a multiple instance classifier is established by learning the bags (multiple instance bags) with class labels, while the classifier is applied for prediction of multiple unknown bags. The bag is a set comprising multiple samples, and only contains labels, without the samples containing the labels. Previous studies have described various models and algorithms of multiple instance learning (Campanella et al. 2018;Shamsolmoali et al. 2021), while MIL has been recently applied for pathological WSI analysis. In endometrial H&E slides, normal and lesioned regions are mixed up, and not all image patches from the WSIs in AH cases contain the lesioned regions (Lotter et al. 2021;Vu et al. 2020;Yao et al. 2020a;Zhang et al. 2020). In a practical diagnostic scene, pathologists observe multiple areas in the pathological slides, comprehensively analyze lesioned and normal regions, then give diagnostic conclusions. In order to simulate the practical diagnosis processes performed by pathologists, thee following multiple instance deep learning algorithms are designed. Specifically, a set of image patches corresponding to the WSI in one case form a bag whose label is prognosis of the case. Its characteristics are similar to those of the image patches (those of lesioned and normal regions) contained in the bag, while prognosis of the case is determined by combining the above-mentioned two types of image  corresponding to cancer and cancer-free cases, a respectively.

Multimodal Learning (MML)
Multimodal Deep Learning technology synthesizes information obtained from two or more modes during the analysis process, realizes information complement, and improves precision as Overall, this affirms the relevance of different modal information during diagnostic and prognostic analyses (Hamdi et al. 2021). Our model expands coverage of information contained in the input data by integrating different modal information, such as pathological slides and physiological characteristics, with the aim of simulating the practical analytical processes performed by pathologists during prognosis. Overall, this approach improves precision and robustness of prediction results.

Deep-learning-based Cognitive Intelligence
Cognitive intelligence is defined to be the artificial intelligence of computer systems which simulate the human brain, which possesses some human cognitive capabilities to perform some specific cognitive tasks, for instance, feature learning, understanding, reasoning and

Data Acquisition
We experimentally validated the network using 102 cases diagnosed with AH/EIH in the biopsies between 2019 and 2020, at the pathology department of West China's Second University Hospital. Each case had 1 Formalin-fixed paraffin-embedded (FFPE) slide, with all 102 slides collected and reviewed by 2 experienced pathologists to confirm AH/EIH diagnosis. The slides were then scanned at a magnification of ×20 ( 0.5 m pixel  ) on a Motic®EasyScan system (Motic Electric Group Co., Ltd), to generate high-quality WSIs. Each patient's age, as well as their final diagnosis after hysterectomy applied within 1-year follow-up period, collected and well-documented . The study group comprised 51 cancer and 51 cancer-free patients.

The Proposed Method
Prognosis of endometrial AH/EIH cases is predicted by summarizing the class labels of bags extracted from WSIs, then combining the date with physiological state information. Specifically, a test WSI n I and physiological information n A in diagnosis are given, with class label probability predicted as are class label probability of the bag n b   . The proposed network, EndometrialPrognosisNet, comprises the following three parts, whose outputs are class label probabilities of prognosis of cases, as shown in Fig. 3.
(1) Image Preprocessing Unit The unit evenly slices WSIs, discards image patches at their edges, then reuses the UNET network (Ronneberger et al., 2015) to perform semantic segmentation of foreground areas (including staining tissue areas, blood interference areas, mucus interference areas) and blank areas. Moreover, it discards the image patches with the proportion in the foreground areas of less than the threshold T (set T = 0.5 by experience).
(2) Staining Tissue Region Detector its network structure is shown in Fig. 4. Briefly, its identification results describe blood and mucus interference areas, as well as stained tissue regions, including the lesioned and normal regions). (3) Prognostic State Predictor It has a multimodal convolution network with attention mechanism whose structure is shown in Fig. 5 In the EndometrialPrognosisNet network, different modules are trained separately as follows: (1) training the Tissue Detector (the left network in Fig. 3); (2) training the multimodal Prognostic Analysor (the right network in Fig. 3).

Staining Tissue Region Detector
Loss function is defined as the cross entropy between a true and predicted class label probability.
The first term, in Eq. (3), is used to predict loss function through the image patch class label, whereas the second is a small positive number which ensures that loss function is not zero, thus preventing gradient disappearance.

Multimodal Prognostic State Predictor
A multimodal prognosis analysis network is trained to predict class labels of the bags.
Specifically, each bag contains multiple image patches and physiological state information from the same case and that corresponding to the bag, respectively. Prognosis of the case corresponding to a bag is used as a label for the bag, and comprises two types namely, cancer and cancer-free.
Their prediction result is as follows: The class with the largest weighted sum, among all bag class labels in one case, is regarded as the class's prognosis type. Notably, training of the parameter set P  (where P  is the set of trainable parameters of the multimodal prognosis analysis network) can be represented by the following minimization problem: According to Eq. (5), the first term represents loss function predicted through the bag class labels, with the attention mechanism weight of samples introduced in Eq.
(2), namely with only the samples with high attention used to predict the prognosis class label corresponding to the bag. The second term 2  is a small positive number, to ensure that loss function is not zero and prevent gradient disappearance. The Prognostic analysis network has a multimodal convolution network with attention mechanism whose structure is shown in Fig. 5.
(a) Profile of the prognosis analysis network, alongside its feature extraction network (Fm). M ranges from 5~20, and has an optimal value of 10.

Fig. 5
The Prognostic analysis network and its backbone structure.

Algorithms
Algorithms corresponding to the related modules in the EndometrialPrognosisNet network are described in Tables 1 and 2. The algorithm for the module for detecting the tissue region is outlined in Table 1, and its parameters are updated by using a single image patch as a mini-batch training sample. The algorithm for the network used in prognostic prediction is described in Table 2. Its network parameters are updated using instances (image patches) in each bag as a mini-batch.

Experimental validation of the network 4.1 Experimental Settings
The

Results and Discussions
The findings on prognosis analysis by pathologists are presented in Table 3. The first column indicates types of methods, where Human Expert 1, Human Expert 2 and Human Expert 3 represent analysis results by the three pathologists and Human Expert Majority represents analysis results produced by the three pathologists through majority voting method. In addition, MIL-MM-1 represents the method reported in the current study excluding blank areas and after removal of interference areas in the WSI, MIL-MM-2 represents the method in the current study excluding age information, and MIL-MM-3 represents the entire process of the method reported in the current study. The mean and standard deviation (std)   , mean std of various indexes determined by three fold cross validation are presented in Table 3 (Efron and Bradley, 1983).
Evaluation indexes including ACCURACY, PRECISION and RECALL (Townsend, 1971) were used to compare the performance of the above methods which were defined by Eq. (6)  The findings showed that MIL-MM-3 presented similar effect to the analysis performance of Human Expert Majority (analysis indexes included mean and mean square difference of RECALL, ACCURACY, and PRECISION obtained by the three-fold cross validation experiment) (Table 3).
In addition, the two methods showed higher performance compared with other methods and compared with analysis by individual pathologists (Table 3). This finding indicates that MIL-MM-3 has a higher performance compared with MIL-MM-1 and MIL-MM-2 and provides an experimental basis for removal of blank areas and interference areas, and use of multimodal inputs to perform pathological image analysis.
Intermediate steps of the proposed were analyzed and compared, which are necessary for accurately predicting the prognosis of patients with endometrial AH/EIH. Different type of image patches obtained from a WSI, including mucus interference area image patches, blood interference area image patches and staining tissue area (including the lesioned region and the normal region) image patches are presented in Fig. 6. Results for area classification performance comparison are presented in Table 4 in which the first column represents the types of models including VGG16 included RECALLs and ACCURACYs of the models. All the classification models used the pre-training weight of IIMAGENET (Jia et al., 2009;Russakovsky et al., 2014). The values in Table 4 represent the means and standard deviations of various evaluation indexes determined by three-fold cross validation. The findings indicated that the tissue region detector tissue D proposed in the current study showed the best effect in the two indexes of RECALL and the ACCURACY (Table 4).  Prognosis analysis results of six endometrial AH/EIH cases are presented in Table 5, whereby the first column shows the H&E staining tissue pathological WSIs in these cases, the second column shows the ages when the cases were diagnosed, the third column shows prognosis of the 6 cases, the fourth~sixth column shows the prognostic judgments of the cases by the three pathologists, the seventh column shows the voting results of the analysis conclusions by the three pathologists, and the eighth column shows the prognosis analysis conclusions using the method proposed in the current. The findings showed that under the experimental conditions described in Section 4.2, the prognosis analysis using the method proposed in the current study presented a higher ACCURACY and RECALL compared with the prognostic judgment of a single pathologist, and showed similar performance to that of most voting results of the three pathologists. This finding indicates high accuracy and effectiveness of the method proposed in the current study in prognosis analysis of endometrial AH/EIH cases.

Conclusion
The current study presents an computer-aided system for prognostic analysis of endometrial AH/EIH based on deep convolution network developed by effectively combination of cognitive intelligence and CNN framework. The system possesses the capabilities to extract image features of endometrial AH/EIH by direct application of hemotoxylin-eosin (H&E) staining WSIs as input, and only requires simple labeling of blank, blood and mucus interference, and stained tissue areas in a small number of image patches, without the precise annotations of the normal regions and the lesioned regions in every image patch. Patient prognostic analysis is accomplished by the application of a multimodal prognosis analyzer, according to the image patterns of staining tissue areas in combination with the information on patients' physiological states. The system has the following advantages: At first, we set up the frame-work of computer-aided decision-making system based on the WSIs image patterns, and transfer the large-scale WSI analysis to the small-scale analysis of multiple suspected lesion regions, so that the analysis of WSI image patterns can be accomplished the major computer vision models. Moreover, the system, accompanied by Staining Tissue Region Detector 、 Prognostic State Predictor, can accomplish effective training of prognosis analysis network without precise labeling of lesioned regions on WSIs . The system was applied for prognostic analysis of 102 endometrial AH/EIH cases, and the findings showed that the method had a higher ACCURACY, RECALL and PRECISION in prognostic analysis compared with those of prognostic judgment by individual pathologist. Notably, the performance of the methods was similar to most voting results of the three pathologists. These findings provide a basis for prognostic analysis of endometrial AH/EIH using deep learning technology. The system can be integrated into a clinical decision-making system to provide pathologists and gynecologists with an important reference for diagnosis and treatment of endometrial AH/EIH, hence the therapeutic strategy can be significantly optimized to improve prognosis of patients.