Predicting Early Allograft Dysfunction after Liver Transplantation from Post-Reperfusion Donor Liver Image

BACKGROUND: To explore the relationship between early allograft dysfunction (EAD) and post-reperfusion liver appearance, and to develop image-based models which predict EAD and short-term mortality. METHODS: A total of 351 recipients of liver transplant were enrolled and divided into training set and testing set. Liver images of post-reperfusion donors and clinical information were collected. All the images were preprocessed. Support vector machines (SVM) and convolution neural network (CNN) models based on the texture analysis of post-reperfusion liver RGB images were constructed to predict EAD. Then, the model with a better performance was selected to construct further predictive models with additional inputs of clinical information. In addition, a score, namely image score, was assigned to each liver image based on the prediction probability from the CNN model. Further, the comparisons of outcomes among different image scores were performed.

Further predictive model was based on the framework of the CNN model, where an AUC of 0.727 was obtained. Moreover, the lager image score was found to be relative to more postoperative infusion, more postoperative complication, the longer length of ICU and hospital stay.

CONCLUSION:
The post-reperfusion appearance of donor liver was associated with the occurrence of EAD. Moreover, it was feasible to predict EAD and patient outcomes through the texture analysis of post-reperfusion liver RGB images.

Background
Currently, liver transplantation (LT) is the optimal approach for the treatment of patients diagnosed with end-stage liver diseases. Being a type of primary graft dysfunction post LT procedure, early allograft dysfunction (EAD) represents a condition where the graft shows varying degrees of liver damage but still exhibit su cient functions to support life. Occurrence of EAD has been attributed to donor characteristics, recipient aspects, and intraoperative risk factors. For example, the utilization of marginal donor liver to expand the donor pool and alleviate the shortage of donated organs has been regarded as a predictor of EAD. Meanwhile, reports claimed that EAD is associated with increased susceptibility to sepsis, prolonged stay in the intensive care unit (ICU), as well as increased morbidity and mortality of recipient [1][2][3] .
Therefore, the quick postoperative function assessment of the allograft, and the accurate prediction of EAD, will bene t LT in patients, allowing advanced preparation in dealing with adverse events.
In this context, research on the development of a reliable, practical, and cost-effective method to help transplant doctors in distinguishing high-risk patients from low-risk patients with EAD has matured. For instance, a method that combines point shear wave elastography and sonographic grading system, which achieved an area under the receiving operating characteristic curve of 0.935 has been proposed [4] .
Moreover, a predictive model based on donor information only, has been constructed with a concordance index of 0.622 [5] . However, despite its strength and effect on inferring diagnostic and prognostic information, no attempts have been made to exploit texture analysis (TA) of liver image in the prediction of EAD.
Notably, TA has transformed the subjective visual evaluation of liver texture into a quantitative and objective method [6] . Sara et al. [7] performed TA of liver RGB image using machine learning algorithms, and eventually developed an arti cial intelligence for evaluation of graft hepatic steatosis [6] . With the ubiquity of smartphones, there is a colossal utilization of TA of RGB images for diagnosis of diseases or classi cation of tissues in other medical elds, including the diagnosis of skin cancer [8] . Notably, RGB optical images acquired with smartphones in the operating room, are important and available information carriers during LT without any additional being equipment required. Herein, we speculated that liver appearance was associated with EAD, this was inspired by the realization that the postreperfusion appearance of donor liver might partly re ect allograft quality and ischemia-reperfusion. This study focused on exploring the relationship between EAD and post-reperfusion liver appearance using TA.
Further, we developed image-based models that predict EAD and short-term mortality. So far, there exist different strategies for texture analysis, whose primary function is to measure and describe the difference in the pixel levels of different images. Conventional methods including gray level co-occurrence matrices and the histogram of local binary patterns, combined with machine learning approaches [e.g. support vector machines (SVM), random forests] showed an encouraging performance on the image recognition of lesion [9,10] . Since 2012, deep learning approaches have demonstrated sustained improvements in medical image recognition, whose performance has currently surpassed the traditional methods [11] . Deep learning methods, particularly convolution neural network (CNN), extract automatically features from images, deliver them to different layers, and improve prediction results through backpropagation. Also, this paper compares the performance of different approaches for texture analysis and selects a preferred method for further construction of a predictive model.

Materials And Methods
Patient population and data preparation The data of patients corresponding to LTs performed between January 2017 and December 2019 at the First A liated Hospital, Sun Yat-sen University were evaluated. Only LTs from donation by brain death donors were included in this study. Pediatric LTs, multiple organ transplants, retransplants, split or living donor LTs were all excluded. Of 386 patients eligible for EAD assessment, 1 with unclear data and 3 with intraoperative death were not considered. Among the remaining patients, 31 patients were excluded due to the missing image data. Eventually, a total of 351 patients remained for study analysis. They were divided into two data sets based on the year of LT, i.e., the training set comprised all eligible LT recipients for the year 2017 and 2019, while the testing set contained LT recipients for the year 2018. The ow chart of selection is depicted in Fig 1. The images of the post-reperfusion liver were routinely captured using smartphones in our transplant center for further analysis before the abdominal closure, just as the instances in Fig 2. The data of donors and recipients were extracted by experienced research assistants from electronic medical record system. This study was performed following the Declaration of Helsinki and was approved by the institutional review board at Sun Yat-sen University. Moreover, the need for written informed consent was waived because of its retrospective and observational nature. No organs from executed prisoners were used.

De nitions
Based on the criteria suggested by Olthoff et al. [1] , EAD is identi ed if one or more of the following abnormalities occur: (1) total bilirubin on postoperative day (POD) 7 >= 171 mg/dL; (2) the international normalized ratio on POD 7 >= 1.6; and (3) the level of serum alanine aminotransferase or aspartate aminotransferase within the rst 7 days is > 2000 IU/L. Balance of risk (BAR) score, applied as a predictor for LT patient survival, was calculated by the calculator available at https://www.assessurgery.com/barscore/bar-score-calculator/.
Regarding the outcomes of patients within 3 months, post-LT hydrothorax was assessed based on ultrasound or thoracentesis results. Whereas hepatic artery thrombosis is the disruption of blood ow to the allograft through the hepatic artery, as con rmed by angiography, Doppler ultrasound, and surgical exploration. Also, pulmonary infection is diagnosed when the pathogens listed by Singh Nina [12] are isolated and detected in the pleural uid or respiratory secretions (bronchoalveolar lavage or sputum).
Chest roentgenograms and arterial oxygenation are collected to evaluate pulmonary edema based on the established methods [13,14] . Intra-abdominal abscess is characterized by the presence of uid collection on CT imaging or ultrasonography, coupled with the detection of organisms in the uid, or with systemic or local signs of infection excluding other sources. Bleeding event is the occurrence of one of the following events: (1) surgical bleeding with the requirement of reoperation; and (2) anatomical bleeding (requiring transfusion and not surgical intervention).

Image data preprocessing
For each liver RGB image, the rst step was manual segmentation, which separated the liver tissue from the background. All the liver contours were drawn, and the image space outside the marked hepatic tissue was lled with black color. The second step varied according to the TA approach. The pipelines of two different methods are depicted in Fig 3. As for the classical machine learning approach, each image was resized to 1000*1000 pixels, then divided evenly into 25 non-overlapping patches (each in the size of 200×200 pixels). Only the patch where hepatic tissue took up at least 90% of image space, was considered valid for further analysis. The minimum number of valid patches that could be obtained from each image was set at 5. To ensure that the number of patches for each patient was similar, 5 valid patches of each patient were randomly selected. Consequently, the dataset for the classical machine learning approach comprised 1,145 patches from the training set and 610 patches from the testing set.
For a deep learning approach, each image was resized to 512*512 pixels. Here data augmentation was applied to balance the number of EAD and non-EAD class. The ImageDataGenerator function in Keras module of Python (version 3.7) was used to double the images of the EAD class in the training set (details shown in Supplementary Table 1).
In addition, sample-wise normalization, a function provided by Keras module, was performed to all images before the extraction of features, regardless of the TA approach.

Classical machine learning model
The uniform rotation-invariant local binary patterns was computed for each RGB channel of the image, and the histogram of local binary patterns was created to acquire a multi-scale and accurate description of the texture. Also, the feature descriptors based on gray level co-occurrence matrices, including contrast, correlation, energy, and homogeneity, were calculated. Intensity-based features, which referred to the image mean and standard deviation for each RGB channel, were computed. The feature descriptors mentioned above were acquired with the help of scikit-image. The technical details are shown in Supplementary Table 2. All the descriptors were concatenated, and consequently, a data set with 195 features for each patch was obtained.
A synthetic minority over-sampling (SMOTE) algorithm was used to balance the number of different groups in the training set. Through SMOTE, samples in the minority group were synthesized by linear interpolation. The data was equalized, and a training set where the patches of EAD and non-EAD patients were equal to 1:1 was obtained. Based on the equalized training set, the supervised machine learning approach, SVM, was used to build the predictive model. The optimal hyperparameters of SVM were determined via grid search, where 1/3 of data in the training set were randomly selected for validation during every cross-validation, and the hyperparameter combination with the highest mean accuracy on the validation data during 20 times of cross-validation was regarded as optimum).

Convolution neural network approach
Inspired by the Google Inception-Net, this study proposed a predictive model based on the CNN architecture, named CNN model 1, to classify EAD and non-EAD patients. The process in the "Image Data Preprocessing" section ensured similar dimensionality of input vector for each image in the training and testing sets. Besides, the global and local features were extracted from images by max-pooling layer and convolutional 2D layers with 1×1 kernel size. Subsequently, these extracted features were convoluted, then concatenated as the input of the global average pooling 2D layer. The nal dense layer, followed by a softmax activation function yielded the prediction result. The details of the architecture of CNN model 1 are presented in Supplementary Figure 1.

Model evaluation
The receiver operating characteristic curve was used, and the area under the receiver operating characteristic curve (AUC) was adopted to quantitatively assess the discrimination capability of the proposed models. The model with higher AUC on the testing set was selected to construct a further predictive model with additional inputs of clinical information. Also, confusion matrixes were used to compute the sensitivity and speci city. Since both the sensitivity and speci city re ect the performance of model one-sidedly, F1 score was calculated according to the following formula: Score based on prediction probability from CNN For simplicity, a score named image score was assigned to each liver image according to the prediction probability from CNN model 1: prediction probability of 0 to 0.3 (including 0.3) was marked as 1, 0.3 to 0.5 (including 0.5) was marked as 2, and the probability above 0.5 was marked as 3. Here we compared outcomes of patients with different image scores, including the dose of postoperative infusion (red blood cell, plasm and platelet), postoperative complication (hydrothorax, hepatic artery thrombosis, pulmonary infection, pulmonary edema, intra-abdominal abscess and bleeding event), the time to resume eating, the length of ICU and hospital stay. As a reference, the comparison of outcomes between EAD and non-EAD group were also performed.

Statistical analysis
Quantitative variables were described by mean ± SD or median (IQR), while frequency and percentages were used to describe qualitative variables. The comparisons between different groups were described by the Chi-square test for qualitative variables, and the Student t-test or rank test for quantitative variables. Additionally, multivariate analysis was processed with a stepwise logistic regression in the rule of the Wald method to select clinical variables for CNN model 2. These variables were retained if P < 0.05. The comparison of outcomes among different image scores, or between EAD and non-EAD group, were both on the total data set (training set + testing set) without interpolation. SPSS Statistics for Windows (Version 24.0, IBM Corporation) was used to perform the statistical analyses. P-value < 0.05 was considered statistically signi cant.

Patient characteristics
A total of 351 consecutive patients [257 males, 94 females; mean age = 51.2 years ± 10.8 (SD)] were enrolled in the study, where 108 patients suffered from EAD, and 59 patients died within POD 90. Out of the 351 patients, 229 patients were in the training set and 122 in the testing set. The number of EAD patients in the training and testing sets was 67 (29.3%) and 41 (33.6%) respectively, on the other hand, the number in terms of POD 90 death was 37 (16.2%) and 22 (18.0%) respectively. Supplementary Table  3 shows the comparison of the donor and recipient information, operative characteristics, and postoperative outcomes between the training and testing sets. The table reveals the difference regarding the antibody status of the donor, indication for transplant, preoperative blood test, anhepatic time, etc. Moreover, Supplementary Table 4 reveals the difference between the EAD group and non-EAD group in the terms of the age, gender, and BMI of the donor, as well as the surgery time, cold ischemia time, POD 90 death, etc.
The performance of predictive model for EAD Fig. 4 shows the performance of the SVM predictive model on the training set and testing set with an AUC of 0.670 and 0.661 respectively. Calculated from the confusion matrix in Fig. 5, the sensitivity, speci city, and F1 score of 67%, 42%, and 51% respectively were obtained by the SVM model on the testing set, whereas 64%, 60%, and 62% on the training set were obtained respectively. CNN model 1 achieved an AUC, sensitivity, speci city, and F1 score of 0.709, 49%, 50%, and 49% respectively on the testing set. For the training set, the values were 0.710, 56%, 65%, and 60% respectively. Due to its higher AUC, CNN model 1 was selected to construct the further predictive model, i.e., CNN model 2. Only 3 clinical variables, i.e., donor age, surgery time, and cold ischemia time were included in the CNN model 2. Further, CNN model 2 yielded an AUC of 0.727, sensitivity of 54%, speci city of 50% and F1 score of 52% on the testing set, while an AUC of 0.78, sensitivity of 66%, a speci city of 66% and F1 score of 66% was obtained on the training set.
The relationship between image score and outcomes The distribution of image score was as follows: patients with 1 image score made up 35.0% of the total, those with 2 image scores made up 30.8%, while those with 3 image scores made up 34.2%. The instances of donor liver images with different image scores are presented in Supplementary Figure 3. The differences in patient outcomes among image scores are revealed in Fig. 6, where the dose of postoperative infusion (red blood cell, plasm and platelet), postoperative complication (bleeding event, hydrothorax, pulmonary infection), the length of ICU stay, and hospital stay were found to be different. As a reference, the differences between EAD and non-EAD groups are shown in Supplementary Figure 4.

Discussion
Texture analysis is one of the research elds that has been profoundly impacted by deep learning (especially CNN). This therefore might improve the decisions made by clinical doctors as well as advance the diagnosis and treatment of various diseases. Arguably, this is a maiden effort to explore the TA of post-reperfusion appearance of donor liver in predicting EAD and short-term survival in patients, an attempt which achieved a preliminary success. Although it is yet to reach the practical application stage, this preliminary work provides an additional insight into EAD prediction. Once completed and applied in clinical practice, such novel predictive models will help transplant doctors in predicting the possibility of EAD occurrence and adverse outcomes immediately LT surgery is completed.
In this study, we extracted features for prediction from post-reperfusion liver RGB images. These images were captured using smartphones camera, gadgets which were easy to use and widely available. Nonetheless, the quality of liver RGB images varied, and this was attributed to the difference in illumination conditions, phone distance, and the smartphone type. Thus, to partly control the effects of these differences, a normalization of each image was performed. In addition, the training of prediction models was frequently disrupted by the class-imbalance problem, a common phenomenon in modeling, particularly in the eld of multiclass modeling. Reports claim that the class-imbalance problem limits the practicality of the machine-learning model since it causes the model to predict the majority class and ignore the minority [15] . Therefore, to alleviate the negative impact of class imbalance, SMOTE and data augmentation was applied to classical machine learning and convolution neural network approach respectively.
As a result, CNN model 1 outperformed the SVM model, i.e., the superiority and effectiveness of deep learning approach for texture analysis of medical images were again proven. The AUC of CNN model 1 in either the training set or testing set (0.710 and 0.709 respectively), indicated the relationship between EAD and post-reperfusion donor liver appearance, thereby con rming the feasibility of predicting EAD from post-reperfusion liver RGB images. Based on the images and clinical information, CNN model 2 achieved an AUC of 0.727 (in testing set), illustrating a relatively excellent discrimination. Although the sensitivity (54%) and speci city (50%) were unideal, there was a signi cant potential for improvement as the expansion and standardization of the training dataset.
To furtherly explore the feasibility of using post-reperfusion donor liver appearance in predicting LT patient outcomes (postoperative complication, stay time in the ICU, etc.), we built a scoring system based on EAD prediction probability from CNN model 1. This procedure allowed us to avoid cumbersome multiple modeling, and greatly simplify the process. The results (Fig 6) found the differences in patient outcomes (the dose of postoperative infusion, postoperative complication, the ICU length of stay and duration of hospital stay) among image scores, which re ected a promising prospect of applying TA of post-reperfusion liver image to outcomes prediction.
Our study had worth mentioning limitations. First, the study was single-centered, and the models were not validated in other centers, this might limit the generalization of prediction models. Secondly, the sample size was small (351), therefore, limiting the improvement of the model performance. Thirdly, the liver images were not captured based on the uniform standard, hence causing heterogeneity in the image quality, and consequently causing adverse impacts on the model performance. Lastly, the nature of the proposed SVM model was to predict whether the patch came from patient with EAD, however, different prediction results might be obtained from the same patient. Although research used the SVM-SIL method to resolve this problem [7] , this study did not utilize this method. Despite these limitations, we have con dence that our research has merits considering that it revealed the relationship between postreperfusion appearance of donor liver and EAD, as well as con rmed the feasibility of applying postreperfusion liver RGB images for EAD and outcome prediction. This study was performed following the Declaration of Helsinki and was approved by the institutional review board at Sun Yat-sen University. The need for written informed consent was waived because of its retrospective and observational nature.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.  The ow chart on the selection of the patient population.  The ow chart of the study design of early allograft dysfunction (EAD) prediction model. (A) The building process of the classical machine learning model, which referred to support vector machines (SVM), and (B) the building process of the convolution neural network (CNN) model, which was named CNN model 1. LBP, local binary patterns. GLCM, gray level co-occurrence matrices. INT, intensity-based feature. SMOTE, synthetic minority over-sampling.    Differences in patient outcomes among image scores. The difference on the occurrence of (A) hydrothorax, (B) hepatic artery thrombosis, (C) pulmonary infection, (D) pulmonary edema, (E) intraabdominal abscess, and (F) bleeding event was present at the upper part, while the difference on the length of (G) ICU stay and (H) hospital stay, (I) the time interval from LT surgery to restoration diet, the postoperative infusion volume of (J) plasm, (K) red blood cell and (L) platelet was present at the lower part.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.