Tensor Decomposition of Largest Convolutional Eigenvalues Reveals Pathological Predictive Power of RhoB in Rectal Cancer Biopsy.

RhoB protein belongs to the Rho GTPase family, which plays an important role in governing cell signaling and tissue morphology. RhoB expression is known to have implications in pathological processes of diseases. Investigation in the regulation and communication of this protein detected by immunohistochemical staining on the microscope is worth exploring to gain insightful information that may lead to identifying optimal disease treatment options. In particular, the role of RhoB in rectal cancer is not well-discovered. Here, we report that methods of deep learning-based image analysis and the decomposition of multiway arrays discover the predictive factor of RhoB in two cohorts of rectal cancer patients having survival rates of less and more than 5 years. The analysis results show distinctions between the tensor decomposition factors of the two cohorts.


Introduction
Rectal cancer is one of the most common cancers worldwide 1 . Preoperative radiotherapy (RT) is a standard neoadjuvant or palliative therapy for patients with advanced cancer, but many cancers have no or little response to RT, leading to patients' poor prognosis [2][3][4] . One of the main reasons is that currently used clinicopathological factors, even tumor stage, cannot precisely provide evidence for clinicians to design an efficient RT strategy. Thus, it is important to find promising biological factors that can be used to predict a cancer patient's response to a specific treatment, or the patient's outcome of developing the disease.
The Rho protein family plays a key role in signal transduction or cell signaling that regulates the actin cytoskeleton and phagocyte NADPH (nicotinamide adenine dinucleotide phosphate) oxidase 5 . It was found that RhoA, RhoB, and RhoD have effects on the membrane trafficking. However, knowledge about the underlying molecular mechanisms for these influences of Rho proteins is incomplete 6 . Growing interests in studying the role of RhoB in cancer have been reported in literature [7][8][9][10][11][12][13] .
RhoB is a distinct anti-apoptotic protein. It was demonstrated that the p53 isoform delta133p53β involves in the regulation of the apoptotic response in colorectal cancer cell lines 14 . The role of RhoB protein in radioresistance of colorectal cancer has recently been investigated 15 . This work found that the overexpression of RhoB is associated with poor overall survival in rectal-cancer patients undergoing radiotherapy (RT), indicating the prognostic factor of the protein in relation to RT. However, the work failed to establish evidence of the predictive factor of RhoB in rectal-cancer patients before RT.
By using the basis of feature transformation developed for deep learning in artificial intelligence (AI) and multidimensional data analysis, the study addressed in this paper presents for the first time that RhoB expression on immunohistochemistry (IHC) imaging obtained from biopsy samples of rectal-cancer patients without RT can differentiate groups between less and more than 5 years of overall survival. Without relying on subjective assessment of the protein expression, the current approach objectively extracts hidden and deep features of the biopsy samples from different three color-image channels and integrates the information from multiple samples and multiple channels by means of a method for tensor decomposition that is viewed as extended principal component analysis. Tensor decomposition is capable of discovering unique latent components from complex data of multiple dimensions 16 .
In fact, deep learning in AI has been recently realized as a very useful method for characterizing nanobiological structures on microscopy and much more effective approach than traditional image analysis algorithms 17 . The work reported in 18 used deep learning and Gaussian mixture modeling to characterize variability in cryogenic electron microscopy and determine protein conformational landscapes. The application of deep neural networks presented in the work had certain advantages over conventionally adopted methods, such as the manifold learning. A study on imaging cellular nanostructures addressed in 19 introduced a deep-learning model to overcome a limitation encountered with standard methods. The results showed that the utilization of deep learning could accurately localize single emitters at high density in three dimensions for a variety of imaging modalities and conditions. Tensor decomposition 20,21 has been increasingly applied for solving challenging problems in medicine and health over recent years [22][23][24][25][26][27][28] . In cancer, tensor decomposition has recently been applied for identifying microRNA (miRNA) and mRNA expression profiles as potential prognostic biomarkers for kidney renal clear cell carcinoma, where genes involving in cancerrelated pathways were found to be associated with miRNAs 29 . Most recently, tensor decomposition has been reported as a useful computational data analysis tool that provides an effective way for integrative analysis of epigenomic data. The tensor decomposition and classification of epigenomic tensors could differentiate epigenomic features between normal myometrium and leiomyoma subtypes, and identified the HOXA13 gene as a potential tumorigenic factor in leiomyoma 30 .
Based on promising biomedical results recently achieved with applications of advanced AI and data science tools, it is important to explore these methods for discovering disease biomarkers, including molecular, histologic, physiologic, and radiative attributes. Here, we show that the novel combination of deep learning-based feature extraction of RhoB expression on IHC imaging obtained from the Swedish Rectal Cancer Trial 31 , and the tensor decomposition of these features in a multidimensional setting can reveal the predictive factor of this protein in rectal-cancer patients.

Patient samples
This study included 91 patients (51 male and 40 female) who were of the South-East Swedish Health Care region and participated in the randomized Swedish Rectal Cancer Trial of preoperative RT between 1987 and 1990 31 . Each participant signed the informed consent. Among the patients, 38 were subjected to 25Gy radiotherapy in 5 fractions and followed by surgical operation within 2 weeks (median 6 days, 0-13 days), and the rest of 53 patients had operation alone. None of the patients received preoperative or adjuvant chemotherapy. Two groups had no statistical differences regarding clinical and pathological characteristics such as the gender, age, TNM stage, grade of differentiation etc (p <.05). All patients were followed up to more than 5 years. All the 120 biopsy samples (1-3 samples from each of 91 patients) were taken before operation, in which 45 and 75 tissue microarray biopsy samples of IHC staining obtained from rectal-cancer patients having less and more than 5-year disease-free survival, respectively.
The study was conducted in accordance with the Declaration of Helsinki. The experimental protocols were approved by the Institutional Ethics Committee of Linkoping University, Sweden (Dnr-2012-107-31). The informed consent was signed by each participant.

Immunohistochemistry (IHC)
The biopsy samples were fixed with formalin, embedded in paraffin, and made as tissue microarray (TMA). TMA sections (4 µm) were deparaffinized by 100% xylene, incubated in 100% ethanol, and rehydrating with decreasing concentrations of ethanol. After antigen retrieval, the sections were incubated with anti-RhoB mouse monoclonal (sc-8048) overnight, and then incubated with a secondary antibody, Envision System Labelled Polymer-HRP Anti-Mouse for 25 minutes. After reacting with Liquid DAB+ (Dako), the sections were lightly counterstained with hematoxylin. The slides were scanned using Leica Aperio CS2 scanner. Figure 1 shows some IHC biopsy samples of RhoB expression obtained from rectal cancer patients whose survival rates were less or more than 5 years. Image sizes are about 3000 × 3000 × 3. To discover the predictive factor of RhoB, the image color is conventionally examined by pathologists to determine if the expression is either negative or positive, or alternatively either weak, medium, or strong. However, due to the complexity of texture and color distribution of these images, human expert perception can be biased and subject to a high degree of uncertainty in assessing the expression.

Convolutional eigenvalues of an image
Using the idea of convolutional neural networks in deep learning, a sequence of maximum convolutional eigenvalues of a 2-D image or a color-image channel is computed in this study and considered as a kind of "deep" feature of the image. Convolutional eigenvalues of a fuzzy recurrence plot (FRP) has recently been introduced in literature 32 . The main difference between the convolutional eigenvalues of an image and an FRP is that the former extracts sequential eigenvalues of the corresponding convolved images, whereas the later compute the eigenvalues of the final convolved matrix of an FRP. The motivation for extracting sequential convolutional eigenvalues of an image is to capture complex RhoB expression on IHC. A sequence of maximum eigenvalues can extract complex image features from low to high levels while the iterative process of applying a kernel filter on the image is carried out. These features include intensity distribution, edges, and texture in the early process of image convolution, and more abstract properties of the image in later process; which can provide distinctive features for effective pattern discrimination.
An image convolution is performed by multiplying the values of a pixel and its neighbors with a small matrix, which is called a convolution matrix, mask, or kernel, for achieving different purposes, such as smoothing, sharpening, enhancing, and edge detection.
The convolution of a kernel k(m, n) and an image f (x, y) results in a filtered or convolved image g(x, y), which can be generally expressed as where ⊛ denotes the convolution operator.
A desire is to emphasize transitions in the image intensity of the protein expression. Therefore, the Laplacian kernel that uses the second derivative for image sharpening was adopted in this study. The Laplacian operator, which takes partial derivatives along the two spatial axes of a 2-D image f (x, y), is defined as 33 which highlights sharp intensity transitions and reduces the effect of regions having slowly varying gray levels, resulting in the following 3 × 3 Laplacian mask Furthermore, to rectify the effect of featureless background while still keeping the sharpened result, this can be done by subtracting the original from Laplacian image as follows.
which yields the 3 × 3 composite sharpening kernel By resizing the image to a square image of a certain size and after the convolution, the rectified linear unit (ReLU) is applied to eliminate negative values of the convolved image, and then the largest eigenvalue of the square image is computed. The ReLU function returns either a zero or positive value to each filtered pixel as follows.
The next step is to reduce the size of the convolved image while still keeping useful information by using a pooling operator. There are different types of image pooling, such as the max, average, and sum operator. In deep learning, the most widely used pooling is the max pooling operator. Mathematically, let Ω = (Ω 1 , Ω 2 , . . . , Ω Q ), where Ω q = (ω q,1 , ω q,2 , . . . , ω q,m×m ), q = 1, 2, . . . , Q, be a set of pooling regions. The number of pooling regions Q within a convolved image is determined by the pool size m × m and stride. The max pooling operating on a pooling region of size m × m, denoted as P max , is defined as The procedure is then repeated with the process of convolution, ReLU, calculation of the largest eigenvalue, and pooling until the convolved image reaches to a specified size. The procedure is outlined with the steps as follows.
Procedure for Computing a Sequence of Largest Eigenvalues of an Image: 1. Input: An N × N image or resize the image to N × N size if it is not a square.
3. Given n as the stopping value for the iterative convolution.

PARAFAC (parallel factor) tensor decomposition of largest convolutional eigenvalues of RhoB in rectal cancer
A tensor is a multidimensional, multiway, or n-way array. A tensor of order one, order two, or order three is a vector, matrix, or volume, respectively. An n-th order or n-way tensor takes the form of an n-hypershape. In many applications, data often represent a tensor with n > 2. In this study, a 3-way tensor is adopted to model the integration of the IHC images, color channels, and their sequences of maximum convolutional eigenvalues of RhoB expression in rectal cancer.
In general, the elements of a 3-way tensor, denoted as T (the underline indicates a tensor), is equivalent to where F is the number of factors, t i jk are the elements of T , and a i f , i = 1, . . . , I, b j f , j = 1, . . . , J, and c k f , k = 1, . . . , K, are elements of three loading matrices A, B, and C, respectively. Alternatively, the tensor can be expressed as 34,35 T where ⊗ denotes the outer product, and a f , b f , and c f are the f -column vectors of A, B, and C, respectively. A solution for computing the loading matrices A, B, and C can be obtained using the PARAFAC decomposition model 36,37 . The PARAFAC method can be considered as a generalization of bilinear PCA (principal component analysis). The use of the PARAFAC model has a certain advantage over other tensor-decomposition methods. The merit is that it provides unique solutions to the tensor decomposition by using the method of alternating least squares (ALS) to minimize the error of approximation with the model: T = a(b ⊗ c), where the unfolded array T = I × JK.
For having more than one factor, let b ⊗ c = Y. Then the expression for T can be rewritten as If Y is given by initializing b and c, then A can be estimated as The estimates of B and C can be obtained using similar procedure for estimating A. To compute B, tensor T is unfolded to matrix T of size J × IK, and Y is determined from A and C. Likewise, for the estimate of C, T of size K × IJ, and Y is determined from A and B. The general procedure for estimating A, B, and C can be outlined as follows 35 .
PARAFAC Algorithm for a 3-Way Tensor: 1. Given F, initializing B and C. A using T, B, and C. 3. Compute B and C likewise.

4.
Stop if the solution converges or change between iterations is within tolerance. Otherwise, return to step 2.
Having described the general procedure of the PARAFAC decomposition of a 3-way tensor, The 3-way tensor for the cohort of patients having the survival rate of less than 5 years, denoted as T L , can be modeled as where the subscript L stands for "less than 5 years", I L , C L and E L are the numbers of samples, image channels, and sequences of the maximum convolutional eigenvalues of the image channels, respectively, belonging to the cohort having the survival rate of less than 5 years. Likewise, the 3-way tensor for the cohort of patients having the survival rate of more than 5 years, denoted as T M , can be expressed as where the subscript M stands for "more than 5 years", I M , C M and E M are the numbers of samples, image channels, and sequences of the maximum convolutional eigenvalues of the image channels, respectively, belonging to the cohort having the survival rate of more than 5 years. Figure 2 (B) shows the 3-way tensor model and its PARAFAC decomposition addressed in this paper for the multidimensional data analysis of RhoB expression in rectal cancer.

Results
To carry out the data analysis, all IHC biopsy samples were resized to 500 × 500 images. The sharpening kernel, pool size = 3 × 3, stride = 1, and terminating value n = 100 were used for computing the sequence of largest convolutional eigenvalues of each IHC image channel by using the iterative image convolution procedure (Figure 2(A)). The computation resulted in sequences of 200 largest eigenvalues for each of the three color-image channels. However, it was observed that values of the largest eigenvalues after the first 12 points were very similar or the same for all images. Therefore, only largest-eigenvalue sequences of length = 12 (first 12 points) were used in the analysis. To construct previously described 3-way tensor models, the number of factors F = 3 was specified. The following results were obtained from the PARAFAC tensor decomposition. Table 1 shows the means, standard deviations, p-values, and 95% confidence intervals of the three tensor-decomposition factors (F1, F2, and F3) of the two groups of patients. Figures 3 and 4 show the 2D and 3D plots of F1, F2, and F3 obtained for the two groups of patients, respectively. Figure 5 shows the distributions of F1, F2, and F3 with respect to the two groups of patients.

Discussion
The means and 95% confidence intervals of the 3 tensor-decomposition factors of the largest eigenvalues of IHC biopsy images of RhoB expression between the two groups of patients having survival rates < 5 years and > 5 years were found statistically significant (p-values ≪ .05). The plots of the factors of the two groups of patients in both 2 and 3 dimensions as shown in Figures 3 and 4, where factors of the two groups are well-separated from each other, strongly suggest the predictive factor of RhoB in rectal cancer. The predictive factor of RhoB is further confirmed by observing the distributions of tensor-decomposition factors F1, F2, and F3 along the corresponding biopsy samples of the two rectal-cancer patient groups, which are well-defined as shown in Figures 5(a) and (b) for F1 and F2, respectively. Although the plots of F3 factor for both groups partially overlap ( Figure 5(c)), their means are distinct (Table 1).
While manual analysis of RhoB expression in IHC biopsy samples of rectal-cancer patients carried out by pathologists could not discover the predictive power of this protein coding gene (our group's unpublished data), the tensor decomposition of largest convolutional eigenvalues of RhoB expression objectively suggests that its measurement is associated with patients' survival. Due to highly intricate color and texture of protein expression on images, it is extremely difficult if not possible for the human eye to discern many parts of a sample that are spatially arranged in very complicated or delicate structures.
To allow timely and efficient processes of biomarker discovery in cancer research, information about the spatial cellular composition and heterogeneity of tissues need to be accurately identified. This is because insightful knowledge of cell subpopulations and the tumor microenvironment through TMAs and IHC is important to elucidate the biology and development of cancer 38 . The conventional assessment of protein expression on IHC imaging in pathology is not only prone to qualitative and quantitative errors and disagreements, but also difficult to make the results reproducible, particularly for a large amount of data. Limitation and imprecision of manual scoring of protein expression on TMAs have already been well aware and several methods for automated image analysis developed [39][40][41] . The combination of advanced AI and computational data analysis methods presented in this study has enabled the discovery by objectively showing the predictive factor of RhoB expression. First, the presented approach provides a rigorous and effective image feature extraction of the protein expression by using a deep learning-like procedure for computing "deep" convolutional eigenvalues of the image. The largest eigenvalue is equivalent to the first principle component that accounts for the largest variance of the data. By considering the largest eigenvalues determined after a series of image convolution, both spatial transformation and temporal information of complex color distribution and texture can be captured in the process. Second, multidimensional data is then modeled as a multiway array and tensor decomposition used to analyze the fusion of latent information. Finally, analysis results are then represented with compact informative properties as components that can precisely differentiate subtle differences in the protein expression between patient groups.
Technical advantages of the data processing of the presented methodology are of two aspects. First, because the AI-based image processing is capable of handling samples of large sizes by downsizing to produce output images that are much smaller than the input, the method is thus scalable. Second, because sequences of the largest eigenvalues of convolved images can be truncated at some much shorter length due to the quick convergence of the values, the computation of decomposition factors can be more efficient in both time and storage, reducing computational complexity.
Applications of state-of-the-art AI and data science in precision or personalized medicine optimistically hold promises for medical revolution aiming to achieve effective treatments and cures of human diseases that are leading causes of death, such as cancer, cardiovascular diseases, neurodegenerative disorders, and rare genetic conditions [42][43][44][45][46] , all of which depend on the accuracy of biomarker assessment. In addition to AI, in fact, tensor decomposition or factorization has been realized as a potential method for precision medicine in studying multiple modalities of biomedical data to mine useful information hidden from the original data 47 . Similarly, the method of coupled matrix and tensor decomposition was utilized as an information fusion scheme for discovering diagnostic biomarkers of schizophrenia using EEG, functional MRI, and structural MRI data 48 .
In summary, RhoB expression in a randomized preoperative clinical trial of rectal-cancer patients with long follow-up data has been addressed with the use of the unique combination of AI-based and multidimensional data analysis methods. This automated identification of the predictive power of RhoB expression in rectal cancer is promising for contributing to leveraging the accuracy and efficiency of biomarker discovery and facilitating novel developments of clinical trials.

Conclusions
Methods for extracting deep image features of RhoB expression in rectal-cancer patients and tensor decomposition of these features known as largest convolutional eigenvalues have been presented and discussed in the foregoing sections. The results obtained from this study demonstrates the potential power of RhoB as a predictive factor for rectal cancer. Furthermore, the methods presented herein is not limited to the pathological image analysis of the protein expression in rectal cancer but can be utilized for biomarker discovery in other types of cancer using microscope-based screening data.
The increasing availability of "big data" in biomedicine offers tremendous potentials for the development and applications of personalized medicine aiming at achieving optimal treatment and effective care of patients [49][50][51] . However, there are computational challenges for the use of big data analytics, such as the preprocessing and analysis of whole slide imaging in digital pathology, which need to be overcome before personalized medicine can be put into clinical settings. The scalability and computational efficiency of the methods addressed in this study appear to be a useful tool contributing to extracting meaningful insights from very large amounts of biomedical data.

Software availability
MATLAB codes and data used in this paper are available at the first and corresponding author's personal homepage: https: //sites.google.com/view/tuan-d-pham/codes under the title "Tensor decomposition of RhoB eigenvalues".