Integrative Analysis of Histopathological Images and Genomic Data in Colon Adenocarcinoma

doi:10.21203/rs.3.rs-96224/v1

Download PDF

Research

Integrative Analysis of Histopathological Images and Genomic Data in Colon Adenocarcinoma

https://doi.org/10.21203/rs.3.rs-96224/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 27 Sep, 2021

Read the published version in Frontiers in Oncology →

Version 1

posted

You are reading this latest preprint version

Background: Colon adenocarcinoma (COAD) is one of the highest morbidity cancers all over the world. Its 5-year survival is no more than 60% even in European countries with the highest survival rates. The histopathological information is crucial for the prognosis and therapy of COAD. Application of the digital whole slide imaging system enables us to read histopathological sections digitally. Apart from that, cancer genomics is also an important prognostic factor.

Methods: To identify prognosis biomarkers of COAD, we downloaded whole-slide histopathological images from TCIA database. After processing these images, histopathological features were extracted by CellProfiler. Least Absolute Shrinkage and Selection Operator and Support Vector Machine Recursive Feature Elimination were followed applied, screening out 5 prognosis-related features. Weighted gene co-expression network analysis (WGCNA) was operated to find co-expression gene module correlated with prognosis-related features. The samples were divided into a training set and a testing set on a scale of 70% and 30%. Random forest was applied to construct histopathologic-genomic prognosis factor (HGPF) using prognosis-related features and genomic data. After that, we combined HGPF and clinical characteristics with nomogram and verify its predictive efficacy.

Results: The time-dependent ROC was drawn to evaluate the efficacy of prognostic model. In the training set, 1-year, 3-year and 5-year AUCs are respectively 0.948, 0.916, 0.933. In the testing set, 1-year, 3-year and 5-year AUCs are respectively 0.913, 0.894, 0.924. In addition, patients were separated into high-risk survival group and low-risk survival group by HGPF. Survival analysis indicates that the low-risk patients’ survival was significantly better than high-risk patients’ in both training set and testing set. It is suggested that histopathological image features have certain ability to predict COAD survival, which can be further improved by means of multi-omics combination.

Conclusions: In conclusion, this study constructs an integrative prognosis model based on histopathological and genomic features, which may have some guidance effect on prognosis and clinical decision of COAD patients. Furthermore, the underlying biological mechanisms of this multi-omics model require further study.

Translational Medicine

Colon adenocarcinoma

Histopathological features

Genomic data

Random forest

Prognosis

Colon adenocarcinoma (COAD) is the second most frequent malignancy in developed countries.[1] The incidence of colon adenocarcinoma has been soaring around the world, and is more common in high-income countries and cities. Although survival rates for colon adenocarcinoma have improved greatly in recent years, the 5-year survival is no more than 60% even in European countries which have the highest survival rates.[2] Currently, the most effective and recognized therapy for COAD is radical resection. Adjuvant treatment is designed to assist in the efficacy of radical surgery, reduce the risk of recurrence and death, and improve patient survival.[3] There are many factors affecting the prognosis of colon cancer, among which the most significant are the depth of tumor infiltration into the intestinal wall and the involvement of lymph nodes, which are also the basis of the clinical case staging system.[4] The histologic staging of the tumor, determined by pathological sections, is therefore a critical factor in the prognosis and therapy of colon adenocarcinoma.

Histopathological images contain a lot of information of tumors, including the nature of the lesions, histological classification and grade of malignancy. Furthermore, it also contributes to clinical diagnosis and prognosis. Histopathological outcomes are often regarded as the gold standard for diagnosis. Therefore, the pathological diagnosis of tumor has irreplaceable status.[5] Nevertheless, in many regions of the world, the number of pathologists and the services they can provide cannot meet the needs of the disease.[6] The research and development of the digital whole slide imaging (WSI) system enables pathological sections to be read digitally, breaking the limitations of traditional microscopes. In addition, the application of computer aided diagnosis (CAD) to histopathological images promotes the intellectuality of pathological diagnosis, which is conducive to the improvement of clinical diagnosis efficiency and accuracy.[7] The computerized intelligent histopathologic image analysis system has been applied to breast,[8] lung,[9, 10] and prostate[11] cancers due to its potential to discover new tumor biomarkers.

The impact of histopathological images in the prediction of tumor prognosis has been widely recognized. However, due to the complexity of the molecular mechanisms that affect cancer prognosis, single-source predictors cannot meet the needs of cancer prognosis modeling. Researchers have attempted to combine predictors from multiple sources to determine prognostic biomarkers. The widespread application of high-throughput sequencing technology has promoted the research of Serial Analysis of Gene Expression (SAGE), so that gene expression characteristics can also be used for the clinical prediction of cancer prognosis.[12] [13]The information revealed by the cancer omics profile and histopathological images is not only relatively independent but in common to a certain extent. The histopathological images reflect the morphological features of tumor cells and histological structure of their microenvironment, which can be influenced by individual immune function and environment in addition to molecular changes.[14, 15] For instance, a previous study[16] has found that there is a significant correlation between TP53 mutation in lung adenocarcinoma and pathological characteristics of tumor cells. Another study has given evidence of the correlation between amplifications of PDGFRA, EGFR, MDM2 and specific image features in glioblastoma.[17] It is also feasible to combine pathologic features with oncological omics to optimize prognostic models. At present, the method of establishing the prognostic model of cancer by using genomic data and histopathological image features has been applied to renal cell carcinoma,[18] breast cancer[19] and other early-stage cancers[20], etc., and superior prediction models has been obtained.

In this study, all of the whole-slide histopathological images were downloaded from The Cancer Imaging Archive (TCIA, http://www.cancerimagingarchive.net/) database and cropped into 1000 × 1000 pixel sub-images. We then extracted pathological features from each sub-image and averaged the same features for further analysis. Aimed to study the relationship between these texture features and the prognosis of colon adenocarcinoma, we used Least Absolute Shrinkage and Selection Operator (LASSO) and Support Vector Machine (SVM) models to screen the pathologic features correlated with the prognosis at the same time. After the intersection of the first 19 features screened by SVM and those filtrated by LASSO, 5 features with significant correlation with prognosis were obtained. To further explore the potential biological mechanisms of prognostic pathologic feature, we performed Weighted Correlation Network Analysis (WGCNA) to identify the gene co-expression model most closely related with the 5 image features, using it to establish a prediction model together with the pathological features. Finally, we integrated pathological features and genomics into the randomized forest to estimate each patient’s prognosis and achieved good validation on the independent validation set.

Data source and downloads

We obtained whole slide histopathological images of colon adenocarcinoma (COAD) patients from The Cancer Imaging Archive (TCIA, http://cancerimagingarchive.net/), a public cancer image archive repository. TCIA database collects, provides and manages affluent oncology image data supported by 28 agencies, and can provide researchers with publicly available cancer imaging data and unique imaging resources.[21, 22]

218 whole slide histopathological images were downloaded from TCIA. The histopathological tissue slides are all formalin-fixed and paraffin-embedded to preserve cell morphology as much as possible so that they are suitable for image feature recognition.

The data of mRNA expression levels and clinical information for these COAD patients were downloaded from The Cancer Genome Atlas database (TCGA, https://portal.gdc.cancer.gov/). The Cancer Genome Atlas (TCGA) database is one of the largest and richest public funded projects designed to build a comprehensive genetic map of cancer genome.[23]

For mRNA sequencing data, we totally obtained 478 COAD samples from TCGA. And plenty of clinical information for corresponding patients has been downloaded at the same time, which ultimately contains 459 COAD samples. We applied R package DESeq2 to normalize the mRNA sequencing data.

Extraction of histopathological imaging features

In order to extract imaging features from whole slide histopathological images obtained, our process for dealing with these images consists of three steps. Firstly, since the size of each pathological image is so large that it is inconvenient to use them directly for analysis and feature extraction, we cropped each image evenly into several sub-images of 1000 × 1000 pixels and saved them in tiff image format using the Openslide Python library.[24] In this process, sub-images containing more than 50% white margin are get rid of. To eliminate sample selection bias and reduce computing amount, we randomly select 20 representative image files from the remaining sub-images for the next step. Cropping and randomly selecting images are widely used methods in studies with whole slide image processing.[9, 16, 19]

Secondly, we applied CellProfiler[25] to extract features from each sub-image. CellProfiler is an open-source modular analysis software that can process cell images. It can measure a number of features, including size, shape, intensity, and texture, for each identified cell or subcellular region. The results of hematoxylin-eosin staining make the histopathological images appear different colors. By converting image color into grayscale, cell and tissue features can be extracted.

A total of 656 features for each sub-image were output during the preliminary extraction process. These features are different from well-known classic pathological characteristics such as cellular basophilic, eosinophilic, nuclear atypia and mitotic counts and not able to be recognized by pathologists through the naked eyes. After further removing irrelevant features such as file sizes and execution information, there are 590 features of each sub-image left for the following workflow.

Thirdly, Calculate the average value of 590 features extracted from the representative sub-images in the above steps and regard it as the average feature value of each corresponding slide. When there are more than one slides of a subject, the mean values over those slides are further figured up.

It should be emphasized that the purpose of our study is not to make specific interpretations of the relationship between these imaging features and COAD, but to quest the optimal combination of features to establish an independent prognostic model of COAD. Therefore, the lack of definite biological interpretations does not prevent us from making further reasonable analysis.

Acquisition of prognosis-related features

The least absolute shrinkage and selection operator (LASSO) gets a relatively refined model by constructing a penalty function, and compresses the insignificant variable coefficient to 0 to achieve the effect of variable selection. By customizing the value of the parameter lambda (λ), the user controls the balance between the sparsity (how many features are produced) and high prediction accuracy. Support vector machine recursive feature elimination (SVM-RFE) is another machine learning method based on support vector machine, which is devoted to find the best variables group by sorting of SVM-generated eigenvectors and iteratively eliminating of the minimum features until all features are removed. In order to figure out the relationship between imaging features and prognosis of COAD, SVM-RFE and LASSO logistic regression were applied to filtrate the pathological prognostic features over imaging features obtained from CellProfiler. SVM-RFE and LASSO analysis was realized by using R version 3.6.3 software. When running 5-fold cross-validation SVM-RFE, feature selecting was performed by defining high risk (patients survival time less than 12 months) and low risk (patients surviving for more than 60 months) as training samples. In SVM-RFE model, the maximal cross-validated accuracy is adopted as the evaluation index to confirm the optimal feature subset related with prognosis. The optimal subset of features obtained by SVM-RFE was intersects with the results of LASSO regression to obtain the pathological features most relevant to prognosis.

Co-expression gene module analysis

WGCNA,[26] fully called Weighted gene co-expression network analysis, can identify the set of co-expressed genes, which is called module. Moreover, correlation analysis can be conducted between modules and phenotype data to explore potential Mark genes. The WGCNA method was applied to construct the co-expression gene network in the samples of COAD and explore the co-expressed gene module most associated with the pathological prognostic features defined by the previous step. Calculating the interaction coefficient between genes and then computing the topological overlap measure (TOM) using the adjacency matrix. The co-expression network was constructed based on the W matrix to determine co-expression gene modules. Where, modules with module significance < 0.05 were regarded as prognostic related modules.

Establishment of integrated prognostic model

Aimed at establishing a prognostic prediction model based on histopathological features and genetic expression of COAD patients, we applied the random forest to construct the integrated prognostic prediction model using R randomForestSRC package. Random forest (RF) is a classifier containing multiple decision trees and each tree is built on an independent bootstrap training set. The output category is determined by the mode of the output category of individual trees. RF has great advantages over other algorithms in high-dimensional data processing. It can process high-dimensional data without deleting variables, and can evaluate the predictive ability of each feature. Meanwhile, the unbiased estimation of generalization error generated by internal cross validation guarantees high accuracy. The intervention of two randomness avoids overfitting. The samples were divided into 10 parts, including seven parts (140 samples) of training set and three parts (59 samples) of independent testing set. The 10-fold cross-validation was applied to construct prognostic model, plotting time-dependent receiver operating curve (ROC) using the average of 10 accuracies in order to calculate area under the curve (AUC). The RF model estimates the survival risk of each patient. In other words, the above steps produce a risk score of survival for each patient which we named it histopathologic-genomic prognosis factor (HGPF). By taking the median of risk score, the training set and test set can be divided into a high-risk group and a low-risk group respectively.

After single-factor cox regression, we incorporated meaningful results (p < 0.05) into multivariate cox regression analysis. The nomogram was drawn to verify the predictive ability of the prediction model. Two predictive factors, HGPF obtained from RF model and tumor stage of patients, were used for evaluation. Scores were assigned for each predictive factor according to their degree of influence (the value of regression coefficient) on survival outcome in cox regression model. After adding the scores of the two factors as total score of each patient, the function conversion was performed between the two factors and the probability of different survival time of patients, so as to accurately predict the survival of patients.

Figure 1 illustrates the flowchart of processing histopathological images, extracting imaging features and further establishing an integrated prognostic model for COAD. The results of each part would be introduced in detail in the following sections.

Patient characteristics

A total of 199 COAD patients (112 male and 87 female) were included in our study with data of histopathology, mRNA expression levels and clinical information from TCIA and TCGA dataset. Patient characteristics are shown in Table 1. The median age of patients at first diagnosis was 71.0 years old (range 36–89 years). According to follow-up, the patients’ median survival time was 735 days. There were 167 patients for alive ending and 32 patients for died ending.

Table 1

Demographic and clinical characteristics of patients.
Characteristic	Total (n = 199)	Train (n = 140)	Test (n = 59)	P value
Age: median (range)	71.0 (36–89)	72.0 (36–89)	68.0 (41–86)	0.599
Gender
Male	112 (56.3%)	78 (55.7%)	34 (57.6%)
Female	87 (43.7%)	62 (44.3%)	25 (42.4%)	0.876
T classification
T1-T2	39 (19.6%)	25 (17.9%)	14 (23.7%)
T3-T4	160 (80.4%)	115 (82.1%)	45 (76.3%)	0.336
N classification
N0	124 (62.3%)	84 (60.0%)	40 (67.8%)
N1-N2	75 (37.7%)	56 (40.0%)	19 (32.2%)	0.339
M classification
M0	151 (75.9%)	105 (75.0%)	46 (78.0%)
M1	29 (14.6%)	20 (14.3%)	9 (15.3%)
Mx	15 (7.5%)	12 (8.6%)	3 (5.1%)	0.690
NA	4 (2%)	3 (2.1%)	1 (1.7%)
TNM stage
Ⅰ-Ⅱ	115 (57.8%)	78 (55.7%)	37 (62.7%)
Ⅲ-Ⅳ	78 (39.2%)	57 (40.7%)	21 (35.6%)	0.523
NA	6 (3.0%)	5 (3.6%)	1 (1.7%)
OS(d): median	735.0	737.5	731.0	0.448
Event
Alive	167 (83.9%)	114 (81.4%)	53 (89.8%)	0.204
Dead	32 (16.1%)	26 (18.6%)	6 (10.2%)

Acquisition of histopathological images features

CellProfiler converts color representative images stained by hematoxylin and eosin into grayscale images and measures image features in 10 aspects, including the correlation between intensities in different images, image area occupied, image granularity, image intensity, image quality, object intensity, object neighbors, object radial distribution, object size shape and texture. Texture reflects the degree and nature of image or object textures through measuring the discrepancy in grayscale images. Image granularity is a texture measurement that outputs spectra of the integrating degree between size measures of the structure elements and the image texture. Object size shape measures several area and shape features of each identified object in the image, such as area, perimeter, formfactor, solidity, eulernumber, orientation. Formfactor measures the object shape with formula” 4*π*Area/Perimeter2”. Zernike shape features measures a series of 30 shape features based on Zernike polynomials from order 0 to order 9.

Finally, we extracted 590 imaging features for each sub-image, then we calculated the average values of 20 representative sub-images as average value for each corresponding slide.

Prognosis-related features identification and co-expression gene module selection

Among 590 features extracted from the sub-images, the results of data dimension reduction through LASSO and SVM-RFE were shown in Fig. 2. The optimal subset of features determined by the maximal cross-validated accuracy contained 19 features after feature elimination using SVM-RFE algorithm. The conduction of LASSO regression screen out 8 prognostic features. The results of the two algorithms were intersected, and 5 features were obtained (2 Zernike shape features, 2 Granularity features and 1 formfactor feature), which were defined as prognosis-related features of the COAD samples. To identify co-expression gene module with independent prognostic ability based on prognosis-related imaging features, WGCNA was applied to generate the heat map of the relationship between the 5 prognosis-related features and co-expression gene modules, and the strength of the relationship was represented by different colors (Fig. 3). Obviously, brown module, containing 372 genes, has the most outstanding dependency with the imaging features. Therefore, we pick the brown module as the key module of prognosis significance for building integrated prognostic model.

Enrichment analysis of the key gene module

In order to figure out the potential biological function of genes in the co-expression gene module, we conducted Gene Ontology (GO) enrichment analysis with Metascape (http://metascape.org) to identify biological pathways. Figure 4A set out the top 20 GO terms which were significantly enriched. And the interrelationship among the genes and their respective pathways was showed in Fig. 4B. The results indicate that there were significant intrinsic correlations among the biological function of these genes, which were most enriched in a number of biological processes including blood vessel development, heart development, skeletal system development and tissue morphogenesis and so on. A few of cellular components were also pertinent, such as extracellular matrix organization, ECM proteoglycans, supramolecular fiber organization.

Construction and validation of integrated prognostic model

To establish a prognostic model using prognosis-related features and prognosis significance gene model, verifying its predictive value, the tumor samples were divided proportionally into training set and testing set. There are 140 samples in training set and 59 samples in testing set. The time-dependent ROC was drawn to evaluate the efficacy of the prognostic model. Since the survival results involve two variables, survival state and survival time, the time-dependent ROC curve can more fully describe the predictive ability of the model over time. In the training set, 1-year, 3-year and 5-year AUCs are respectively 0.948, 0.916, 0.933. In the testing set, 1-year, 3-year and 5-year AUCs are respectively 0.913, 0.894, 0.924. The predictive accuracy of the validation set remains at a high level. Whereafter, patients were divided into high risk group and low risk group by median value of histopathologic-genomic prognosis factor (HGPF) predicted. The results of Kaplan-Meier analysis showed that the survival of low-risk patients was significantly better than high-risk patients in both the training set (p < 0.0001) and the testing set (p = 0.00018).

Moreover, decision curve analysis (DCA) evaluates the predictive benefits of each model, including integrated prognostic model (which combine risk score and tumor stage of patients), HGPF model and clinical model. It is manifest that the integrated mode had a better net benefit than others in DCA analysis (Fig. 7C). A nomogram scoring system incorporating HGPF and tumor stage of patients was constructed using cox regression model. (Fig. 7A) The patient sample was scored according to the weights of the two influencing factors, and progression-free survival probability was predicted correspondingly for each patient at 3 and 5 years. Figure 7B demonstrates the calibration curve for 3-year and 5-year overall survival prediction nomogram.

In this study, we extracted image features from whole-slide histopathological images of colon adenocarcinoma. Features regarded as prognostic significance were screened out through integrated machine learning algorithms. We combined prognostic related histopathological features, transcriptomics data and clinical information to construct a prediction model for patients’ survival which had a better prediction accuracy than other models of single-source information. In addition, we further explore potential molecular biological mechanisms correlated with HGPF through enrichment analysis. In summary, it is suggested that histopathological image features have certain ability to predict COAD survival, which can be further improved by means of multi-omics combination.

Our study identified numerous of image features associated with prognosis of colon adenocarcinoma using integrated machine learning algorithms. These primary prognostic features included Zernike shape features of the nuclei, Granularity features and a formfactor feature. Differences in image texture and pathological morphology of COAD may influence the prognosis to some extent. Apart from prognosis prediction, the discrepancies in cell structure revealed by these image features may, to some extent, lead to the differences in the invasion activity of tumor cells. In bladder cancer, a staging diagnostic model based on tumor invasiveness were developed with histopathological image features.[27] This approach can also be applied into other cancer for grading more accurately.[28] These image features are difficult for histopathologists to distinguish with the naked eye. Therefore, the application of computer algorithms to identify histopathological features related to prognosis could reveal more underlying biological mechanisms of tumor development and progression in COAD.

After defining the pathological features most relevant to prognosis, we adopted co-expression gene modules to further explore the potential molecular pathways and mechanisms of prognosis related histopathological features affecting the prognosis of COAD, enrichment analysis was carried out. Among the signaling pathways displayed, tumor microenvironment-related pathways are dominated such as extracellular matrix (ECM) organization and blood vessel development. ECM is a key factor in regulating TGF-β signaling pathway. In addition to regulate the initiation of TGF-βsignaling, it can also determines the outcome of the cytokine action, one of which is to induce the epithelial-mesenchymal transition (EMT).[29, 30] EMT was regarded as a pivotal step for cancer cells to acquire the ability of migration and invasion.[31, 32] Moreover, the change of extracellular matrix (ECM) organization may also be vital in tumor recurrence.[33] The rapid proliferation of cancer cells leads to the formation of hypoxic areas in the tumor center. The tumor facilitates angiogenesis for further growth, which in return increases the need for new blood vessels.[34] Epithelial cells with EMT will lead to reduction of cell junctions, recombination of cytoskeleton structure and changes in cell polarity and cell shape, which may lead to characteristic changes in histopathological images.[35] Considering the correlation between the 5 prognosis-related features and co-expression gene module, these signaling pathways enrichment by module genes may be potential biological mechanisms correlated with the prognosis-related histopathological image features.

Our research has established a prognostic model using prognosis-related features, prognosis significance co-expression gene module and clinical characteristics of COAD patients. Combining them could obtain the robust results in the training set and testing set. Many previous studies have conduct extensive modeling and research using single omics such as gene signatures of COAD.[36–38] In this study, we integrate pathological images and genomics of colon adenocarcinoma for the first time, improving the prediction performance. This method of pathological images in combination with genes to predict survival has also been studied in other tumors.[18, 39] Some has also found that integrating models can improve prediction power by combining genes and images such as magnetic resonance imaging and computerized tomography.[40–42] On account of the changes in cancer genetic mechanisms are often reflected in cell morphology, pathological images have better insight into other medical images.

To our best knowledge, this is the first time that histopathological images and genomics were integrated in research for predicting prognosis of COAD patients. Our research exploits a new feasibility for establishing of prognosis models for colon adenocarcinoma with multi-omics information, and makes more full application and excavation of histopathological image information. In addition, the analysis of prognostic signaling pathways puts forward new direction for the biological mechanism of possible pathological morphological changes in colon adenocarcinoma, and provides reference for clinical prognosis and treatment strategies.

What we've achieved so far still faces a few of limitations that need further study. Although significant prediction value of the prognostic model has been demonstrated during our validation, its accuracy and practicability for clinical prediction of survival results still needs to be verified by multi-center and large-scale studies. Second, the specific mechanism of the connection between the enriched analysis signal pathway and the prognostic model is still unclear and needs further study.

In conclusion, our study constructs a robust integrative prognosis model based on multi-omics features in predicting survival outcomes of colon adenocarcinoma patients. This model deepens the cognition to the histopathological image information and may provide additional information for the prognosis and clinical decision of COAD patients. Moreover, the potential biological mechanism by which histopathological image features affect survival risk needs further study.

Ethics approval and consent to participate:

Not applicable

Consent for publication:

Not applicable

Competing interests:

The authors declare that they have no competing interests.

Funding:

Not applicable

Authors' contributions:

Hao Zeng and Hui Li are responsible for the conception and design of the research. Linyan Chen is responsible for data downloading and sorting. Hao Zeng conducted the data processing. Hui Li is responsible for editing article and formatting. Qimeng Liao is in charge of the interpretation of data. Jianrui Ji is in charge of submission and manuscript revision.

Acknowledgements:

Not applicable

Miller KD, et al. Cancer treatment and survivorship statistics, 2019. CA Cancer J Clin. 2019;69(5):363–85.
Labianca R, et al. Colon cancer. Crit Rev Oncol Hematol. 2010;74(2):106–33.
Dienstmann R, Salazar R, Tabernero J. Personalizing colon cancer adjuvant therapy: selecting optimal treatments for individual patients. J Clin Oncol. 2015;33(16):1787–96.
Steinberg SM, et al. Prognostic indicators of colon tumors. The Gastrointestinal Tumor Study Group experience. Cancer. 1986;57(9):1866–70.
Wilson ML, Fleming KA. Global Cancer Care: The Role of Pathology. Am J Clin Pathol. 2016;145(1):6–7.
Nelson AM, et al. Oncologic Care and Pathology Resources in Africa: Survey and Recommendations. J Clin Oncol. 2016;34(1):20–6.
Hipp J, et al. Computer aided diagnostic tools aim to empower rather than replace pathologists: Lessons learned from computational chess. J Pathol Inform. 2011;2:25.
Beck AH, et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci Transl Med. 2011;3(108):108ra113.
Yu KH, et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. 2016;7:12474.
Luo X, et al. Comprehensive Computational Pathological Image Analysis Predicts Lung Cancer Prognosis. J Thorac Oncol. 2017;12(3):501–9.
Wilkins A, Dearnaley D, Somaiah N, Genomic and Histopathological Tissue Biomarkers That Predict Radiotherapy Response in Localised Prostate Cancer. Biomed Res Int, 2015. 2015: p. 238757.
Taherian-Fard A, Srihari S, Ragan MA. Breast cancer classification: linking molecular mechanisms to disease prognosis. Brief Bioinform. 2015;16(3):461–74.
Visser E, et al. Prognostic gene expression profiling in esophageal cancer: a systematic review. Oncotarget. 2017;8(3):5566–77.
Xu Y, et al., Histopathological Imagingâ࿽»Environment Interactions in Cancer Modeling. Cancers (Basel), 2019. 11(4).
Zhong T, Wu M, Ma S. Examination of Independent Prognostic Power of Gene Expressions and Histopathological Imaging Features in Cancer. Cancers (Basel), 2019. 11(3).
Yu KH, et al. Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma. Cell Syst. 2017;5(6):620–7.e3.
Cooper LA, et al. Novel genotype-phenotype associations in human cancers enabled by advanced molecular platforms and computational analysis of whole slide images. Lab Invest. 2015;95(4):366–76.
Cheng J, et al. Integrative Analysis of Histopathological Images and Genomic Data Predicts Clear Cell Renal Cell Carcinoma Prognosis. Cancer Res. 2017;77(21):e91–100.
Sun D, et al. Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome. Comput Methods Programs Biomed. 2018;161:45–53.
Shao W, et al. Integrative Analysis of Pathological Images and Multi-Dimensional Genomic Data for Early-Stage Cancer Prognosis. IEEE Trans Med Imaging. 2020;39(1):99–110.
Prior FW, et al., TCIA: An information resource to enable open science. Annu Int Conf IEEE Eng Med Biol Soc, 2013. 2013: p. 1282-5.
Clark K, et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013;26(6):1045–57.
Tomczak K, Czerwińska P, Wiznerowicz M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol (Pozn). 2015;19(1a):A68–77.
Goode A, et al. OpenSlide: A vendor-neutral software foundation for digital pathology. J Pathol Inform. 2013;4:27.
Soliman K. CellProfiler: Novel Automated Image Segmentation Procedure for Super-Resolution Microscopy. Biol Proced Online. 2015;17:11.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.
Yin PN, et al. Histopathological distinction of non-invasive and invasive bladder cancers using machine learning approaches. BMC Med Inform Decis Mak. 2020;20(1):162.
Niazi MKK, et al. Visually Meaningful Histopathological Features for Automatic Grading of Prostate Cancer. IEEE J Biomed Health Inform. 2017;21(4):1027–38.
Cichon MA, Radisky DC. Extracellular matrix as a contextual determinant of transforming growth factor-β signaling in epithelial-mesenchymal transition and in cancer. Cell Adh Migr. 2014;8(6):588–94.
Stallings-Mann ML, et al. Matrix metalloproteinase induction of Rac1b, a key effector of lung cancer progression. Sci Transl Med. 2012;4(142):142ra95.
Yang J, Weinberg RA. Epithelial-mesenchymal transition: at the crossroads of development and tumor metastasis. Dev Cell. 2008;14(6):818–29.
Strizzi L, et al. Development and cancer: at the crossroads of Nodal and Notch signaling. Cancer Res. 2009;69(18):7131–4.
Zhai X, et al. Colon cancer recurrence–associated genes revealed by WGCNA co–expression network analysis. Mol Med Rep. 2017;16(5):6499–505.
Batlle R, et al. Regulation of tumor angiogenesis and mesenchymal-endothelial transition by p38α through TGF-β and JNK signaling. Nat Commun. 2019;10(1):3071.
Lamouille S, Xu J, Derynck R. Molecular mechanisms of epithelial-mesenchymal transition. Nat Rev Mol Cell Biol. 2014;15(3):178–96.
Xu G, et al. A 15-gene signature for prediction of colon cancer recurrence and prognosis based on SVM. Gene. 2017;604:33–40.
Yang H, et al. Association of a novel seven-gene expression signature with the disease prognosis in colon cancer patients. Aging. 2019;11(19):8710–27.
Gao P, et al. Integrated analysis of gene expression signatures associated with colon cancer from three datasets. Gene. 2018;654:95–102.
Hao J, et al. PAGE-Net: Interpretable and Integrative Deep Learning for Survival Analysis Using Histopathological Images and Genomic Data. Pac Symp Biocomput. 2020;25:355–66.
Lee J, et al. A Quantitative CT Imaging Signature Predicts Survival and Complements Established Prognosticators in Stage I Non-Small Cell Lung Cancer. Int J Radiat Oncol Biol Phys. 2018;102(4):1098–106.
Toledano MN, et al. Combination of baseline FDG PET/CT total metabolic tumour volume and gene expression profile have a robust predictive value in patients with diffuse large B-cell lymphoma. Eur J Nucl Med Mol Imaging. 2018;45(5):680–8.
Shu C, et al. The TERT promoter mutation status and MGMT promoter methylation status, combined with dichotomized MRI-derived and clinical features, predict adult primary glioblastoma survival. Cancer Med. 2018;7(8):3704–12.

Table 1. Demographic and clinical characteristics of patients.

Characteristic	Total (n=199)	Train (n=140)	Test (n=59)	P value
Age: median (range)	71.0 (36-89)	72.0 (36-89)	68.0 (41-86)	0.599
Gender
Male	112 (56.3%)	78 (55.7%)	34 (57.6%)
Female	87 (43.7%)	62 (44.3%)	25 (42.4%)	0.876
T classification
T1-T2	39 (19.6%)	25 (17.9%)	14 (23.7%)
T3-T4	160 (80.4%)	115 (82.1%)	45 (76.3%)	0.336
N classification
N0	124 (62.3%)	84 (60.0%)	40 (67.8%)
N1-N2	75 (37.7%)	56 (40.0%)	19 (32.2%)	0.339
M classification
M0	151 (75.9%)	105 (75.0%)	46 (78.0%)
M1	29 (14.6%)	20 (14.3%)	9 (15.3%)
Mx	15 (7.5%)	12 (8.6%)	3 (5.1%)	0.690
NA	4 (2%)	3 (2.1%)	1 (1.7%)
TNM stage
Ⅰ-Ⅱ	115 (57.8%)	78 (55.7%)	37 (62.7%)
Ⅲ-Ⅳ	78 (39.2%)	57 (40.7%)	21 (35.6%)	0.523
NA	6 (3.0%)	5 (3.6%)	1 (1.7%)
OS(d): median	735.0	737.5	731.0	0.448
Event
Alive	167 (83.9%)	114 (81.4%)	53 (89.8%)	0.204
Dead	32 (16.1%)	26 (18.6%)	6 (10.2%)

Download PDF

Journal Publication

published 27 Sep, 2021

Read the published version in Frontiers in Oncology →

Version 1

posted

You are reading this latest preprint version

Integrative Analysis of Histopathological Images and Genomic Data in Colon Adenocarcinoma

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Materials And Methods

Result

Discussion

Conclusion

Declarations

Ethics approval and consent to participate:

Competing interests:

Funding:

Authors' contributions:

Acknowledgements:

References

Tables

Status:

Journal Publication

Version 1