Additional file 1
Supplementary Tables
Table S1: The composition and number of features in each feature set.
Table S2: Summary of C-index and time-dependent AUC.
Table S3: The name and coefficient of features selected in the final image-based model.
Table S4: The likelihood ratio (LR) and its p value of models. “AJCC stage”, “cln” and "cln_im” represent models based on AJCC tumor pathologic stage, baseline variables and the combination of baseline variables and WSI features, respectively. Abbreviations: “All”, all the patients; “AJCC stage<III”, patients within AJCC tumor pathologic stage<III group; “AJCC stage≥III”, patients within AJCC tumor pathologic stage≥III group; “Metastatic”, the group of patients with metastatic tumors; “Locoregional”, the group of patients with locoregional tumors.
Table S5: The median survival time of higher and lower-risk subgroups in each pathologically-defined groups of patients.
Supplementary Figures
Figure S1: Three examples of nucleus segmentation results. Figure A shows an image block with a small number of nuclei; B is a block with a higher number of nuclei; C is a block almost all filled with nuclei.
Figure S2: The RFS probability curve of the 152 patients enrolled in this study.
Figure S3: Analysis of variation of cross-validation C-index along with the penalty (log-transformed ). Figure A was for models developed based on baseline variables, while figure B for that of based on both baseline variables and WSI features.
Figure S4: The overall survival probability of subgroups stratified by the risk score. A represents all the patients; B represents the patients in AJCC stage<III; C represents the patients in AJCC stage≥III; D represents the patients with metastatic tumors; E represents the patients with locoregional tumors.
Figure S5: The dot plot of the top 20 GO in BP identified by the GOseq package. The DE Ratio is the ratio of differentially expressed genes among all the genes in a specific GO category. The GO Description displays the ID and brief information of each GO. The color of dot shows the adjusted p value of the GO term. The size of dot represents the number of differentially expressed genes.
Figure S6: The dot plot of the top 20 gene ontologies in CC identified by the GOseq package. The DE Ratio is the ratio of differentially expressed genes among all the genes in a specific GO category. The GO Description displays the ID and brief information of each GO. The color of dot shows the adjusted p value of the GO term. The size of dot represents the number of differentially expressed genes.
Figure S7: The dot plot of the top 20 gene ontologies in MF identified by the GOseq package. The DE Ratio is the ratio of differentially expressed genes among all the genes in a specific GO category. The GO Description displays the ID and brief information of each GO. The color of dot shows the adjusted p value of the GO term. The size of dot represents the number of differentially expressed genes.
Figure S8: The directed acyclic graph of the enriched GO terms in biological process category identified by the clusterProfiler package. The color represents the significance of GO terms (more significant from yellow to red). The arrow represents the hierarchical relationship between two terms. The shape of each term represents the top 10 significant GO terms (rectangle) and others (ellipse). In each term the GO ID, brief description, FDR, the number of differentially expressed genes and all genes were displayed.
Figure S9: The directed acyclic graph of the enriched GO terms in cellular component category identified by the clusterProfiler package. The color represents the significance of GO terms (more significant from yellow to red). The arrow represents the hierarchical relationship between two terms. The shape of each term represents the top 10 significant GO terms (rectangle) and others (ellipse). In each term the GO ID, brief description, FDR, the number of differentially expressed genes and all genes were displayed.
Figure S10: The directed acyclic graph of the enriched GO terms in molecular function category identified by the clusterProfiler package. The color represents the significance of GO terms (more significant from yellow to red). The arrow represents the hierarchical relationship between two terms. The shape of each term represents the top 10 significant GO terms (rectangle) and others (ellipse). In each term the GO ID, brief description, FDR, the number of differentially expressed genes and all genes were displayed.
Supplementary methods
The computational formulas of the finally selected texture features.