5.1 Expression of GPX3 in normal human tissues
The Human Protein Atlas (HPA) (https://www.proteinatlas.org/) is a comprehensive repository for protein expression profiles in tissue, cells and blood and their metabolic and pathologic roles in the body. The expression of GPX3 in normal tissues was analyzed using HPA RNA-seq tissue data [26]. The expression of GPX3 protein in main cancer tissues (Colorectal cancer, Prostate cancer, Breast cancer, Lung cancer, Liver cancer) and normal tissues (Normal kidney tissues) was analyzed using immunohistochemical (IHC) tissue images in HPA.
5.2 Gene expression analysis
The expression of GPX3 mRNA in 33 different malignant tumors and corresponding normal tissues was assessed using RNA sequence data in the TCGA and GTEx databases [27]. Differential gene expression between cancerous and corresponding normal tissues was analyzed using t-tests, whereas the differently expressed genes between the tissue sets were presented using a violin plot. Before plotting, the expression data was first transformed to log2 [TPM (Transcripts per million) +1], with P <0.05 considered statistically significant.
The cancer omics data from the Clinical proteomic tumor analysis consortium (CPTAC) was analyzed using UALCAN platform (http://ualcan.path.uab.edu/analysis-prot.html) [28]. The major analysis involved the expression of GPX3 several malignant tumors including Breast cancer, Ovarian cancer, Colon cancer, Clear cell Renal Cell Carcinoma (Clear cell RCC), Uterine Corpus Endometrial Carcinoma (UCEC) and Lung adenocarcinoma (LUAD). The degree and nature of abnormal expression of GPX3 protein between cancer and adjacent normal tissues was based on Z-values, with the median protein expression levels used as reference points.
The GPX3 expression at different cancer stages was analyzed using the "Expression DIY" module in GEPIA2 platform (http://gepia2.cancer-pku.cn/#index). The corresponding violin plot was also constructed after transformation of the expression data to log2 (TPM +1). The comparative analyses for the expression of GPX3 in different cancer stages were performed to understand the role of the protein in cancer pathology [29].
5.3 Prognostic utility of GPX3
We constructed the predictive potential of GPX3 for Overall Survival (OS), Disease Specific Survival (DSS), Disease-Free Interval (DFI) and Progression-Free Interval (PFI) of different tumors in the TCGA database. The median GPX3 expression was used as the cutoff level for high and low expression of the protein. The predictive utility of GPX3 for OS, DSS, DFI and PFI of cancer patients was assessed using log-rank test and the Kaplan-Meier curve.
Further analyses were performed to assess epidemiological implication of GPX3 expression in 33 tumor types in the TCGA database. The effect of GPX3 expression on Overall Survival (OS), Disease-Free Survival (DFS), Progression Free Survival (PFS) and Disease Specific Survival (DSS) for different cancers were assessed using R software V. 4.0.3. The relationship between GPX3 expression and OS, DFS, PFS and DSS were analyzed using univariate Cox regression analysis and hazard ratios (HR) at 95% confidence interval (CI) and log-rank P test at statistical significance of P <0.05 [30].
5.4 Genetic alteration in tumor cells
RNA-seq data for 33 cancer patients in TCGA database were downloaded from the genomic data Commons (GDC) portal (https://portal.gdc.cancer.gov/). Tumor Mutation Burden (TMB), defined as the number of mutations (insertion/deletion) per mega base in the exon coding region of a gene, was analyzed as previously described [31]. The TMB is directly proportional to the expression of neoantigens recognizable by T cells, which influences the immune response. Microsatellite Instability (MSI) is any change in the microsatellite length caused by insertion or deletion of a repeat unit in a gene in a tumor tissue, relative to normal tissue [32], which generates microsatellite alleles. TMB and MSI are often used in assessing the prognosis and effect of immunotherapies. The association between GPX3 expression and TMB and MSI in cancerous tissues was assessed. Relevant data was obtained from the TCGA database; whereas the analysis was performed using R software V. 4.0.3, with statistical significance sets at P <0.05.
5.5 Infiltration of immune cells
Tumor Immune Evaluation Resource 2 (TIMER2) is a database for the systematic analysis of immune infiltration of different cancer types (B cells, CD4+ T cells, CD8+ T cells, Neutrophils, Macrophages, and Dendritic cells). In this study, infiltrating immune cell scores of 33 cancers were downloaded from the TIMER2 database. Spearman correlation analysis was used to evaluate the correlation between GPX3 expression and scores of B cells, CD4+ T cells, CD8+ T cells, Neutrophils, Macrophages, and Dendritic cells [33].
The relationship between GPX3 expression and infiltration levels of Cancer associated fibroblasts (CAFs) was analyzed using TIMER2 platform ( http://timer.cistrome.org/ ). CAF regulates functioning of immune cells in the tumor microenvironment (TME). The infiltration of immune cells in the TME were estimated using EPIC, MCPCOUNTER, XCELL and TIDE algorithms. Since most immune cell types are negatively correlated with tumor purity, we obtained P-values and correlation coefficient by Spearman’s rank correlation test after purity adjustment. The above relationship was presented using a heat map and a scatter plot. Scatter plot was constructed for cells exhibiting the strongest correlation with tumor (P <0.05).[29]
5.6 Enrichment analysis
The top 100 genes associated with GPX3 expression in TCGA and GTEx databases were identified based on Pearson’s correlation coefficient (PCC). The correlations between GPX3 and the top 3 most dysregulated genes were also assessed using GEPIA2 module. A scatter plot for the top 3 most dysregulated genes was also constructed.
Protein-protein interaction (PPI) network in tumor tissues associated with GPX3 expression was constructed using STRING platform (https://string-db.org/). The minimum required interaction score was set as Low confidence=0.150, the max number of interactors to show was set as no more than 50 interactors in 1st shell. Finally, the available experimentally determined GPX3-binding proteins were obtained.
KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analysis of the 150 genes, which was combined from top 100 GPX3-similar genes and 50 GPX3-interacted genes, was performed to identify pathways regulated by the proteins. The resultant genes were uploaded to DAVID database, under the name of (“OFFICIAL_GENE_SYMBOL”) and ("Homo sapiens") for species. GO (Gene Ontology) enrichment analysis for the Biological Process (BP), Cellular Component (CC) and Molecular Function (MF) associated with the dysregulated genes were also identified and plotted graphically using the cnetplot package (circular = F, colorEdge =T, node_label=T). KEGG and GO analyses were performed using R software. Statistical significance for both analyses was set at two-tailed P<0.05 [29].
5.7 Construction and validation of the nomogram of GPX3 for STAD
The results above indicated that GPX3 expression had an important impact on the survival prognosis of numerous malignant tumors, such as BLCA, COAD, PAAD, STAD. OS, DSS, PFS, DFS, DFI and PFI all strongly supported that the prognosis of STAD would get worse when GPX3 level elevated. Therefore, a nomogram of GPX3 for STAD was established and verified to further analyze the predictive significance of GPX3 for the OS of patients with STAD. Firstly, univariate and multivariate Cox regression analysis were used to identify all independent factors for STAD and displayed as hazard ratios (HR) combined with the corresponding 95% confidence intervals (CI). Then, according to the results of the multivariate Cox regression analysis model, a prognostic nomogram was established to predict the OS probability of STAD patients at 1-, 2-, 3-, and 5-year by the TCGA training dataset by using the rms package in R software. Concordance index (C-index), which ranges from 0.5 (poor) to 1.0 (perfect), was employed to assess the performance of nomogram. Briefly, the higher the C-index, the better its prognostic accuracy. Finally, to ensure the nomograms’ accuracy, calibration and validation of the nomogram were performed using the R package “rms” and “cmprsk”, P < 0.05 was considered statistically significant [34, 35].