GEO data identified 30054 abnormally expressed genes in OSCC comparing to normal oral epithelium
Four GEO transcriptome profiles were combine used to explore the abnormally expressed genes in OSCC comparing to normal oral epithelium, and 18981, 3932, 5695 and 11506 genes were identified in GSE30784, GSE78060, GSE31056 and GSE37991 respectively. Besides the genes that were shared by different profiles, a total of 30054 genes were eventually revealed to be differently expressed in OSCC vs normal control (Table S2).
The genes were then divided into 4 groups based on the expression discrepancy level, considering the heterogeneity of different samples and the possible data deviation caused by different GEO platforms, the 4 profiles genes were analyzed separately. And the result showed that in GSE30784, the expression change of 15326 genes were < 2 fold, 2686 genes were 2 ~ 4 fold, 664 genes were 4 ~ 8 fold and 305 genes were > 8 fold in OSCC comparing to normal oral epithelium tissues. In GSE78060, the gene number was 1024, 2013, 668 and 227 in each group respectively. In GSE31056, the gene number was 3609, 1362, 444 and 280 in each group. Meanwhile, In GSE37991, the gene number was 9173, 1844, 389 and 100 in < 2 fold, 2 ~ 4fold, 4 ~ 8 fold and > 8 fold group respectively (Fig. 1A-1D).
Above all, besides the genes that were shared in multiple profiles, the analysis of 4 GEO profiles indicated that the expression of 23671, 4285, 1491 and 607 genes were credible revealed to be < 2 fold, 2 ~ 4fold, 4 ~ 8 fold and > 8 fold changed in OSCC comparing to normal oral epithelium tissues (Fig. 1E, 1F).
GO/KEGG preliminary interpretation of the OSCC abnormal expressed genes with different changing level
To preliminary interpret the biological functions of the abnormally expressed genes with different expression changing level, the four groups of genes with expressions changed < 2 fold, 2 ~ 4 fold, 4 ~ 8 fold and > 8 fold were processed with GO/KEGG analysis independently. Interestingly, all four groups of genes in four GEO profiles displayed a same trend that more different the genes expression are in cancer comparing to normal control, their cellular location tend to be more outwards from cell nuclear. To be more specific, the < 2 fold group of genes were mainly located in nuclear, 2 ~ 4 fold genes were mostly in the cytoplasm, meanwhile, the 4 ~ 8 fold and > 8 fold genes were tend to locate in cell membrane and outside in extracellular region (Fig. 2A-2D). Actually, this phenomenon has been observed in many other cancers[40–42], and it’s logical considering the biological fact that most function genes are synthesised in nuclear and regulated by various modular factors, the slight change in nuclear protein might result in massive extra nuclear proteins changes.
PPI network and function module analysis revealed a 33-genes containing gene cluster
To next step deeply analyze the abnormally expressed genes in OSCC, the connection among each gene was observed by PPI network, and for the convenience of further immunohistochemistry (IHC) experiment validation and potential clinical use, we mainly focused on the high level expression changed > 8 fold group of genes.
A total of 607 > 8 fold genes were screened by four GEO profiles and the PPI network was constructed based on which three top gene clusters were identified (Fig. 3A). And the first module which posses highest module score contains 33 genes that were mainly related with extracellular matrix structural constituent regulation as well as cell growth and maintenance related biological processes (Fig. 3B). Meanwhile, the next module which score was second to the top module contains 39 genes, and the genes were mostly associated with cellular structural molecular activity and chemokine activity (Fig. 3C). Moreover, the third module contains 9 genes which were mainly enriched in catalytic activity and energy metabolism related biological processes (Fig. 3D).
Given the top module possess the highest score and the genes were mostly extracellular matrix constituent regulation related which is consistent with previous GO/KEGG analysis that high level gene clusters tend to locate distal from cell nuclear, we mainly focused on the 33 genes in the first module for further analysis.
Risk score assessment of the 33-genes containing module
Before deeply interpretation of each of the 33 genes in first module, SurvExpress was in advance used to evaluate the prognostic indicating value of the module as a whole, and the result supported that based on the module genes expression, OSCC patients could be significantly divided into two groups with high and low prognostic risks respectively. The prediction concordance index equals 66.86, risk group hazard ratio equals 2.71 between high and low survival patients (Fig. 4A-4D). Moreover, the relative expression of 33 genes expression in low and high risk patient groups were also explored (Fig. 4E).
KM survival combine with Cox regression analysis revealed SPP1 and PLAU as two credible OSCC prognostic indicators
To further scale down the candidate genes and identify credible key genes during cancer development, KM survival which is based on GEO, EGA and TCGA data, UALCAN database which is based on TCGA, MET500 and CPTAC data and multivariate cox regression analysis were combine used to select credible survival related key genes from the 33 module genes. And the results showed that 7 of 33 genes were supported by both KM analysis and Ualcan data to be associated with OSCC survival, namely SPP1, SERPINH1, PLAU, THBS1, CXCL13, CSF2 and CXCL1. Further, the 7 genes were processed for multivariate Cox regression analysis which result indicated that SPP1 and PLAU genes were the only two genes could be regarded as independent prognostic indicators in OSCC development (Table 1), and both genes relate with not only patients overall survival but also progress free survival (Fig. 5A-5D).
Table 1
Univariate combine with multivariate Cox Regression analysis of the 33 module genes to identify credible key genes
Module Genes | Survival analysis (p value) |
KM survival | UALCAN survival | COX Regression |
SPP1 | 0.0039 | 0.028 | 0.048* |
SERPINH1 | 0.00014 | 0.142 | 0.559 |
PLAU | 0.00005 | 0.001 | 0.021* |
THBS1 | 0.0014 | 0.009 | 0.122 |
CXCL13 | 0.00045 | 0.397 | 0.275 |
CSF2 | 0.0058 | 0.136 | 0.769 |
CXCL1 | 0.0035 | 0.238 | 0.475 |
Basic physicochemical properties of SPP1 and PLAU genes
ProtParam, ProtScale and GeneCards analysis were successively used to interpret the physiochemical properties of SPP1 and PLAU genes. The results indicated that SPP1 which is short for Secreted Phosphoprotein 1, locates in 4q22.1 and encodes a protein composed of 314 amino acids including 75 negatively charged amino acid residues (ASP + Glu) and 29 positively charged amino acid residues (Arg + Lys), the estimated molecular weight of the protein is 35.4KD with theoretical isoelectric point computed as 4.37. Moreover, the estimated instability index of the protein is 53.37 and grand average of hydrophobic value is -1.10 indicating SPP1 works as a cellular unstable and hydrophilic protein which is consistent with the ProtScale analysis result of SPP1 structure showing that the protein harbors more hydrophilic regions than hydrophobicity regions (Fig. 5F). Meanwhile, the prediction result of GeneCards indicated that SPP1 locates in cellular Golgi apparatus or to be secreted in the extracellular region (Fig. 5E).
As for PLAU gene whose full name is Urokinase-type plasminogen activator, the gene locates in 10q22.2 and encodes a protein composed of 431 amino acids including 39 negatively charged amino acid residues (ASP + Glu) and 52 positively charged amino acid residues (Arg + Lys), the estimated molecular weight of the protein is 48.5KD with theoretical isoelectric point computed as 8.78. The estimated instability index of the protein is 42.71 and grand average of hydrophobic value is -0.470 indicating SPP1 also works as a cellular unstable and hydrophilic protein (Fig. 5H). Similar to SPP1 gene, the prediction of GeneCards also indicated that PLAU locates in cellular Golgi apparatus or to be secreted in extracellular region (Fig. 5G).
Aberrant changed expression of SPP1 and PLAU in human cancers including OSCC
To validate the expression change of SPP1 and PLAU genes in OSCC comparing to normal oral epithelium, both online database and local hospital samples were used. As for SPP1 gene, online GEPIA analysis revealed that although the gene expression various in broad spectrum human cancers, for instance, the expression was lower in sarcoma comparing to control samples, its expression in most other types of tumors including HNSCC was significantly higher than matched normal tissues, especially in glioblastoma multiforme and cholangio carcinoma, the expression was more than 50 times higher in cancer comparing to corresponding normal control samples (Fig. 6A), and in HNSCC, SPP1 expression was nearly 20 times higher in cancer versus normal oral epithelium (Fig. 6B). Meanwhile, as for PLAU gene, a similar trend was observed that although the expression various (Fig. 6C), it expresses almost 10 times significantly higher in HNSCC comparing to normal oral epithelium samples (Fig. 6D).
Besides online GEPIA analysis which was based on TCGA data, the result of IHC experiment which was conducted using 304 local hospital patients samples (Table S3) also validated the aberrant gain of expression of SPP1 and PLAU in OSCC. IHC experiment result showed that the positive ratio of both SPP1 and PLAU were much higher in cancer (69.7% and 54.8% respectively) than matched normal oral epithelium samples (both less than 5%), supporting the aberrant gain of expression of SPP1 and PLAU in OSCC (Fig. 6E-6H).
Association between SPP1 and PLAU expression with OSCC clinical pathological parameters
To access the association between SPP1 and PLAU expression with OSCC clinical parameters, both TCGA online data and local hospital patients information were used. Firstly, UALCAN information which is based on TCGA data revealed that not only SPP1 and PLAU genes express markedly higher in cancer comparing to normal control, but also both genes expression keep increasing as the cancer stage and grade advancing. More interestingly, both genes expression were associated with HPV infection, as a matter of fact, the genes tend to express higher in patients with no HPV infection indicating their potential roles as gene targets in HPV(-) OSCC patients. Additionally, PLAU expression tends be higher in patients with TP53 mutation. Meanwhile, no significance relationship was found between neither SPP1 nor PLAU expression with patients age, gender, race and nodal metastasis (Fig. 7A-7P).
Meanwhile, the information of 304 local hospital patients which were previously used for IHC experiment revealed that besides the keeping increasing genes expression as the cancer stages advancing which is consistent with above UALCAN result, the genes expression were also associated with cancer differentiation as well as chemo and radiotherapy history. Moreover, no association was found between neither genes expression and patients smoking, drinking, tobacco chewing habit nor with tumor location, intravascular tumor thrombus existence and tumor nerve invasion (Table 2, 3).
Table 2
The association between SPP1 and OSCC clinical pathological features
Parameters | OSCC(%) | P Value |
- (%) | + (%) |
Gender | | | |
| female | 34 (33.0%) | 69 (67.0%) | 0.275 |
male | 54 (27.0%) | 146 (73.0%) |
Age | | | |
| ≤ 60 | 62 (29.1%) | 151 (70.9%) | 0.969 |
> 60 | 26 (28.9%) | 64 (71.1%) |
Smoking habit | | | |
| No | 63 (31.3%) | 138 (68.7%) | 0.216 |
Yes | 25 (24.5%) | 77 (75.5%) |
Drinking alcohol | | | |
| No | 76 (31.1%) | 168 (68.9%) | 0.101 |
Yes | 12 (20.3%) | 47 (79.7%) |
Chewing tobacco | | | |
| No | 83 (29.1%) | 202 (70.9%) | 0.903 |
Yes | 5 (27.8%) | 13 (72.2%) |
Tumor site | | | |
| Left | 42 (30.2%) | 97 (69.8%) | 0.486 |
Right | 43 (29.5%) | 103 (70.5%) |
Radiation history | | | |
| No | 85 (32.%) | 180 (67.9%) | 0.002 |
Yes | 3 (7.9%) | 15 (83.3%) |
Intravascular tumor cells | | | |
| No | 84 (29.8%) | 198 (70.2%) | 0.296 |
Yes | 4 (19.0%) | 17 (81.0%) |
Nerve affection | | | |
| No | 64 (31.5%) | 139 (68.5%) | 0.175 |
Yes | 24 (24.0%) | 76 (76.0%) |
Lymph node infiltration | | | |
| No | 86 (29.7%) | 204 (70.3%) | 0.426 |
Yes | 2 (15.4%) | 11 (84.6%) |
Differentiation | | | |
| Well | 32 (21.3%) | 118 (78.7%) | 0.004 |
Moderate | 39 (32.5%) | 81 (67.5%) |
Poor | 14 (50.0%) | 14 (50.0%) |
Stage | | | |
| cT1 | 62 (41.6%) | 87 (58.4%) | < 0.001 |
cT2 | 20 (18.7%) | 87 (81.3%) |
cT3 | 3 (8.8%) | 31 (91.2%) |
cT4 | 1 (12.5%) | 7 (87.5%) |
Table 3
The association between PLAU and OSCC clinical pathological features
Parameters | OSCC(%) | P Value |
- (%) | + (%) |
Gender | | | |
| female | 30(29.1%) | 73(70.9%) | 0.982 |
male | 58(29.0%) | 142(71%) |
Age | | | |
| ≤ 60 | 57(26.8%) | 156(73.2%) | 0.178 |
> 60 | 31(34.4%) | 59(65.6%) |
Smoking habit | | | |
| No | 64(31.8%) | 137(68.2%) | 0.132 |
Yes | 24(23.5%) | 78(76.5%) |
Drinking alcohol | | | |
| No | 74(30.3%) | 170(69.7%) | 0.316 |
Yes | 14(23.7%) | 45(76.3%) |
Chewing tobacco | | | |
| No | 85(29.8%) | 200(70.2%) | 0.233 |
Yes | 3(16.7%) | 15(83.3%) |
Tumor site | | | |
| Left | 46(33.1%) | 93(66.9%) | 0.128 |
Right | 42(27.4%) | 122(72.6%) |
Radiation history | | | |
| No | 80(30.2%) | 185(69.8%) | 0.246 |
Yes | 8(21.1%) | 30(78.9%) |
Intravascular tumor cells | | | |
| No | 79(28.0%) | 203(72.0%) | 0.148 |
Yes | 9(42.9%) | 12(57.1%) |
Nerve affection | | | |
| No | 57(28.1%) | 146(71.9%) | 0.598 |
Yes | 31(31.0%) | 69(69.0%) |
Lymph node infiltration | | | |
| No | 84(29.0%) | 206(71.0%) | 0.889 |
Yes | 4(30.8%) | 9(69.2%) |
Differentiation | | | |
| Well | 33(22.0%) | 117(78.0%) | 0.002 |
Moderate | 37(30.8%) | 83(69.2%) |
Poor | 15(53.6%) | 13(46.4%) |
Stage | | | |
| cT1 | 57(38.3%) | 92(61.7%) | 0.001 |
cT2 | 25(23.4%) | 82(76.6%) |
cT3 | 3(8.8%) | 31(91.2%) |
cT4 | 1(12.5%) | 7(87.5%) |
Construction of a nomogram based on SPP1 and PLAU expression as well as OSCC clinical parameters
For the clinical convenience of patients survival prediction, a nomogram combining gene indicators and OSCC clinical features was constructed. Firstly, based on multivariate cox regression analysis, three clinical parameters including patients age, extracapsular invasion and radiation therapy history were identified to be independent risk indicators in OSCC development (Table 4).
Table 4
Association between OSCC key parameters with patients overall survival
OSCC parameters | P Value | B value | HR | 95% CI |
Univariate analysis | Multivariate analysis |
SPP1 expression | 0.001 | 0.021* | 0.435 | 1.545 | 1.067–2.236 |
PLAU expression | 0.028 | 0.048* | 0.319 | 1.376 | 0.998–1.897 |
Age | 0.014 | 0.007* | 0.678 | 1.970 | 1.205–3.219 |
Extracapsular | 0.001 | 0.001* | 0.965 | 2.624 | 1.501–4.588 |
Perineural | 0.002 | 0.101 | 0.407 | 1.502 | 0.924–2.441 |
Radiations | 0.002 | < 0.001* | −0.963 | 0.382 | 0.231–0.630 |
Grade | 0.043 | 0.800 | 0.056 | 1.057 | 0.687–1.628 |
Stage | 0.012 | 0.043* | 0.354 | 1.424 | 0.968–2.095 |
* Represents the parameters that were supported by both univariate and multivariate survival analysis to be independent survival indicators. |
And as for the gene indicators, besides the previous observation that both SPP1 and PLAU genes associated with OSCC survival, great utility of the two genes were found in potential further OSCC clinical medical use. Firstly, both online data and IHC experiment showed no obvious correlation between the two genes expression indicating they regulate cancer development via independent and most likely different signaling pathways (Fig. 8B). And what’s inspiring is that based on the two genes expression, OSCC patients survival could be divided into four groups: the patients with both SPP1 and PLAU high expression harbor the worst survival, next the patients with either SPP1 or PLAU gene high expression and the other gene low, meanwhile, the survival of patients with both genes low expression was the best in four groups of patients, supporting the potential combine use of two genes as survival indicators in clinical medical treatment (Fig. 8C).
Moreover, a genes signature was at start constructed based on the two genes expression: genes score = 0.0405*SPP1 expression + 0.179*PLAU expression, and the signature was calculated by LASSO analysis to be the best genes equation of the two genes which posses highest prognosis prediction value among other equations (Fig. 8D-8G). However, considering the inconvenience that a complicate genes calculation might caused to clinical medical use, we compared the gene signature with an easier signature that: gene score = median SPP1 expression + median PLAU expression, surprisingly, only little difference was found between the two gene signatures indicating the potential utility of median SPP1 + median PLAU which will be of much more convenience in further clinical medical use (Fig. 8H, 8I).
Further, to combine evaluate the survival prediction value of gene indicators and clinical features in OSCC, a nomogram was constructed. And in the nomogram, a point scale was assigned for each of above variables, and the sum of the all the variable points equal the final score of each patient, and the 1, 3 and 5 years of survival could be predicted by drawing a vertical line from the total point axis straight downward to the outcome axis. Since for different patients the points of multiple variables vary, the nomogram could be used to predict the survival risk of different OSCC individuals (Fig. 8A). For example, a OSCC patient who is younger than 65 years old, without extracapsular invasion and no radiation therapy history, meanwhile both of the SPP1 and PLAU gene were high expressed, the total points of the patient would be scored as 58 + 40 + 0 + 0 = 98, then the potential of 1-year survival was less than 20%, 3 years survival rate was much less that 10% and 5 years survival probability < 5%.
Other genetic alterations of SPP1 and PLAU in OSCC
Besides mRNA expression, other types of gene alterations including mutation ratio, amplification, deletion and protein structure variant were also preliminary explores based on cBioPortal database. The results revealed that as for both SPP1 and PLAU genes, multiple types of gene alterations exist in human tumors, a certain percent of gene mutation, deletion and amplification occurs in various tumors. In HNSCC, besides gene amplification, a certain percent of cases were diagnosed with SPP1 or PLAU gene mutations, mostly missense mutations and nonsense mutations including SPP1(E282*) nonsense mutation as well as SPP1(A213V), PLAU(R130Q) and PLAU(S376L) missense mutations (Fig. 9A, 9B).
Association between SPP1 and PLAU expression with different types of immune cells infiltration
Immune cells infiltration has been a characteristic feature of tumor microenvironment, which not only relates with the ability of cancer initiation, progression and metastasis, but also associates with the effect of immune targeting therapy. Given that SPP1 and PLAU were supported by IMMPORT data which has been a worldwide commonly used immune analyzing database to be associated with cancer immune regulation, TIMER and TISIDB database were combine used to evaluate the potential association between SPP1 and PLAU expression with OSCC immune cell infiltration. And the result revealed that mild positive association was found between SPP1 expression and cancer associated fibroblasts as well as macrophages infiltration, meanwhile, slight negative association was discovered between the gene expression and B cells infiltration (Fig. 9C, 9D). As for PLAU gene, moderate association was found between gene expression and myeloid dendritic cells, CD4 + T cells as well as cancer associated fibroblasts infiltration (Fig. 9E, 9F).
What’s worth of emphasizing is that besides the association between genes expression with immune cells infiltration, SPP1 and PLAU expression was also preliminary revealed to be different in multiple OSCC immune subtypes. Based on TISIDB analysis result, SPP1 expresses the lowest in inflammatory subtype and highest in lymphocyte depleted subtype. Meanwhile, the expression of PLAU was the lowest in lymphocyte depleted subtype and highest in IFN-gamma dominant type (data not shown) indicating their potential roles in OSCC immune microenvironment regulation, although the mechanism remains to be deeper researched.
SPP1 and PLAU potentially involved biological processes and signaling pathways
To preliminary explore the potential biological functions of SPP1 and PLAU genes in OSCC and the probable signaling pathways involved, STRING was used to construct the SPP1 and PLAU centering PPI network respectively for revealing the surrounding potentially connecting genes followed by GO and KEGG analyzing the probable signaling pathways these genes enriched in. And GO results showed that the biological processes SPP1 gene participated in were mainly focused on extracellular matrix organization and cell matrix adhesion, and KEGG analysis revealed the signaling pathways SPP1 gene involved were mostly ECM-receptor interaction, PI3K-AKT signaling pathway and human papillomavirus infection related signaling pathways (Fig. 10A, 10B).
Meanwhile, the biological processes that PLAU gene participated in were mainly focused on cell migration and protein metabolic process regulation, and the signaling pathways PLAU gene involved were mostly proteoglycans metabolism, ErbB signaling pathway and human papillomavirus infection related signaling pathways (Fig. 10C, 10D).
Although deeper analysis are needed to validate the association between SPP1 and PLAU with above signaling pathways, current results shall provide promising directions for further exploring the mechanism behind the genes regulation on OSCC development. It’s of encouraging insight to better understand the genetic events during OSCC development and discovering promising disease indicators and probable drug targets.