Pan-cancer survey and evaluation of the oncogenic role of NF-κB1

Although emerging cells or animals based evidence supports an association between nuclear factor kappa-B1 (NF-κB1) cells and cancers, there has no pan-cancer analysis. Therefore, based on TCGA (The Cancer Genome Atlas) and GEO (Gene Expression Omnibus) data sets, we rst studied the potential carcinogenic effect of NF-κB1 in 33 tumors. As we not only found high expression of NF-κB1 in most tumors, but also found that NF-κB1 expression is closely related to the prognosis of tumor patients. Enhanced phosphorylation of S893 was observed in several tumors, such as breast cancer, uterine corpus endometrial carcinoma or lung adenocarcinoma. In thymoma, NF-κB1 expression was relevant to CD8 + T-cell inltration levels, and tumor-associated broblast inltration has also seen in other tumors, such as uterine corpus endometrial carcinoma or glioblastoma multiforme. In addition, the functional mechanism of NF-κB1 also involves the related functions of protein processing and RNA metabolism. In this study, NF-κB1 was pan-cancer study in order to have a systematic and comprehensive understanding of the carcinogenic effect of NF-κB1 in different tumors.


Gene expression analysis
We are on timer2 (tumor immune estimation resource, version 2) website (http://timer.cistrome.org) Gene of_ Enter NF-κB1 in the de module. To observe the expression difference of NF-κB1 in speci c tumors or different tumor subtypes of TCGA project, adjacent normal tissues and tumors. For certain tumors with highly limited or without normal tissues, such as TCGA-DLBC (Lymphoid Neoplasm Diffuse Large B-cell Lymphoma), TCGA-TGCT (Testicular Germ Cell Tumors), etc. We used the GEPIA2 (Gene Expression Pro ling Interactive Analysis, version 2) "Expression Analysis-Box Plots" module web server (http://gepia2.cancer-pku,cn/#analysis) (14). If the P-value cutoff=0.01, log 2 FC (fold change) cutoff=1, "Match TCGA normal and GTEx data", the difference in expression between these tumor tissues and corresponding normal tissues in the GTEx (Genotype-Tissue Expression) database was obtained by box plots. In addition, through the "pathological stage map" module of depia2, we obtained the ddle map of NF-κB1 expression in different pathological stages (stages I, II, III and IV) of all TCGA tumors. The expression data of the log2 [TPM (Transcripts per million) +1] transformation is used for the box or ddle diagram.

Survival prognosis analysis
We used the survival map module of GEPIA2 (16) to obtain the OS (overall survival) and DFS (disease free survival) signi cance map data of NF-κB1 in all TCGA tumors. Cutoff-high (50%) and cutoff-low (50%) values were used as expression thresholds to split the high-expression and low-expression cohorts. The hypothesis test adopts log rank test, and the survival map is also from GEPIA2 survival analysis module.

Genetic alteration analysis
Login cBioPortal website (https://www.cbioportal.org/) (17,18), in the quick select section, select TCGA pan cancer atlas studies, enter NF-κB1, and query the genetic variation characteristics of NF-κB1. In the cancer type summary module, the change frequency, mutation type and can (copy number change) results of all TCGA tumors can be observed. The NF-κB1 mutation site information can be displayed in the protein structure diagram or three-dimensional (3D) structure through the mutation module.

Immune in ltration analysis
We used the immune gene module of TIMER2 web server to explore the relationship between NF-κB1 expression and immune in ltration of all TCGA tumors, and selected immune cells of tumor associated broblasts and CD8 + T cells. The TIMER, CIBERSORT, CIBERSORT-ABS, QUANTISEQ, XCELL, MCPCOUNTER and EPIC apply algorithms to estimate immune penetration. P-value and partial derivative correlation (COR) value were adjusted by purity Spearman rank correlation. The data are visualized as heat maps and scatter diagrams.

NF-κB1-related gene enrichment analysis
We start by searching the STRING website (http://string-db.org/) for the name of a single protein ("NF-κB1") and the name of an organism ("Homo sapiens"). Then, we set the following main parameters: minimum interaction score ["Low con dence (0.150)"], network edge meaning ("evidence"), maximum number of interactive users to display ("not more than 50 interactive users" in the rst shell), and active each other sources ("experiments"). Last, the useable NF-κB1 conjugated proteins were gained.
We used the "similar gene detection" module of GEPIA2 to obtain the rst 100 targeted genes related to NF-κB1 based on all normal tissue and TCGA tumor data sets. We also applied GEPIA2's "correlation analysis" module to NF-κB1 was paired with the selected gene for Pearson correlation analysis. Log2 TPM is used for point graphs. P-values and correlation coe cients (R) are given. In addition, the "gene_corr" module of TIMER2 was used to provide the heat map data of the selected genes, including partial correlation (COR) and purity adjusted p value of Spearman rank correlation test. We used the JVENN interactive Venn diagram viewer (19) for cross analysis to compare the NF-κB1 binding and interacting genes. In addition, we combined the two sets of data for KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis. In short, we uploaded the list of genes to DAVID (a database for annotation, visualization, and integrated discovery), set the selected identi ers ("OFFICIAL_GENE_SYMBOL") and species ("Homo sapiens"), and obtained the data for the functional annotated map. Finally, the enrichment path was visualized through "tidyr" and "ggplot2" R packets. In addition, the R package of "clusterPro ler" was used for GO (Gene Ontology) enrichment analysis. BP (Biological Process), CC (Cellular Component) and MF (Molecular Function) were visualized as Cnetplot.

Gene expression analysis data
In this study, we aimed to investigate the carcinogenic effect of NF-κB1 (NM_001165412 for mRNA, NP_001158884.1 Fig. S1a) in human. As shown in Fig S1b, the structure of NF-κB1 protein is conserved between different species (such as H. sapiens, P. troglodytes, M. mulatta, etc.), and is usually composed of Death_ NF-κB1_p105 (cd08797) domain, RHD-n (c108275) domain, DD (cl14633) and Ank_2 (pfam12796) domain, etc. Phylogenetic tree data (Fig. S2) showed the evolutionary relationships of NF-κB1 proteins among different species.
We rst analyzed the expression pro le of NF-κB1 in nontumor tissues and different cells. As shown in Fig. S3a, combined with HPA (Human protein atlas), GTEx and Fantom5 (Function annotation of the Mammalian genome 5) data set, the expression of NF-κB1 is the highest in lymph nodes, followed by bone marrow, appendix and thymus (Fig. S3a). However, NF-κB1 was expressed in all the tested tissues (all of which were consistent with the normalized expression value >1), showing a low RNA tissue speci city. When NF-κB1 expression was analyzed in different blood cells, low RNA blood cell type speci city also appeared in the HPA/Monaco/Schmiedel dataset (Fig. S3b).
We analyzed the expression status of NF-κB1 in different types of TCGA cancer using the TIMER2 method. As shown in Fig Results from the CPTAC dataset showed that total NF-κB1 protein was highly expressed in primary tissues of breast cancer, UCEC (Uterine Corpus Endometrial Carcinoma), ovarian cancer, LUAD (Lung adenocarcinoma) colon cancer, colon cancer, clear cell RCC, and LUAD (Lung adenocarcinoma) colon cancer (Fig. 1c, P<0.001) compared with normal tissues.
We also used the GEPIA2 "Pathological Stage Plot" module to observe the correlation between NF-κB1 expression and tumor pathological stage, including BRCA (Breast invasive carcinoma), KIRC (Kidney renal clear cell carcinoma) (Fig. 1d, P<0.05).

Survival analysis data
We divided tumor cases into high expression group and low expression group according to the expression level of NF-κB1, and mainly studied the correlation between NF-κB1 expression and prognosis of patients with different tumors using TCGA and GEO data sets. As shown in Fig. 2a, high expression of NF-κB1 in TCGA was associated with poor overall OS (Survival) outcomes for CESC (P=0.02), LGG (P=0.019), LUSC (P=0.033), OV (P=0.026). Data from DFS (disease-free survival) analysis (Fig. 2b) showed that high NF-κB1 expression was not associated with poor prognosis for all types of tumors. In addition, low NF-κB1 gene expression was associated with poor OS prognosis for ACC(P=0.037), KIRC(P=0.00004), READ (P=0.035) (Fig. 2a, P=0.012) and DFS prognosis for KIRC (P=0.0014) (Fig. 2b, P=0.014).
Moreover, using Kaplan Meier mapping tool to analyze survival data, it was found that low expression of NF-κB1 was correlated with breast cancer OS (overall survival) (Fig. S7a, (Tables S1-S5). The above data suggested that the expression of NF-κB1 was different from the prognosis of patients with different tumors.

Genetic alteration analysis data
We observed genetic alterations in NF-κB1 in different tumor samples from the TCGA cohort. As shown in Figure 3a, the "mutant" uterine tumor patients had the highest frequency of NF-κB1 change (>6%). The "ampli ed" type of CNA was the dominant type of Pheochromocytoma and Paraganglioma, with a frequency of change of about 1% (Fig. 3a). Figure 3b further shows the types, loci and number of cases of NF-κB1 gene alteration. We found that missense mutations in NF-κB1 were the main type of genetic alterations. A520T/V alterations in the ANK_2 domain were detected in 1 UCEC cases, 1BRCA cases and 1 COAD cases (Fig. 3b), which can induce frame shift mutations in NF-κB1 gene, translating 520 NF-κB1 proteins from A (Alanine) to T/V (Threonine/valine), and subsequent NF-κB1 protein truncation. We can observe the A520T/V site in the 3D structure of NF-κB1 protein (Fig. 3c). In addition, we also explored the potential association between c gene alterations and clinical survival outcomes in patients with different types of cancer. The data in Fig. 3d shows that the prognosis of CUEC cases with NF-κB1 alteration is better in terms of disease-speci c survival rate (P = 0.0811) and disease-free survival rate (P =0.096) than that of cases without NF-κB1 alteration. Additionally, we analyzed the association between NF-κB1 expression and TMB (tumor mutation burden)/MSI (microsatellite instability) in all TCGA tumor. As shown in Fig. S6, we observed that NF-κB1 expression of LUAD, BLCA, LIHC, BRCA, THCA and UVM was negatively correlated with TMB, while UCEC, COAD, STAD and LGG were positively correlated. The expression of NF-κB1 was also negatively correlated with PAD, BRCA, SKCM, HNSC, and DLBC (P<0.05), and positively correlated with COAD, KIRC, and LAML (P<0.05) (Fig. S7). This result deserves further investigation.

DNA methylation analysis data
In the TCGA project, we used the MEXPRESS method to study the potential relationship between NF-κB1 DNA methylation and different tumor pathogenesis. For TGCT cases, we observed a signi cant negative correlation between DNA methylation and gene expression in NF-κB1 non promoter region, such as cg06501333 (P<0.001, R=0.527), as shown in Figure S8.

Protein phosphorylation analysis data
We also compared the phosphorylation levels of NF-κB1 in normal tissues and primary tumor tissues. CPTAC data sets were used to analyze ve types of tumors (breast cancer, ovarian cancer, LUAD, UCEC and clear cell RCC,). Fig. 4a summarizes the NF-κB1 phosphorylation sites and their signi cant differences. Compared with normal tissues, S893 within the NF-κB1 DEATH domain showed higher phosphorylation levels in all primary tumor tissues ( Fig. 4a-g, all P <0.05), the next is the increase of phosphorylation level of S892 locus in the breast cancer death area. (Fig. 4b, P=0.00000049), clear cell RCC (Fig. 4c, P=0.1), LUAD (Fig. 4d, P=0.033). We also analyzed NF-κB1 phosphorylation identi ed by CPTAC using the PhosphoNET database, and found that the NF-κB1 phosphorylation of S893 in cell cycle was supported by a published article (21).

Immune in ltration analysis data
As an important part of tumor microenvironment, tumor in ltrating immune cells are closely related to tumor occurrence, progression or metastasis (22,23). Tumor associated degmacyte in tumor microenvironment have been reported to be involved in adjusting the function of various tumor soaking immunocyte (24,25). Here, we used TIMER, CIBERSORT, CIBERSORT-ABS, QUANTISEQ, XCELL, MCPCOUNTER and EPIC algorithms to investigate the potential relationship between different levels of immune cell in ltration and NF-κB1 gene expression in different types of TCGA tumors. Through a series of analyses, we nd that the immune in ltration of CD8 + T cells is negatively correlated with the expression of NF-κB1 in THYM (Thymoma) (Fig.S9a-b). In addition, we observed that the expression of NF-κB1 was positively correlated with the invasion value of cancer-associated broblasts in TCGA tumors of LGG, LIHC, LUSC, RAAD and TGCT (Fig. 5). The above tumor scatter plot data obtained by an algorithm are shown in Fig. 5 and Fig. S9. For instance, because of the MCPCOUNTER algorithm, expression levels NF-κB1 in TGCT is positively correlated with the in ltration level of cancer-associated broblasts (Fig. 5, Rho= -0.621, P=5.12E-17)

Enrichment analysis of NF-κB1-related partners
In order to better understand the molecular mechanism of NF-κB1 gene in tumorigenesis. Through a series of pathway enrichment analysis, we tried to screen the target proteins bound to NF-κB1 and the related genes expressed by NF-κB1. We used the string tool to obtain a total of 50 NF-κB1 binding protein was sustained by experimental evidence. The interaction network between these proteins is shown in Figure 6a. Using GEPIA2 tool in combination with all tumor expression data of TCGA, we obtained the rst 100 genes expressed by NF-κB1. In Fig. 6b, the expression level of NF-κB1 was positively correlated with UBE2D3 (R=0.63), IRF2 (R=0.63), ELF1 (R=0.58), ERAP1 (R=0.56), SMNDC1 (R=0.56) genes (all P<0.001). The corresponding heat map data also demonstrated that NF-κB1 was positively correlated with the above 5 genes in most explicit cancer types (Fig. 6c). By cross analysis, the two groups had 6 members, namely UBE2D3, SP1, STAT3, TNFAIP3, CNOT6L, RIPK1 (Fig. 6d).
We combined these two data sets for KEGG and GO enrichment analysis. The KEGG data in Figure 6e suggest that "endoplasmic reticulum protein processing" and "metabolic pathway" may be participated in NF-κB1 effect on tumor etiopathogenesis.

Discussion
NF-κB1 has been reported in almost all animal cell types and is an important regulator of autoimmune diseases, cell proliferation, apoptosis and stress response (26). In this study, our "HomoloGene" and phylogenetic tree analysis data also showed that NF-κB1 protein structure was conserved among different species, indicating that there may be a similar mechanism for the normal physiological function of NF-κB1. Newly published publications report a functional association between NF-κB1 and clinical diseases, especially tumors (11)(12)(13)(27)(28)(29). It is not clear whether NF-κB1 plays a role in the pathogenesis of different tumors through some common molecular mechanisms. Through the search of network literature, we were unable to retrieve any publications that performed pan-cancer analysis of NF-κB1 from the perspective of the whole tumor. Therefore, based on TCGA, CPTAC and GEO database data, we combined the molecular characteristics of gene expression, gene alteration, DNA methylation or protein phosphorylation to comprehensively detect NF-κB1 gene in 33 different tumors.
NF-κB1 is highly expressed in most tumors. However, NF-κB1 gene survival analysis data have yielded different conclusions for different tumors. For liver cancer, we performed a set of survival analyses using the Kaplan Meier mapping method (30), including liver cancer cases in the GSE20017/GSE9843 cohort. The high expression of NF-κB1 is related to the clinical prognosis of overall survival, progression free survival, relapse free survival and Disease speci c survival. However, the role of NF-κB1 expression on clinical prognosis of hepatocellular carcinoma needs more clinical big data to con rm.
With regard to lung cancer, we analyzed the data sets of TCGA-LUSC (n=483) and TCGA-LUAD (n=483) projects and found that NF-κB1 over-expression was associated with poor overall survival and prognosis in lung squamous cell carcinoma (P=0.033), but not in lung adenocarcinoma (Fig. 2a). However, after analyzing 1926 lung cancer cases of CAARRAY, GSE14814, GSE19188, and GSE29013, NF-κB1 overexpression was associated with overall survival, rst progression, and post-progression survival prognosis, especially in lung adenocarcinoma cases (Table S2). Consequently, a larger sample capacity is demanded to con rm the role of NF-κB1 in the subsist and prognosis of patients with different types of carcinoma of the lungs.
In ovarian cancer, based on GEO data (GSE14764, GSE15622, GSE18520, GSE19829, etc.), we have observed that low NF-κB1 expression was associated with adverse clinical outcomes in progression-free survival of ovarian cancer, especially in "stage3/4", "grade 3", "TP53 mutation Mutated" and "Debulk" (Table S3). Similarly, it has been reported that NF-κB1 can advance the invasion and remove of cervical cancer cells (31). Our survival analysis based on TCGA shows that the low expression of NF-κB1 is associated with adverse clinical prognosis of cervical cancer. Since the TCGA-TCGT cohort only contains tumor data, we used normal testicular tissues from GTEx data set as controls and found that the level of NF-κB1 in TGCT tissues was higher than that in normal tissues. However, NF-κB1 overexpression seems to be associated with good clinical outcomes in patients with TGCT.
Some studies have reported the role of high expression of NF-κB1 in regulating the occurrence of breast cancer (11,12,32,33). Surprisingly, survival data of Kaplan-Meier plotters based on Affymetrix HGU133A and HGU133+2 microarrays (34). we discover that the low expression of NF-κB1 was interrelated with poor overall survival, distance metastasis free survival and recurrence free survival of mammary gland cancer patients.
In this study, we present for the rst time evidence of a potential association between NF-κB1 expression and TMB or MSI in all TCGA tumors. In addition, we integrated information on NF-κB1 combine ingredients and NF-κB1 expression related genes in all tumors. Then a series of enrichment analyses were carried out to determine the potential effects of "metabolic pathway", "endoplasmic reticulum protein processing" and RNA metabolism in tumor etiology or pathogenesis. Through a variety of immune deconvolution methods, we observed that there was a statistically negative correlation between the level of CD8 + T cell in ltration and the expression of NF-κB1 in thymic tumors. Our results suggest for the rst time that the expression of NF-κB1 is relevant to the level of tumor in ltrating broblasts in some tumors. In order to explain the difference of NF-κB1 overexpression in the above-mentioned tumors, in contrast, this high expression status is related to the favorable prognosis of the patients. First of all, it is noteworthy that the NF-κB1 high expression group and NF-κB1 low expression group, including TGCT and UCEC, did not exceed 100 cases. A larger sample may be needed to testify the above-mentioned conclusion. Second, further molecular experimental evidence is needed to judge whether the high expression of NF-κB1 plays an important role in the occurrence of these tumors, or whether it is only the result of tumor drug resistance in normal tissues.
In TGCT patients of TCGA, we observed a potential correlation between the decrease of DNA methylation status and the high expression of NF-κB1 in the non-promoter region. Different methylation sites also have NF-κB1 methylation differences in TGCT tissue matching normal tissue. More evidence is needed for the potential role of DNA methylation of NF-κB1 in the pathogenesis of TGCT.
We rst studied the molecular mechanism of NF-κB1 protein in breast cancer, lung adenocarcinoma, endometrial carcinoma, ovarian cancer and clear cell renal cell carcinoma from the perspective of total protein and phosphoprotein by CPTAC data set. The results of this study showed that compared with normal controls, the expression level of total NF-κB1 protein was higher in primary tumors, and the phosphorylation level of S893 in the DEATH domain was higher. However, we still cannot rule out that the high-level phosphorylation of S893 NF-κB1 is a by-product indicated by tumor cells and has no functional meaning in tumor cells. The role of NF-κB1 phosphorylation at S893 site and its related cell cycle regulation in tumorigenesis needs to be further studied.
In conclusion, our rst pan cancer analysis of NF-κB1 showed that NF-κB1 expression was signi cantly correlated with clinical prognosis, DNA methylation, protein phosphorylation, immune cell in ltration, tumor mutation burden or microsatellite instability in a variety of tumors. This helps to understand the role of NF-κB1 in tumorigenesis from the perspective of clinical tumor samples.

Declarations Data Availability
The datasets generated/analyzed during the current study are available.  Mutation characteristics of NF-κB1 in different tumors of TCGA. We used the cBioPortal tool to analyze the mutation characteristics of NF-κB1 in TCGA tumors. Displays the frequency of mutation type (a) and mutation site (b). 3D structure of NF-κB1 (c). We also analyzed the potential association between mutat status and overall, disease-speci c, disease-free and progression-free survival of UCEC (d) using the cBioPortal tool.  Correlation analysis of NF-κB1 expression and immune in ltration in cancer-associated broblasts. The potential correlation between the expression level of NF-κB1 gene in TCGA and the in ltration level of cancer-associated broblasts in all types of cancer was investigated using different algorithms.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. supplementarydocument.docx tablessupplementarydocument.docx S1.tif S20.85.tif