Potential Role of The Upregulation of The Zinc Finger Protein of The Cerebellum 2 In Nasopharyngeal Carcinoma

Background: Nasopharyngeal carcinoma (NPC) is a common type of cancer mainly caused by EB virus infection in southern China and Southeast Asia. Zinc nger protein of the cerebellum 2 has been reported to be dysregulated in numerous cancers. However, its role in NPC has not yet been completely investigated. Methods: ZIC2 mRNA expression and clinical role of NPC was explored by Gene Expression Omnibus (GEO) databases and tissue microarrays. Immunohistochemical (IHC) staining was used to assessed the protein expression level of ZIC2 in NPC. Moreover, functional enrichment analyses were performed by the overlapping genes of differentially expressed genes in NPC and ZIC2 related co-expressed genes. Cistrome Data Browser (Cistrome DB) and scatter diagrams displayed the possibility of ATAD2 as a target gene of ZIC2. Last, characteristic phenotype was identied by weighted gene co-expression network analysis (WGCNA). Results: The mRNA expression levels of ZIC2 were signicantly increased in NPC within genechips and tissue microarrays (SMD = 2.14; 95% CI = 1.85–2.44). Increased ZIC2 correlated with poor prognosis in NPC patients. The IHC results also showed that ZIC2 protein expression in NPC tissues was remarkably higher than in control tissues. Enrichment analysis showed that genes positively related to ZIC2 are mainly enriched in cell division and in the cell cycle. WGCNA analysis discovered that NPC dataset GSE12452 may be positive correlated with N stage and nagetive correlated with grade. Conclusion: ZIC2 upregulation may promote NPC tumorigenesis through cancer cell proliferation and cell cycle regulation. ATAD2 may produce a marked effect as the downstream target of ZIC2.

discovered the overexpression of ZIC2 in NPC through bioinformatics and immunohistochemistry, respectively. Yu et al. (20) clari ed that miRNA-129-5p suppresses NPC by targeting ZIC2, and Lv et al. (21) demonstrated that tumor-suppressive miRNA-873 acted as an upstream factor of the downregulation of ZIC2 in NPC. Moreover, Shen et al. (22) explored the regulative effect of HOXA10 on ZIC2 expression in NPC. Although these studies were conducted to investigate the role of ZIC2 in NPC, the mechanisms of such role are not yet precise. In this study, we used bioinformatics analysis to identify new target markers of ZIC2. We also included more samples for immunohistochemistry to enhance our understanding of NPC carcinogenesis and progression.

Materials And Methods
High-throughput data mining and analysis We obtained microarray datasets, which could be used to compare the ZIC2 expression pro le data between normal nasopharynx tissues and NPC tissues, from the GEO datasets ArrayExpress, Oncomine, and SRA. We selected the included microarray datasets according to the following three criteria: (1) All the specimens came from humans; (2) Each gene chip contained the transcription pro le of the messenger RNA (mRNA) expression; and (3) Both the NPC and the non-cancerous nasopharynx tissues were provided.
We performed the background correction, normalization, and log2 transformation of the preceding data for processing data with the "limma" package in the R 4.0.3 software environment in order to eliminate the in uence of the sequencing depth and the gene length on the expression level. Moreover, we used the NPCrelated microarrays for differentially expressed genes (DEGs) analysis through nonspeci c ltration via the empirical Bayesian method, in order to facilitate the borrowing of information (|log2 Fold Change| > 1 and adj. P < 0.05). From the upregulated or downregulated DEGs that we collected, we chose those that appeared in two or more independent datasets as the DEGs in this study. Similarly, we identi ed genes with a ZIC2 |correlation coe cient|>0.3, a P-value < 0.05, and a repeat number larger than or equal to 2 as signi cant coexpressed genes (CEGs). We obtained the overlapping genes of the DLX2 positively correlated CEGs and upregulated DEGs (positive gene set), of the DLX2 negatively related CEGs, and of the downregulated DEGs (negative gene set) for our follow-up research.

In-house immunohistochemical evaluation
The tissue microarrays (TMAs) that contained 154 NPC and 49 chronic nasopharyngeal mucositis tissue specimens (NPC131, NPC241, and NPC482) were provided by Pantomics, Inc. (Richmond, CA). At the same time, the NPC tissues and mucosal in ammation tissues that we derived from 12 patients had been diagnosed with nasopharyngeal carcinoma, and we collected 14 cases of chronic nasopharyngitis from the First A liated Hospital of Guangxi Medical University. We assayed the protein level of ZIC2 in the tissues via immunohistochemistry (IHC). First, we prepared the para n-embedded tissue specimens into 4µm-thick tissue sections, and then we dewaxed and hydrated them. We boiled these slides for 3 minutes in a pressure cooker to recover the antigen. Then we incubated the samples in a diluted rabbit polyclonal ZIC2 antibody (Biorbyt) at room temperature for 1 hour, and the secondary antibody at 37℃ for 1 hour. We performed the rest of the measure operation steps according to the instructions. To ensure the accuracy of the experiment, two pathologists separately determined the nal results of the IHC staining. We evaluated the nal regional differences in the staining through the immunoreactivity scores (IRSs), which ranged from 0 to 12 were calculated by multiplying two parameters: the staining intensity and the proportion of the stained tumor cells.
We evaluated the staining intensity as follows: 0 for no staining, 1 for weak staining, 2 for moderate staining, and 3 for strong staining. We de ned the frequency of the stained cells as follows: 0 for < 10% positive cells; 1 for 10-25% positive cells; 2 for 26-50% positive cells; 3 for 51-75% positive cells; and 4 for more than 75% positive cells. After calculating the IRS value, we assessed the relationship between the ZIC2 protein expression and the clinicopathological features of NPC.
Clinical signi cance of the statistical analysis of ZIC2 in NPC After comparing the expression of ZIC2 in a single NPC database, we used the same chip platform to enlarge the sample size for comprehensive analyze of the ZIC2 expression. We evaluated the ordinary ZIC2 expression level and the clinical value of gender and age using the standard mean difference (SMD) strategy with STATA 15.1 software. We used the chi-square and I 2 tests to detect the heterogeneity, and while I 2 > 50% (P < 0.05), a random-effects model would be adopted. Otherwise, a xed-effect model may be suitable. We assessed the publication bias and the combined quality using Begg's funnel plots. Subsequently, we calculated continuous variables of the ZIC2 expression value in each dataset to true-positive, false-positive, false-negative, and truenegative counts, and discriminated the cut-off values. A summary of the receiver operating characteristic (sROC) curves is presented to thoroughly evaluate the general discriminatory power of ZIC2 between the NPC patients and the control group sample. In addition, we computed the sensitivity and speci city to determine the sensitivity of the positive diagnostic samples and to make speci c judgments of the negative samples.

Prediction of the ZIC2 target genes
The upregulated CEGs may increase the expression level with the raised expression of ZIC2 upwards. Therefore, the correlation between ZIC2 and ATAD2 is presented with a correlation scatter diagram generated using GraphPad Prism software 8.0.To determine the target gene of ZIC2, we intersected the score that was greater than or equal to 1 in the Cistrome Data Browser (Cistrome DB; http://cistrome.org/db/#/, a database for analyzing human and mouse genome data using ChIP-seq data and chromatin accessibility data) and the upregulated CEGs that appeared in at least ve databases. We used the Integrative Genomics Viewer (IGV) software 2.9.4 to visualize the peak calling results.

Functional enrichment analysis
We analyzed the conjunction gene sets of the differential expression-and ZIC2-related genes via the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. First, we completed the GO and KEGG pathway analysis using R packages ("clusterPro ler," "enrichplot," and "ggplot2"). GO analysis has three main parts: analysis of the biological process (BP), analysis of the cellular component (CC), and analysis of the molecular function (MF). Likewise, we submitted the above two gene sets to the Search Tool for the Retrieval of Interacting Genes (STRING) (http://string-db.org/) database for protein-protein interaction (PPI) network analysis. STRING calculated the tab-separated value between the proteins and referred it to the Cytoscape 3.7.2 software plug-in cytoHubba of the top 10 hub genes. We also performed Gene Set Enrichment Analysis (GSEA) of gene expression pro le data to group and classify the known genes according to multiple functional gene sets in order to understand their expression status in speci c functional gene sets. We put the gene expression pro le data in the GSE12452 chipset into the GSEA 4.1.0 software for analysis. We used the Gene matrices c5.go.v7. 4

WGCNA
Weighted correlation network analysis (WGCNA) can identify highly collaborative gene sets and candidate biomarker genes or therapeutic targets according to the interlinkages of the gene sets to explore the association between the gene network and the phenotypes concerned. We rst selected the GSE12452 expression pro le data as analysis objects, and then recognized the outlier sample, using the screening condition height = 90. We used the dynamic cut tree algorithm to cluster the topological gene matrix and identify the network module. After removing this sample, based on the soft threshold power β, we constructed the adjacency matrix to build gene co-expression networks. Next, we transformed the topological overlap matrix (TOM), which compared the weighted correlations between the nodes, using the preceding adjacency matrix to represent the hierarchical clustering results that contained at least 30 genes (minModuleSize = 30).
After performing the dynamic cut tree method, we merged these similar modules (abline = 0.25) and nally turned them into gene network modules. We used the module eigengenes (MEs), which is the rst representative component of the module, to calculate the correlation with the clinical features in each module.
We implemented all these using the WGCNA package of the R 4.0.3 software.

ZIC2 mRNA and protein expression levels in NPC
We included eight gene chip datasets (GSE12452, GSE34573, GSE40290, GSE53819, GSE61218, GSE64634, GSE68799, and GSE126683) in this study that comprised 57 control samples and 157 NPC samples. The TMAs contained 63 chronic nasopharyngeal mucositis tissues and 166 NPC tissues. Violin plots differentiated the NPC tissues from the non-NPC tissues based on their expression values and ROC curves, as shown in Figs.
1 and 2, respectively. To verify the results of the ZIC2 expression level from public databases and microarrays, we evaluated the expression of the ZIC2 protein in a wide variety of NPC tissues and non-NPC tissues to con rm their expression in situ via IHC (Fig. 3). Both of them showed that the expression level of ZIC2 was enhanced in NPC. The basic information for each dataset are shown in Table I. In addition, we screened the DEGs and CEGs of each dataset. Altogether, there were 1,047 upregulated and 1,280 downregulated DEGs of NPC that had no fewer than two gene chip datasets. We also identi ed 4,321 CEGs that were positively related to DLX2 and 2,906 CEGs that were negatively related to DLX2 in no less than two datasets. The intersection of the DEGs and CEGs separately produced 808 positive gene sets and 859 negative gene sets.  (Table ). Moreover, our meta-analysis showed that the ZIC2 expression was upregulated in the NPC tissues, unlike in the normal nasopharynx tissues [SMD = 2.14, 95% con dence interval (CI): 1.85-2.44], and there was pronounced heterogeneity (I 2 = 87.9%, P < 0.0001), so we used the random-effect model (Fig. 4A).
However, when there was publication bias, the funnel plot was asymmetric, and the distribution was skewed.
Begg's funnel plot analysis gave an indication of no signi cant publication bias, with the P-value = 0.244 (Fig.   4B). Furthermore, sensitivity analysis clearly showed the degree of detection bias (Fig. 4C). Multiple diverse trials with the same index were represented by an ROC curve based on the weight of their odds ratio (OR) in the meta-analysis, called "sROC," the AUC value of which was 0.93 (95% CI: 0.91-0.95) (Fig. 5A). In the Fagan plot, the pretest probability, posttest probability positive, and post-test probability negative were 20%, 51%, and 1%, respectively (Fig. 5B). The corresponding sensitivity and speci city were 0.97 (95% CI: 0.91-0.99) and 0.77 (95% CI: 0.63-0.86), respectively (Fig. 5C), which suggest that ZIC2 demonstrated a signi cant discriminatory capacity for NPC screening. The target gene of the transcription factor ZIC2 As a transcription factor, ZIC2 binding to the ABCC4 promoter regulates prostate cancer proliferation (23). This suggests that ZIC2 may also be involved in transcriptional regulation as a transcription factor in NPC. Overall, there were six overlapping genes: RACGAP1, PCNA, ATAD2, RBBP8, HSPE1, and UHRF1. Next, after the presence of DNA peaks in the ZIC2 transcription regulatory regions was re-analyzed, we chose the ATPase family AAA domain containing protein 2 (ATAD2) as the putative target gene (Fig. 6). Then we transferred their correlation in each gene set into a scatter plot.

Enrichment analysis and PPI network construction
We used GO and KEGG to analyze the overall function of ZIC2 in NPC. Regarding the positive gene set related to ZIC2, our GO analysis revealed the importance of the nuclear division, chromosomal region, and ATPase activity in terms of the BP, CC, and MF, respectively, and our KEGG pathway enrichment analysis mainly investigated the human papillomavirus infection (Fig. 7A-B). For the negative gene set, microtubule-based movement, motile cilium, and peptidase regulator activity were the most clustered in the BP, CC, and MF terms, respectively. Our KEGG pathway enrichment analysis showed that the pathways were mainly enriched in the chemokine signaling pathway (Fig. 8A-B). To further explore the potential mechanism of ZIC2, we used the genes in the rst three KEGG signaling pathways for the PPI analysis. The analysis identi ed the top 10 positively related core genes as CDK1, CCNB1, CCNA2, CCNE2, CDC6, MCM2, CDC45, CCNB2, MCM4, and BUB1B (Fig. 9A), and the top 10 negatively related core genes as CD79A, CD22, CD5, CD19, BTK, CD79B, CR2, MS4A1, CD37, and CD72 (Fig. 9B). The results of the GSEA enrichment analysis were similar to those of GO and KEGG. The most signi cant pathways in the BP, CC, and MF were the ribonucleoprotein complex biogenesis, chromosomal region, and single-stranded DNA binding, respectively. The GSEA analysis revealed that highly expressed genes are mainly enriched in the ribonucleoprotein complex biogenesis, chromosomal region, and single-stranded DNA binding in the GO gene feature set database and in the cell cycle in the KEGG gene feature set database (Table III).

WGCNA analysis
First, we constructed a sample clustering tree and ruled out a sample of GSM312909 because it was above the threshold, as shown in Fig. 11A. In our results, we set the soft threshold at 10 to be able to construct the scale-free network (R 2 = 0.94) (Fig. 11B). Finally, a total of 17 modules were generated based on the 0.25 cutoff value of dissimilarity of the modules and on average hierarchical clustering (Fig. 11C). The relationship between the modules and the clinical traits is shown in Fig. 11D. The red module denotes the positive correlation with the corresponding clinical trait, and the blue module denotes the opposite. The stronger the relevance is, the darker the color is. The illustration shows that the tan module that was positively associated with the N stage was the deepest (cor = 0.38, P = 0.04) (the grey module represents the genes that were not divided into any module), and the green module that was negatively associated with the grade was the deepest (cor = -0.36, P = 0.05).

Discussion
The occurrence and development of malignant tumors involve multiple signaling molecular pathways. In recent years, the genomic changes that are key to the progression of NPC have been revealed as the multiple functional loss mutations in the negative regulator of NF-κB; recurring gene damage, including deletion of the CDKN2A/CDKN2B locus, ampli cation of CCND1, TP53 mutation, and changes in the PI3K/MAPK signaling pathway; chromatin modi cation; and DNA repair (24)(25)(26)(27). In addition, some studies have explored molecular markers related to nasopharyngeal carcinoma. For example (28-30), CKS1, P27, and CENP-F have been found to be involved in the regulation of an abnormal cell cycle. Some oncoproteins and tumor suppressor genes, such as RAP2A, PTP4A2, and ECRG4 (31)(32)(33), are closely related to cell proliferation. Although many oncogenes and carcinogenic factors have been found to affect the occurrence and development of NPC, the exact mechanism is still unknown.
Many studies recently reported that ZIC2 might regulate the development of various tumors, including prostate cancer (23,(34)(35)(36), lung adenocarcinoma (17), breast cancer, clear cell renal cell carcinoma (15,37,38), colorectal cancer (14,(39)(40)(41), hepatocellular carcinoma (13,(42)(43)(44), and cervical cancer (10,45). All these tumors, except breast cancer, were upregulated. In this study, a total of 7 microarrays, 3 TMEs, and in-house tissues were incorporated, including 120 control samples and 323 NPC samples. ZIC2 was excessively expressed in NPC than in the normal tissues according to the gene chip, tissue chip data, and IHC results of the tissue samples. These results are consistent with those of related studies, and certainly add to our understanding of the expression of ZIC2 in NPC at the human tissue level, which no relevant study had previously investigated. Moreover, by mining data from the microarrays data as mentioned above, we observed that ATAD2 expression is positively correlated to ZIC2 expression. Interestingly, when we searched for the binding position of the ChIP-seq data in the Cistrome DB, we found that ZIC2 and ATAD2 could combine and showed peaks. ATAD2-or ANCCA, CT137, or PRO2000, as it is also called-is a member of the ATPase family, which is associated with various cellular activities by regulating protein complexes and is responsible for ATP binding and hydrolysis (46). Members of the ATPase family of proteins participate in diverse cellular processes that include cell cycle regulation, protein proteolysis and decomposition, organelle biogenesis, and intracellular tra cking (47). Being de cient in normal regulation, either due to the ATAD2 locus ampli cation or to speci c changes in the core members of the transcription mechanism, may cause ATAD2 abnormal activation, which will eventually lead to oncogenesis (48, 49). Moreover, ATAD2 showed expression disturbance in hepatocellular carcinoma, ovarian cancer, stomach cancer, and other cancers (50)(51)(52). The discovery of Liu et al., through bioinformatics analysis, that ATAD2 may take part in the tumorigenesis of NPC is noteworthy (53). This also agrees with our ndings. Besides, our GO and KEGG enrichment analysis showed that ZIC2 positive relation genes are mainly enriched in cell division and in the cell cycle. Likewise, our GSEA analysis revealed that the upregulation of ZIC2 is associated with the proliferation and division of cells and DNA synthesis, which is consistent with the results of our GO and KEGG analysis. Another crucial nding is that highly expressed genes are also enriched in DNA-dependent ATPase  The expression data of ZIC2 and corresponding ROC curves in NPC tissues and normal tissues from GEO datasets and in-house tissues and TME.