Identification of the innate normal tissue specific genes and acquired tumor specific genes in determining the tumor transcriptional profiles

Background: For a specific cancer type, the transcriptional profile is determined by the combination of innate transcriptional features of the original normal tissue and the acquired transcriptional characteristics mediated by genomic and epigenetic aberrations in the tumor development. However, the classification of innate normal tissue specific genes and acquired tumor specific genes is not studied in a pan-cancer manner. Methods: The innate and acquired gene expression profiles in each tumor type were studied using The Cancer Genome Atlas (TCGA) RNA-seq dataset. The prognostic effects of the tumor acquired genes were determined by “survival” package in R software. The methylation of the tumor acquired genes was delineated using TCGA HumanMethylation450 microarray data. Results: 90% liver hepatocellular carcinoma (LIHC) specific genes are derived from innate normal liver specific genes. On the contrary, 90.3% kidney clear cell carcinoma (KIRC) specific genes and 90.9 % lung squamous cell carcinoma (LUSC) specific genes are acquired in the tumor developmental progress. The innate normal tissue specific genes are down regulated in tumor tissues, while, the tumor acquired specific genes are up regulated in the tumor tissues. The innate normal tissue specific genes and the tumors acquired specific genes are both associated with the tumor overall survival in some tumor types. The hyper-DNA methylation of normal tissue specific genes is contributing to the inhibition of normal tissue specific genes expression in cancer cells. And the tumor acquired specific genes are activated by hypo-DNA methylation and genomic aberrations. Conclusions: Our results provide descriptions of the specific transcriptional features across cancer types and suggest that the tumor acquired specific genes are potential targets for anti-cancer therapy.


Abstract
Background: For a specific cancer type, the transcriptional profile is determined by the combination of innate transcriptional features of the original normal tissue and the acquired transcriptional characteristics mediated by genomic and epigenetic aberrations in the tumor development. However, the classification of innate normal tissue specific genes and acquired tumor specific genes is not studied in a pancancer manner. Methods: The innate and acquired gene expression profiles in each tumor type were studied using The Cancer Genome Atlas (TCGA) RNA-seq dataset.
The prognostic effects of the tumor acquired genes were determined by "survival" package in R software. The methylation of the tumor acquired genes was delineated using TCGA HumanMethylation450 microarray data. Results: 90% liver hepatocellular carcinoma (LIHC) specific genes are derived from innate normal liver specific genes. On the contrary, 90.3% kidney clear cell carcinoma (KIRC) specific genes and 90.9 % lung squamous cell carcinoma (LUSC) specific genes are acquired in the tumor developmental progress. The innate normal tissue specific genes are down regulated in tumor tissues, while, the tumor acquired specific genes are up regulated in the tumor tissues. The innate normal tissue specific genes and the tumors acquired specific genes are both associated with the tumor overall survival in some tumor types. The hyper-DNA methylation of normal tissue specific genes is contributing to the inhibition of normal tissue specific genes expression in cancer cells. And the tumor acquired specific genes are activated by hypo-DNA methylation and genomic aberrations. Conclusions: Our results provide descriptions of the specific transcriptional features across cancer types and suggest that the tumor acquired specific genes are potential targets for anti-cancer therapy. 3 Background Cancer is usually classified by the original normal tissue where the tumor cells are derived from. Due to the different cell original patterns, each tumor type has a very distinctive and unique transcriptional feature [1][2][3][4][5]. However, the original tissue expression characteristics, only influence, but not fully determine the tumor classifications [6]. Those observations highlight the contributions of genetic and epigenetic changes in determining the distinctive transcriptional profiles of tumor cells.
Genetic changes are including genomic rearrangements, gene amplifications or deletions, and specific gene mutations. The genomic aberrations are highly important to tumor therapy response and tumor overall survival [7]. Epigenetic changes such as DNA methylation and chromatin modification are also critical to cancer development and progress [8][9][10]. Those genetic and epigenetic changes are finally reflected to transcriptome, including mRNA, microRNA and lncRNAs deregulations and proteome, including protein expression and modification alterations in cancer cells.
So, for a specific cancer type, the ultimate transcriptional profiles are determined by the combination of innate transcriptional features of the original normal tissue and acquired transcriptional characteristics by genomic and epigenetic aberrations.
However, which factor is more important to determine the different transcriptional features across tumor types is no clear. And due to the variation in cancer driver alterations among different tumor types, the acquired tumor transcriptional characteristics may dramatically be different.
With the advances of TCGA project, the genetic and epigenetic changes as well as the transcriptional alterations of each tumor type and across tumor types are well 4 illustrated [11,12]. Moreover, some normal tissue expression and DNA methylation data is deposited in TCGA project [13,14]. And all the data is open-accessed, thus providing us great knowledge to address how the genetic, epigenetic changes and innate transcriptional difference of normal tissues influence the tumor transcriptional features across cancer types.
Here, the malignant and normal tissue specific genes are identified across 14 tumor types. The normal tissue specific genes are down regulated in tumor tissues. After

Data collection
Gene expression profiles across cancer types were analyzed using RNA-seq data (TCGA HiSeqV2 data). The DNA methylation profiles were analyzed through HumanMethylation450 microarray data. All the datasets were downloaded from the TCGA hub (https://tcga.xenahubs.net).

5
The average count (log2) of each gene in the various malignant tissue samples was calculated. For a gene with count > 512 (log2 count > 9) and 1.5 fold higher than any other tissues was classified into malignant tissue specific gene. Same selection threshold was used to identify the normal tissue specific genes.

Heatmap
Heatmaps were created by R software "pheatmap" package. The "pheatmap" package was available in bioconductor. The clustering scale was determined by "average" method.

Survival analysis
Kaplan-Meier estimator from "survival" package in the R statistics software was applied to identify the association between tumor acquired tissue specific genes and tumor overall survival. The "survival" package and the basic usage were downloaded from bioconductor. Log-rank P value was determined.

Analysis the genomic alteration
LUSC and LIHC acquired specific genes amplifications were downloaded from cbioportal (http://www.cbioportal.org/index.do ). The DNA location of genes was annotated according to hg19.

Tumor acquired specific core transcription factors network
The networks of tumor acquired specific core transcription factors were created by cytoscape GeneMANIA App. The first degrees of core transcription factor connected genes were demonstrated.

6
The box plots were generated from prims5.0. Statistical analysis was performed using the Student's t test.

Identification of the malignant tissue specific genes from TCGA dataset.
To identify the tumor tissue specific genes, we evaluated all the tumor samples in the TCGA collection which the RNA-seq data was available. Only samples with There were 579 LIHC specific genes, while, there were only 36 STAD specific genes and 36 BLCA specific genes. LUSC had the least 11 specific genes (Fig. 1).
Although, those tumor specific genes were highly expressed in corresponding malignant tissues, we found that COAD specific genes and ESCA were also highly expressed in STAD tissues (Fig. 1). Those results further highlighted the similar functions and tissue origin of colon, esophagus and stomach [15]. Another interesting finding was that HNSC specific genes were also highly expressed in LUSC 7 tissues (Fig. 1). This phenomenon will be further illustrated.

Identification of the normal tissue specific genes from TCGA dataset.
Using the same selective strategies, normal tissue specific genes from TCGA dataset were identified. Totally 645 normal samples from 11 different tissue types, including bladder, breast, colon, esophagus, head, neck, kidney, liver, lung, prostate, stomach and thyroid were studied (Fig. 2). Normal kidney samples were combined from KICH, KIRP and KIRC datasets. Normal lung samples were combined from LUAD and LUSC datasets. This resulted in 3368 normal tissue specific genes. As illustrated in the heatmap presentation (Fig. 2), those genes were highly expressed in corresponding tissues. The number of tissue specific genes was also varied significantly from different tissue types. For instant, there were 1089 liver specific genes, while, there were only 13 stomach specific genes and 67 bladder specific genes (Fig. 2). Notably, because of the functional similarity, stomach shared similar transcriptional features with the gastrointestinal tract, head and neck tissues [15] ( Fig. 2).
The overlapping between normal tissue specific genes and malignant tissue specific genes.
Therefore, we obtained both the normal tissue specific genes and corresponding tumor specific genes, we then determined the overlapped normal and tumor specific genes. The venny diagrams depicted the common and unique genes between normal and tumor specific genes across 14 tumor types (Fig. 3). For the majority of tumor types, the tumor tissue specific genes were few than normal tissue specific genes (Fig. 3). We suggested that the higher percentage of common genes derived from tumor specific genes indicating more importance of innate transcriptional profiles of normal tissues in determining the transcriptional features of tumor cells. 8 We found that 90% LIHC specific genes were derived from normal liver specific genes, suggesting the innate transcriptional profiles of normal liver were more important to determine the transcriptional features of LIHC (Fig. 3). 57.38% colonic COAD, 57% HNSC and 67.5% LUAD specific genes were derived from normal colon, head and neck and lung specific genes, respectively (Fig. 3). On the contrary, only 9.7% KIRC specific genes and 9.1% LUSC specific genes were derived from normal kidney and lung specific genes respectively, suggesting the genomic aberrations and DNA methylation were more important to determine the transcriptional features of KIRC and LUSC across tumor types (Fig. 3).

The normal tissue specific genes are decreased in the tumor samples.
Tumor is an abnormal growth of tissue losing the original specialized functions. The collapse of original specialized functions may be induced by the loss of tissue specific genes in tumor cells. So, we tested the expression of tissue specific genes in normal tissues and corresponding tumor tissues. We found that the tissue specific genes were inhibited in BLCA, BRCA, COAD, ESCA, HNSC, KIRC, KIRP, LIHC, LUAD, LUSC and THCA tumor samples (Fig. 4). Particularly, compared to the normal samples, nearly all the colon specific genes were inhibited in COAD, kidney specific genes were inhibited in KIRC and KIRP, liver specific genes were inhibited in LIHC and lung specific genes were inhibited in LUAD and LUSC (Fig. 4). However, we found that most of the prostate specific genes were not inhibited in PRAD tumor samples (Fig. 4).
The normal tissue specific genes are inhibited in cancer by hyper-DNA methylation.
Next, we tried to determine the mechanisms that induced the decreasing of tissue specific genes in the tumor development. The first clue was DNA methylation.

9
Tissue specific genes were highly controlled by DNA methylation [16]. Compared to the normal samples, the alterations of methylation profile across cancer types were studied in TCGA [13,14]. Using the normal samples in the TCGA for which the DNA methylation data was available, we showed that the tissue specific genes were with low DNA methylation in corresponding tissues (Fig. 5a).
In the tumor developmental process, the inhibited tissue specific gene expression may be controlled by DNA hyper-methylation. And the increased DNA methylation was controlled by DNA methyltransferase hyper-activity [8][9][10]. DNMT1 is a DNA methyltransferase and is responsible for the maintaining of the DNA methylation patterns [17]. We found that compared to the normal samples, DNMT1 was up regulated in nearly all the tumor types, except PRAD (Fig. 5b).
The elevated DNMT1 expression in tumor may increase the DNA methylation in the tissue specific genes thus inhibited their expressions. So, the DNA methylation profiles between normal and tumor samples in different tumor types were analyzed.
Compared to the normal samples, we found that nearly half percentage of tissue specific genes were with high DNA methylation patterns in BRAC, KIRC, KIRP, LIHC, LUAD and LUSC tumor types (Fig. 6c). Those observations suggested that DNA methylation was partially contributing to the decreasing of tissue specific genes in the tumor development.

The tumor acquired specific genes are increased in the tumor samples.
From above results, we had shown that the normal tissue specific genes were decreased in the tumor samples. We proposed that in the venny diagrams (Fig. 3), the unique normal tissue specific genes were totally lost due to the dedifferentiation process of the tumor development. Although, the common genes were also decreased in the tumor samples, those genes still maintained the tissue specifications in tumor cells. The tumor unique specific genes were new acquired genes which were up regulated by genomic aberrations or mis-regulations of DNA methylation.
To test this proposition, the expressions of tumor acquired specific genes in normal tissues and corresponding tumor tissues were illustrated. We found that most of the tumor acquired specific genes were highly expressed in BLCA, BRCA, COAD, ESCA, HNSC, KICH, KIRC, KIRP, LUAD, LUSC, PRAD and THCA tumor samples than the normal samples (Fig. 6).

The KIRC acquired specific genes are activated by hypo-DNA methylation.
Next, we tried to determine the mechanisms that induced the activation of tumor acquired specific genes in the tumor development. Previously, we had shown that the hyper-DNA methylation was partially determining the decreasing of normal tissue specific genes in the tumor (Fig. 5c). Contrast to the normal samples, we found that some of the tumor acquired specific genes were with hypo-DNA methylation patterns in BLCA, BRCA, COAD, HNSC, KIRC, LIHC, LUAD and THCA tumor types (Fig. 7a). Particularly, in KIRC tumor type, more than 80% tumor acquired specific genes were with hypo-DNA methylation in tumor samples (Fig. 7a).
Those observations suggested that DNA methylation was partially contributing to the activation of tumor acquired specific genes in the tumor cells.

amplifications.
Another factor determining to the activation of tumor acquired specific genes in tumor cells was genomic aberrations, particular DNA amplification. We found that the LUSC acquired specific genes were with significant DNA amplifications (Fig. 7b).
There were 57 LIHC acquired specific genes. Among them, we found 18 genes were amplified in more than 10% LIHC patients (Fig. 7c). And those 18 genes were located on two DNA regions, 8q11.21 and 1q12 (Fig. 7c). The DNA amplifications of acquired specific genes in other tumor types were also studied and not significant DNA amplifications were observed.

Acquired of head and neck normal specific genes in LUSC development is mediated by SOX2 amplification.
SOX2 plays important roles in embryonic development, maintaining pluripotent stem cells identity and cell differentiation [18][19][20]. Previously, it had reported that SOX2 was associated with increased cancer aggressiveness [21,22] and therapy resistance [23]. We found that SOX2 amplified and acquired in LUSC (Fig. 7b) and involved in regulation of LUSC specific genes. We also showed that SOX2 was particular expressed in head and neck tissue and SOX2 involved regulatory networks was important to maintaining head and neck functions (Fig. 7d).
LUAD and LUSC were both derived from normal lung tissue. Further demonstration suggested that, in LUSC, the original normal lung expression profiles were totally lost. Instead, LUSC was highly expressed head and neck specific genes (Fig. 7d).
Those observations provided some explanations that HNSC specific genes were also highly expressed in LUSC tissues (Fig. 1).
The normal tissue specific genes and the tumor acquired specific gene expression are associated with the tumor outcomes.
At last, we tested whether the down regulation of tissue specific genes and the up regulation of the acquired specific genes were associated with the tumor progress.
Cohort of tumor expression data with clinical overall survival in TCGA dataset was studied. The kaplan-meier survival analysis showed that the tissue specific genes distinguished a cluster of patients with high probability of overall survival in BLCA (P=2e-04), HNSC (P=0.002), LIHC (P=0.03), KIRC (P=4e-05) and KIRP (P=0.004) (Fig.   8a). However, the tissue specific genes were not associated with the overall survival in LUAD, BRCA, COAD and STAD tumor types (Fig. 8a).
The kaplan-meier survival analysis also showed that the acquired specific genes distinguished a cluster of patients with low probability of overall survival in KICH (P=0.07), KIRC (P=0.008) and KIRP (P=0.03) (Fig. 8b). However, the tumor acquired specific genes were not associated with the overall survival in other tumor types.

Discussion
Here, normal and malignant tissue specific genes are identified from TCGA dataset ( Fig. 1 and 2) and those genes are further studied in tumor samples. We suggest that the down regulation of normal tissue specific genes in cancer is contributing to the collapse of normal tissue functions (Fig. 4). And the inhibitions of normal tissue specific genes in cancer are caused by epigenetic hyper-DNA methylation (Fig. 5).
However, how the genomic alterations induce the down regulation of tissue specific genes and collapse of normal tissue functions are still not known, and difficult to be addressed from TCGA dataset. We did try to determine whether the DNA deletion was an inner mechanism to induce the inhibition of tissue specific genes in cancer.
However, almost all the tissue specific genes showed no DNA deletion. Whether the 13 cancer drive mutations influence the tissue specific gene expression is even harder to determine, because of the different cancer drive mutations across tumor types.
TP53 is the most common drive mutation [24]. And loss of TP53 induces multiple types of cancer in TP53 knockout mice [25]. Next, we try to test the tissue specific gene expression in TP53 wild type and knockout mice to determine whether loss of TP53 functions inhibits the tissue specific gene expression.
Tumor development is a dedifferentiation process [26,27]. The normal cells lose the original specialized functions and dedifferentiated to a relative primary state. Also in the tumor development process, cancer cells acquire some tumor hallmarks because of the accumulation of genetic and epigenetic alterations [28,29].
However, in responding to the genomic variations in the tumor cells, each tissue has different abilities to maintain its original characteristics (Fig. 3). Compared to other tumor types, LIHC has the most ability to maintain its original characteristics.
Although, our results showed the decreasing of liver specific genes in LIHC (Fig. 4), the remaining transcriptional features still distinguish LIHC from other tumor types.
KIRC and LUSC have the least ability to maintain its original characteristics (Fig. 3).
LUSC acquires partial head and neck transcriptional features (Fig. 7d). The acquired LUSC specific genes are activated by DNA amplifications, particular LUSC acquired specific gene SOX2 are amplified in 40% LUSC patients (Fig. 7b). The acquired KIRC specific genes are activated by hypo-DAN methylation. More than 80% KIRC specific genes are with low DNA methylation (Fig. 7a).
The BLCA, BRCA, COAD, HNSC, LIHC, LUAD and THCA acquired specific genes are also partially determined by hypo-DNA methylation or DNA amplifications (Fig. 7a).. However, the inner mechanisms of ESCA, KICH, PRAD and STAD acquired specific genes are not yet determined. We find no hypo-DNA methylation or DNA amplifications in ESCA, KICH, PRAD and STAD acquired specific genes. So, we hypothesize that ESCA, KICH, PRAD and STAD acquired specific genes are indirectly activated by specific genetic changes. Like TP53 signaling pathway is mutant in more than 50% ESCA, KICH and STAD patients [7]. MYC mediated transcription network is amplified in ESCA and STAD [30]. Next, we will use experiments to validate this hypothesis.
In our paper, we analyzed the normal tissue specific genes across cancer types, and  The overlapping between normal tissue specific genes and malignant tissue specific genes. V 23 Figure 4 The normal tissue specific genes are decreased in tumor samples. Heatmaps showed the nor 24 Figure 5 The normal tissue specific genes are inhibited in cancer by hyper-DNA methylation. (a) Heatm 25 Figure 6 The tumor acquired specific genes are increased in the tumor samples. Heatmaps showed th 26 Figure 7 The expressions of tumor acquired specific genes are increased by hypo-DNA methylation an 27 Figure 8 The normal tissue specific genes and the tumor acquired specific gene expression are associ