A New Sight In The Key Prognosis-Related Proto- Oncogene FYN In Hepatocellular Carcinoma Based On WCGNA Hub-Gene Screening Strategy


 Background: Hepatocellular Carcinoma (HCC) is the third deadly cancer worldwide. More breakthroughs are needed in the clinical practice of liver cancer, and new treatment strategies are required to carry out to benefit patients. This study aims to screen out other significant differences in genes associated with LIHC and analyze its prognostic value further. Methods: Here, we use the TCGA-LIHC database and the profiles of GSE25097 from GEO to explore the differential co-expression genes in HCC tissues compared with para-tumor (or healthy) tissues. Then, we utilized WGCNA to screen out differential co-expression genes. Finally, we explored the function of FYN in HCC cells and xenograft tumor models.Results: We identified ten hub genes in the protein-protein interaction (PPI) network, but only three (COLEC10, TGFBR3, and FYN) of them seem to have a close-related to the prognosis. The expression of FYN was negatively correlated with the prognosis of HCC patients. The xenograft model showed that overexpression of FYN could significantly inhibit malignant tumor behaviors and promote tumor cell apoptosis.Conclusion: Thus, FYN may be central to the development of LIHC and maybe a novel biomarker for clinical diagnosis and treatment.


Introduction
According to the 2019 data of the National Cancer Center [1] , by the end of 2015, the number of malignant tumors nationwide was about 3.929 million, and the deaths were approximately 2.338 million. Data shows that, on average, more than 10,000 people are diagnosed with cancer every day, and 7.5 people are diagnosed with cancer every minute. Among them, lung cancer, liver cancer, upper digestive system tumors, and colorectal cancer are the primary malignant tumors in China [2] .
Liver cancer is known for its aggressiveness and high mortality. Most liver cancer patients in China were already in an advanced stage when diagnosed, and their treatment is minimal [2,3] . At the same time, accompanied by blood metastasis and tumor thrombus formation, the ve-year survival rate after liver cancer surgery in China is lower than that of developed countries in Europe and America [1] . Therefore, realizing the early diagnosis of liver cancer is particularly important for treating liver cancer and improving the prognosis [4] . At present, the most used clinical diagnostic marker for liver cancer is alphafetoprotein (AFP) [5] . However, the sensitivity and speci city of AFP in diagnosing liver cancer are not very satisfactory [6] . Therefore, we still need to screen out some more biomarkers closely related to liver cancer progression to assist clinical diagnosis and treatment.
WGCNA (short for Weighted Gene Co-expression Network Analysis) aims to establish a multi-gene coexpression model used to cluster gene expression similarly and analyze the relationship between differences and phenotypes within the module [7] . Compared with traditional methods, WGCNA establishes a more signi cant relationship between the gene mRNA expression and prognosis data.
Finally, we combined WGCNA and differential gene expression analysis to identify differential genes further, and these genes with signi cant correlation can be treated as candidate biomarkers [8][9][10][11] .
Here we have screened out three hub genes (TGFBR3, COLEC10, and FYN) with potential value in clinical practice. As one of the Src family protein and tumor suppressor genes, FYN has little research on liver cancer. Finally, we explored the biological functions of FYN in liver cancer cells.

Data source
All gene expression pro les of LIHC (Liver hepatocellular carcinoma) are obtained from the TCGA database (https://portal.gdc.cancer.gov/) and GEO database (https://www.ncbi.nlm.nih.gov/gds). In the TCGA database. Furthermore, LIHC data and corresponding clinical information were downloaded by R package TCGAbiolinks for free as pervious discribed [12,13] . As data showed, 424 samples (since some clinical information is missing, we use "Not Available (N/A)" to replace the missing values for further WGCNA), including 374 LIHC tissues and 50 healthy tissues and RNAseq count data on 19,601 genes. All genes with low abundance were excluded following the edgeR package recommendation.
Also, R package GEOquery was used to normalize the expression pro les of GSE25097 from the GEO database. GSE25097 consisted of 268 tumor samples and 243 paired normal tissues from patients with LIHC and 6 healthy liver samples. As a result, about 2,887 genes were selected for further analysis.

WGCNA Analysis
Gene expression data pro les of TCGA and GSE25097 were constructed to gene co-expression networks using the WGCNA package in R to explore the modules of positively correlated genes among samples for relating modules to external sample traits. The adjacency transformed into a TOM by using dissTOM. Coexpressed genes were aggregated into different MEs by the dynamic tree cut method. We decided on a cut line (< 0.25) for merging highly similar modules to make the modules more compact and the most signi cant and correlated modules with LIHC selected. Limma in R was applied to screen out DEGs. The DEGs of the TCGA-LIHC and GSE25097 datasets were visualized as a volcano plot using the R package ggplot2. Subsequently, identify potential prognostic genes by overlapping genes between DEGs and coexpression genes, presented as a Venn diagram using VennDiagram in the R package.

Functional Annotation and Construction of PPI and Screening of Hub Genes
Use the R package cluster pro le package to explore the functions among genes of interest. GO annotation that contains the three sub-ontologies biological process (BP), cellular component (CC), and molecular function (MF)-can identify the natural properties of genes and gene sets for all organisms.
Then the STRING online tool was used to construct a PPI network of hub genes. The expression level of each hub gene is plotted as a box plot graph.

Evaluation of HCC malignant behaviors
For proliferation curves: HCC cells (Hep3B and Huh-7) were plated into 96-well plates at 2.5 × 10 3 /well and cultured in medium. The number of cells was determined using the CCK8 (Dojindo, Tokyo, Japan) every day for seven days.

Apoptotic Analysis of Terminal Dexynucleotidyl Transferase (Tdt)-Mediated dUTP Nick End Labeling (TUNEL) Staining
Use 4% paraformaldehyde to x the cell slide after the drug treatment for 30 minutes and wash it three times with PBS. Add 0.3% Triton X-100, incubate at room temperature for 5 minutes, and rinse three times with PBS. Prepare an appropriate amount of TUNEL test solution according to the TUNEL instructions and mix thoroughly. Add 100 µL of TUNEL detection solution to the sample, incubate at 37°C in the dark for 60 minutes, and wash three times with phosphate buffer saline (PBS). After mounting with antiuorescence quenching mounting solution, observe and take pictures under a uorescence microscope.

Immunohistochemical Staining
Dewaxing and hydration for sections. Use 0.3% H 2 O 2 to eliminate endogenous peroxidase, followed by antigen retrieval. Slides were incubated with primary antibodies overnight at 4°C and set with a secondary antibody at room temperature for 30 min. Then use neutral gum to seal the sections. Take pictures and select ve random visions for statistics.

Human HCC xenograft model
BALB/c nude mice (male, 5-6 weeks old) purchased and maintained under speci c pathogen-free conditions with a 12 h on/off light cycle. One million Huh-7 cells inoculated subcutaneously in the right ank. The mice randomly divided into two groups (8 animals each) when tumors reached an average volume of approximately 60 mm 3 . Tumor volumes are calculated according to the following equation: Volume (mm 3 ) = Length×(Width) 2 ×π/6. No animals' been excluded. Continue feeding for 1-week. Mice were sacri ced by cervical dislocation, and tumors' volume were measured and weighed.

Statistical analysis
The work ow of this study was shown in Figure1. All statistical analyses performed using SPSS V23.0 software. Comparisons of data with normal distribution between two groups using Student's t-tests. Otherwise, comparisons using the non-parametric Mann-Whitney test or the Wilcoxon matched-pairs test.
Kaplan-Meier analysis and log-rank test to compare the survival probabilities between different groups. Statistical tests were two-tailed, and a P of 0.05 was statistically signi cant.

Identi cation of Differentially Expressed Genes (DEGs) in Both TCGA and GEO database
For data collection, after analyses on RNA-Sequencing and microarray data and GES25097 dataset to screen out DEGs between LIHC and normal tissues, there were 2704 and 252 DEGs in TCGA and GES25097, respectively. The heatmap plots were shown in Supplementary Fig. 1A, B. The volcano plot of genes was shown in Fig. 2A, B. Subsequently, we identify candidate biomarkers and therapeutic targets to screen out co-expression networks. This study constructed gene co-expression networks of the gene expression data pro les of TCGA-LIHC and GSE25097 by the WGCNA package in R (Fig. 2C, D). By gathering similarly expressed genes in TCGA, a total of 9 modules were identi ed, whereas there are three modules in the GEO dataset (Fig. 2E, F). The MEs in the TCGA yellow and GEO blue modules showed a higher correlation with the histologic grade of HCC. Therefore, we chose these two modules for further research.

The GO and KEGG Pathway Enrichment Analysis of DEGs
At the same time, we put these DEGs into DAVID to analyze signi cant GO and KEGG pathways. GO annotation contains the three sub-ontologies, biological process (BP), cellular component (CC), and molecular function (MF), which can identify the biological properties of genes and gene sets for all organisms (Fig. 3A, C). We observed several enriched genes sets combined with the KEGG enrichment analysis shown in Fig. 3B, D, for instance, complement activation, protein activation cascade, and alcohol responding pathway in the TCGA database, prion disease, and sphingolipid signaling pathway in the GEO dataset.

Identi cation of Hub Genes from DEGs
As shown in Fig. 2G, in total, the 23 overlapping genes were extracted for validating the genes of coexpression modules. Cytoscape constructed the PPI networks of all genes in the four modules. We used the "cytoHubba app" to calculate the degree of each node. Ten top hub-genes showed in Fig. 3E, in which PHLDA1 seemed to act as the central role in the PPI network. Furthermore, all these ten hub-genes were taking into further analysis.

Veri cation of the Prognostic Values of Hub Genes
To identify whether these hub genes have a prognostic value, the overall survival (OS) and disease-free survival (DFS) analyses of the ten hub genes performed by Kaplan-Meier plotter using the R survival package GEPIA2 database. As shown in Fig. 4, COLEC10, TGFBR3, and FYN, three genes have a close relative to the survival rate of HCC patients. Among these three genes, the lower expression level of FYN was signi cantly associated with worse OS and DFS of the HCC patients (Fig. 4G, H), while CLOEC10 and TGFBR3 only have worse OS (Fig. 4A, B, D, E). Unfortunately, both COLEC10, TGFBR3, and FYN expressions were not related to the staging phase of HCC patients (Fig. 4C, F, I). Since the FYN gene is rarely studied in HCC, we will do further research on FYN to clarify its role in liver cancer progression.

Overexpression of FYN inhibited the malignant behaviors of HCC cells
It is generally believed that FYN is a proto-oncogene, which plays a vital role in metabolism and physiological activities. However, its effect on HCC has not yet been clari ed. To investigate whether FYN may act as cancer suppressor genes, we then detected the migration and invasion abilities in FYN overexpression Hep3B and Huh-7 cells. As shown in Fig. 5A-C, when compared with the control group, FYN-overexpression may inhibit the proliferation and, also, migration, invasion abilities of Hep3B and Huh7 cells. Moreover, FYN overexpression could promote TUNEL-positive stained HCC cells (Fig. 5D).
Collectively, these results con rmed that the FYN might act as a tumor suppressor gene that takes part in the malignant behaviors of HCC cells.

Overexpression of FYN suppressed the tumor growth in vivo
Next, we explored whether FYN has some effects on the HCC xenograft model. As expected, the FYN overexpression group signi cantly reduced the tumor volume and weight compared with the control group ( Fig. 6A-C).
Consistently, immunohistochemical staining and TUNEL assay of tumor tissues also con rmed that the FYN overexpression group resulted in a remarkable decrease of Ki67 positive cells in parallel with an increasing amount of apoptotic cells compared with the control group (Fig. 6D).
From the UALCAN database, we found out that primary HCC tissues have a lower FYN expression than normal tissues (Fig. 6E). Tissue microarrays containing 164 pairs of HCC tissue samples and para-tumor tissue samples were examined for FYN expression by IHC. As shown in Fig. 6F, FYN

Discussion
Recent data showed, the World Health Organization (WHO) announced that primary liver cancer caused 745,517 deaths worldwide, of which more than 50% were from China [1] . Due to over 120 million chronic hepatitis B virus (HBV) carriers at high risk of liver cancer in our country, it is a long-term and arduous task to achieve early diagnosis and early treatment, effective prevention and treatment, and precise treatment of liver cancer [4,14] . Better biomarkers for predictive and prognostic molecules for HCC are urgently required. The RNA-seq-expression and clinical information of HCC from the TCGA database are correlated by WGCNA, which allows us to screen out genes closely related to the occurrence and development of cancer from the existing data.
In this study, through the identi cation of DEGs and WGCNA, we constructed a protein-protein network to show the top 10 genes in the different modules by calculating degree in Cytoscape, which screened out three hub genes (TGFBR3, COLEC10, and FYN) on which have intimately related to the survival outcome of HCC patients. Form the data shown, among these 3 genes, the lower expression level of FYN was signi cantly associated with both worse OS and DFS of the HCC patients. In contrast, CLOEC10 and TGFBR3 only have worse OS. Besides, a previous study showed that TGFBR3 and COLEC10 were rmly related to the occurrence and development of HCC [15][16][17] , but the biological functions of FYN, a member of Src family kinases (SFKs), were rarely reported in HCC.
The SFKs include Src, Lyn, Fyn, Yes, Lck, Blk, and Hck, which play an essential role in regulating the growth and differentiation of eukaryotic cells [18,19] . FYN is a 59 kDa protein, which its carboxy terminus has extensive amino acid sequence homology with Src, but it is very different at 81 amino acid residues at the amino terminus. The FYN protein is synthesized on the cytosolic polyribosome and is Noctadecylated, and then rapidly targets to the plasma membrane for palmitoylation. The corresponding sequences around Src Tyr416 and Tyr527 are conserved in FYN and, therefore, may be similarly regulated by phosphorylation. Double-acetylated FYN aggregates in caveolae-like membrane microdomains, which can interact with various other signal transduction molecules. FYN has various biological functions, including signal transduction through T cell receptors, regulation of brain function, and adhesionmediated signal transduction. Changes in FYN levels in the corresponding target tissues may lead to better treatment of certain related diseases. [19][20][21][22] . FYN is considered to be a proto-oncogene, but its role in tumors, especially hepatocellular carcinoma cells, has not been clari ed, even contradiction.
Interestingly, when we overexpressed FYN in HCC cells, it seems FYN may act as a tumor suppressor gene since it suppressed the migration and invasion ability and the proliferation of Hep3B and Huh-7 cells.
Furthermore, FYN overexpression remarkably inhibited the colonization and growth of the tumor. Again, in the tissue microarrays of HCC patients, we found that the expression level of FYN in tumor tissues was lower than that in the adjacent tissues in most HCC patients, and the lower the expression level of FYN, the worse the prognosis of patients.

Conclusion
In conclusion, our study aimed to identify hub genes involved in HCC by WGCNA of data from the TCGA database and concluded that three functional genes (TGFBR3, COLEC10, and FYN) are highly differentially expressed in tumor and non-tumor tissues. Finally, according to further study in vitro and in vivo, we preliminarily con rmed the anticancer effect of FYN, and it seemed FYN might be a promising biomarker for predictive and prognostic molecules for HCC. Moreover, the molecular mechanisms and the diagnostic value of FYN for HCC patients should be further validated through a series of experiments. All methods involving animal experiments were performed in accordance with the relevant guidelines and regulations. We con rmed the study was carried out in compliance with the ARRIVE guidelines (details in our ARRIVE checklist in Additional le 1).

Consent for publication
Not applicable.

Availability of data and materials
All gene expression pro les of LIHC are obtained from the TCGA database (https://portal.gdc.cancer.gov/) and GEO database (https://www.ncbi.nlm.nih.gov/gds). Other data used to support the ndings of this study are within this article.  respectively. The column corresponds to a clinical trait (HCC and normal tissues). Each module contains the corresponding correlation and P-value. (G) The Venn diagram of genes among DEGs. About 23 overlapping genes are listed in the intersection of DEG lists and two co-expression modules in total.

Figure 3
Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment analysis of all differentially expressed genes (DEGs) and the PPI network of hub-genes. (A) (B) KEGG and GO analysis of the TCGA-LIHC database. (C) (D) KEGG and GO analysis of the GEO dataset. The color represents the adjusted p-values, and the size of the spots represents the gene number. (E) The protein-protein interaction (PPI) network and the candidate hub genes. Edges represent the protein-protein associations. The red nodes represent genes with a high MCC sore, while the yellow node represents genes with a low MCC sore. Figure 4