Development of a Prognostic Signature Based on Hypoxia-related Genes for Thyroid Cancer

Background. We aimed to establish a model to predict the prognosis of patients with thyroid cancer based on differentially expressed hypoxia-related genes. Methods. By comparing the genes in TCGA database and hypoxiaDB database, we obtained differentially expressed genes (DEGs) related to hypoxia in thyroid cancer. Gene function enrichment analysis was performed, and a protein-protein interaction network was constructed using the STRING database. Univariate Cox regression were used to screen hypoxia-related genes with prognostic value. Subsequently, multivariate Cox analysis was used to determine prognostic markers based on thyroid cancer, a prognosis model based on these genes was established. The Kaplan-Meier analysis, Receiver operating characteristic (ROC) analysis and The Harrell’s concordance indexes in the training set and the validation set were used to evaluate the performance of the model. Finally, we conducted univariate analyses of the prognostic value of clinical data (including risk scores) of thyroid cancer patients. Results. 326 hypoxia-related thyroid cancer genes were found. Functional enrichment analysis demonstrated they were mainly involved in regulating biological functions. 23 genes have been proved to be associated with the prognosis of thyroid cancer with univariate Cox regression, among them, 11 marker genes were used to construct a new prognosis model by multivariate Cox analysis. Accordingly, the system of risk scores was constructed, patients with high-risk scores (P <0.005) had shorter overall survival than those with low-risk scores. The ROC curve indicated good performance of the eleven-gene signature at predicting overall survival. The Harrell’s concordance indexes in the internally validated for the 11-gene prognostic signature was 0.881. Moreover, univariate analysis showed that the risk score and age were signicantly associated with patient overall survival. The model we created was signicantly associated with patient overall survival. Conclusions. The model we established had excellent performance in the prognosis of thyroid cancer.


Introduction
The most common form of endocrine cancer is thyroid cancer, which occurs in all age groups, i.e., children to the elderly [1] Due to the rapid development of detection technology, more cases of thyroid cancer have been clinically diagnosed, which is more concerning. At present, the diagnosis of thyroid cancer mainly relies on the ultrasound and percutaneous ne-needle aspiration (pFNA), while the treatment mainly remains to be surgery and I 131 radiotherapy. Such mature diagnosis and treatment programs are widely being recognized. Meanwhile, the prognosis of thyroid cancer has become a hot topic of discussion among scholars.
It is proven that the microenvironment in tumor cells is chaotic and complex, which is caused by insu cient oxygen supply from hypoxia. Under a hypoxia environment, gene and protein expression could undergo regulation, while hypoxia can also play a role in genetic instability, tumorigenesis, and progression. For example, hypoxia causes changes in the tumor microenvironment, promoting in ammation, immune suppression, and treatment resistance, inducing lung cancer [2] Recently, scientists have made great efforts to use hypoxia-related markers to evaluate the prognosis of tumors. For example, three hypoxia-related genes (PDSS1, CDCA8, and SLC7A11) were used to construct a prognosis, recurrence, and diagnosis model for hepatocellular carcinoma (HCC) [3] Based on hypoxiarelated genes, a hypoxia risk model was developed to evaluate the prognosis of glioma patients [4]. The establishment of these hypoxia-related prognostic independent models has signi cantly contributed to the comprehensive treatment of cancer patients. Our study intended to provide novel and reliable prognostic markers for a comprehensive treatment of throid cancer patients.

Materials And Methods
Acquisition of hypoxia-related genes According to FDR (false discovery rate) <0.05 and | log 2 FC | >1, we obtained 2215 differentially expressed genes, combined with the differentially expressed genes downloaded from hypoxia DB database, which provided a complete and up-to-date database for the study of human hypoxia-related regulatory proteins, we obtained 373 differentially expressed hypoxia-related genes.
All differentially expressed genes and differentially expressed hypoxia-related genes are shown in raw data.
Hypoxia-related gene database A total of 373 differentially expressed genes were selected from the hypoxia-related gene database. The database, HypoxiaDB, was based on the literature, which provided a complete and up-to-date database for the study of human hypoxia-related regulatory proteins.

Thyroid cancer transcriptome data and clinical data
The transcriptome data was downloaded from the o cial website of the Cancer Genome Atlas of Thyroid Cancer, which also included 510 thyroid cancer specimen data and 58 adjacent to cancer specimen data for further analysis.

Functional Analyses
In this study, the clusterPro ler was used for GO and KEGG analyses to study the functional correlation between the differentially expressed hypoxia-related genes. The value of P<0.05 was considered statistically signi cant.

Construction of protein interaction network
Hypoxia-related genes were mapped to the STRING database to create an interactive network to clarify the association between differentially expressed genes. Then, the Cytoscape software was adopted to visualize the protein-protein interaction network.

Construction and validation of a prognostic model of hypoxia-related markers
Univariate Cox proportional hazard regression analyses were used to identify differentially expressed hypoxia-related genes associated with overall survival time, where P<0.05 was considered to be statistically signi cant. After combining the regression coe cient (β) of each speci c gene, the patient's risk score formula was constructed by weighing the estimated regression coe cient. The risk score of each patient was found according to the following formula: Risk score=β gene(1) × expression gene(1) + β gene(2) × expression gene(2) + ···+ β gene(n) × expression gene(n).In the testing set and the validation set, according to the risk score formula, with the median risk score as the critical point, patients were divided into low-risk groups and high-risk groups. The Kaplan-Meier analysis was used to assess the survival difference between the two groups in two sets using the log-rank as the comparison. Receiver operating characteristic (ROC) analysis and The Harrell's concordance indexes was used to assess the accuracy of the model predictions. Univariate Cox regression was used to identi ed the independent prognostic factors among risk scores, age, tumor TNM, stage, Focus and gender.

Internal validation of the prediction model
To further verify the predictive power of this model, we used the 50% thyroid cancers samples randomly selected from the entire TCGA database as internal validation dataset (n = 255). The C-index was used to assess the performance of the established model.

Statistical Analyses
The TCGA database was analyzed using the R language. The survival curve was made using Kaplan-Meier and compared by the log-rank test. Multivariate Cox analysis of risk factors was used to establish a prognostic prediction model for hypoxia-related genes. All statistical analyses were conducted using the R language. All statistical tests were two-sided with P<0.05 being statistically signi cant.
Functional enrichment analysis of differentially expressed hypoxia-related genes The functional enrichment analysis of differentially expressed hypoxia-related genes helped in their biological understanding. Figure 2 summarizes the rst ten biological processes and ten pathways of GO and KEGG enrichment, respectively. GO enrichment showed that the biological processes of differential genes mainly involved 'extracellular structure organization', 'collagen-containing extracellular matrix' and 'extracellular matrix structural constituent' (Fig. 2A). while the KEGG enrichment showed the following pathways of differential genes, such as: 'PI3K-Akt signaling pathway', 'Cytokine-cytokine receptor interaction' and ' MAPK signaling pathway' (Fig.2B).

PPI network construction and important gene modules
To fully understand the differentially expressed hypoxia-related genes, we utilized Cytoscape software to construct an interactive PPI network (Fig.3). As shown in the gure, the red color represents up-regulated hypoxia-related genes, while the green color indicates down-regulated ones. Molecular Complex Detection (MCODE) tool was used to identify signi cant gene modules, and three signi cant gene modules were screened out ( Supplementary Fig.1).
Identi cation of hypoxia-related genes for prognosis Univariate Cox regression analysis was performed to determine the hypoxia-related genes associated Based on the prognosis formula of hypoxia-related genes, we determined the distribution of these genes in different risk populations and the survival rate of patients in the testing set and validation set. To determine the role of 11 hypoxia-related genes in predicting the clinical prognosis of thyroid cancer patients, a K-M (Kaplan-Meier) survival curve was plotted to analyze different survival times between high-risk and low-risk groups in the two sets. The K-M analyses informed us that the survival rate of patients in the high-risk group was signi cantly lower than that of the low-risk group (Fig.5).
Later, 11 genes related to hypoxia were used to construct ROC curves for survival rates of thyroid cancer patients for 1 year, 3 years, and 5 years in the two sets to evaluate their predictive performance (Fig.6). Subsequently, we conducted univariate (Table 1) analyses of the prognostic value of clinical data (including risk scores) of thyroid cancer patients, and it turned out that the risk score we constructed exhibited a high value in the prognostic evaluation.

Discussion
Thyroid cancer incidence has been showing an increasing trend in recent years, posing a major threat to human health [5]. In the United States, the detection rate of thyroid cancer has increased in both males and females, i.e., 4.9 cases per 100,000 people in 1975 to 14.3 cases per 100,000 people in 2014 [6] Although most of the pathological types of thyroid cancer have a high survival rate, there are poorly differentiated thyroid cancer (PDTC) with a ve-year speci c survival rate of 66% [7], which has attracted the attention of many scholars. In this study, we hope to nd a new prognostic plan for thyroid cancer patients as well.
With the in-depth understanding of the hypoxia-induced changes in the tumor microenvironment along with the improvement of TCGA and HypoxiaDB database, scholars have paid more attention to the prediction of tumor prognosis based on hypoxia-related factors [8,9]. Recently, emerging evidence has emphasized the importance of the immune and in ammatory microenvironment in thyroid cancer [10,11]. However, none of the studies have analyzed the prognosis of thyroid cancer patients using hypoxiarelated factors.
At present, most studies on the hypoxic microenvironment in thyroid cancer are still focused on a single gene. In this study, 23 differentially expressed genes were obtained using the thyroid cancer hypoxiarelated data from the TCGA database and hypoxia-related genes data from the HypoxiaDB database. Besides being related to hypoxia, the enrichment analysis of GO and KEGG showed that the biological process of these hypoxia-related differential genes also involved 'extracellular structure organization', 'collagen-containing extracellular matrix' and 'extracellular matrix structural constituent' with the following pathways, such as: 'PI3K-Akt signaling pathway', 'Cytokine-cytokine receptor interaction' and ' MAPK signaling pathway'. After multivariate survival analyses, we identi ed,11 hypoxia-related gene, including SLC6A8, STC1, PTGIS, SLC24A3, DPP4, ANK2, APOE, PIM1, PHKG1, AKR1C3, NDR-G1, which were closely related to the prognosis. Moreover, a risk scoring system was constructed to provide clinicians with new tools for analyzing the prognosis of patients. This scoring tool showed greater advantages compared to the patients' clinical data (such as gender, age, TNM stage) during the prognostic analysis. Among 11 hypoxia-related genes that we have identi ed, many of them were closely related to the prognosis of tumors in previous studies as well. In the previous animal experiments, it was con rmed that knockout of the CrT (SLC6A8) gene lead to the insu cient uptake of creatine while impairing the immunity of anti-tumor T cells. It was also proved that the supplementation of creatine signi cantly inhibited tumor growth, which also synergized with PD-1/PD-L1 blockers to inhibit the tumors [12]. Apolipoprotein E (ApoE), the main cause of hyperlipidemia, was also shown to improve the hypoxic microenvironment of tumors in hyperlipidemic models through exercise while also slowing down the formation of primary and secondary EO771 breast tumors [13].
Further, PTGIS and DPP4 have also been con rmed in the bioinformatics research. In the previous study of novel biomarkers related to liver hepatocellular carcinoma (LIHC), PTGIS was one of the 21 hub genes identi ed by mRNA expression network analyses, which might have been a potential therapeutic target for inhibiting LIHC cells [14] while the dipeptidyl peptidase-4 (DPP4) was shown to be the target gene of NF-κB [15].
Stanniocalcin-1 (STC1), PIM1, and NDRG1 are other target genes that are studied in emerging literature, where STC1 is thought to play a carcinogenic role in hypoxic gastric cancer by causing an imbalance of Bcl-2. Thus, it is regarded as a potential therapeutic target for gastric cancer [16]. Down-regulation of miR-124 and miR-144 in the hypoxic microenvironment can increase the risk of prostate cancer by weakening the inhibitory effect of PIM1, which eventually promotes hypoxia and radiation in the prostate cancer cells [17]. As a key gene in regulating lipid metabolism, NDRG1 may promote the aggressiveness of breast cancer, and the close relationship between NDRG1 and poor prognosis of breast cancer makes NDRG1 a promising therapeutic target for breast cancer [18].
In short, in this study, a variety of prognostic markers for thyroid cancer was identi ed based on the comprehensive analysis of expression pro les of differentially expressed hypoxia-related genes and corresponding clinical characteristics, By constructing a new risk scoring model, the prognosis of patients with thyroid cancer can be effectively evaluated. However, the limitation lies in that it is a retrospective study. More prospective studies should be conducted to verify the prognostic function of hypoxia-related genes. Multi-center data was required to con rm our ndings. In short, we hope to contribute to the prognostic analysis of thyroid cancer and the determination of treatment targets.

Conclusions
A variety of prognostic markers for thyroid cancer were identi ed in this study based on the comprehensive analysis of expression pro les of differentially expressed hypoxia-related genes and corresponding clinical characteristics. By constructing a new risk scoring model, the prognosis of patients with thyroid cancer could be effectively evaluated. However, the limitation is that it is a retrospective study. Thus, more prospective studies should be conducted to verify the prognostic function of hypoxiarelated genes. Also, multi-center data are required to con rm our ndings. In short, we hope to contribute to the prognostic analysis of thyroid cancer.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The author declares that he has no competing interests.      Univariate and multivariate analysis of differentially expressed genes. According to (A)univariate analysis, a total of 23 hypoxia-related genes were signi cantly associated with overall survival. Red represents up-regulated genes and green represents down-regulated genes. According to (B) multivariate analysis, a total of 11 hypoxia-related genes were signi cantly associated with overall survival.  Prognostic indicators based on hypoxia-related genes showed good predictive performance. (A)ROC analysis of overall survival after 1-year, 3 years,5-years for 11 gene markers in the testing set. (B) ROC analysis of overall survival after 1-year, 3 years,5-years for 11 gene markers in the validation set. Table 1 Univariate analyses of potential markers for thyroid cancer patients. Univariate analysis was used to determine the assessment of patient outcomes with different clinical variables.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Table1.docx