Identification of the collagen family as prognostic biomarkers in papillary thyroid carcinoma

The aim of this study was to construct a collagen-related prognostic model for thyroid cancer and to investigate prognostic value of collagen family genes for thyroid cancer. A LASSO Cox regression model for thyroid cancer was developed based on the expression profiles of collagen-related genes. Kaplan–Meier survival analysis was performed for high and low risk groups. The ROC method was used to assess its predictive performance. Predictive independence was verified by multivariate Cox regression analysis. The relationship between this feature and immune cell infiltration was analyzed by tumor microenvironment. COL18A1 was validated by immunohistochemistry and RT-PCR in thyroid cancer tissues. The effect of COL18A1 on cell proliferation, migration and invasion ability of tumor cells were further valuated by CCK-8 assay and transwell assay. The effect of COL18A1 on the immune escape ability of tumor cells was further valuated by cytotoxicity assays. A model including 4 collagen family genes was developed to predict thyroid cancer prognosis. Patients with high-risk score had a poorer prognosis than those with low-risk scores for 1-, 2-, 3-, and 5- year survival. The model independently predicted prognosis after adjusting for other prognostic factors. A nomogram combining risk score and age was constructed with high sensitivity and specificity. This feature was significantly associated with immune cell infiltration. COL18A1 was aberrantly over-expressed in thyroid cancer compared with control tissues and significantly increased proliferative capacity, migration capacity, invasion capacity, and immune escape ability of tumor cells. Our findings establish a signature associated with collagen family genes that can be a promising tool to predict the prognosis of thyroid cancer. High COL18A1 expression significantly correlates with the poor prognosis of patients and enhances the immune escape ability of tumor cells.


Introduction
Thyroid cancer is one of the malignancies with the fastest growth rate in recent years and is the most common malignant tumor of the endocrine system. Of these, papillary thyroid carcinoma is the most common type of pathology [1,2]. Most papillary thyroid carcinomas have a good prognosis, with a cure rate of over 90% [3]. However, some patients will have a poor prognosis due to the occurrence of distant metastases and recurrence. There is a need to develop new predictive models that will in turn improve the diagnosis rate and prognosis of papillary thyroid cancer (PTC).
The extracellular matrix (ECM) forms the scaffolding of the tumor microenvironment and may regulate cancer behavior [4][5][6]. The structure and composition of the ECM is remodeled during cancer development, and changes in the expression pattern of ECM macromolecules can alter the motility, invasion and metastasis of cancer cells [7,8]. In addition, it is recently reported that ECM-cell interactions can mediate cellular functional properties and cell phenotype [9][10][11], which regulate cellular signaling to promote cancer cell proliferation, migration, apoptosis, cell survival, angiogenesis and epithelial mesenchymal transition [8,10]. As a major component of the extracellular matrix, collagen accounts for its primary function. Alterations in collagen in the tumor microenvironment have been found to correlate with cancer spread and prognosis [12][13][14][15]. Increased collagen density around cancer cells directs local invasion and metastasis [12]. Collagen alignment and orientation have been shown to be indicators of tumor metastasis in breast cancer [13,14], glioblastoma [16], and prostate cancer [17]. However, relevant study about the role of collagen in PTC has rarely been reported [18]. The aim of this study was to investigate the impact of collagen family genes on the prognosis of PTC and to construct a predictive model for the prognosis of PTC patients. We obtained clinical data and mRNA expression levels of PTC patients from the TCGA and GEO databases. Differentially expressed genes (DGEs) associated with collagen proteins were used to construct a prognostic model. The prognostic model was then validated utilizing TCGA and ICGC database, providing a theoretical basis for clinical application. Besides, we examined the expression of COL18A1 in thyroid cancer and its effect on thyroid cancer by cytological assays. Due to the critical role of collagen family in cancer progression, they may be promising diagnostic markers and pharmacological targets.
Using the "venn" package in R software, the obtained collagen-related DEGs were intersected with the prognosisrelated collagen family genes to obtain the collagen-related DEGs associated with PTC prognosis, and venn plot was drawn. The "survival" package in R software was used to draw forest plot of the collagen-related DEGs associated with PTC prognosis. The "heatmap" package in R software was used to draw heatmap of the collagen-related DEGs associated with PTC prognosis. The protein interactions network of prognosis-associated DEGs was constructed by string online network tool, as well as correlation network, using the "igraph" package and "reshape2" package in R software. Construction and validation of the prognostic model The least absolute contraction and selection operator (LASSO) penalized Cox regression analysis was performed to select optimal prognostic genes related to overall survival (OS). Prognostic risk scores were determined by a linear combination of regression coefficients (β) and gene expression levels in a LASSO Cox regression model. We calculated the risk score using the following formula: risk score = sum (the normalized expression level regarding each gene×its regression coefficient). PTC patients were divided into high and low risk groups according to the median of risk score. Kaplan-Meier (K-M) survival curves and timedependent receptor operating characteristic (ROC) curves were used to assess the predictive ability of prognostic characteristics on OS. PCA and t-SNE analyses were performed using the "Rtsne" and "ggplot" packages in R software to visualize 4 collagen family genes by dimensionality reduction.

Construction and validation of the prognostic nomogram
Univariate and multivariate Cox regression analyses were used to determine whether prognostic models could predict OS in patients with PTC independently of other traditional clinical characteristics (including age, sex and TNM stage). Hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated for each variable. P < 0.05 was considered a statistically significant difference. Based on the results of the multivariate Cox regression analysis, a prognostic nomogram was constructed to predict the OS of PTC patients at 1, 2, and 5 years across the TCGA dataset. The predicted OS of nomogram against observed survival rates was plotted using the calibration curve.

Functional enrichment analysis
The "limma" package in R soft was used to identify collagen-related DEGs between the high-risk and low-risk groups using | Log2FC | > 1 and FDR < 0.05 as criteria. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using the "clusterProfiler".

Immune microenvironment and drug sensitivity analysis
The proportion of 20 tumor-infiltrating immune cells in patients with thyroid cancer was calculated based on the CIBERSORT score. ESTIMATE immune, stromal score for analysis of immune and stromal cell infiltration levels and tumor purity. In addition, the relationship between the four collagen-related DGEs and drugs was also explored. Drug data were obtained from the cellminer database (https://discover.nci.nih.gov/cellminer/loadDownload.do).
Pearson's correlation test was used in R soft to explore the correlation between the expression of collagen-related genes and drugs.

Immunohistochemistry
Human PTC tissues were collected from patients who underwent thyroidectomy at The Second Hospital of Tianjin Medical University. The study was approved by the Ethics Committee of The Second Hospital of Tianjin Medical University and written informed consent was obtained from these patients. The clinicopathological data of the patients are listed in Table 2, and a total of 119 patients were included in the study. 119 tumor tissue samples and 15 normal paracancerous tissue samples were selected for immunohistochemical staining. The tissues were fixed in 4% paraformaldehyde, embedded using paraffin, sectioned into 5 μm thick sections and dewaxed. After antigen repair and endogenous peroxidase disruption, sections were incubated with Anti-COL18A1(Abcam, ab275746) antibody overnight, followed by incubation with secondary antibody. Sections were treated with horseradish peroxidase and 3,3'-diaminobenzidine and then stained with hematoxylin and dehydrated. Light microscopy was used to capture images and immunohistochemical scoring using the 12-point scale [19].

Western blot
Equal amounts of protein samples (30 μg) were extracted with RIPA lysis buffer (Sigma, USA) and uploaded into 10% sodium dodecyl sulfate polyacrylamide gels for protein separation. The proteins were transferred to polyvinylidene difluoride membranes (Millipore, USA) and then sealing the membranes with 5% skimmed milk, the membranes were incubated with primary anti-COL18A1 (1:1000, Abcam, ab275746) and anti-rabbit secondary antibody (1:5000, Thermo Fisher). Immune binding was detected using an ECL detection system (Shanghai Peiqing Technology Co., Ltd., China).

CCK-8 assay
Cell viability was evaluated by Cell Counting Kit-8 (CCK-8) assay in BCPAP and TPC-1 cells. COL18A1 downregulated and control cells (1000 cells/well) were respectively seed in 96-well plates. After indicated time, 10 μL CCK-8 solution was added to each well, and cultured at 37˚C for 2 h. The OD value of each well was measured at 450 nm. All values are repeated at least three times and presented as the mean ± standard deviation.

Transwell assay
The BCPAP and TPC-1 cell line were chosen for the cell migration and invasion assay. Cell invasion was assessed by using the Matrigel invasion system. Transwell chambers (8 μm pore size, polycarbonate filters, 6.5 mm diameter; Corning Costar) in 24-well plates were coated with 50 μL of polymerized Matrigel (BD Biosciences, Franklin Lakes, NJ, USA). To assess effect of COL18A1 on PTC cell lines migration and invasion ability, BCPAP and TPC-1 cells were placed into the upper chamber with 200 μL serum-free medium and loaded on the lower chamber with 500 μL complete medium. After incubation at 37˚C for 48 h, the invaded cells in the lower chamber were obtained by staining and counted. All values are repeated at least three times and presented as the mean ± standard deviation.

Cytotoxicity assay
We isolated peripheral blood mononuclear cells (PBMCs) using the Ficoll-Paque centrifugation method from normal person. The isolated PBMCs were activated by IL-2 and cultured for 3-5 days, and then co-cultured with CFSE(5,6carboxyfluorescein diacetate, succinimidyl ester) -labeled tumor cells at a ratio of 1:5 for 10 h. Acquisition is performed using a FACSAria instrument with FACSDiVa software (BD Biosciences, San Jose, CA, USA).

Statistical analysis
SPSS version 22.0 (IBM Corporation, Armonk, NY) and GraphPad Prism version 8.0 (La Jolla, CA, USA) were used for statistical analyses. Chi-square test was performed for comparison of patients' clinicopathological characteristics. Shapiro-Wilk test was performed to test data normality before using student's t test for analysis. All data which was involved in student's t test accorded with the normal distribution pattern. The 2-tailed Student t test was used for the two groups in the in vitro data, and each experiment was repeated three times. Cox regression was performed to analyze the relationship between the IHC markers and patients' prognosis. Kaplan-Meier analysis was performed to investigate the relationship between the IHC markers and patients' prognosis. FlowJo version 10.0.7 (Tree Star, Inc., Ashland, OR, USA) was used to analyze cytotoxicity assay data. P-values indicating significant differences were shown as follows: ***P < 0.001, **P < 0.01, *P < 0.05.

Results
Identification of collagen-related DEGs with predictive value We identified 33 and 32 collagen-related differentially expressed genes between tumor and adjacent normal tissues from the TCGA database and GEO database, respectively, of which 6 collagen-related DEGs were significantly associated with OS ( Fig. 1A, P < 0.05). Forest plots showed the correlation between gene expression and OS (Fig. 1B). Heat maps showed that tumors and adjacent normal tissues could be distinguished by DEGs (Fig. 1C, from TCGA database; Fig. 1D, from GEO database). Protein-protein interaction network showed DEGs interactions at the protein level (Fig. 1E), Fig. 1F showed the association between these genes.

Construction of a prognostic model based on 6 collagen-related DGEs associated with PTC prognosis
Based on collagen-family DEGs associated with PTC prognosis, we developed a prognostic model for PTC by LASSO Cox regression analysis. Prognostic index (PI) = (0.031 * expression level of COL5A1) + (0.954 * expression level of COL18A1) + (−0.656 * expression level of COL13A1) + (0.203 * expression level of COL11A1). We used the median risk score as the standard for classifying patients into group with high risk and group with low risk ( Fig. 2A). Patients in the high-risk group exhibited higher mortality rates relative to the low-risk group (Fig. 2B). Kaplan-Meier curves were plotted and OS was found to be lower for patients divided into the high-risk group than the low-risk group (P = 1.752e-03) (Fig. 2C). Figure 2D showed the OS predictive power of the risk score, with the area under the curve (AUC) of 0.901, 0.917, 0.812, and 0.788 at 1 year, 2 years, 3 years, and 5 years, respectively. PCA and t-SNE analysis showed that PTC patients were distributed between the two groups in two directions. The results of PCA and t-SNE analysis showed that this prognostic model could clearly distinguish patients in the highrisk group from those in the low-risk group (Fig. 2E, F). We used the same approach to analyze the data in the ICGC database to validate the findings and reached a consistent conclusion (Fig. 3A-F).

Construction and validation of the nomogram in the PTC cohort
Cox regression analysis was used to confirm whether risk scores independently predicted OS. Univariate Cox regression analysis found a significant association between risk scores and OS (HR = 3.117, 95% CI = 1.778-5.466, P < 0.001) (Fig. 4A). Consistently, multivariate Cox regression analysis found a significant association between risk score and OS (HR = 3.871, 95% CI = 1.865-8.035, P < 0.001) (Fig. 4B). Based on the 2 independent predictors, we constructed a prognostic nomogram to quantify the predicted individual survival probabilities of PTC patients at 1-, 2-, and 5-years (Fig. 4C). There was substantial agreement between the OS rates for 1-, 2-, and 5-years predicted by the nomogram and the actual observations (Fig. 4D). We used time-dependent ROC analysis for internal validation, and the results showed that the area under the curve (AUC) was 0.987, 0.993, and 0.916 for 1-, 2-, and 5-years, respectively (Fig. 4E), The nomogram further improved prediction accuracy over the use of a single gene model (Fig. 2D).

Functional analysis of collagen family DEGs in the TCGA
We performed GO enrichment and KEGG pathway analysis to identify the biological functions and pathways associated with risk scores. Figure 5A showed the top 10 BP terms, CC terms and MF terms. The main enriched GO terms were extracellular matrix organization, extracellular structure organization, collagen fibril organization, collagen−containing and extracellular matrix structural constituent. Figure 5B showed the first 18 KEGG pathways, mainly including Protein digestion and absorption, ECM − receptor interaction, PI3K − Akt signaling pathway and Focal adhesion. These functions play a key role in tumorigenesis and development. These results suggested that collagenassociated DEGs were closely associated with the development and biological features of cancer.
Correlation of collagen-related DGEs with tumor immune microenvironment and drug sensitivity   scores. ESTIMATE scores, a combination of stromal and immune scores, indicated that high expression levels of COL5A1, and COL11A1 were significantly associated with lower tumor purity (Fig. 6B). In addition, the relationship between four collagen family genes and drugs were explored (Fig. 6C). The top 16 most relevant ones were visualized.

Validation of the collagen -related genes in thyroid cancer cells
Among the four collagen-related DGEs, we selected COL18A1 with the highest risk on OS evaluation in all four genes to validate its function in PTC. First, we verified the expression of COL18A1 in PTC tissues and adjacent normal thyroid tissues. Immunohistochemical results showed that COL18A1 was differentially elevated in PTC tissues compared to normal thyroid tissues (Fig. 7A, B). High expression of COL18A1 were significantly related with larger tumor size, worse T stage and TNM stage (Table 3). In addition, we examined the correlation between the COL18A1 immunohistochemistry score and OS in patients with PTC, suggesting that patients with higher immunohistochemistry scores have a worse prognosis. (Fig. 7C). The results of the Cox proportional risk model analysis were shown in Table 4. In the multifactorial analysis, lymph node metastasis (P = 0.019, HR = 0.118), TNM stage (P = 0.029, HR = 10.862), and COL18A1 expression (P = 0.012, HR = 5.156) were the factors that significantly correlated with poor progression free survival of PTC patients. In addition, lymph node metastasis (P = 0.010, HR = 0.136), TNM stage (P < 0.001, HR = 33.508), and COL18A1 expression (P = 0.004, HR = 5.092) were the factors that significantly correlated with poor overall survival of PTC patients. These results indicated that lymph node metastasis, TNM stage and COL18A1 expression were independent risk factors for overall survival and progression free survival of PTC patients. We also examined the mRNA expression levels of COL18A1 in different thyroid cancer cell lines by RT-qPCR (Fig. 7D). The results showed that COL18A1 was significantly highly expressed in thyroid cancer tissues. In addition, the expression level of COL18A1 was significantly reduced by siCOL18A1 in TPC-1 and BCPAP cells (Fig. 7E). We also found that the killing ability of PBMCs was significantly enhanced after knockdown of COL18A1 in TPC-1 and BCPAP cells. These data suggested that COL18A1 might be involved in the progression of thyroid cancer (Fig. 7F). Then, the CCK-8 assay indicated that COL18A1 knockdown repressed the proliferation of thyroid cancer cells (Fig. 7G). Finally, we found the migration and invasion ability of thyroid cancer cell was significantly reduced after COL18A1 knockdown (Fig. 7H).

Discussion
Collagen family genes play an important role in the formation of the extracellular matrix. A growing number of studies have identified collagen family genes as playing an important role in the development of various cancers.
Recent study showed that the distribution and structural characterization of collagen within the PTC nodule capsules could be used to differentiate PTC from follicular adenomas nodules [18]. However, it is not clear whether collagen family genes expression level affect the clinical prognosis of patients with PTC. Studying the molecular mechanisms and signaling pathways of collagen family genes can provide a theoretical basis for the development of new therapeutic targets for thyroid cancer. In this study, we constructed a 4 genes model associated with collagen family genes, which could be a promising tool to predict the prognosis of thyroid cancer.
We constructed a model of 4 genes associated with collagen family genes to predict thyroid prognosis by LASSO Cox regression analysis. For patients with thyroid cancer, the high-risk score suggested a lower prognosis than the low-risk score. ROC curve confirmed that the model had accurate and sensitive predictive efficacy. Multivariate COX regression analysis showed that the signature  B Correlation of target gene expression with stromal score, immune score and ESTIMATE score, respectively. Spearman's correlation analysis was used to calculate the association. C Correlation between gene expression levels and drugs. The 16 drugs most related to the COL18A1, COL5A1, COL11A1, and COL13A1 were visualized independently predicted survival in thyroid cancer. Of these, COL5A1, COL18A1, COL13A1, COL11A1 were risk factors for thyroid cancer prognosis. Collagen type V (COL5) is classified as a regulatory fibril-forming collagen and consists of a combination of three different chains, namely COL5A1, COL5A2, and COL5A3 [20,21]. COL5 H Migration/invasion assay in BCPAP and TPC-1 cells after pre-treatment with siCOL18A1 transfection is found in most connective tissue stroma and has been found to play a functional role in a number of cancers, including breast, colon and pancreatic ductal adenocarcinoma [22][23][24]. Previous study found COL5A1 might be a new prognostic factor for patients with lung adenocarcinoma [25]. In addition, high levels of COL5A1 accelerate the growth and progression in renal cell carcinoma and breast cancer [26,27]. COL18A1, also known as endothelial inhibitor, inhibits the migration and proliferation of endothelial cells [28,29] and induces apoptosis [30]. It has been shown to inhibit the recruitment of peripheral cells into capillaries in vivo, thereby reducing the number of mature blood vessels [31]. It also inhibits growth factor-induced entry of endothelial progenitor cells into the circulatory system, thereby inhibiting angiogenesis [32]. Interestingly, our study on the role of COL18A1 in thyroid cancer has come to a different conclusion. We found that COL18A1 is activated in thyroid cancer, which was further confirmed by IHC assay and cell culture models. Collagen type XIII (COL13) is a transmembrane protein that is located at the cell-cell and cell-ECM junctions [33,34]. Previous studies have found that downregulation of COL13A1 resulted in the observation of an E-calmodulin to N-calmodulin switch, suggesting an association between COL13A1 and the promotion of extracellular matrix. COL13A1 may be an activator of the PI3K/AKT pathway, which plays a key role in regulating a variety of cellular functions including metabolism, growth, proliferation, survival, transcription and protein synthesis [35]. COL11A1 is expressed and secreted primarily by a subset of cancer-associated fibroblasts and regulates tumor-stromal interactions and the mechanical  properties of the extracellular matrix. COL11A1 also promotes cancer cell migration, metastasis and treatment resistance by activating pro-survival pathways and regulating tumor metabolic phenotypes [36]. High levels of COL11A1 are often associated with an aggressive tumor phenotype and poor prognosis in a variety of solid tumors including ovarian, breast, pancreatic and colorectal cancers [37,38]. However, the function of these 4 collagen family genes in thyroid cancer remains to be clarified. Collagen I, code by COL1A1 and COL1A2, is the main component of stroma matrix. Collagen I processing and remodeling significantly affect tumor ECM topography to promote tumor progression and metastasis, and have the potential to contribute to predicting patient outcomes in various cancers, including lung adenocarcinoma, breast cancer, and pancreatic cancer [39,40]. Our study also included COL1A1 and COL1A2 (Table 1) to analyze the potential clinical value of both genes. Although Cox analysis suggested that the HR value of COL1A1 (HR = 1.165, 95%CI: 0.908-1.496, P = 0.230) and COL1A2 (HR = 1.267,95% CI: 0.942-1.703, P = 0.118) was greater than 1, the correlation between the mRNA expression levels and the prognosis of TC patients was not significant. Future studies should include the structural parameters and accumulation level to further analyze the role of Collagen I in thyroid cancer progression. Multivariate Cox regression analysis has shown that age is an independent risk factor for thyroid cancer. Consistently, age has been commonly applied as a risk factor for prognostic stratification of thyroid cancer [41]. To extend the clinical application of the model, we combined it with age to construct a nomogram that could be easily calculated for the expected survival of individual patients. Calibration curves confirmed that the 1-, 2-, and 5-year survival times predicted by the nomogram were close to the actual survival times. This suggests that the nomogram has the potential to be an important tool for predicting the prognosis of thyroid cancer in clinical practice.
Functional enrichment analysis of collagen-related differential genes revealed that extracellular matrix organization, extracellular structure organization, collagen fibril organization, collagen−containing and extracellular matrix structural constituent, as well as protein ECM-receptor interactions, PI3K-Akt signaling pathway and focal adhesion were all enriched. This is consistent with previous findings. Previous studies have found that collagen plays a regulatory role in tumor growth and invasion. Collagen, the most abundant component of the ECM, is excessively deposited in the extracellular matrix, forming the tumor microenvironment and regulating the polarity and migratory capacity of cancer cells [42]. The massive deposition of collagen enhances tumor cell proliferation and immune cell infiltration [43][44][45].
Many patients with thyroid cancer not associated with thyroiditis are found to have immune infiltration postoperatively, suggesting that alterations in the immune microenvironment are associated with the progression of thyroid cancer [46]. Our findings suggested that this model was significantly associated with aDC cell as well as NK cell infiltration. Immunotherapy has been a hot topic of research in tumor-targeted therapy in recent years, but its role in thyroid cancer is unclear. The relationship between collagen gene-related models and the tumor microenvironment needs further analysis.
In addition, we further analyzed the correlation between COL5A1, COL18A1, COL13A1, and COL11A1 gene expression and drugs. Most of the drugs related with collagen family genes expression are involved in PI3K-AKT pathway signaling inhibition, indicating a relationship between collagen family DEGs expression and AKT pathway activation, which also exist in KEGG analysis. It provides a theoretical basis for the combined application of drug therapy and collagen-targeted therapy. While we report the potential correlation between collagen family and AKT pathway, we cannot directly infer causality from above results. Future studies using cell culture or animal models would be useful in evaluating the impact of collagen family on mentioned drug's efficacy and the directionality of associations between collagen family expression and AKT pathway activation.
In many types of cancer, stromal fibroblasts and/or myofibroblasts are the main cell types responsible for increased collagen production in solid tumors. Cancer cells produce factors that induce stromal cell proliferation and/or deposition of extracellular matrix. It has also been found that tumor cells, especially undifferentiated thyroid cancer cells, also secrete collagen and are associated with an aggressive biological behavior in recent years. In the univariate Cox regression analysis between prognostic collagen-related DEGs expression and OS, COL18A1 high expression significantly correlated with poor patient outcome and have the highest HR value. In the following LASSO regression analysis, COL18A1 has the largest weight as one of the candidate genes constituting the prediction model, and until now, the role and function of COL18A1 in thyroid cancer remains unclear, so we choose it for research. Our study is the first to analyze the expression of the COL18A1 gene and to demonstrate that abnormally high expression of this gene promotes tumor progression in PTC cells. We confirmed that COL18A1 was activated in thyroid cancer and predicted poor prognosis by immunohistochemistry assay. In addition, we determined the higher mRNA expression levels of COL18A1 in different thyroid cancer cell lines than thyroid cell line by RT-qPCR. Reducing the expression level of COL18A1 in PTC cell line significantly suppressed the proliferation ability, migration ability, invasion ability and immune escape ability of PTC cells. The results were consistent with their mRNA levels in TCGA sequencing data, supporting our bioinformatic analysis. This study may provide new clues to explore the biological role and clinical significance of these collagen family genes. The results suggest thated high expression of COL18A1 indicated poor prognosis and increases immune escape ability in thyroid cancer.

Conclusion
In conclusion, we constructed a 4 collagen family genes model and showed good predictive performance. The model was significantly correlated with the tumor immune microenvironment. The model combining the collagen risk score and patient age could offer new possibilities for individualized treatment of thyroid cancer patients. Collagen family genes, especially COL18A1, significantly predict patient prognosis and may be promising therapeutic targets for PTC treatment.

Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.