Data preparation
The gene expression profiles and clinical information of osteosarcoma patients (TARGET-OS) was downloaded as training set from the TCGA database (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga) (7, 8). Clinical information including survival time, survival state, gender, age, disease at diagnosis, primary tumor site, specific tumor region and definitive surgery. The autophagy information used for selecting autophagy relative genes (ARGs) was downloaded from Human Autophagy Database (HADb) (http://www.autophagy.lu/index.html) (9). In order to verify the accuracy of risk model and nomogram prediction, two datasets (GSE16091, GSE39058) were downloaded as validation set from the GEO database (https://www.ncbi.nlm.nih.gov/geo). Since the two datasets come from different platforms, we first use the normalizeBetweenArrays function to correct the datasets, then merge the two datasets and use "ComBat" package of R to perform batch corrections between the datasets.
Construction of autophagy prediction of risk score model
Univariate Cox regression analysis was conducted for OS related ARGs, and the clinical factor was taken into consideration. Then, a multivariate Cox regression analysis model along with the LASSO method for variable selection and shrinkage was applied to build a autophagy prediction of risk model by using the GLMNET package (https://CRAN.R-project.org/package=glmnet) (5). The penalty regularization parameter k was determined via the cross-validation routine cv. glmnet before running the main algorithm with an n-fold value equal to 10. The k value was finalized by using lambda. min, which is the value of lambda giving minimum mean cross-validated error (10, 11). Genes with corresponding efficiencies were screened from the TARGET-OS training set and used to construct a risk score model for autophagy prediction. According to the model, the risk score of each person was calculated. The algorithm of the risk score was as follows: ∑nx=1(coefx×Exprx),in which coefx was the regression coefficient of key ARGs in patient x obtained by multivariate Cox proportional risk regression analysis, and Exprx represents the expression level of one of the key ARGs in patient x (12). Patients were assigned to high- and low- risk groups according to the median of risk scores. Then, we used the "Survival" package of R to draw a Kaplan–Meier survival curve. In order to avoid the mutual influence between risk factors, we carried out principal component analysis (PCA) and performed dimensionality reduction. According to several key ARGs and risk scores in the risk model, we drew a nomogram with R's "Reglot" package to predict the 1-, 3- and 5-year survival rates of OS patients. The scale on the nomogram line represents the range of values for each variable, and the total score calculated for each variable could be used to predict survival (13).
Validation of the risk score model
We used the receiver operating characteristic curve (ROC) drawn by " Survival ROC" package and the calibration curve obtained by "RMS" package to evaluate the accuracy of their prediction of 1-, 3- and 5-year survival rates, and used ROC curve to verify each grouping variable (14). In addition, diagnostic value of key ARGs were validated by a dataset from GEO databases, and the diagnostic value of each key ARGs was evaluated by AUC.
Gene enrichment analysis (GO and KEGG)
We investigate molecular function (MF), biological process (BP) and cellular components (CC) of the ARGs in GO database. And selected ARGs were utilized in the functional pathway analysis of KEGG. R software and ClusterProfiler package were used to conduct the results of functional enrichment analysis. The correlation between ARGs was analyzed by Pearson's correlation coefficient using the corrplot package.
Gene set enrichment analysis (GSEA)
GSEA analysis was performed to identify the potential function of selected hub genes. Genome‑wide expression profile datasets and corresponding grouping files determined by the expression of ARGs were uploaded to GSEA4.0.3 software (15) for enrichment analysis with database C2 and C5 of the Molecular Signatures Database (MSigDB) (16). A set of genes with |normal enrichment score| (|NES|)>1, false discovery rate (FDR) <0.25 and P<0.05 was considered to be statistically significant.
Immune infiltrating
The proportion of the 29 immune signatures was quantified (16 immune cell and 13 immune-associated function signatures) in each OS sample. The ESTIMATE algorithm was applied to evaluate the immune cell infiltration level (immune score) and the stromal content (stromal score) for each OS sample. The ConsensuClusterPlus R package was used to perform consensus clustering of each OS sample based on the 29 immune signature.
Prediction of response to targeted therapy drugs
Half maximal inhibitory concentration (IC50) of common targeted therapy drugs was calculated to evaluate clinical responses to OS treatment using pRRophetic [13] and ggplot2 packages in R. The relationship between IC50s and high- and low-risk groups was represented by box plots.
Immunohistochemistry
We obtained 5 OS and 5 paracancerous samples from Guangxi Medical University First Affiliated Hospital for immunohistochemistry. All patients were diagnosed as OS by pathology. The clinicopathological data, such as gender and age were collected.
The tissue slices were placed in xylene for 20min, then replaced with fresh xylene and repeated once. The dewaxed tissue slice was soaked in 100% ethanol for 5min twice, 95% ethanol, 80% ethanol and distilled water for 5min respectively. The alkaline antigen repair solution (Tri-EDTA, pH=9) was heated to the boil in a pressure cooker, the tissue slice was placed in, and the time was timed for 2 min, then cooled to room temperature naturally (17). Incubate with 3% H2O2 at room temperature for 10 min in dark, and then block with normal sheep serum solution at room temperature for 30min. The primary antibody RP215 was added at 4℃ overnight and HRP-labeled secondary antibody was treated at room temperature for 30min. DAB color, hematoxylin redyeing (17).
Ten high-power (400×) visual fields were randomly selected, and two researchers independently read the images. The nucleus of hematoxylin stained is blue, and the positive expression of DAB is brownish yellow.
Statistical analysis.
All data were statistically analyzed using SPSS22.0(IBM) and R 3.6.2 (https://www.r project.org/). Risk ratio (HR) and 95% confidence interval (CI) were used to represent the relative risk between each variable and the prognosis of OS patients. All results P <0.05 were considered statistically significant (6).