1.1 Materials
1.1.1 GEO database
The Gene Expression Omnibus (Gene Expression Omnibus, GEO, http://www.ncbLnlm.nih.gov/geo/)[8] is the largest public database for storing high-throughput molecular abundance data in the world, mainly gene expression data. Users can submit, store and retrieve data in multiple formats and use them for free. Using cervical cancer as the search keyword, restricting the study type to expression profiling by array, and restricting the species to homo sapiens, the data set of gene expression profiles related to radiotherapy for cervical cancer that have been reported worldwide were retrieved, and data from controlled studies that matched the sensitivity and resistance to radiotherapy for cervical cancer were selected.
Relevant data were retrieved from the GEO database until December 2015, and the selected data met the following criteria: (1) the dataset must be genome-wide expression mRNA microarray data for cervical cancer; (2) these data must be controlled studies on radiotherapy sensitivity versus radiotherapy resistance for squamous cervical cancer; (3) the dataset case (radiotherapy resistant)-control (radiotherapy sensitive) groups must include or exceed 3 (4) Clear information on sensitivity and resistance of each sample must be given. Data sets that meet the above criteria will be included in this study. Among them, GSE19526 which was the data for the in vitro study of radiotherapy sensitivity, due to only 1 sample was not included in the analysis, and GSE6213 was not included in the analysis because of the lack of samples corresponding to radiotherapy sensitivity, only 2 data sets finally met our criteria, GSE56303[9] and GSE56363[4] (see Table 1). 2 data sets Both were different grouped gene expression datasets for evaluating the recent outcome after radiotherapy in patients with intermediate to advanced cervical cancer, except that GSE56303 had a group of some pathological specimens with adenocarcinoma and GSE56363 were all squamous carcinomas.
1.1.2 proteinatlas data
The proteinatlas data, published in Science by Mathias Uhlen et al[10] in 2017, are transcriptome-wide data using a systematic level approach to analyze protein-coding genes associated with clinical outcomes in 17 major cancers, including cervical cancer as well. All data were published in its http://www.proteinatlas.org/humanproteome/pathology, giving data for our analysis of prognosis-related indicators.
1.1.3 STRING database
STRING (search tool for the retrival of interacting genes/proteins), at https://string-db.org/, is a database for finding known protein-protein interrelationships and predicting protein-protein interrelationships, which is currently the One of the largest information about protein interactions, contains experimental data, text mining results in pubmed, integrated data from other databases, and can also be used to predict results using bioinformatics methods, including analysis of genetic data based on microarray data. The study of protein interactions through this database helps to mine the core regulatory genes.
1.1.3 CCDB data
The Cervical Cancer Gene Database CCDB (Cervical Cancer Gene Database, at http://crdd.osdd.net/raghava/ccdb/index.php)[11], is a manually compiled, experimentally validated catalog containing genes involved in different stages of the cervical cancer formation process. Each entry contains information about genes and protein sequences, their location, structure, function, chromosomal position, accession number, gene, CDS size, etc. In addition, the database is richly cross-referenced with databases such as Unigene, HPRD, HGNC, Ensemble, and OMIM.CCDB also provides references to the relevant literature for the genes included in the database.
1.1.4 OncoLnc data
OncoLnc data, at http://www.oncolnc.org/, is a TCGA-based repurposing online analysis tool that allows survival analysis of mRNA, miRNA, or IncRNA, which also contains survival information related to cervical cancer. The relevant genes can be analyzed online at this URL to obtain whether the gene affects survival.
1.1.5 Immunohistochemistry
1.1.5.1 Experimental objects
Patients with intermediate to advanced cervical squamous carcinoma were selected, of which 87 specimens were preserved in the Department of Pathology of the Cancer Hospital of Guangxi Medical University from January 2011 to June 2015, including 44 cases in the radiotherapy-sensitive group and 43 cases in the radiotherapy-resistant group. Each pathological specimen was sectioned by the pathology teacher from paraffin block in 5 sheets of 5 microns each, attached to anti-dislodging slides, fixed in a constant temperature oven at 60°C for 2 hours, cooled and mounted in slide boxes for spare.
1.1.5.2 Main reagents
1) Mouse anti-human ASPH polyclonal antibody IgG, purchased from Bio-Swamp
2) General secondary antibody Beijing Lambert Biotechnology Co.
3) DAB staining solution Nanjing Jiancheng Biotechnology Co.
4) Other reagents such as: sodium citrate buffer, PBS, hydrogen peroxide, different concentrations of ethanol, different xylenes, hematoxylin, neutral gum, etc. were provided by the laboratory of the Department of Pathology, Southwest Medical University Hospital.
1.1.5. 3 Main experimental apparatus
General microscopes, Olympus microscopes, autoclaves, refrigerators, thermostats, spikers, etc. were provided by the laboratory of the Department of Pathology, Southwest Medical University Hospital.
1.2 Method
1.2.1Batch effect processing
In the sequencing, microarray, RNA-seq, DNA methylation, and proteomics, and other histological studies, as the samples are collected and processed in multiple batches as well as collected at different times, and also the fluorescent signals are converted into digital information, there may be some technical reasons such as analysis platform, laboratory samples, experimenters, programmers, or reagents that may produce differences during processing, called batch effects, and deviations in batch effects may lead to incorrect analysis of downstream results. Therefore, homogenization of batch effects is needed, including visualization of data, hierarchical clustering, principal component analysis, and analysis of variance.
In the actual sample processing, it is necessary to first consider whether batch correction is needed, and then how to correct and what tool to choose for correction. BatchQC[12] is an R package for the Shiny App, a tool that deals with these issues by providing interactive diagnostics, visualization, and statistical analysis. BatchQC can also apply some existing batch correction methods so that the user can compare the advantages and disadvantages between the methods in an interactive way and ultimately select the results of the batch correction. The output of BatchQC is organized into multiple tabs, which will eventually generate a report that contains the results of summary, differential expression analysis, significance of results, median correlation, heat map, expression correlation between samples, and PCA analysis, so the tool is a good choice.
1.2.2 Screening for differentially expressed genes
A large amount of gene expression data obtained by microarray, gene microarray and other technologies, but these data are characterized by small sample size, high dimensionality, strong correlation, etc. There may be significantly different genes for small changes, and a statistical method needs to be selected to find out these meaningful genes with significant differences. Commonly used methods include ANOVA t-test, which usually doubles or halves the expression level as a criterion to determine whether there is an expression difference.
Differentially expressed gene screening using the limma package. limma is an R/Bioconductor package that can cover every major step and function of gene analysis, from data import, preprocessing, normalization to differential expression analysis and gene characterization. We first transformed the data set originating from GEO into an expression matrix of genes, then preprocessed, normalized with the limma package, specified the study and control groups, and finally calculated differentially expressed genes with the lmFit function. In this study, we took P≤0.05 as the threshold value and combined with the fold change (FC) analysis method, when fold change (FC) ≥2, the gene was considered up-regulated, and if FC≤0.5, the gene was considered down-regulated.
Every major step and function of gene analysis can be covered, from data import, preprocessing, normalization to differential expression analysis and gene characterization, etc., at http://www. bioconductor.org/ web page. Firstly, the expression profile dataset downloaded from GEO was alerted to pre-processing and standardization with the algorithm of limma package to extract differentially expressed genes from the expression profile data, in which the study and control groups were set, and the thresholds of p-value and logFC were set to extract differentially expressed genes. p<0.05 was used as the criterion for screening differentially expressed genes. Combined with the ploidy analysis (fold change,FC), when logFC<0, gene expression was down-regulated; when logFC>0, it indicated up-regulation.
1.2.3 Gene function annotation
DAVID software (Database for Annotation, Visualization and Integrated Discovery, https://david.ncifcrf.gov/)[5, 7] is a bioinformatics database containing many integrated biological data and analytical tools DAVID provides a large scale list of genes or proteins that can be used to obtain systematic and comprehensive annotated information on biological functions, mainly for gene probe conversion, gene function classification, gene enrichment analysis and functional annotation clustering.
1.2.4 Immunohistochemistry
1.2.4.1 Experimental methods and procedures
1) Dewaxing and hydration of slices
The prepared paraffin sections were placed in xylene I for 10 min, xylene II for 10 min, 100% ethanol for 5 min, 100% ethanol for 54 min, 95% ethanol for 5 min, 85% ethanol for 2 min, 70% ethanol for 2 min, 50% ethanol for 2 min, followed by 2 immersion washes in distilled water for 5 min each, followed by 5 min in 0.01 PBS solution in turn.
2) Antigen repair
Sections were immersed in 0.01 M citrate buffer (PH 6.0), heated to boiling in an autoclave and then cooled naturally for 35 min, followed by PBS rinsing 3 times for 2 min each.
3) Closure of endogenous peroxidase
Add 100 μL of endogenous peroxidase blocker, incubate for 10 min at room temperature, and rinse 3 times with PBS for 2 min each time.
4) Addition of primary antibody
After filter paper is blotted dry around the closure solution, add primary antibody (mouse anti-human ASPH polyclonal antibody, PBS as negative control) at a concentration of 1:200, and then incubate in a 37℃ thermostat for 1 hour, and then rinse the PBS solution 3 times for 3 minutes each time.
5) Add secondary antibody
After absorbing the excess liquid on the slide with filter paper, add universal secondary antibody on the slide sequentially, incubate for 30 minutes at 27°C, and then rinse 3 times with PBS solution for 3 minutes each time.
6) DAB color development
Prepare DAB chromogenic solution, add 100 μL of freshly prepared DAB chromogenic solution to each section and incubate for 5 min at room temperature.
7) Re-staining
After rinsing the sections under running water, 100 μL of hematoxylin solution was added for restaining, and distilled water was rinsed for 5 minutes.
8) Dehydration, transparency and sealing
Place the rinsed sections into the ethanol concentration of 70%, 80%, 90%, 95%, 100%, 100% for 2 minutes each, then immerse the sections into xylene I for 2 minutes and xylene II for 2 minutes, dry the xylene around the sections, use neutral resin drops next to the tissue then cover with a coverslip, seal the number and dry naturally at room temperature.
1.2.4.2 Determination of immunohistochemical results
Positive protein expression persisted in the cytoplasm or cell membrane of cervical tissues, and brownish-yellow granules were considered positive and no staining was considered negative, and were counted using the semi-quantitative integration method. Five areas of each section were randomly selected under a 400× microscopic field of view, and the average percentage of positive cells in the field of view of that section was calculated. The number of positive cells was scored 0%, 6%-30% was scored 1, 31%-50% was scored 2, 51%-75% was scored 3, and >75% was scored 4. Positive intensity is not colored 0 points, light yellow counts as 1 point, yellow counts as 2 points, and reddish-brown counts as 3 points. The percentage of positive cell count and the positive intensity score were multiplied to give a total score of 0 as -, 1 to 4 as +, 5 to 8 as ++, and 9 to 12 as ++++, with + considered as low expression and +++ to ++++ as high expression. For all 87 specimens, 2 pathologists were asked to score them individually.
1.2.4.3 Statistical analysis
All data were analyzed using R language 3.4.3 software (https://CRAN.R-project.org/package=survival.), survival curves and survival rates were plotted using the Kaplan-Meier method, differences in survival rates between groups were tested using the log-rank test (log-rank method), and P-values were obtained using the chi-square test. Univariate and multifactorial analyses were performed using the ratio COX regression model with 95% confidence intervals (95% CI) for the risk ratio area, and all data were analyzed using a two-sided test, with P<0.05 indicating statistical differences.