Subjects
This was a cross-sectional study conducted among subjects comprised 111 healthy controls and 287 patients who were diagnosed with lung adenocarcinoma at the Department of Respiratory Medicine of the Second Affiliated Hospital of Harbin Medical University (Harbin, Heilongjiang, China) between 2013 and 2015. The patients were diagnosed by surgery and pathological assessment. The healthy control group was randomly selected from 111 age-matched healthy subjects without any history of familial or personal autoimmune diseases or malignancies, who received annual physical examinations at the same hospital. According to lung cancer classification developed by WHO in 2009, the histological classification and stage were assessed. These samples of patients were collected prior to treatment since different treatments may affect clinical indicator measures of patients. All patients and controls were recruited from the northeastern Chinese Han population. Table 1 lists the clinicopathological features of the patients and the controls. The written informed consent for blood collection and subsequent analysis was provided by each participant. The ethics committee of the same hospital approved the study.
Six clinical indicators, including carcino-embryonic antigen (CEA), neutrophilicgranulocyte (GRAN), lactate dehydrogenase (LDH), lymphocyte (LYM), Neutrophil to Lymphocyte Ratio (NLR) and white blood cell (WBC), were measured in all samples (Chen et al. 2018).
DNA extraction and genotyping
The TIANamp Blood DNA Kit (TIANGEN, Beijing, China) was used to extract genomic DNA from 500 μl of EDTA-anticoagulated venous blood samples. Four SNPs of the PD-1 gene were genotyped: rs2227981, rs2227982, rs36084323 and rs7421861. Genotype was assayed by SNaPshot Multiplex Kit (PE Applied Biosystems, Warrington, UK and Foster City, CA, USA). Primer 3.0 was used to design the primers for use in PCR amplification. The primer sequences for each SNP were as follows:
rs2227981: 5’-TCTCCTGAGGAAATGCGCTGAC-3’ (forward) and 5’-TGGTGTCCCCAGATCACACAGA-3’ (reverse);
rs2227982: 5’-TCTCCTGAGGAAATGCGCTGAC-3’ (forward) and 5’-TGGTGTCCCCAGATCACACAGA-3’ (reverse);
rs36084323: 5’-CTCCCATTCTGTCGGAGCCTCT-3’ (forward) and 5’-GAAGGGGAGGTCAGCCTCACAG-3’ (reverse);
rs7421861: 5’-CCCAGCTGGAATGTCATTGAGAA-3’ (forward) and 5’-TTACACTCCCCTGTGCCAGAGC-3’ (reverse).
PCR was performed with 1 mL of DNA sample, 3.0 mM Mg2+ , 1× GC-I buffer (Tahara), 1mL multiple PCR primers, 0.3 mM dNTP and 1 unit HotStarTaq polymerase (Qiagen, Inc.) in a total volume of 20 mL. The PCR cycling program was as follows: 95ºC for 2 min; followed by 11 cycles of 94ºC for 20 sec, 65ºC (decreased 0.5ºC per cycle) for 40 sec and 72ºC for 90 sec; plus 24 cycles of 94ºC for 20 sec, 59ºC for 30 sec and 72ºC for 90 sec; with a final extension at 72ºC for 2 min and 4ºC forever. Next, 5 units shrimp alkaline phosphatase and 2 units Exonuclease I was added to the PCR product, incubated at 37ºC for 1 h and inactivated at 75ºC for 15 min for purification. SNaPshot multiple single base extension reaction was performed using 5 μL SNaPshot Multiplex Kit (Applied Biosystems), 0.5 mL 5’ ligase primer mixture (1.2 µM), 0.5 mL 3’ ligase primer mixture (1.6 µM), 2 mL ddH2O and 2 mL purified PCR product in a final volume of 10 mL. The reactions were cycled as follows: 28 cycles of 96°C for 10 sec, 55°C for 5 sec and 60°C for 30 sec, with the products subsequently kept at 4°C. Purified extension product 0.5 mL was then combined with 0.5 μL Liz120 Size Standard and 9 μL Hi-Di, inactivated at 95°C for 5 min and then sequenced and analyzed using an ABI 3730XL DNA Analyzer and GeneMapper 4.1 (Applied Biosystems Co.Ltd.USA), and the nucleotide at each SNP site was identified and recorded.
Statistical analysis
1)、Hardy-Weinberg equilibrium
The statistical test of Hardy-Weinberg equilibrium has been an important tool for detecting genotyping errors in the past and is still important in the quality control of next-generation sequence data. The chi-square test was used to test whether the 4 SNPs were in Hardy–Weinberg equilibrium with Excel.
2). Chi-square test
The chi-square test gives evidence of association or no association. The difference between the observed frequency and the expected frequency can be assessed by a statistical test called X2 (Pandis 2016). The statistical formula for this test is as follows: (see Formula 1 in the supplementary files)
where O is the observed cell frequency, and E is the expected cell frequency. The P value of the test was calculated by R program.
3). Wilcoxon rank sum test
The Wilcoxon rank sum test is a method often used in statistical practice to compare position measurements where the underlying distribution is far from normal or not known in advance (Rosner et al. 2003). We used the Wilcoxon rank sum test to analyze differences of the six indicators between different TNM stages. Then, for the early stage (TNM I and TNM II) and late stage (TNM III and TNM IV) of both- or single-gender samples, differences of six clinical indicators were also tested, separately.
4) Logistic regression analyses
Correlations between lung adenocarcinoma stages and SNP genotypes were analyzed by logistic regression model under dominant and recessive models. Logistic regression analyses were performed using IBM SPSS (Statistical Package for the Social Sciences) Statistics ver. 17.0.
5) Classification efficacy assessment
To evaluate the efficiency of classifying early and late stage samples of genes and clinical indicator-encoding genes, a Support Vector Machine (SVM) classifier was constructed. Leave-one-out cross-validation (LOOCV) was carried out to assess the performance. The receiver operating characteristic (ROC) curves were plotted and the areas under the curves (AUC) were computed. Those with better classification efficiency indicated their better potential to act as biomarkers.