The average copy number variation (CNVA) of chromosome fragments is a potential surrogate marker for TMB in predicting responses to immunotherapy in Non-small cell lung cancer


 Background: Genomic instability plays a large role in the process of cancer. Tumor mutational burden (TMB) is closely related to immunotherapy outcome and is an important manifestation of genomic instability. However, the cost of TMB detection is extremely high, which limits the use of TMB in clinical practice. Another new indicator of genome instability, CNVA (the average copy number variation) which calculates the changes of 0.5 Mb chromosomal fragments, requires extremely low sequencing depth, and is expected to replace TMB as a new marker of immune efficacy.Methods: A total of 50 samples (23 of which came from patients who received immunotherapy) were subjected to low-depth (10X) chromosome sequencing on the MGI platform. CNVA was calculated by the formula avg (abs (copy number-2)). Then, we analyzed the relationship between CNVA and immune infiltration or immunotherapy efficacy. In addition, through the analysis of whole genome sequencing data of 509 lung adenocarcinoma in the TCGA database, we compared CNVA with classic marker TMB to evaluate the value of CNVA as an immune evaluation index.Results: Compared with the low CNVA group, the high CNVA group had higher expression of PD-L1, CD39 and CD19, and more infiltration of CD8 + T cells and CD3 + T cells. Among the 23 patients treated with immunotherapy, the average CNVA value of the SD (stable disease)/PR (partial response) group was higher than that of the PD (progressive disease) group (P <0.05). The data of whole genome sequencing data of 509 lung adenocarcinomas from TCGA and real-time quantitative PCR results of 22 frozen specimens found that CNVA was more correlated with CD8 and PD-L1 than TMB. In addition, CNVA showed a specific positive correlation with TMB (r = 0.2728, p < 0.0001).Conclusion: CNVA can be a good indicator of immune infiltration and predicting immunotherapy efficacy. With its low cost and potential clinical application for testing, it is expected to become a substitute for TMB.


Background
The advent of immunotherapy based on checkpoint inhibitors (ICIs) targeting the programmed cell death 1 (PD-1) axis has brought breakthrough progress to the treatment of non-small cell lung cancer. However, due to tumor heterogeneity and the dynamics of PD-L1 expression, only a subset of patients showed a durable clinical benefit from such therapies. The precise application of ICI therapy requires reasonable predictive indicators. According to reports, patients with a PD-L1 tumor proportion score (TPS) > 50% benefit more than those with a TPS < 50% [1,2].
In addition to the TPS, one of the most important predictors of immune efficacy is tumor mutational burden (TMB). High TMB is associated with poor prognosis of various tumors [3][4][5]. More importantly, TMB is an independent biomarker of ICI response [6],as high TMB is associated with high neoantigen burden and high T-cell infiltration. Initially, the TMB test required whole exome sequencing (WES) of the specimen. It reflects the number of somatic nonsynonymous mutations including coding SNVs (single nucleotide variation) and indels and is reported in numbers per Mb. However, the detection cost of WES is particularly high and the data are relatively difficult to analyze which is not suitable for large-scale analysis. Recently, some reports have simplified this process to estimate TMB with targeted next-generation sequencing (NGS) panels [7]. The most used are two assays: MSKIMPACT™ [8]and DFCI OncoPanel [9]. Compared that detected by WES, the number of genes detected by targeted NGS sequencing is small, which reduces the cost of testing to a certain extent. However, WES is only concerned with nonsynonymous mutations (leading to changes in the amino acid sequence and numbers of potential neoantigens within a tumor genome), but to make up for the insufficient number of genes, targeted NGS focuses on both synonymous and nonsynonymous mutations, and the depth of sequencing is also far higher than that for WES, making it difficult to analyze the data. TMB is essentially caused by genomic instability. There are different causes of genomic instability, among which chromosomal instability (CIN), microsatellite instability (MIN) and DNA damage response (DDR) have attracted much attention. The relationship between microsatellite instability and cancer has been confirmed. For example, high MSI is closely related to the efficacy of immunotherapy. Researchers found that the majority (82.1%) of MSIhigh tumors were TMB-high; however, only 18.3% of TMB-high tumors were MSI-H [10]. This shows that TMB is not caused only by MSI. In addition, the combination of MSI and TMB detection is better at predicting the efficacy of immunotherapy than TMB alone [10][11][12]. Specifically in colorectal cancer patients with high MSI, TMB is a better indicator of the efficacy of ICIs [13,14]. In addition, in small cell lung cancer (SCLC), large number of DDR phenomena have been discovered and the DDR pathway alterations have a positive correlation with high TMB [15]. Since the cost of DDR testing is much lower than that of TMB text, it is expected to become a substitute for TMB [16]. However, the direct relationship between chromosomal instability and cancer is different in different situations. CIN is mainly reflected in the increase or deletion of fragments. The sequencing depth required for detection is also far lower than that of genome TMB detection, which greatly reduces the detection cost and analysis difficulty. The deletion or increase of the entire chromosome will lead to the generation of aneuploidy. Compared with normal diploid cells, most tumor cells are in an aneuploid state [17,18]. Different types of ploidy have different effects on the occurrence and development of tumors [19]. The overall prognosis of patients with polyploid tumors is poor [20,21], and polyploid tumors are more malignant than haploid or diploid tumors [22,23]. However, the analysis of the entire chromosome can be used only for qualitative analysis of tumor specimens. It cannot provide specific values like TMB for quantitative analysis of different specimens, but it can be good for predicting the therapeutic effect.
In this article, to solve this problem, we broke the chromosome into small 0.5 Mb fragments, calculated the copy number changes of all fragments, averaged the absolute value (CNVA) and then evaluated the correlation between CNVA and tumor immune infiltration or immunotherapy effect; in addition, we compared the utility of CNVA with that of TMB. We found that specimens with high CNVA had more chromosomes alterations and, more obvious immune infiltration and were more sensitive to immunotherapy drugs than specimens with low CNVA.

Patient samples
A total of 23 patients with non-small cell lung cancer (NSCLC) treated with a PD-1 or PD-L1 inhibitor after surgery or biopsy were retrospectively identified from the Cancer Hospital of the Chinese Academy of Medical Sciences from April 2017 to June 2019. The fresh and frozen specimens used for flow sorting and real-time quantitative PCR came from another 27 surgical patients from the Cancer Hospital of the Chinese Academy of Medical Sciences from March 2019 to February 2020. Patient clinical characteristics, tumor genomics, and outcome data were collected. Responses to PD-1/L1 targeted therapy were assessed using RECIST 1.1 guidelines and confirmed by the corresponding institute investigator. Ethics approval was granted by the Committee for the Ethics Review of Research Involving Human Subjects of the Cancer Hospital of the Chinese Academy of Medical Sciences.

CNVA analysis
First, the collected paraffin-embedded samples were dissolved, DNA was extracted, and then the DNA library was constructed with the MGI platform. High-quality genomic DNA with A260 / A280 = 1.8 ~ 2.0 was selected for fragmentation. After the fragmentation, the range of DNA distribution was wide, and we used magnetic bead to select specific fragments to control the final library fragment concentration. Those samples with concentrated DNA and high purity after physical fragmentation could be directly subject to end repair. If the DNA concentration range of the sample after physical fragmentation was wide, the fragments were sorted by length. Next, the ligated products were purified and PCR amplified. After constructing the library, we carried out subsequent chromosome low-depth(10X) sequencing. During data analysis, the entire chromosome was cut into t fragments of 0.5 Mb, and each chromosome was cut into more than six thousand fragments. Raw reads were aligned with Burrows-Wheeler Aligner software to determine the chromosomal origin of each sequenced DNA fragment. To calculate the copy number of each position (0.5 Mb), we subtracted 2 from the copy number, then took the absolute value and calculated the average number of all positions. CNVA was calculated with the formula avg (abs (copy number-2)).

Noninvasive prenatal testing (Nipt) analysis
The sequencing platform and database building method for Nipt analysis are the same as those for CNVA analysis. Unlike CNVA, which only focuses on copy number changes of 0.5 Mb chromosome fragments, Nipt focuses on changes in the entire chromosome, that is, macroscopic increases or deletions affecting the copy number of the entire chromosome. The specific data analysis method is described in the literature [24].

Analysis of TMB using the TCGA database
TCGA data were downloaded from the Genomic Data Commons Data Portal (GDC Data Portal, https://portal.gdc.cancer.gov/). In total, 509 lung adenocarcinoma (LUAD) patients from the TCGA database were analyzed. Somatic mutations were identified using the ecTMB algorithm. The correlation analysis between CNVA of selected samples and whole exome mutation burden was performed using Pearson correlation coefficient analysis.

Immunohistochemistry
The paraffin sections were gradually dewaxed and hydrated with xylene and alcohol. Antigens retrieval was performed in a microwave. After washing 3 times with PBS buffer, goat serum was applied to tissue slides to cover the whole section. The slides were incubated for 30 minutes at room temperature, taking care to prevent the surrounding tissue from drying out. After the goat serum had been used to block the slides, the excess serum was shaken off and diluted primary antibodies against PD-L1 (CST 13683S), CD8 (ab93278) and CD3 (PTG 17617-1-AP) were added to the specimen and incubated at 4 °C overnight. After incubating at room temperature for half an hour the next day, the secondary antibody was added, and the sample was incubated and finally developed by the BCA method.

Flow cytometry
Flow cytometric analysis was undertaken on representative fresh samples of lung cancer obtained at surgical resection. After mechanical disaggregation to obtain monodispersed cell suspensions, the tissue samples were treated with collagenase (Sigma-aldrich, 35285124) and DNase I (0.33 U/ml, Roche) and stained with anti-PD-L1 antibody and DAPI (KeyGEN BioTECH, KGA215-50).

Real-time quantitative PCR
Total RNA was isolated with the standard TRIzol-based protocol (Invitrogen). RT-PCR was performed on an ABI 7900HT Real-Time PCR thermocycler (Life Technologies). After the PCR was completed, the base and threshold of the amplification curve were manually set. Data were analyzed using the2 −ΔΔCT method. For the primer sequences used in the article, please refer to the Supplementary Materials.

Statistical analysis
Data analysis was performed using GraphPad Prism 6 (GraphPad Software, Inc.) or SPSS 24 (IBM, USA). Comparisons between two groups were performed using Student's t-test (two-tailed). Pearson correlations were used to evaluate the significance of the association between TMB and immune infiltration or between TMB and CNVA. Differences with p-values < 0.05 were considered significant.

Distribution of chromosomal differences in patients treated with different immunotherapies
In this study, we included a total of 23 patients receiving immunotherapy and 27 non-immunotherapy patients undergoing thoracic surgery. The clinical information is shown in Table 1. According to the method mentioned above, we performed chromosomal variation detection with paraffin-embedded samples of these 50 specimens. Low-depth sequencing (10 X) was performed to analyze the copy numbers of these small fragments (0.5 Mb). The visualization of the fluctuation of these small fragments of chromosomes is shown in Fig. 1A-1C. We observed that chromosomes 1-23 showed different degrees of fluctuation in different specimens, and the copy number of the small fragments (0.5 Mb) of each chromosome fluctuated around the baseline. To quantify this fluctuation state, we proposed a new marker CNVA, which is the average of the copy numbers of all these small 0.5 Mb fragments, and is calculated with formula avg (abs (copy number-2)). When the CNVA value was large (0.4032) (Fig. 1A), the fluctuation degree of the chromosome was significantly increased, and the copy value of the 0.5 Mb fragment was randomly distributed around the baseline; however, when the CNVA value was small (0.1471) or moderate (0.238), the fluctuation of chromosomes was low ( Fig. 1B and 1C). The above results indicated that the CNVA value was a good indicator of chromosome fluctuations.

Distribution of chromosomal differences in patients treated with different immunotherapies
In this study, we included a total of 23 patients receiving immunotherapy and 27 non-immunotherapy patients undergoing thoracic surgery. The clinical information is shown in Table 1. According to the method mentioned above, we performed chromosomal variation detection with paraffin-embedded samples of these 50 specimens. Low-depth sequencing (10 X) was performed to analyze the copy numbers of these small fragments (0.5 Mb). The visualization of the fluctuation of these small fragments of chromosomes is shown in figure 1A-1C. We observed that chromosomes 1-23 showed different degrees of fluctuation in different specimens, and the copy number of the small fragments (0.5 Mb) of each chromosome fluctuated around the baseline. To quantify this fluctuation state, we proposed a new marker CNVA, which is the average of the copy numbers of all these small 0.5 Mb fragments, and is calculated with formula avg (abs (copy number-2)). When the CNVA value was large (0.4032) (Fig. 1A), the fluctuation degree of the chromosome was significantly increased, and the copy value of the 0.5 Mb fragment was randomly distributed around the baseline; however, when the CNVA value was small (0.1471) or moderate (0.238), the fluctuation of chromosomes was low ( Fig. 1B and 1C). The above results indicated that the CNVA value was a good indicator of chromosome fluctuations.

High CNVA value indicates high immune infiltration status
Among these 50 specimens, we found that CNVA values were mainly distributed between 0.1 and 0.5 ( Fig. 2A). The CNVA value was not related to age, sex, tumor size and stage (Table 1). To further prove that the CNVA value could indeed reflect the degree of chromosome fluctuations, in addition to focusing on the copy number of 0.5 Mb fragments, we used the Nipt algorithm[24] to analyze amplification or deletion of the entire chromosome. The value returned by the algorithm was defined as the Z value, Z value> 3 indicated a triploid or higher state and Z value <-3 indicated a haploid or lower state. According to the CNVA level, we divided the 50 specimens into high, moderate and low groups. Then we drew a heat map based on the CNVA and Z values. Green represents the group with low CNVA value, orange represents the group with high CNVA value, and blue represents the group with moderate CNVA value. The lightest color in the heat map, that is, the part with the least degree of chromosome aneuploidy, was basically concentrated in the group with low or moderate CNVA value. Conversely, the high CNVA group was particularly dark in color, and the chromosome aneuploidy state was more obvious (Fig. 2B). This proved that CNVA could indeed reflect the aneuploid or stable state of the entire chromosome through the fluctuation of 0.5 Mb fragments.
For the 50 specimens, there were also 27 nonimmunotherapy frozen tissue. To prove the relationship between CNVA and immune infiltration, we extracted RNA from these 27 frozen tissues specimens. Five of the specimens were of poor quality and were discarded. After that, we detected the expression of immune-related genes via immunohistochemistry in the remaining 22 samples by real-time quantitative PCR. The results showed that compared with the CNVA low group, the high CNVA group had significantly upregulated levels of CD8 (p=0.0067), PD-L1 (0.0477), CD39 (0.0337) and the B cell marker gene CD19 (0.0464). The Average expression level of immunosuppressive genes Foxp3 and CTLA-4 were down-regulated in the CNVA high group but there was no statistical difference ( Fig. 2C and Fig. 7E). In addition, we also detected the expression of CD3, CD8 and PD-L1 in 50 immunotherapy and nonimmunotherapy samples by immunohistochemistry. The results also showed that, compared with the group with low CNVA, the high CNVA had higher expression of CD3 (0.0370), CD8 (0.0495) and PD-L1 (Fig. 2D-2J). The above results indicate that high CNVA is positively correlated with immune infiltration.

Correlation between the effect of immunotherapy and immune infiltration
We retrospectively collected clinical information on patients receiving immunotherapy at the Cancer Hospital of the Chinese Academy of Medical Sciences. Among them, 21 patients underwent surgery before receiving immunotherapy and had paraffin-embedded samples available. The other two patients underwent biopsy before immunotherapy, and paraffin samples were also available. The clinical information is shown in Table 2. Of these 23 patients, 14 were lung adenocarcinoma patients and 9 were squamous cell carcinoma patients. All patients received second-, third-, or fourth-line immunotherapy after surgery or biopsy. Based on the evaluation of the efficacy after 12 weeks of immunotherapy, of these 23 patients, 12 were judged as achieving SD or a PR and 11 were judged having PD (efficacy assessment was completed by the clinician). Patients with SD or PR were classified into a single group with good efficacy, and patients with PD was classified as a group with poor efficacy. We performed immunohistochemical analysis on the paraffin samples of these 23 patients. The results showed that compared with the group with poor efficacy, group with good efficacy showed increased expression of the classic immune infiltration marker CD3 (Fig. 3A-3C) (p<0.05), CD8 (Fig. 3D-3F) (There was no statistical difference which might be due to the small sample size) and PD-L1 (Fig. 3G-3I). This result is consistent with literature reports [25].

High CNVA value may predict a superior immunotherapy effect
After proving that CNVA had a specific correlation with immune infiltration, we next compared the relationship between immunotherapy efficacy and CNVA in the 23 patients. The CNVA values of these patients were mainly between 0.1 and 0.4 (Fig. 4A). We divided these 23 patients into two groups according to efficacy, patients with PD as one group and patient with PR/SD as another group. It can be seen from the scatter plot that the average CNVA value of the PD group was lower than that of the PR/SD group with statistically significant difference (p = 0.0204) (Fig. 4B).However, we only included a small number of immunotherapy patients and these patients are not strictly first-line immunotherapy patients, as many of them have undergone radiotherapy and chemotherapy or targeted therapy. These treatment methods may affect certain chromosomes fragments. Therefore, we further analyzed the copy number (aneuploidy) of the whole chromosome of chromosomes 1-23 in different therapeutic groups by the Nipt method. The frequency of deletion or amplification of the entire chromosome is very likely to be lower than that of the small 0.5 Mb fragment, and it will be more stable under the influence of various treatment methods. Indeed, we found that the PR/SD group was mainly located in the lighter color area of the heatmap, while the PD group was basically in the darker color area, that is, the area where the chromosome aneuploidy phenomenon was more obvious (Fig. 4C). All the result above confirmed that the degree of chromosome fluctuation was indeed related to the therapeutic effect.

Verification of the correlation between chromosome copy number changes in terms of DNA content and immune infiltration
The most intuitive results from changes in the copy number of chromosomes is changes in the DNA content. According to change in DNA content, it can be roughly speculated whether there is chromosome aneuploid or a major change in chromosome copy number. However, such changes will be noted only when the difference in DNA content caused by the change in the number of chromosomes can be detected. Therefore, a normal DNA content cannot exclude the existence of malignant or chromosomal abnormalities, but an abnormal DNA content always indicates chromosomal abnormalities. In other words, changes in DNA content will be more specific in reflecting chromosomal instability. Therefore, we can indirectly confirm the relationship between chromosome copy number abnormalities and immune infiltration by analyzing the relationship between DNA content and immune infiltration. We detected the relationship between DNA content (fluorescence value of DAPI) and PD-L1 in a series of lung cancer cell lines and lung cancer tissue samples by flow cytometry. A schematic diagram of the PD-L1 isotype control and positive specimen is shown in figure 5A-5D. We observed that in 15 lung cancer cell lines, the fluorescence intensity of DAPI was basically negatively correlated with the expression of PD-L1 (Fig.  5E). However, in lung cancer tissue specimens, the optimal curve of the trend line was a polynomial of two terms (Fig. 5F). These results showed that when a certain baseline was used as the standard (probably the DNA content of normal diploid cells), the DNA content changed towards one of two extremes,, which could have either very small or very large numbers of chromosome copies (predicting chromosome fluctuations), PD-L1 expression would increase and the result was indeed similar to the CNVA effect (with an emphasis on the absolute value of copy number changes). But the result of the cell lines did not develop in two extreme directions, we speculated that it might be related to the small number of cell lines and the relative concentration of DNA content.

The state of immune infiltration is related to the level of TMB
TMB is a widely studied indicator related to the efficacy of immunotherapy. We further analyzed the correlation between TMB and various infiltrating immune cells in lung adenocarcinoma. We downloaded the information of 509 lung adenocarcinoma patients with both whole genome and transcriptome data from the TCGA database. Whole-genome data were used to calculate the TMB value by ecTMB and the transcriptome data were used to analyze the infiltration of various immune cells by CIBERSORT (https://cibersort.stanford.edu/). According to the TMB value, these 509 cases were divided into two groups: TMB high and TMB low. The results of the heat map clustering showed that the group with high TMB (blue) was roughly clustered together and had higher levels of CD8 + T cells and M1 type macrophages and lower M2 type macrophages than the low TMB group (Fig. 6A).To observe the difference in infiltrating lymphocytes between the TMB high and TMB low groups more clearly, we subdivided the 509 samples into three groups, TMB high, TMB moderate and TMB low, and plotted histograms. Compared with the group with low TMB, we found that the TMB high group, in addition to having obvious increases in CD8 + T cells (p < 0.0001 ) (Fig. 6B and Fig. 7C), also had significant increases in plasma cells and activated CD4 + T cells (p < 0.0001) (Fig. 6B). In addition, the ratio of M1 to M2 macrophages (which serves as an index of immune activity) in the group with high TMB was much higher than that in the group with low TMB (p < 0.0001) (Fig. 6C).

Comparison of the utility of CNVA and TMB in predicting response to immunotherapy
Both CNVA and TMB essentially reflect genome instability. The former is concerned with the copy number changes of chromosome fragments of 0.5 Mb at the macro level, while TMB is only concerned with the number of mutations per 1 Mb DNA. In theory, the two values should be correlated. The cost of TMB testing is extremely high and the analysis is difficult. We wondered whether CNVA could replace TMB as an indicator of immunotherapy efficacy. First, we compared the changes in the expression levels of two indexes PD-L1 and CD8, which were closely related to immunotherapy efficacy under different TMB or CNVA states. The TMB value was extracted from 509 samples from the TCGA database. The results showed that the expression of PD-L1 and the infiltration of CD8 + T cells in the group with high TMB were significantly increased (p < 0.01) ( Fig. 7A and 7C), and both had a certain correlation with TMB (although the correlation coefficient was not large) ( Fig. 7B and 7D).
The CNVA data came from the abovementioned 27 nonimmunotherapy specimens, 22 of which had high-quality RNA for real-time quantitative PCR. We also found high expression of PD-L1 and CD8 in samples with high CNVA (p< 0.05) and this correlation was stronger than the correlation between high TMB and these two parameters ( Fig. 7E and 7G). In addition, the correlation coefficient was also higher than that for TMB, although the P value of PD-L1 is not as significant as TMB due to the lack of samples ( Fig. 7F and 7H).
Finally, to further verify that there was a correlation between CNVA and TMB, we reanalyzed the whole genome data of 509 samples from TCGA. Because the whole genome sequencing process does not include break the chromosome into small fragments of 0.5 Mb, and the data present only changes in mutations or copy number throughout the genome, we analyzed the changes in the copy number of all exons and introns, and the average value was calculated and used to replace the CNVA value. Pearson correlation showed that there was indeed a clear correlation between TMB and CNVA values (r = 0.2728, p < 0.0001) (Fig. 7I). These results indicated that CNVA had the potential to replace TMB.

Conclusions
CNVA is an excellent marker to indicate immune infiltration and predict immune efficacy, and is expected to become a potential substitute for TMB.

Declarations
Ethics approval and consent to participate Ethics approval was granted by the Committee for the Ethics Review of Research Involving Human Subjects of the Cancer Hospital of the Chinese Academy of Medical Sciences.

Consent for publication
Not applicable.

Availability of data and materials
The analyzed data sets generated during the study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.  Table 1 Clinical information for immunotherapy and non-immunotherapy patients    Correlation between CNVA score and immune efficacy (A). Histogram showed the distribution of CNVA values in 23 immunotherapy specimens. (B). The scatterplot showed the distribution of CNVA score in the PR/SD group and PD group. (C). Heat map displayed complete deletion or increase of chromosomes 1-23 in PR group, SD group and PD group. Z value> 3 meant chromosome is triploid or more, Z value <-3 meant chromosome is haploid or less. Data are presented as the mean ± SD, *p< 0.05, **p <0.01, ***p < 0.001. ns means no significance.  Relationship between TMB and immune infiltration (A). Heat map showed the infiltration of 19 immune cell subclasses in the 509 TCGA specimens in the group with high TMB and low TMB. (B). The histogram displayed the infiltration of 19 immune cells after further subdividing TMB into high, moderate and low groups. (C). Scatterplot showed the ratio of M1 and M2 macrophages at different TMB levels in the 509 specimens. Data are presented as the mean ± SD, *p< 0.05, **p <0.01, ***p < 0.001, ****p < 0.0001. ns means no significance. Correlation between the average copy number change of all genes (including introns and exons) in 507 TCGA specimens and TMB. Data are presented as the mean ± SD, *p< 0.05, **p <0.01, ***p < 0.001, ****p < 0.0001. ns means no significance.