Construction of an immune-related lncRNA signature pair for predicting oncologic outcomes and the sensitivity of drugs to treat adverse reactions to immunotherapy in lung adenocarcinoma

Based on The Cancer Genome Atlas (TCGA) LUAD dataset, IRLPs were identied to construct an IRLPs scoring system by Cox regression and were validated in the Gene Expression Omnibus (GEO) dataset. Next, immune and molecular characteristics were explored in different IRLP subgroups. The “pRRophetic” package was used to predict the sensitivity of drugs used to treat ir AEs.


Background
Lung cancer is one of the most common malignant tumors globally, with a high incidence of 11.4% in 2020. Approximately 40% of the primary lung tumors are lung adenocarcinomas (LUAD) (1,2). LUAD, which is common in females and non-smokers, is characterized by high mortality and metastasis rates (3). Although great improvement has been made in the clinical diagnosis and treatment, the 5-year survival rate in LUAD patients is only 18% (4). Therefore, the identi cation of new biomarkers to help in the prognosis of LUAD is of great signi cance. Recently, the emergence of immunotherapy has brought unprecedented levels of survival to lung cancer patients, especially those with advanced or metastatic LUAD (5,6). However, immunotherapy brings not only considerable therapeutic effects but also immunerelated adverse events (ir AEs). In addition, although the incidence of ir AEs is lower than the incidence of the adverse effects of chemotherapy, due to the mechanism of action of immune drugs, ir AEs may occur in all of the body (7,8). Furthermore, ir AEs may even lead to systemic infections and death, and an increase in fatal events has been reported (9). Corticosteroid therapy can successfully treat most ir AEs, but a combination of immunosuppressors is needed to combat more serious adverse reactions (10,11). However, since it is unclear whether a patient can undergo immunosuppressive therapy of ir AEs safely, there is an urgent need to nd some biomarkers to predict the drug sensitivity of immunosuppressors.
LUAD is an immune-sensitive cancer. Studies have shown that the immunotherapy response may be predicted by tumor-immune cell in ltration and an immune score (12). Long non-coding RNAs (lncRNAs) are RNAs without protein-coding capacity and greater than 200 nucleotides in length (13). Studies showed that lncRNAs can regulate the immune response and immune cell development (14,15). Several studies have proposed immune-related lncRNA signatures to help in the prognostic of LUAD. However, the results cannot be directly generalized to all patients due to the use of different chip sequencing protocols, different platforms, and different testing times for gene expression (16-18). These shortcomings could be overcome by combining two or more biomarkers, which work better than a single prediction criterion in cancer prediction models, and immune-related gene pairs (IRGPs) were reported to have accurately predicted the LUAD prognoses (19,20). However, these studies have focused on mRNAs rather than lncRNAs, which play an important role in the immune system. Therefore, the clinical relevance and prognostic signi cance of immune-related lncRNAs pairs (IRLPs) are currently unclear.
In this study, we constructed an individualized signature of IRLPs that works as an independent and predictive factor of overall survival (OS) for LUAD patients. Furthermore, the IRLP model also helps distinguishing the LUAD patients responsive to immunotherapy and predicts the sensitivity of drugs used in ir AEs therapy.

Data source
The RNA-seq data of 515 LUAD cases (including 535 tumor samples and 59 normal samples) and 569 LUAD cases of genetic alteration data were downloaded from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) (October 10th, 2020). In addition, the normalized data of RNA expression matrix of GSE30219, GSE37745, and GSE50081 were downloaded from Gene Expression Omnibus (GEO https://www.ncbi.nlm.nih.gov/geo/); the platform of these datasets was GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array). The relevant clinical characteristics of patients were also downloaded, and the patients without the information on survival time and survival status were excluded from our study.

Construction of a prognostic IRLP signature
To ensure that the immune-related lncRNAs could be measured on all platforms included in this study, the intersect function was used to identify the common immune-related lncRNAs in the TCGA and GEO datasets. We only selected the lncRNAs with a relatively high variation in expression levels (median absolute deviation >0.5). Next, the immune-related lncRNAs were paired randomly to construct a collection of lncRNA pairs. For each LUAD sample, the IRLPs were computed by pairwise comparison of the expression level. The output is one if the expression of the rst lncRNA is higher than that of the second one; otherwise, the output is zero. After removing IRLPs with small variation and imbalanced distribution (MAD=0), the remaining ones were selected as candidate IRLPs. Finally, lasso Cox proportional with 10-fold cross-validation (glmet, caret, and survival package) was used to establish the nal model of an IRLP risk score to predict the prognosis of LUAD in the TCGA dataset.

Validation of IRLPs signature in the GEO data set
The IRLP model was further evaluated in the LUAD patients from the GEO dataset by the log-rank test. We also accessed the prognosis value of the IRLP risk score based on other clinical factors in univariate and multivariate Cox regression analysis.
2.5 Comprehensive analysis of immune characteristic and molecular variation in different IRLP risk score subgroups Data on the in ltration of immune cells found in the TCGA dataset were downloaded from TIMER2.0 (http://timer.comp-genomics.org). Spearman's correlation analysis was performed to analyze the relationship between the immune cell in ltrates and the IRLPs risk score. In addition, Student's t-test was used to compare the different levels of immune cell in ltrates between the high-risk and low-risk groups de ned by the IRLP risk score.
Differential expression analysis was performed on all genes between the high-risk group and low-risk group of TCGA samples. In addition, enrichment analysis was used to determine the signaling pathways based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) gene set (GSEA software).
In the gene mutation analysis, gene mutation quantity and quality were analyzed in two subgroups of LUAD patients (Maftools package). In addition, we also analyzed the relationship between tumor mutation burden (TMB) and IPLP risk score subgroup using a Student's t-test.

Predicting the drug sensitivity of the immunosuppressors
To identify which ir AEs drugs might be useful, we used the "pRRophetic" R package to predict drug sensitivity from tumor gene expression levels (21).

Statistical analysis
All statistical analyses were performed in the R 4.0.5 software. Student's t-test was used to compare the differences between two subgroups. The Kaplan-Meier method was used to analyze the differences in survival curves using the log-rank test.

Dataset of LUAD patients
After excluding the patients for whom the survival status and survival time were missing, a total of 795 patients (TCGA: 477 cases; GEO: 318 cases) were included in our study. The ow diagram of this study was shown in Fig. 1.

Construction and validation of a prognostic IRLP signature
A total of 105 immune-related lncRNAs were found in all platforms of the dataset, and 773 IRLPs were paired. First, univariate Cox regression identi ed 53 IRLPs that were related to the OS of LUAD patients in the TCGA train dataset (P<0.01). Then, the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis with iteration=1000 selected 18 IRLPs (Fig. 2a) for the multivariate Cox regression analysis, and, nally, eight IRLPs were identi ed to calculate the IRLP risk score (Fig. 2b). Furthermore, we compared the survival curves of the TCGA train dataset (P<0.001), TCGA test dataset (P=0.017), and GEO dataset (P=0.027) (Fig. 3a-c). These results all showed that high-risk LUAD patients exhibited a poorer prognosis than low-risk LUAD patients.

Relationship between the IRLP subgroups and clinical characteristics
We evaluated the correlation between the IRLP subgroups and clinical characteristics by heatmap. The results showed that the distributions of gender, stage, T stage, and N stage were signi cantly different between the high-risk and low-risk groups of the TCGA total dataset (Fig. 6a), and the GEO dataset showed that the distributions of the T and M stage were signi cantly different (Fig. 6b).

Relationship between the IRLPs and immune cell in ltrates
Spearman's rank correlation analysis showed that the immune score (Cor=-0.18893, P<0.001), stoma score (Cor=-0.24804, P<0.001), and microenvironment score (Cor=-0.22338, P<0.001) were signi cantly decreased in the group with the higher IRLP risk score (Fig. 7). The distributions of immune cell in ltrates were different in the two IRLP subgroups (Fig. 8).

Molecular characteristics of different IRLP subgroups
The gene sets of the high-IRLP subgroup were most enriched in KEGG_DNA_REPLICATION (enrich score=0.76), GOBP_ATTACHMENT_OF_MITOTIC_SPINDLE_MICROTUBULES_TO_KINETOCHORE (enrich score=0.92), GOMF_SINGLE_STRANDED_DNA_HELICASE_ACTIVITY (enrich score=0.86), and GOCC_CONDENSED_NUCLEAR_CHROMOSOME_KINETOCHORE (enrich score=0.83) (Fig. 9a). The gene sets of low-IRLPs subtype were most enriched in KEGG_ASTHMA (enrich score=0.70), GOMF_ATP_DEPENDENT_MICROTUBULE_MOTOR_ACTIVITY_MINUS_END_DIRECTED (enrich score=0.81), GOCC_AXONEMAL_DYNEIN_COMPLEX (enrich score=0.79), and GOBP_SODIUM_ION_EXPORT_ACROSS_PLASMA_MEMBRANE (enrich score=0.76) (Fig. 9b). GO analysis showed that the differentially expressed genes between the IRLP subgroups were enriched in neutrophil activation involved in immune response (BP), the cell-cell junction (CC), and metal ion transmembrane transporter activity (MF; Fig. 9c-e). KEGG analysis showed that the differentially expressed genes were enriched in Herpes simplex virus 1 infection (Fig. 9f). Then, we analyzed gene mutations to gain further biological insight into the immunological nature of the IRLPs subgroups. The high-risk group had the highest mutation rate (the top 20 genes), and missense variations were the most common mutation type in the two subgroups ( Fig. 10a and b). The TTN mutation was the highest in the high-risk group. Next, we explored the relationship between the TTN mutation and IRLP subgroups. The TTN mutation was signi cantly more frequent in the high-risk group (Fig. 10c). Finally, we compared the tumor mutation burden (TMB) between these two subgroups. As a result, TMB was higher in the IRLPs of the high-risk group (Fig. 10d).

Prediction of drug sensitivity to ir AEs therapy
We identi ed three immunosuppressors (methotrexate, parthenolide, and rapamycin) in the "pRRophetic" R package. Methotrexate had higher sensitivity in the IRLP high-risk group (P=0.0052, Fig. 11a), whereas parthenolide (P<0.001) and rapamycin (P=0.013) showed lower sensitivity in IRLP high-risk group (Fig. 11b and c).

Discussion
With the development of sequencing technology, more and more attention in cancer research has been paid to bioinformatics methods. In our present study, we constructed a risk scoring system based on eight IRLPs in the TCGA dataset, and the patients were divided into high-risk and low-risk groups according to the cut-off of risk score. Survival analysis showed that the high-risk group had a poor prognosis. The IRLP-risk score was an independent risk factor in our Cox regression analysis combined with clinical characteristics (age, gender, and stage). These results were also proven in the GEO dataset. Furthermore, our results also showed that the IRLP risk score was related to immune cell in ltration. Next, we explored the gene functional enrichment and gene mutation in two IRLPs subgroups. The high-risk group was found to be enriched in molecular changes in DNA and chromosomes, and to have a higher TMB than the low-risk group. Finally, the drug sensitivity of immunosuppressors was predicted to nd the most suitable ir AEs therapy for each group.
The tumor microenvironment (TME) is correlated with cancer prognosis, supports cancer cells to replicative proliferation, and affects the malignant phenotypes (22,23). Many immune cells are present in the TME, modulating tumor cell migration, invasion, metastasis, and anticancer drug sensitivity (24). The relationship between the IRLP score and in ltrating immune cells was analyzed in our study, and we found that they were signi cantly correlated. These results indicated that our IRLP risk score might allow the prognosis of LUAD by being sensitive to the functional status of immune cells. The immune score re ected the in ltration of immune cells in the tumor tissue based on the algorithm. A study found that patients with medium and high immune scores had a longer OS time than those in the low immune score group in lung cancer (25). This means that a higher immune score may be bene cial for survival in lung cancer patients. The IRLP risk score was found to be negatively correlated with the immune score in our current results. These results demonstrated that a high immune activity might play an important role in the increased survival time of LUAD patients.
To gain further biological insight into the IRLP subgroups, we studied the functional enrichment and gene mutations in these two subgroups. Functional enrichment analysis found that molecular changes in DNA and chromosomes were most enriched in the high-risk subgroup. As previously reported, our results also showed that missense mutations are the most common type of mutations in LUAD(26). The TTN mutation was found to be more frequent in the high-risk group than in the low-risk group and showed a signi cant difference between the high-risk and low-risk groups. The TTN mutation was reported as a potential biomarker associated with a better response to immune checkpoint blockade in solid tumors (27). A study based on the TCGA dataset reported that the TTN missense mutation correlates with favorable prognosis in lung squamous cell carcinoma (LUSC) but not in LUAD(28). Our results are also in agreement with the notion that TTN mutation plays a different role in LUAD.
Next, the relationship between the IRLP score and TMB was explored. Not only a high TMB was found to re ect worse clinical outcomes in non-small cell lung cancer (29), but also patients with high TMB (TMB-H) achieved good results in immunotherapy of solid tumors (30). In this study, the high-risk subgroup had the highest TMB. Thus, the TMB may explain why IRLPs are correlated with the prognosis of LUAD, and the IRLP score may also help explain the immunotherapy response. However, other possible mechanisms involved in this relationship still need to be further studied.
Nowadays, there is evidence that immune checkpoint inhibitors effectively improve the OS time in various cancers (31). However, immunotherapy-mediated ir AEs such as pneumonitis and thyroid dysfunction were frequently reported because of their speci city and severity (32,33). Most mild ir AEs can be treated with glucocorticoids, while immunosuppressors treat severe ir AEs (34,35). In the clinic, the management of ir AEs is still di cult because they may occur at any time during the therapy but also at the end of treatment (36). Hence, we explored the drug sensitivity of three immunosuppressors: methotrexate, parthenolide, and rapamycin. Methotrexate sensitivity was higher in the high-risk group, whereas the parthenolide and rapamycin sensitivity was lower in the high-risk group. Methotrexate is usually used for autoimmune disease therapy, and a single-center analysis reported that methotrexate has a good curative effect in rheumatic ir AEs (10). Parthenolide is one of the biologicals that play an anti-in ammatory role by inhibiting nuclear factor kappa B (NF-κB) and cytokine tumor necrosis factor (TNF)-α(37, 38). The same pharmacological effect in inhibiting TNF-α was found in in iximab and could be helpful in the treatment of steroid-refractory ir AEs(39). These immunosuppressors (methotrexate, parthenolide, and rapamycin) have a different mechanism of action, and patients also had different drug sensitivities. Thus, our IRLPs scores may help identify the patients who would bene t from ir AEs therapy, but the mechanisms of drug action in these two subgroups still need to be clari ed.
Although we have constructed an IRLP risk scoring system that showed a good predictive performance for LUAD patients and overcame the inconsistent sequencing platforms, there were still some noteworthy limitations. First, the patients included in the training set were downloaded from TCGA, which mainly includes white race patients. Thus, other ethnic groups still need to be evaluated. However, the results showed that the IRLP score constructed in TCGA also applies to the Asian GEO dataset. Second, we intersected the lncRNAs from two public datasets to overcome the differences in the sequencing platforms, and some important lncRNAs may have been ignored or contributed to selection bias. Third, our prediction on drug sensitivities was not validated in an immunotherapy cohort. There is no complete cohort data on ir AEs at present because of the di culty in collecting ir AEs data and treatment outcomes. Finally, we used the "pRRophetic" R package to explore the drug sensitivity of immunosuppressors, which includes a limited set of drugs and did not allow us to address the sensitivity of many commonly used drugs.

Conclusion
In summary, we built a risk model based on IRLPs. This signature had a good predictive accuracy and effectiveness for LUAD. Furthermore, our IRLPs score signi cantly correlated with TME and TMB, indicating that these molecular changes might explain the different clinical outcomes. Importantly, our IRLPs may enhance the identi cation of the patients who can bene t from ir AEs therapy. Availability of data and materials

Abbreviations
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Competing interests
The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.  Figure 1 The ow diagram of this study