Prognostic Values of Nucleotide Polymorphism in ARDS: A Whole Exome Sequencing Association Study

Background Genetic locus were identied associated with ARDS outcome. Our goal was to explore the associations between genetic variants and outcome of ARDS, and the prognostic values of nucleotide polymorphism in ARDS. Methods This was a single-center, prospective trial enrolling adult ARDS patients. After baseline data were collected, blood samples were drawn to perform whole exome sequencing, and single nucleotide polymorphism (SNP) / insertion-deletion to explore the quantitative and functional associations between genetic variants and ICU outcome. Then the lung injury burden (LIB), which was dened as the ratio of nonsynonymous SNP number per megabase of DNA, was used to evaluate its value in predicting outcome of ARDS. Results A total of 105 ARDS patients were enrolled in the study, including 70 survivors and 35 nonsurvivors. Based on the analysis of a total of 65542 nonsynonymous SNP, LIB in survivors was signicantly higher than nonsurvivors [1892 (1848 - 1942) /MB versus 1864 (1829 - 1910) /MB, p = 0.018], while GO analysis showed that 60 functions were correlated with ARDS outcome, KEGG enrichment analysis showed that SNP/InDels were enriched in 13 pathways. Several new SNPs were found potentially associated with ARDS outcome. Analysis of LIB was used to determine its outcome predicting ability, the area under the ROC curve of which was only 0.6103, and increase to 0.712 when combined with APACHE II score.

Based on the analysis of a total of 65542 nonsynonymous SNP, LIB in survivors was signi cantly higher than nonsurvivors [1892 (1848 -1942) /MB versus 1864 (1829 -1910) /MB, p = 0.018], while GO analysis showed that 60 functions were correlated with ARDS outcome, KEGG enrichment analysis showed that SNP/InDels were enriched in 13 pathways. Several new SNPs were found potentially associated with ARDS outcome. Analysis of LIB was used to determine its outcome predicting ability, the area under the ROC curve of which was only 0.6103, and increase to 0.712 when combined with APACHE II score.

Conclusions
Genetic variants are associated with ARDS outcome; however, their prognostic value still need to be veri ed by larger trials.
Trial registration Clinicaltrials.gov NCT02644798. Registered 20 April 2015. Background Acute respiratory distress syndrome (ARDS) is characterized by the acute lung injury, associated with increased pulmonary vascular permeability and reduced aerated lung tissue [1]. With an extremely high hospital mortality rate among 35-46% [2], current therapeutic strategies to increase ARDS survival consist of support to prevent ventilator induced lung injury, to improve oxygenation and gas exchange, advances in etiology and pathology of ARDS are urging. Clinical factors and protocolized therapeutic strategy poorly explain the outcome of ARDS, the role of genetic locus in the pathogenesis of ARDS is increasingly recognized [3][4][5][6].
Numerous genetic variants were identi ed which are associated with the outcome of ARDS. Morrell [7] found that genetic variation in MAP3K1 associated with ventilator-free days in ARDS, while Wei [8] showed that the missense genetic variant in LRRC16A/CARMIL1 improved survival by attenuating platelet count decline in ARDS patients. However, as a heterogeneous disease with multiple and interactive pathogenic processes, the effect of genetics contributing differently [7][8][9][10][11], meanwhile, the racial and ethnic differences in mortality also exist [12,13]. More than ve categories of genes were found to associate with the outcome of ARDS: genes in uencing immune regulation, genes in uencing endothelial barrier function, genes in uencing respiratory epithelial function, genes in uencing coagulation, genes in uencing injury and oxidative stress and so forth. Then a few genetic risk factors have been discovered by large-scale genotyping approaches, from in vivo or in vitro models of lung injury, which highlight the importance of identifying genetic biomarkers of outcome for ARDS to further improve strati cation. The mutational landscape and variability at single nucleotide polymorphisms (SNP) with outcome of ARDS is unknown. By whole exome sequencing association study, our goal was to explore the associations between genetic variants and outcome of ARDS, and the prognostic values of nucleotide polymorphism in ARDS.

Setting
This was an investigator-initiated, single-center, prospective trial that was conducted in the intensive care unit of a tertiary care teaching hospital. The study protocol was approved by the Ethics Committee (Approval Number: 2015ZDSYLL014.0) of Zhongda Hospital, School of Medicine, Southeast University, and written informed consent was obtained from each patient or their next of kin. Trial registration: Clinicaltrials.gov NCT02644798. Registered 20 April 2015.

Patients
Adult ARDS (according to Berlin de nition) patients were enrolled in the trial. The diagnostic criteria included (a) within one week of a known clinical insult or new or worsening respiratory symptoms; (b) chest imaging showing that bilateral opacities-not fully explained by effusions, lobar/lung collapse, or nodules; (c) respiratory failure not fully explained by cardiac failure or uid overload; and (d) arterial partial pressure of oxygen / fraction of inspiration oxygen (PaO 2 /FiO 2 ratio, P/F ratio) less than or equal to 300 mmHg.

Data collection
Baseline-recorded data included demographic characteristics, comorbidities, and the origin and etiology of ARDS were collected by trained investigators. Predicted body weight was calculated by sex and height. Severity of illness was assessed with the Acute Physiology and Chronic Health Evaluation (APACHE) II score within 24 hours on enrollment. Sequential Organ Failure Assessment (SOFA) and Murray lung injury score within 24 hours on enrollment were also calculated.
Predisposing conditions of ARDS were collected, and subphenotypes of ARDS were determined. Severe ARDS group and non-severe group were divided according to the severity of lung injury (Berlin de nition). Patients with risk factors of pneumonia (pulmonary sepsis), pulmonary contusion, inhalation and drowning were categorized as having pulmonary ARDS, whereas patients with risk factors of non-pulmonary sepsis or pancreatitis were categorized as having extrapulmonary ARDS. Patients with sepsis on enrollment after enrolled were recorded as ARDS with sepsis. Patients with shock on enrollment were recorded as ARDS with shock. Sepsis was de ned by Sepsis 3.0.
Peripheral blood samples were drawn. Prognosis was recorded as the survivors and non-survivors in ICU.

Methods
Whole-exome sequencing was performed by the sequencing platform Illumina, the data were compared with reference genome UCSC hg19. Firstly, genomic DNA was isolated from the peripheral blood samples taken from individuals by following the manufacturer's standard procedure using QIAamp DNA Blood kits (Qiagen, Hilden,Germany). Then exome sequence capture was performed on SureSelect Human All Exon V6 (Agilent).
DNA library was subjected to 2 × 150 bp paired-end massively parallel sequencing using a Hiseq2000 Sequencing System (Illumina, San Diego, CA, USA). Before variant calling, sequence alignment les were generated to duplicate removal, local realignment around known Indels and base quality recalibration using the Genome Analysis Toolkit (GATK) [14]. Variations that included single-nucleotide variants (SNVs) and small insertions or deletions (Indels) were identi ed using both the VarScan 2.2.7 software package (http://www.ncbi.nlm.nih.gov/pubmed/22300766) [15] as well as the variant quality score recalibration (VQSR) protocol in GATK, and further ltered using a recommended threshold value (mapping quality > 30, base quality > 15, and read numbers > 3). Then, SNP available at dbSNP130 (hg19) as well as those reported by the 1000 Genomes Project were ltered out from the output les using the ANNOVAR (http://nar.oxfordjournals.org/content/38/16/e164) [16].
After identifying a newly number of coding SNPs potentially associated with ARDS, SNP/InDel were tested by plink method to understand the difference between the outcome and subphenotypes of ARDS. While detecting the number and function of nonsynonymous SNV, the lung injury burden (LIB) was calculated by the ratio of nonsynonymous SNP number per megabase (MB) of DNA. The area under the receiver operating characteristic curve (ROC) was used for evaluating the predictive values of LIB in predicting outcome and subphenotype of patients with ARDS.

Statistics
Data were presented as number (%) for categorical variables, and median (interquartile range) for continuous variables. Fisher's exact test or χ2 test were used for categorical variables, and Student's t-test or Mann-Whitney U test were used for continuous variables, as appropriate. The value of predictive ability was evaluated by the area under the curve (AUC) in the receiver operating characteristic (ROC) analysis. A p value < 0.05 was considered statistically signi cant. Statistical analyses were carried out by the SPSS 16.0 software (IBM, Somers, NY).

Results
There were 105 patients enrolled in the study, including 70 survivors and 35 non-survivors. The characteristics of outcome and subphenotype are presented in Table 1. The median age was 59 years old, while the median APACHE II score was 23, the median SOFA score and Murray lung injury score was 9 and 2.7, respectively.
Among them, 91 patients were categorized as having pulmonary ARDS, 89 patients were diagnosed as sepsis and 66 patients as shock on enrollment.

Snp/indel Data By Whole-exome Sequencing
By whole-exome sequencing, the number of SNP/InDel were 471131 (Table 2). Among them, 120830 SNP/InDel were in exonic region. The number of nonsynonymous SNV were 65542,with 436 of frameshift-insertion for InDel and 897 of frameshift-deletion for InDel. GO analysis showed that 52 functions were correlated with ARDS development (p < 0.01), and KEGG enrichment analysis showed that these SNP/InDel were in 10 pathways, such as cGMP-PKG signaling pathway, Platelet activation (p < 0.05). analysis showed that 60 functions were correlated with ARDS outcome (p < 0.01) (Fig. 1), and KEGG enrichment analysis showed these SNP/InDel were in 13 pathways (Table 3), such as ECM-receptor interaction pathway, Platelet activation pathway and cGMP-PKG signaling pathway (p < 0.01).

Association Of Genetic Polymorphisms With Ards Outcome
To identify the novel SNPs which associated with ARDS outcome, the genotype distribution in different gene were summarized in Table 4, conformed to Hardy-Weinberg equilibrium. Although no strong evidence of strati cation has been reported, several SNPs which potentially associated with ARDS outcome were found ( Fig. 2). Severe ARDS group and non-severe group were divided according to the severity of lung injury. Compared with non-severe group, LIB was lower in severe ARDS group, with the ROC of predictive value of 0.727 (p < 0.0001).
GO analysis showed that 25 functions were correlated with ARDS severity (p < 0.01), and KEGG enrichment analysis showed that these SNP/InDel were in 4 pathways, such as PI3K-Akt signaling pathway, ECM-receptor interaction (p < 0.05).
ARDS patients were divided into pulmonary ARDS and extrapulmonary ARDS group. LIB was not signi cantly altered between the pulmonary and extrapulmonary ARDS. GO analysis showed that 19 functions were correlated with pulmonary and extrapulmonary ARDS (p < 0.01), and KEGG enrichment analysis showed that these SNP/InDel were in 8 pathways, such as ECM-receptor interaction (p < 0.05).
ARDS patients were divided into ARDS combined with sepsis and ARDS without sepsis on enrollment.
Compared with patients without sepsis, the LIB was lower in ARDS combined with sepsis, with the ROC of predictive value of 0.6803 (p = 0.0084). GO analysis showed that 24 functions were correlated with ARDS combined with sepsis (p < 0.01), and KEGG enrichment analysis showed that these SNP/InDel were in 3 pathways, such as ECM-receptor interaction, Focal adhesion (p < 0.05).
ARDS patients were divided into ARDS combined with shock and ARDS without shock on enrollment. Compared with patients without shock, the LIB was lower in ARDS combined with shock, with the ROC of predictive value of 0.6915 (p = 0.0008). GO analysis showed that 46 functions were correlated with ARDS combined with shock (p < 0.01), and KEGG enrichment analysis showed that these SNP/InDel were in 10 pathways, such as cAMP signaling pathway, ECM-receptor interaction (p < 0.05).

Discussion
In this single-center, prospective trial which enrolled adult ARDS patients, whole exome sequencing was performed to understand the difference between the ICU prognosis of ARDS. The highlight of the study is the integrated framework of genetic variability of ARDS displayed through ARDS survivors and non-survivors. As de ned by lung injury burden, the mutational landscape of ARDS showed the overall genetic variability between survivors and non-survivors, while the detailed speci c genetic polymorphisms which have an in uence on outcome which nally showed genetic factors play a role in the outcome of ARDS.
As the role of genetics in the pathogenesis of ARDS is increasingly recognized, numerous genes and genetic variants were identi ed to proclaim their association with outcome of ARDS. However, most were single genetic polymorphisms, little studies focus on the whole mutational landscape and its in uence on ARDS outcome. To build an integrated framework, we classi ed different categories of genes, and try to observe their association with the outcome of ARDS, which are genes in uencing immune regulation, genes in uencing endothelial barrier function, genes in uencing respiratory epithelial function, genes in uencing coagulation, genes in uencing injury and oxidative stress and so forth. However, as multiple different pathogenic processes, all these genes could interrelate.
Tumor mutation burden (TMB) is a marker which calculated as the nonsynonymous mutation number of per MB of DNA in tumor tissue [17]. High TMB often correlates with a higher probability of tumor neoantigens, which could be recognized by lymphocytes [18][19], so it is hypothesized that the tumors with the highest TMB might be more likely to respond to immune checkpoint blockade therapy. Previous studies showed that patients with high TMB response better to immune checkpoint blockade therapy [20][21][22] and might have a better outcome [23]. However, little data observed the mutation burden in ARDS, which might make a rough estimate on the whole mutational landscape of ARDS. In this study, we found when combined with clinical characteristics, burden could predict prognosis of ARDS.
We acknowledge some limitations in our study. Firstly, there was no validation group to study the association of functional SNPs with ARDS outcome which found by whole exome sequencing. Secondly, functional studies are needed to evaluate the mechanisms that underlie the associations between all the genetic variants and ARDS outcome and the mediating pathway. Thirdly, there was no healthy control group. In addition, the ndings in the study were mainly pertinent to patients in single center who developed ARDS, and should be validated before generalization in cohorts.

Conclusions
Genetic variants are associated with ARDS outcome; however, their prognostic value still need to be veri ed by larger trials.  Supplementary Files