Establishing a prediction model of severe acute mountain sickness using deep learning of support vector machine recursive feature elimination

doi:10.21203/rs.3.rs-2435892/v1

Download PDF

Research Article

Establishing a prediction model of severe acute mountain sickness using deep learning of support vector machine recursive feature elimination

https://doi.org/10.21203/rs.3.rs-2435892/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

Severe acute mountain sickness (sAMS) can be life-threatening, but little is known about its genetic basis. Using microarray genotype data and phenotype data for deep learning, we aimed to explore the genetic susceptibility of sAMS for the purpose of prediction.

Methods

The study was based on microarray data from 112 peripheral blood mononuclear cell (PBMC) samples of 21 subjects, who were exposed to very high altitude (5260 m), low barometric pressure (406 mmHg), and hypobaric hypoxia (VLH) at various timepoints. Subjects were investigated for the interplay effects between multiple phenotypic risk factors, and the underlying risk genes were identified to establish the prediction model of sAMS using the support vector machine recursive feature elimination (SVM-RFE) method.

Results

Exposure to VLH activated the gene expression in leukocytes, resulting in inverted CD4/CD8 ratio which interplayed with other phenotypic risk factors at the genetic level (P < 0.001). 2291 underlying risk genes were input to SVM-RFE system for deep learning, and a prediction model was established with satisfactory predictive accuracy (C-index = 1), and clinical applicability for sAMS using ten featured genes with significant predictive power (P < 0.05). Five featured genes (EPHB3, DIP2B, RHEBL1, GALNT13, and SLC8A2) were identified as the upstream of hypoxia and/ or inflammation-related pathways mediated by micorRNAs as potential biomarkers for sAMS.

Conclusions

The established prediction model of sAMS holds promise to be clinically applied as a genetic screening tool for sAMS. More studies are needed to establish the role of the featured genes as biomarker for sAMS.

Severe acute mountain sickness (sAMS)

genetic susceptibility

SVM-RFE

hypobaric hypoxia

CD4/CD8 ratio

Acute mountain sickness (AMS) is believed as a self-limiting syndrome of non-specific symptoms concerning fatigue, headache, nausea, and dizziness, which may occur in non-acclimatized individuals under acute exposure of high altitute above 2500 m[1]. In certain conditions without medical care, and/ or in certain groups with high risks, it is possible to develop severe AMS (sAMS), sometimes even accompanied with life-threatening situations like cerebral edema and/ or pulmonary edema[2]. Generally, symptoms of mild to moderate AMS can occure early and get a peak within 24 to 72 hours post to high-altitude exposure, which typically overlaps the time duration when cerebral edema or pulmonary edema occurs alongside[1, 3, 4]. The potential continuum from AMS to sAMS, cerebral edema or pulmonary edema suggests that, preventing of sAMS may hold promise to avoid the related events at high altitude. The occurrence of these severe disorders, to a great extend, is determined by the planned altitude, the ascending speed and the individual susceptibility, thus the incidence of these conditions may vary much in different studies[3]. Accordingly, it is usually difficult to predict who is at risk to develop sAMS for the preventing purpose.

Partial pressure of oxygen (PaO₂)[5], the partial pressure of carbon dioxide (PaCO₂)[6], the saturation of oxygen (SaO₂)[7, 8], arterial oxygen content (CaO₂)[9], oxygen tension at 50% haemoglobin saturation (P50)[10] and hemoglobin[11] were thought to be hypoxia-sensitive and evidenced either as independent predictors or factors related to the subsequent development of AMS, however, speculation remains regarding their importance in prediction of AMS. Up to date, the study concerning the prediction value of blood gas testing is rather limited. Blood gas findings are usually inconsistent for possible interference from field or laboratory conditions or individual reasons. Also, statistically significant differences usually require large scale and/ or randomized controlled trials, which are currently almost impossible to complete under high-altitude circumstance. Beside, pulmonary-function testing[12], cardiopulmonary exercise testing[13], and hypoxic exercise testing[14] have been used to assess the risk of hypoxemia, but the applicability of these measurements to high-altitude exposure has not been fully established[2].

Despite the varity of the ascend plan and the individual baseline medical conditions, genetic susceptibility is addressed to explain why AMS and the related events may still occur in certain groups[15], however, the evidence for the genetic susceptibilities to AMS is very rare, still less to sAMS. In the study, microarray data of GSE103297[16] was explored for the genetic background of AMS, and a prediction model of sAMS was established by deep learning of the support vector machine recursive feature elimination (SVM-RFE) method[17], which was clinically applicable as tested within the timeline of GSE103297 cohort, and validated in an isolated cohort GSE52209[18]. Five featured genes (EPHB3, DIP2B, RHEBL1, GALNT13, and SLC8A2) were identified as important regulators for hypoxia-related processes including erythrocyte differentiation, alpha-beta T cell differentiation, and secretion of histamine by mast cells. The study was a preliminary attempt to explore the genetic susceptibility of sAMS, which occurred in almost half of the GSE103297 subjects exposed to very high altitude (5260 m), low barometric pressure (Pb, 406 mmHg), and hypobaric hypoxia (VLH).

Collection and preprocessing of VLH microarray data

VLH microarray data was explored in Gene Expression Omnibus[19] using the keywords “AMS” and “high altitude”, and was collected from the platform GPL6244 in MINiML format under the accession number GSE103297 and GSE52209. GSE103297 was structured on 112 peripheral blood mononuclear cell (PBMC) samples from 21 subjects exposed to sea level then VLH at seven time points, including baseline, the first day noon (d1noon) or post meridiem (d1pm), day 7 (d7), day 16 (d16), post-decent day 7 (post7) or post21. Informations concerning the age, sex, hight, the detected levels of PaO₂, PaCO₂, SaO₂, CaO₂, P50, hemoglobin, Lake Louise Questionnaire (LLQ)-AMS score, and AMS-C-Composite score were included (Table S1). Data of d1 were used to train and establish the model. Data of the baseline, d7, d16, post7 were used to test the model. Isolated data from GSE52209 were applied to validate the model. All the raw data was extracted and preprocessed using the package in RStudio (2022.07.1 + 554), then normalized by log2 transformation using the normalize quantiles function of preprocessCore package. The normalized data was annotated in GPL6244 for the conversion of all probes into gene symbols. Probes mapping to multiple genes were filtered out. The final gene expression value was determined by the mean over multiple detection. The removeBatchEffect function of the limma package was applied to remove the batch effects. the principal component analysis (PCA) was performed prior to further analysis of all the data.

Identification Of Differential Expression Genes (Dgs)

DGs across baseline, d1pm and d7 in the training cohort were identified using the Limma package (version: 3.52.3) in R software. False positive results were corrected via P-value adjustment. The thresholds for the screening of DG mRNAs were defined as P < 0.05 and log2 (fold change, FC) > 1.3 or log2FC < − 1.3.

Function And Pathway Enrichment

Gene ontology (GO) analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, Reactome pathway analysis, WikiPathway analysis, and visualization were performed using ClueGo application in Cytoscape software (version 3.9.1)[20]. Various types of evidence (experimental, computational, author statement from publication, curatorial statement) were based for analysis. Ontologies, pathways, and annotation files were updated before each analysis using GO annotation database (UniProt-GOA)[21]. To identify the representative pathways, medium network specificity was selected, and GO levels varing from 3 to 8, with minimun 3 genes per term and at least 4% from the total associated genes were mapped. GO term fusion shreshold was 50% for group merge. Only terms with P value < 0.05 were displayed with statistical significance.

Cluster Analysis

DGs across baseline, d1pm, and d7 in the training cohort were explored for functional differenation of leukocytes using fuzzy c-means clustering. The mean expression value of DGs at each time point was calculated using the avereps function in the limma package. The expression patterns along the exposure timeline were detected with Mfuzz package. The filter threshold for the expression value was 0.25. According to the result of multiple rounds of training, the number of clusters was expected as 6. Each cluster was compared in relation to immune signatures using GO analysis and leukocyte 22 data matrix (LM22), a leukocyte gene signature matrix containing a total of 547 leukocyte markers[22]. Pearson test was used to estimate the timeline correlation of each cluster to LM22 signatures.

Cibersort Algorithm For The Abundence Of Leukocyte Types In Pbmc Samples

Coupled with LM22, cibersort algorithm was allowed for distinguishing of 22 types of leukocytes, including B cells, plasma cells, T cells, NK cells, monocytes, macrophages, dendritic cells, mast cells, eosinophils, neutrophils, and subtypes of above[23]. The timeline expression data from 112 samples were extracted for analysis. To improve the accuracy of the deconvolution algorithm, 1000 permutations from the default signature matrix were applied to compute the P value and root-mean-square deviation for samples at each time point. The scores for each signature were summarized and median centered to permit timeline comparisons. The paired T test was used to discover the significance between comparisons. The total estimated sum of T cells CD4 (naive, memory resting, and memory activated) and T cells CD8 were based to calculate the CD4/CD8 ratio. All the T cell subtypes were based to estimate the total proportion of T cells. The area under the receiver operating characteristic curve (ROC AUC) was used to predict the best thresholds of SaO₂, P50, hemoglobin,

CaO₂, and estimated proportion of T cells, as binary classifiers of CD4/CD8 ratio, and sAMS (LLQ-AMS score ≥ 6).

Risk Gene Mapping

The interplay effects between estimated CD4/CD8 ratio, LLQ-AMS score, SaO₂, P50, hemoglobin, CaO₂, estimated proportion of T cells, and the underlying hub genes were visualized using nine quadrant diagram and Venn diagram[24]. Weighted correlation network analysis (WGCNA)[25] was conducted to discover the relationships between gene expression patterns and phenotypes. Genes with the expression value above 1 were applied for further analysis. The soft power was estimated at 7. Unsigned scale-free co-expression network for the genes was constructed using the minimum module size (minModuleSize) of 100 and the threshold of 0.25 for merging of modules. Pearson’s correlation value was applied to establish the similarity matrix, adjacency matrix, and the topological overlap matrix between each pair of genes across all samples. Gene co-expression module was detected using dynamic tree cut algorithm, and was constructed with a cut height of 0.975.

Deep Learning To Establish The Prediction Model Of Sams

The phenotype-validated DGs were input to SVM-RFE system for deep learnling, which was run within the e1071 and msvmRFE packages. SVM-RFE is a SVM-based iterative algorithm that works backward from an initial set of features, which is applied to find the optimal hub gene by deleting feature vectors. The input data was from all samples of the training cohort of d1, containing 19 observations (individual subjects), and 2291 features (DGs). We used k = 15 for the k-fold cross validation (CV), and halve.above = 30 to cut the features in half each round until there were fewer than 30 remaining. The entire feature selection and generalization error estimation process were wrapped for five-fold CV. Feature ranking was performed using lapply function based on the average rank across the five folds of accuracy and error estimation. Univariate Cox hazard analysis was applied to assess the prediction performance of the selected features. Multinomial logistics regression, nomogram, survival analysis, AUC, and calibration curve were used to establish, test, and validate the model. Decision curve analysis (DCA) was used to assess the clinical applicablity of the prediction model. The model was tested within the timeline of the training cohort including baseline, d7, d16, and post7, and validated in another cohort, which comprises 17 subjects who developed high altitude pulmonary edema within 48–72 hours after exposure to VLH, 14 normal controls, and 14 high altitude natives(GSE52209)[18]. R packages including Hmisc, lattice, survival, Formula, ggplot2, rmda, ggDCA, rms, SparseM, caret, and pROC were applied in the model development.

Exposure to VLH activated the gene expression in leukocytes

Microarray data of GSE103297 was based on 112 PBMC samples collected from 21 subjects at seven time points of VLH exposure. At baseline, subjects were studied at sea level (130 m, Pb = 749 mmHg). At d1noon or d1pm, d7, d16, post7, or post21, subjects were ascended or reascended to the target altitude (5260 m, Pb = 406 mmHg), sampled, and studied (Fig. 1A). Laboratory values concerning PaO₂ (mmHg), PaCO₂ (mmHg), SaO₂ (%), CaO₂ (ml/dL), P50 (mmHg), hemoglobin (g/dL), LLQ-AMS score, and AMS-C-Composite score were detected and recorded. PCA was conducted to determine the timeline differences in the gene expressions (Fig. 1B). Gene expression patternes at d1pm and d7 indicated an apparent distinction from baseline (increasing along the PC1 axis), but this trend reduced both along the PC1 and PC2 axes at d16 and post7, suggesting the acute exposure to VLH may change the gene expression pattern in monocytes. Then the DGs across the baseline, d1pm, and d7 were determined by P value (< 0.05) and Log2FC (> 1.3 or < -1.3). There were 512 overlapped genes with consistent significance along with the exposure timeline from d1pm to d7 (Fig. 1C, Table S2). We then conducted GO analysis to identify the function of the 512 DGs, and it was indicated that most of them were involved in leukocyte activation (Fig. 1D). 12 DGs enriched in the GO term “leukocyte activation involved in immune response” were further explored for the expression level along the timeline (Fig. 1E). It was indicated that these leukocyte activation-related genes were up-regulated along the timeline with a peak at d7, suggesting leukocyte was activated upon VLH, but it would recover along the exposure timeline.

T Cells Dominated The Genetic Responses Upon Vlh Exposure

To investigate the functional differenation of leukocytes upon VLH exposure, 1644 genes (Table S3) with significant difference from the baseline were functionally clustered along the timeline, and six clusters were yielded (Fig. 2A). Both cluster 1 (299 genes) and cluster 6 (384 genes) were inversely changed across the timeline, with peaked decrease in cluster 1 while increase in cluster 6 at d7. Different expression dynamics were also observed in cluster 2 (213 genes), cluster 3 (281 genes), cluster 4 (123 genes), and cluster 5 (344 genes). The function of each cluster was annotated using LM22[22] (Fig. 2B). As expected, each cluster was functionally associated with different leukocyte types, with T cells CD4/CD8 dominating in cluster 1/2/4/5, T cells regulatory in cluster 4/5, T cells gamma delta in cluster 2/4/5/6, NK cells resting in cluster 6, monocytes in cluster 2/6, dendritic cells activated in cluster 3, and neutrophils in cluster 6. We then compared the biological functions among the clusters using GO, KEGG, Reactome, and WikiPathway database (Fig. 2C). No significant terms were enriched in cluster 5. No relationship was observed in the expression pattern between cluster 2, 3, and 4. Both cluster 1 and 6 overlapped in T cell-associated functions, oxidative stress, and blood coagulation. Cluster 1 was specifically enriched in cell death in response to hydrogen peroxide and regulation of response to reactive oxygen species. Cluster 6 was specifically enriched in platelet activation. Both cluster 1 and 6 were indicated with dramatic altitude or hypoxia sensitivity, and both were functionally associated with T cell activities, suggesting the dominant roles of T cells in the genetic responses to VLH exposure.

Inverted Cd4/cd8 Ratio May Function As The Risk Factor Of Sams

To investigate the timeline abundence of various leukocytes in subjects exposed to VLH, cibersort algorithm[23] was applied in combination with LM22, and the expression profiling at baseline, d1pm, d7, d16, and post7 (Fig. 3A and Table S4). T cells CD8, CD4, NK cells resting, and monocytes were indicated as the major cell types in PBMC samples, with significant decrease in the estimated proportion of CD4 cells at d1pm and d7 vs. baseline, which resulted in the inverted CD4/CD8 ratio at d1pm and d7 (Fig. 3B, Table S5). When predicted using ROC, SaO₂ (AUC = 0.833), P50 (AUC = 0.833), and the serum level of hemogbin (AUC = 0.792) performed well to differeniate subjects with normal or inverted CD4/CD8 ratio (Fig. 3C), suggesting their potential effects in CD4/CD8 banlance. SaO₂ (AUC = 0.617), CaO₂ (AUC = 0.783), P50 (AUC = 0.667), hemogbin (AUC = 0.850), and the estimated proportion of T cells (AUC = 0.633) were also indicated as the possible binary classifiers for sAMS (sAMS or non-sAMS) (Fig. 3D). The interplay effects between CD4/CD8 ratio, SaO₂, P50, hemogbin, the estimated proportion of T cells, and CaO₂ implied that, the inverted CD4/CD8 ratio may function as the potential risk of sAMS.

Genetic Profiling For Sams

To uncover the gene signature underlying the phenotypes of sAMS, subjects in the training cohort were binarily subgrouped with the best predicted thresholds of various classifiers. The interplay effects between classifiers were investigated regarding the expression pattern of DGs across the binary subgroups (Fig. 4A). Significant correlations were observed between SaO₂, P50, hemoglobin, and CD4/CD8 ratio in the expression pattern of related DGs (P < 0.001), meanwhile, as the potential risk factors of sAMS, CD4/CD8 ratio, the estimated proportion of T cells, SaO₂, P50, hemoglobin, and CaO₂ were indicated with significant correlations with LLQ-AMS score (P < 0.001). 2291 risk factor-related DGs were identified, with 328 in the set of CaO₂, 346 in CD4/CD8 ratio, 682 in hemoglobin, 964 in P50, 515 in SaO₂, 263 in the estimated proportion of T cells, and 508 in LLQ-AMS score (Table S6), which were intersected in 2 to 6 ways under the Venn analysis (Fig. 4B). To further identify function modules among the 2291 DGs and the relationship to the risk factors of sAMS, WGCNA was performed using the values of PaO₂, PaCO₂, SaO₂, CaO₂, P50, hemoglobin, LLQ-AMS score, AMS-C-composite score, CD4/CD8 ratio, and the estimated proportion of T cells as trait. Gene co-expression network was constructed and 7 modules were identified using the tools of hierarchical clustering, dynamic tree cut, and merged dynamic (Fig. 4C). Next, we established the module-trait relationships (Fig. 4D). The royalblue module was negatively correlated with SaO₂, but positively with CD4/CD8 ratio. Both purple and pink modules were negatively correlated with hemoglobin, while positively with LLQ-AMS score and AMS-C-composite score. Morever, the purple module seemed to be sensitive to CaO₂, likewise, blue to hemoglobin, turquoise to P50, tan, yellow and turquoise to CD4/CD8 ratio and red, cyan to LLQ-AMS score. The results demonstrated that SaO₂, CaO₂, P50, hemoglobin, LLQ-AMS score, AMS-C-composite score, CD4/CD8 ratio, and the estimated proportion of T cells, as the potential risk factors of sAMS, may impact the disease outcome at the genetic level.

Deep Learning To Establish The Prediction Model Of Sams

To identify the marker genes of sAMS, we contructed a prediction model of sAMS using the deep learning of SVM-RFE, which consists of the classification algorithm and the feature selection algorithm wrapped around, strategized to select or remove some features from the high-dimensional feature set, and obtain the optimum feature subset from various candiate subsets generated. Therefore, SVM-RFE is actually designed to find a hyperplane of the maximized marginal distance with the best differentiating performance between the two categories of the dataset, which is represented with the weight vector W^T, feature vector X, and the threshould b as following: W^TX + b= 0 (Fig. 5A). Obviously, when W^TX + b= 0, the sum of the marginal distances from the hyperplane to the closest features (D1 + D2) is maximized, however, it would be indicated with the poorest differentiating performance and accuracy whenever W^TX + b= 1 or -1. In deep learning of SVM-RFE (Fig. 5B), the initial features (M = 2291) were input for classifier-training, with the relevance of the n–th entry of X determined by the corresponding value W_n in W^T(n = 1, 2,…M). Then, in each fold (k = 15) of CV, the concrete number of features (τ = 30) with the lowest absolute values of W_n were rejected. The max accuracy was determined by the entire feature selection and error estimation process (five-fold CV). The top 14 ranked features (Table S7) with the highest five-fold CV accuracy (Fig. 5C) or the lowest error (Fig. 5D) were selected for further analysis. 10 of 14 with significant predictive power (P < 0.05) (Table 1) were used to build the model (C-index = 1, P < 0.01) (Table S8) and nomogram (Fig. 5E). In the training cohort of d1, the model was indicated with excellent prediction performance for sAMS as analysed using ROC (AUC = 1) (Fig.S1A), calibration curve (Fig. 5F), and survival analysis (R² = 5.011, P = 0.024) (Fig. 5G). When tested using AUC within the timeline of the training cohort over baseline (AUC = 0.600), d7 (AUC = 0.691), d16 (AUC = 0.673), post7 (AUC = 0.633), and validated in the validation cohort (AUC = 0.626) (Fig. S1B-F), the model was indicated with satisfactory predictive accuracy between the actual probability and the predicted. To assess the clinical applicablity of the model, we also established a single-gene (OR10G8) model (C-index = 0.764) using the baseline data of the training cohort (Fig S2, Table S9), and a three-gene model (B4GALT4, DIP2B, GALNT13) (C-index = 0.897) (Fig S3) based on the validation-cohort data (Table S10, S11). All the models were indicated with overall net benefits varing from 53–100% when assessed using DCA (Fig. 5H).

Table 1

Univariate logistic regression analysis
Dependent variables	Coefficient	Standard error	Statistics	OR	P value
ACSM1	2.639	1.126	2.340	14.001	0.019
B4GALT4	4.277	1.495	2.860	72.002	0.004
DHX58	2.639	1.126	2.340	14.001	0.019
DIP2B	2.639	1.126	2.340	14.001	0.019
EPHB3	-3.466	1.323	-2.620	0.031	0.009
GALNT13	2.639	1.126	2.340	14.001	0.019
IFNA5	-1.099	0.957	-1.150	0.333	0.251
MARVELD3	4.277	1.495	2.860	72.002	0.004
OR10G8	-2.100	1.058	-1.990	0.122	0.047
OR5B3	4.277	1.495	2.860	72.002	0.004
RHEBL1	-3.466	1.323	-2.620	0.031	0.009
SLC8A2	1.540	0.988	1.560	4.666	0.119

Micrornas (Mirs) Mediated The Effects Of The Featured Genes In The Development Of Sams

To further explore the roles of the 14 featured genes in the development of sAMS, miRs were predicted using the miR function of FunRich (version 3.1.3). There are 29 homo sapiens (hsa)-miRs identified in five featured genes, including 2 from EPHB3, 3 from DIP2B, 8 from RHEBL1, 3 from GALNT13, and 13 from SLC8A2, which were targeted to 3710 miR targets (Fig. 6, Table S12). We next wanted to know the biological functions of the miR targets. As expected, most targets were enriched in terms related to lymphocyte activities under GO analysis (73.69%) (Fig. 7). Furthermore, there were 5.26% enriched in histamine secretion, 2.56% in erythrocyte differentiation, and 15.79% associated with the regulation of myeloid cell differentiation (Table S13). Accordingly, several meaningful pathways of the featured genes were identified, including GALNT13-(hsa-miR-124-3p/ 506-3p)–RCOR1 (Fig. 8A), SLC8A2/ DIP2B-(hsa-miR-133a-3p/ 133b)-RVAMP2/ SLC4A1 (Fig. 8B and C), RHEBL1-( hsa-miR-19a/ b-3p)-HIF1A, and EPHB3-(hsa-miR-149-5p)-IL6, which were functionally related to erythrocyte differentiation, alpha-beta T cell differentiation, and histamine secretion by mast cells (Fig. 9). These results suggested the important roles of the featured genes on sAMS, which were mediated by miRs and their downstream targets.

The study was based on the microarray dataset abstracted from GSE103297[16], a well-established dataset including 112 PBMC samples from 21 subjects, who were rappidly exposed or re-exposed to the very high altitude of 5260 m after mutiple periods of hypoxia acclimatization varied from 48 h to 21 days. Data at baseline, d1pm, d7, d16, and post7 from 21 subjects who have completed all the planned testings was extracted for further analysis. At the first day of exposure, 10 of 21 were diagnosed with sAMS (LLQ-AMS score ≥ 6), a severe condition implying the possibility of life-threatening events. Though all of them were recovered from a 3-night acclimatizing at 3800 m followed by a prolonged stay at 5260 m for 13 days (d16), we still wondered why some of them were at risk of sAMS, but others not, even under almost the same condition of exposure. In this study, the genetic basis underlying the pathological and physiological responses to VLH exposure were investigated, aimed to identify the ones who are vulnerable to sAMS or related events.

The gene expression patterns at d1pm, and d7 varied from baseline, but recovered after several acclimatization days at d16, and post7(Fig. 1B), suggesting the genetic responses upon acute VLH exposure. Most DGs across baseline, d1, and d7 (Fig. 1C) were involved in immune cell activation (Fig. 1D), with a continuous upregulation from baseline to the peak of d7 in those related to leukocyte activation upon immune response (Fig. 1E). Similar patterns of immune activation triggered by high-altitude exposure (3232 m) were also observed in other study[26], with immune responses sensitized at the early phase of high-altitude exposure. Furthermore, peak changes in clustered DGs were observed at d7 (Fig. 2A), with the functions related to T cells (gamma delta, CD8, CD4 naive, CD4 memory resting, and CD4 memory activated) dominating in DG clusters (Fig. 2B). Accordingly, these up- or down-regulated DGs were functionally-related to platelet activation, oxidative stress, and/ or T cell differentiation (Fig. 2C), which were considered essential to the occurrence of AMS, the subsequent development of sAMS, and/ or the related events[27–31]. These results implied that T cells dominated the genetic responses to VLH exposure.

The proportion of T cells CD4 was indicated with a significant decrease in the subjects from baseline to d7 (Fig. 3A) and thereof the inverted CD4/CD8 ratio (Fig. 3B), which was also reported in other high-altitude populations[31], reminding us the imbanlanced immunity and the susceptibility to sAMS. Interestingly, similar timeline changes were observed in CD4/CD8 ratio, LLQ-AMS score, and other laboratory values detected, all with peaked values at d1pm or d7, then recovered from d7 to d16 (Fig. S4), implying their contributions to sAMS as risk factors. Then, CD4/CD8 ratio, SaO₂, CaO₂, P50, hemogbin, and the estimated proportion of T cells were further investigated for the interplay effects between them, aimed to identify the underlying risk genes of sAMS (Fig. 3C and D). 2291 risk genes were mapped (Fig. 4) for classifier-training, and 14 gene classifiers were identified to establish the model using SVM-RFE (Fig. 5A-D). We established a ten-gene model of genetic susceptibility to sAMS (Fig. 5E) with excellent discrimination (C-index = 1, AUC = 1) and satisfactory predictive accuracy as assessed using ROC and survival analysis (Fig. 5F, G). We also constructed a one-gene model and a three-gene model. All the models were indicated with good clinical applicablity as assessed by the overall net benefits over risks (Fig. 5H), suggesting the roles of the modeled genes as predictive markers for sAMS.

Limited evidences have indicated that certain miRs may function as biomarkers for AMS[32, 33], or play roles on acute hypoxia and hypoxia-induced pulmonary vascular leakage[34]. In subjects exposed to a height of 3100 m, miR-424 was overexpressed in a HIF1A-dependent manner, which in turn can stabilize HIF1A. In our study, 29 miRs and 3710 miR targets were identified from five genes (EPHB3, DIP2B, RHEBL1, GALNT13, and SLC8A2) (Fig. 6), which were associated with multiple biological processes as evidenced by GO ayalysis (Fig. 7). We have identified 260 important miR-mediated signalling pathways concerning erythrocyte differentiation, alpha-beta T cell differentiation, and histamine secretion (Fig. 9). As one of the members of sodium-calcium exchangers, SLC8A2 has been previously evidenced as a nuclear translocation regulator of HIF1A[35], which was significantly down-regulated upon SLC8A2 overexpression[36]. Our study indicated that SLC8A2 acts upstream of multiple hypoxia and/ or altitude-sensitive miR targets, like RCOR1 (a transcription rheostat essential for normal myeloerythroid lineage differentiation)[37, 38]; PRDM1 and LDB1(both are involved in high-altitude adaptation)[39, 40]; and CASP3 (a member of hypoxia-activated mitochondrial apoptosis pathway)[41]. We noticed that, both hsa-miR-133a-3p and hsa-miR-133b-3p mediate the signals from SLC8A2 or DIP2B to SLC4A1, a biomarker of AMS, which was correlated with various AMS symptoms and plays important roles in CO₂ gas transport in erythrocytes[42]. The shared miR targets of SLC8A2 and DIP2B also include TMOD3, which has been evidenced as a candidate biomarker for high-altitude pulmonary hypertension in Kyrgyz highlanders[43]. Interestingly, GALNT13 has been previously certaintified as a risk gene relevant to sickle cell disease-associated pulmonary hypertension, which may play roles in endothelial permeability[44, 45]. Our results showed that GALNT13 interact with multiple miR targets related to histamine secretion, and hypoxia-induced activities in erythrocytes and T cells, suggesting its potential effects in pulmonary vascular and ventricular injuries. More importantly, as miR products of RHEBL1 (a member of Ras superfalily)[46], both hsa-miR-19a-3p and hsa-miR-19b-3p were indicated as mediators of HIF1A[47, 48], suggesting the important roles in hypoxia-related biological processes. Furthermore, EPHB3 (a proliferation suppressor in ambient and hypoxic environments)[49]-(hsa-miR-149-5p)-IL6 pathway was also believed essential for hypoxic responses as the underlying associations to hypoxia and inflammation. We failed to identify miR products or miR-mediated signals related to the other 9 featured genes, whereas, some of them were previously argued as hypoxia-sensitive genes, like ACSM1, a member of lipoic acid salvage pathway controlling HIF1 activation[50, 51]. Obviously, as potential predictors or biomarkers of sAMS, the 14 featured genes still remain far from bing uncovered regarding their roles and mechanisms in the development of sAMS.

Because of the infeasibility, it is almost not possible to conduct large-scale trial at high altitude, especially at extreme conditions like VLH. This study was based on microarray data from 112 PBMC samples of 21 subjects exposed to VLH, aimed to explore the genetic susceptibility of sAMS for the preventing purpose. Using the deep learning of SVM-RFE, we identified 14 classifier genes and established a prediction model of sAMS, which performed well in predicting or differentiating subjects suffered from sAMS. However, more studies are still needed to corroborate the existing findings related to the predictive or differentiating power of the model, and to establish the role of the modeled genes as biomarker for sAMS.

AMS	Acute mountain sickness
AUC	Area under the receiver operating characteristic curve
CaO₂	Arterial oxygen content
CV	Cross validation
DCA	Decision curve analysis
d1noon	Day 1 noon
d1pm	Day 1 post meridiem
DGs	Differential expression genes
FC	Fold change
GO	Gene ontology
hsa	Homo sapiens
KEGG	Kyoto Encyclopedia of Genes and Genomes
LM22	Leukocyte 22 data matrix
LLQ	Lake Louise Questionnaire
miRs	MicroRNAs
PaCO₂	Partial pressure of carbon dioxide
PaO₂	Partial pressure of oxygen
Pb	Barometric pressure
PBMC	Peripheral blood mononuclear cell
PCA	Principal component analysis
P50	Oxygen tension at 50% haemoglobin saturation
post7	Post-decent day 7
ROC	Receiver operating characteristic curve
sAMS	Severe acute mountain sickness
SaO₂	Saturation of oxygen
SVM-RFE	Support vector machine recursive feature elimination
VLH	Very high altitude, low barometric pressure, and hypobaric hypoxia
WGCNA	Weighted correlation network analysis

Supplementary information

Supplementary materials are available in the online version of the article.

Funding and acknowledgements

This work was supported by Natural Science Foundation of Si Chuan Province (2023NSFSC0528), Institute Management Project (2021-XZYG-B34) of the General Hospital of Western Theater Command. We would like to give special thanks to all the contributors of GSE103297 and GSE52209 for their meaningful data. We thank Dr. Haijing Wang from Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, for her helpful advice in regard to the use of R packages.

Author contributions

M.Y., Y.W., XB. Y. conceived and designed the study. M.Y., Y.L.and MY. G acquired funding for the study and provided the necessary support. M.Y., J.T., T.L., W.L., J.Y. collected and analyzed the microarray data. M.Y., Y.W., Y.Z. YUE.Z. and Y.L. conducted the image analysis and interpretation, and wrote the manuscript. All the authors edited the manuscript.

Availability of data and materials

Data generated and described in this article are available from the corresponding web servers, and are freely accessible to any scientist wishing to use them noncommercially. On reasonable request, the corresponding author can provide further information.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

There are no competing interests between the authors.

Supplementary materials

Supplementary materials are available in the online version of the article.

Berger MM, Sareban M, Bärtsch P. Acute mountain sickness: Do different time courses point to different pathophysiological mechanisms? J Appl Physiol 2020; 128: 952-9.
Luks AM, Hackett PH. Medical conditions and high-altitude travel. N Engl J Med 2022; 386: 364-73.
Turner RE, Gatterer H, Falla M, Lawley JS. High-altitude cerebral edema: its own entity or end-stage acute mountain sickness? J Appl Physiol 2021; 131: 313-25.
Swenson ER. Early hours in the development of high-altitude pulmonary edema: time course and mechanisms. J Appl Physiol 2020; 128: 1539-46.
Cobb AB, Levett DZ, Mitchell K, Aveling W, Hurlbut D, Gilbert‐Kawai E, et al. Physiological responses during ascent to high altitude and the incidence of acute mountain sickness. Physiol Rep 2021; 9: e14809.
Douglas DJ, Schoene RB. End-Tidal Partial Pressure of Carbon Dioxide and Acute Mountain Sickness in the First 24 Hours Upon Ascent to Cusco Peru (3326 meters). Wilderness Environ Med 2010; 21: 109-13.
Martin B, Michael P, Hannes G, Johannes B, Martin F, Werner N, Rudolf L. Physiological Responses in Humans Acutely Exposed to High Altitude (3480 m): Minute Ventilation and Oxygenation Are Predictive for the Development of Acute Mountain Sickness. High Alt Med Biol 2019; 20: 192-7.
Mazur K, Machaj D, Jastrzębska S, Płaczek A, Mazur D. Prediction of the development and susceptibility to acute mountain sickness (AMS) by monitoring oxygen saturation (SpO2) – literature review. J Educ Health Sport 2020; 10: 79-84.
Duffin J, Hare GM, Fisher JA. A mathematical model of cerebral blood flow control in anaemia and hypoxia. J Physiol 2020; 598: 717-30.
Dominelli PB, Baker SE, Wiggins CC, Stewart GM, Sajgalik P, Shepherd JR, et al. Dissociating the effects of oxygen pressure and content on the control of breathing and acute hypoxic response. J Appl Physiol 2019; 127: 1622-31.
Zubieta-Calleja GR, Zubieta-DeUrioste N. High Altitude Pulmonary Edema, High Altitude Cerebral Edema, and Acute Mountain Sickness: an enhanced opinion from the High Andes–La Paz, Bolivia 3,500 m. Rev Environ Health in press. doi:10.1515/reveh-2021-0172.
Small E, Juul N, Pomeranz D, Burns P, Phillips C, Cheffers M, et al. Predictive capacity of pulmonary function tests for acute mountain sickness. High Alt Med Biol 2021; 22: 193-200.
Minder L, Schwerzmann M, Radtke T, Saner HE, Eser PC, Wilhelm M, et al. Cardiopulmonary Response to Exercise at High Altitude in Adolescents with Congenital Heart Disease. Congenit Heart Dis 2021; 16: 597-608.
Georges T, Menu P, Le Blanc C, Ferreol S, Dauty M, Fouasson-Chailloux A. Contribution of Hypoxic Exercise Testing to Predict High-Altitude Pathology: A Systematic Review. Life 2022; 12: 377.
MacInnis MJ, Koehle MS. Evidence for and against genetic predispositions to acute and chronic altitude illnesses. High Alt Med Biol 2016; 17: 281-93.
Subudhi AW, Bourdillon N, Bucher J, Davis C, Elliott JE, Eutermoster M, et al. AltitudeOmics: the integrative physiology of human acclimatization to hypobaric hypoxia and its retention upon reascent. PloS one 2014; 9: e92191.
Sanz H, Valim C, Vegas E, Oller JM, Reverter F. SVM-RFE. selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics 2018; 19: 432.
Tomar A, Malhotra S, Sarkar S. Polymorphism profiling of nine high altitude relevant candidate gene loci in acclimatized sojourners and adapted natives. BMC Genet. 2015; 16: 112.
Chen G, Ramírez JC, Deng N, Qiu X, Wu C, Zheng WJ, et al. Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis. Database 2019; 2019:1-8.
Mlecnik B, Galon J, Bindea G. Automated exploration of gene ontology term and pathway networks with ClueGO-REST. Bioinformatics (Oxford, England) 2019; 35: 3864-6.
Courtot M, Shypitsyna A, Speretta E, Holmes A, Sawford T, Wardell T, et al. UniProt-GOA: A central resource for data integration and GO annotation. SWAT4LS 2015; 2015: 227-8.
Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Meth 2015; 12: 453-7.
Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 2019; 37: 773-82.
Jia A, Xu L, Wang Y. Venn diagrams in bioinformatics. Brief Bioinform. 2021; 22: bbab108.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008; 9: 1-13.
Feuerecker M, Crucian BE, Quintens R, Buchheim JI, Salam AP, Rybka A, et al. Immune sensitization during 1 year in the Antarctic high‐altitude Concordia Environment. Allergy 2019; 74: 64-77.
Lackermair K, SCHÜTTLER D, Kellnar A, Schuhmann CG, Weckbach LT, Brunner S. Combined Effect of Acute Altitude Exposure and Vigorous Exercise on Platelet Activation. Physiol Res 2022; 71: 171.
Lackermair K, Schuhmann CG, Mertsch P, Götschke J, Milger K, Brunner S. Effect of acute altitude exposure on serum markers of platelet activation. High Alt Med Biol 2019; 20: 318-21.
Pena E, El Alam S, Siques P, Brito J. Oxidative Stress and Diseases Associated with High-Altitude Exposure. Antioxidants 2022; 11: 267.
Liu B, Chen J, Zhang L, Gao Y, Cui J, Zhang E, et al. IL-10 dysregulation in acute mountain sickness revealed by transcriptome analysis. Front Immunol 2017; 8: 628.
Bai J, Li L, Li Y, Zhang L. Genetic and immune changes in Tibetan high-altitude populations contribute to biological adaptation to hypoxia. Environ Health Prev Med 2022; 27: 39.
Liu B, Huang H, Wu G, Xu G, Sun B-D, Zhang E-L, et al. A signature of circulating microRNAs predicts the susceptibility of acute mountain sickness. Front Physiol 2017; 8: 55.
Huang H, Dong H, Zhang J, Ke X, Li P, Zhang E, et al. The role of salivary miR-134-3p and miR-15b-5p as potential non-invasive predictors for not developing acute mountain sickness. Front Physiol 2019; 10: 898.
Tsai S-H, Huang P-H, Hsu Y-J, Chen Y-W, Wang J-C, Chen Y-H, et al. Roles of the Hypoximir microRNA-424/322 on Acute Hypoxia and Hypoxia-Induced Pulmonary Vascular Leakage. FASEB J 2019; 33:1-9.
Liu H, Yu J, Yang L, He P, Li Z. NCX2 Regulates Intracellular Calcium Homeostasis and Translocation of HIF-1α into the Nucleus to Inhibit Glioma Invasion. Biochem Genet in press.doi: 10.1007/s10528-022-10274-9.
Qu M, Yu J, Liu H, Ren Y, Ma C, Bu X, et al. The Candidate Tumor Suppressor Gene SLC8A2 Inhibits Invasion, Angiogenesis and Growth of Glioblastoma. Mol Cells 2017; 40: 761-72.
Rivera C, Lee H-G, Lappala A, Wang D, Noches V, Olivares-Costa M, et al. Unveiling RCOR1 as a rheostat at transcriptionally permissive chromatin. Nat Commun. 2022; 13: 1-15.
Yao H, Goldman DC, Fan G, Mandel G, Fleming WH. The corepressor rcor1 is essential for normal myeloerythroid lineage differentiation. Stem cells. 2015; 33:3304-13.
Stobdan T, Akbari A, Azad P, Zhou D, Poulsen O, Appenzeller O, et al. New insights into the genetic basis of Monge’s disease and adaptation to high-altitude. Mol Biol Evol 2017; 34: 3154-68.
Jin M, Lu J, Fei X, Lu Z, Quan K, Liu Y, et al. Selection Signatures Analysis Reveals Genes Associated with High-Altitude Adaptation in Tibetan Goats from Nagqu, Tibet. Animals 2020; 10: 1599.
Hou Y, Wang X, Chen X, Zhang J, Ai X, Liang Y, et al. Establishment and evaluation of a simulated high‑altitude hypoxic brain injury model in SD rats. Mol Med Rep 2019; 19: 2758-66.
Yang J, Jia Z, Song X, Shi J, Wang X, Zhao X, et al. Proteomic and clinical biomarkers for acute mountain sickness in a longitudinal cohort. Commun Biol 2022; 5: 548.
Iranmehr A, Stobdan T, Zhou D, Poulsen O, Strohl KP, Aldashev A, et al. Novel insight into the genetic basis of high-altitude pulmonary hypertension in Kyrgyz highlanders. Eur J Hum Genet 2019; 27: 150-9.
Desai AA, Zhou T, Ahmad H, Zhang W, Mu W, Trevino S, et al. A novel molecular signature for elevated tricuspid regurgitation velocity in sickle cell disease. Am J Respir Crit Care Med 2012; 186: 359-68.
Maron BA, Machado RF, Shimoda L. Pulmonary vascular and ventricular dysfunction in the susceptible patient (2015 Grover conference series). Pulm Circ 2016; 6: 426-38.
Zhang Z, Ma L, Fan X, Wang K, Liu L, Zhao Y, et al. Targeted Sequencing Identifies the Genetic Variants Associated with High-altitude Polycythemia in the Tibetan Population. Indian J Hematol Blood Transfus 2022; 38: 556-65.
Tian H, Qiang T, Wang J, Ji L, Li B. Simvastatin regulates the proliferation, apoptosis, migration and invasion of human acute myeloid leukemia cells via miR-19a-3p/HIF-1α axis. Bioengineered 2021; 12: 11898-908.
Liu H, Shi C, Deng Y. MALAT1 affects hypoxia-induced vascular endothelial cell injury and autophagy by regulating miR-19b-3p/HIF-1α axis. Mol Cell Biochem 2020; 466: 25-34.
Assis-Nascimento P, Tsenkina Y, Liebl DJ. EphB3 signaling induces cortical endothelial cell death and disrupts the blood–brain barrier after traumatic brain injury. Cell Death Dis 2018; 9: 1-15.
Bailey PS, Hiltunen JK, Dieckmann CL, Kastaniotis AJ, Nathan JA. Different opinion on the reported role of Poldip2 and ACSM1 in a mammalian lipoic acid salvage pathway controlling HIF-1 activation. Proc Natl Acad Sci USA 2018; 115: E7458-E9.
Paredes F, Williams H, Martin AS. 258 - Poldip2 is an Oxygen-sensitive Mitochondrial Protein that Controls Oxidative/glycolytic Metabolism Balance and Proteasome Activity. Free Radic Biol Med 2017; 112: 173-4.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Establishing a prediction model of severe acute mountain sickness using deep learning of support vector machine recursive feature elimination

Status:

Version 1

Abstract

Background

Methods

Results

Conclusions

Figures

Background

Methods

Collection and preprocessing of VLH microarray data

Identification Of Differential Expression Genes (Dgs)

Function And Pathway Enrichment

Cluster Analysis

Cibersort Algorithm For The Abundence Of Leukocyte Types In Pbmc Samples

Risk Gene Mapping

Deep Learning To Establish The Prediction Model Of Sams

Results

Exposure to VLH activated the gene expression in leukocytes

T Cells Dominated The Genetic Responses Upon Vlh Exposure

Inverted Cd4/cd8 Ratio May Function As The Risk Factor Of Sams

Genetic Profiling For Sams

Deep Learning To Establish The Prediction Model Of Sams

Micrornas (Mirs) Mediated The Effects Of The Featured Genes In The Development Of Sams

Discussion

Conclusions

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1