2.1. Data selection and processing
The sepsis RNA expression data was downloaded from the Gene Expression Omnibus (GEO; http:// www.ncbi.nlm.nih.gov/geo/). Data from the GSE65682[18] (Platforms: GPL13667, Affymetrix Human Genome U219 Array) was used as a train set for differential expression analysis. This dataset was part of the Molecular Diagnosis and Risk Stratification of Sepsis (MARS) project, conducted between January 2011 and December 2013 in the mixed ICUs of two tertiary teaching hospitals in the Netherlands. [19-22] Total 802 samples were collected, including 760 intensive care unit (ICU) patient samples with sepsis and 42 healthy controls. We extracted four subgroups such as community acquired pneumonia (CAP), hospital acquired pneumonia (HAP) , abdominal sepsis(AS) and healthy control(HC) for differentially expressed genes (DEGs) analysis(Table 1). An advantage of this method is that normalization occurs at the probe level (rather than at the probeset level) across all of the selected hybridizations [23].
GSE134364[24] (Platforms: GPL17586, [HTA-2_0] Affymetrix Human Transcriptome Array 2.0) was also a big sepsis dataset downloaded from GEO,which contained total 334 samples, including 215 samples with sepsis and 83 healthy controls. This dataset was used as a validation set for genes of interest (GOIs) expression analysis.
Table 1. Characteristics of each subgroups in GSE65682 dataset.
Group
|
Male
|
Female
|
n.
|
Age(y)
|
n.
|
Age(y)
|
CAP
|
63
|
60.8±14.9
|
45
|
61.2±18.7
|
HAP
|
54
|
61.6±15.2
|
30
|
61.4±15.6
|
AS
|
27
|
63.2±9.3
|
24
|
60.3±14.4
|
HC
|
18
|
48±22.3
|
24
|
44.8±17.7
|
These datasets were selected from other datasets with reference to the following criteria: i) dataset providing data information about gender and age; ii) the presence of corresponding controls in the same dataset (healthy individuals or individuals scheduled for elective surgery); iii) available processed data and iv) samples collected within 24 h after ICU admission. The exclusion criteria include: i) animal and pediatric studies and ii) datasets with small number of samples.
2.2. Identification of sex related DEGs
R version 4.1.3 (R Foundation for Statistical Computing, Vienna, Austria) was used for data extraction and sorting to obtain the gene expression matrices. To obtain differentially expressed genes (DEGs), the limma package (3.52.1) in R was used to identify DEGs of man samples versus woman samples in the four subgroups of the GSE65682 transcriptome data. The adjusted P-values (adj.P.value) and Benjamini and Hochberg false discovery rate were applied to provide a balance between discovery of statistically significant genes. |log2-fold change (FC)| >0.5 and adj.P.values <0.05 were considered statistically significant. The Venn App (Origin 2022, OriginLab Corporation, Northampton, USA) was used to overlap the lists of sex related differentially expressed genes (DEGs) from the four subgroups. Then, a set of sex related DEGs was selected based on those overlapping. In addition, we selected only genes classified as “protein-coding” for further analysis.
2.3. Correlation enrichment analyses of DEGs
All sex related DEGs were selected for enrichment analysis to reveal their biological function and signaling pathways. Gene Ontology (GO) term enrichment analysis was acquired from the the PANTHER classification system[25] (http://www.pantherdb.org), searching the following categories: GO Biological Process(BP), GO Molecular Function(MF), GO Cellular Component(CC), and PANTHER pathway[26]. FDR<0.05 was considered statistically significant. GO annotation is a main bioinformatics tool to annotate genes and analyze biological process of DEGS. The PANTHER is a unique resource that classifies genes into canonical pathways to predict function[27].
2.4. Statistical Analysis of Gene expression in subgroups
In order to further study the expression change of sex related DEGs in subgroups, we extracted the mean expression values of males and females in CAP, HAP, AS and HC individually and form a distribution heat map. The mean expressions were extracted by the R package “tidyverse”. The heat map and cluster analysis of differential gene distribution were performed by the R package “pheatmap”. One-way ANOVA was used for inter group test, and Turkey test was used for mean comparison. Log2FC is obtained by calculating the average expression ratio of genes in sepsis group and healthy control group and taking the logarithm based on 2. P value < 0.05 were considered statistically significant. Sex related DEGs of |log2 (FC) | >0.4 were our genes of interest (GOIs). Statistical analysis was performed with Origin 2022 (OriginLab Corporation, Northampton, USA).
2.5. Gene Set Enrichment Analysis
GSEA was performed using the R package clusterProfiler in the four groups of AS, CAP, HAP and HS to discover the significant functional difference between male and female. Significant pathway enrichment was identified by the normalized enrichment score (|NES| >1), P value <0.05, and FDR q value <0.05[28]. The top five terms of HALLMARK analyses from MSigDB were exhibited respectively. Venn diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/) was used for graphical depiction of the unions, intersections and distinctions among different group HALLMARK gene sets, and a list of those intersections in all groups and sepsis groups was generated.
2.6. Validation GOIs
The dataset GSE134364 was used for validation analysis. First, the expression quantities related to the list of sex related DEGs were selected, and then the dataset was divided into four groups according to sex difference in sepsis and healthy group. One-way ANOVA is performed for each group, P values < 0.05 were considered statistically significant.
2.7. Expression of GOIs in normal tissues
RNA-seqs of GOIs were download from HPA RNA-seq normal tissues project in NIH (National Library of Medicine, USA) (https://www.ncbi.nlm.nih.gov/gene). RNA-seq was performed of tissue samples from 95 human individuals representing 27 different tissues in order to determine tissue-specificity of all protein-coding genes.