Study search and selection
A systematic review and selection of studies were conducted in September of 2022. The search was performed in two public databases for transcriptomic data - GEO [10] and ArrayExpress [11] - according to PRISMA [12] statement guidelines and carried out with the keywords "spinal cord injury." The inclusion criteria were: (1) organism, Rattus norvegicus; (2) tissue, thoracic spinal cord; and (3) RNA sequencing or microarray data from Affymetrix or Agilent used as gene expression platforms. The exclusion criteria were: (1) studies without control samples (sham or naive rats); (2) individual samples from rats that had received any type of treatment; and (3) samples rostral or caudal to the epicenter of the injury.
We classified injury severity into moderate and severe injuries. The severe group included lesions resulting from a spinal cord impact of ≥ 200 kdyn, a 10 g weight drop from 50 mm, or a complete transection of the spinal cord; meanwhile, we considered injuries not meeting these criteria as moderate (Additional File 2). Additionally, we defined four phases of injury based on the timeline post-injury and when samples were collected: the acute phase (0–3 days, T1), the subacute phase (4–14 days, T2), the early chronic phase (15–35 days, T3), and the late chronic phase (over 35 days, T4). Control groups included samples from rats either sham-operated or naïve rats.
Data preprocessing
For microarray studies, normalized data were downloaded with the GEOquery package and probes were annotated to their corresponding gene symbol using platform-specific annotation packages. For RNA-seq studies, raw count matrices were manually downloaded from GEO and identifiers were converted to gene symbols using the org.Rn.eg.db package. In all cases, the median of the expression values of duplicated identifiers was calculated.
Exploratory analysis
An exploratory analysis was performed (including boxplot, PCA, and clustering analysis) to observe sample distribution in the groups of interest and evaluate anomalous behavior and possible batch effects. Samples from the GSE183591 study were taken in two different series, separated into two blocks in the PCA. This batch effect was taken into consideration for DGE analysis. The negative values detected in boxplots for the GSE2599 and GSE464 datasets were normalized by adding the minimum of all values followed by a logarithmic transformation of the data.
Differential gene expression analysis
Each injury group was compared with its corresponding control within the study to identify DEGs. DGE analysis of microarray platforms was carried out using the limma package [66]. To ensure comparable results between technologies, RNA-seq datasets were analyzed using the limma-voom pipeline, which involves the removal of genes with low counts and a logarithmic transformation of the expression matrix. p-values were corrected with the Benjamini–Hochberg (BH) method [67].
Gene expression meta-analysis
DGE results were integrated to obtain a transcriptional consensus signature for each comparison. A meta-analysis was performed for each gene in at least two studies in each comparison. The random-effects model proposed by Der Simonian & Laird [68], implemented in the metafor package [69], was used to evaluate the combined effect. This method weighs the heterogeneity of each study and incorporates their inherent variability into the overall estimate of the measure of effect. Consequently, studies with greater variability will have lower weight in this measure. In this case, logFC was used to measure the effect, while the variance was used to measure variability. Thus, values for the p-value, logFC, and 95% confidence interval (CI) were calculated for each gene evaluated in the meta-analysis. Since multiple meta-analyses were performed, p-values were adjusted using the BH method. Adjusted p-values lower than 0.1 were considered significant. Funnel and forest plots were used to assess the variability and measure each study's contribution to the meta-analysis.
Clustering analysis
All clustering analyses were performed on the gene consensus signature matrix. We constructed this matrix with consensus logFC values from the meta-analysis, selecting genes evaluated in all groups and significant in at least one group. The mixOmics package for PCA was used [70]. For hierarchical clustering analysis, the Euclidean distance was first calculated as input for the hclust function.
Biomarker gene identification
In this context, biomarker genes were defined as those displaying more significant deregulation in certain groups of interest than others, allowing the stratification of different groups based on their expression pattern. A limma test was applied to identify possible biomarker genes that could classify samples according to severity/phase. The four severe were compared against the four moderate groups of samples to elucidate severity-specific biomarker genes. Phase-specific biomarker genes common to both severities and specific to moderate or severe injury were also explored. Detailed comparisons are listed in Additional File 4. Subsequently, the top 10 genes with the lowest p-values from each comparison were selected to assess their classification ability via clustering analysis comprising PCA and hierarchical clustering.
Gene set analysis
A functional enrichment analysis was performed for each gene consensus signature with the GSA method implemented in the mdgsa R package [71]. A gene ranking and functional annotation are required as input for GSA. The ranking is made by ordering all genes according to the p-value obtained in meta-analysis and the logFC value sign. For functional enrichment annotation, three annotation databases have been included: the biological processes of Gene Ontology [72], KEGG pathways [73], and Reactome pathways [74]. Gene sets with less than 10 or more than 500 genes were excluded. P-values were corrected with the BH method.
Transcription factors activities
The identification of differentially active TFs used the msviper function of the viper package [75] with each of the gene consensus signatures as input. Mouse regulons from the DoRothEA package [76] with a confidence level of A, B, C, or D were selected, excluding those with less than 10 genes (Additional File 11). The p-values were corrected with the BH method.
Construction of gene co-expression networks
First, in order to find genes with similar expression patterns we used the clust software [77] with the following arguments: -n 0 -t 100 -cs 4. Gene consensus signatures matrix from moderate and severe injuries were used separately as input. A PPI was then constructed for each cluster of co-expressed genes obtained from clust software. The STRINGdb package with the interactions of STRING version 11.5 [78] and medium confidence of 0.400 were used to build PPI networks. The PPI network of each cluster was divided into different subnetworks of more closely related genes using the fastgreedy algorithm implemented in the get_clusters function.
Each subnetwork was then classified into manually pre-defined functional categories of interest. The get_enrichment function was first used to obtain overrepresented functional terms in each subnetwork. Subsequently, the biological processes were selected from the Gene Ontology, KEGG, and Reactome pathways, ad hoc classifying them into pre-established general categories. Finally, the frequency distribution of each category was calculated for each subnetwork, assigning the category with the highest frequency as that which best defines the network.
Bioinformatic validation datasets
For the mouse transcriptomic dataset, a normalized count matrix provided from the supplementary material provided by Li et al. [28] (https://doi.org/10.6084/m9.figshare.17702045) was downloaded. Samples representative of the four temporal phases established in our study (1, 7, 28, and 42 dpi) and controls were selected (Additional File 19). For the GSE218088 dataset, the normalized expression matrix was downloaded using GEOquery, and the probes were annotated to the gene symbol. Those genes that displayed significance in at least one comparison and were present in all groups of our meta-analysis were selected for both datasets. A DGE analysis using the limma package was performed, comparing each injured group against uninjured controls.
Experimental validation (qPCR)
A severe traumatic SCI model in adult female Sprague Dawley rats was employed for RNA isolation and qPCR in silico data validation. The animals were housed at the Animal Experimentation Unit of the Research Institute Príncipe Felipe (Valencia, Spain) under standard conditions. All experimental procedures adhered to the guidelines established by the European Communities Council Directive (86/609/ECC), the Spanish Royal Decree 53/2013, and the Animal Care and Use Committee of the Research Institute Príncipe Felipe (2021/ VSC/PEA/0032).
Rats were subcutaneously pre-medicated with morphine (2.5 mg/kg) and anesthetized with 2% isoflurane in a continuous oxygen flow of 1 L/min. Laminectomy was performed on thoracic vertebrae 8–9 to expose the spinal cord, and severe SCI was induced at the thoracic vertebrae 8 level by contusion, applying a force of 250 kdyn using the Infinite Horizon Impactor, as previously described (13, 94). Post-surgery care included manual bladder drainage twice a day until vesical reflex recovery and subcutaneous administration of 5 mg/kg of Enrofloxacin (Alsir) for 7 days, as well as 0.1 mg/kg of Buprenorphine twice a day for 4 days after each intervention.
At each endpoint: 1 dpi (n = 3), 7 dpi (n = 3), 28 dpi (n = 4), and 56 dpi (n = 4) after injury to represent (T1, T2, T4, and T8, respectively), animals were overdosed of sodium pentobarbital (100 mg/kg) and transcardially perfused with a 0.9% saline solution. The spinal cord tissue was extracted, and the injury epicenter was immediately frozen in liquid nitrogen and stored at – 80 ºC until use. RNA extraction was performed using the TriZol standard method, followed by an additional cleanup step using RNeasy MinElute Cleanup (Qiagen, Germany) to ensure sample quality (A260/280 ≈ 2 and A260/230 ≥ 1.8). Reverse transcription was performed using the high-capacity RNA-to-cDNA™ kit (Applied Biosystems, Massachusetts, USA).
Specific primers (Additional File 20) for each gene of interest were designed using primer-BLAST (NCBI, Maryland, USA) and validated by efficiency curve performance. qPCR was conducted in triplicate using AceQ SYBR qPCR Master Mix (ThermoFisher) in the Light-Cycler 480 detection System (Roche, Basel, Switzerland). Ct data were calculated with the LightCycler 480 relative quantification software (Roche, Basel, Switzerland). GAPDH mRNA levels served as an internal control for normalization.
Analysis of human blood samples
The GSE151371 normalized gene expression matrix was downloaded from GEO, and the expression values were log-transformed. The samples corresponding to the AIS A, AIS D, and the healthy control groups were then selected for further analysis (Additional File 21). DGE analysis was then performed to compare AIS A vs. AIS D with the limma package following the same pipeline as the previous analyses. Finally, significantly altered genes (FDR < 0.1) were selected to compare with the list of rat biomarker genes. The list of intersecting genes was used for the clustering analysis of both biological systems. The healthy controls from the GSE151371 dataset were also included for visualization and clustering analysis.
Meta-SCI app
Meta-SCI app is powered by RStudio Shiny package and deployed on a shinyapps.io server, available at https://metasci-cbl.shinyapps.io/metaSCI. Plots are generated using ggplot, plotly, and heatmaply. All data processing and analysis are carried out using R.