Transcriptomic Time Series Analysis in Wound Healing: Challenges and Perspectives on Data Interpretation


 Background: Wound transcriptomic analysis can be used to quantify wound healing stages and identify leverage points for wound healing intervention. However, individual gene signatures corresponding to wound healing stages vary from one experiment to another and are highly dependent on both experimental setup and bioinformatics methods. Methods: We develop a systematic approach to informatively compare time series from publicly available wound transcriptomic datasets, including mouse and human wounds, and identify consistent gene expression patterns. Results: We reveal the limitations of gene expression data collection, interpretation, and comparison. For example, the sample rate of wound transcriptomic sample collection must be higher than the rate of changes in the wound healing processes, otherwise, important changes in gene expression may be missed. This may lead to mis finding the most significant genes, as peaks of expression for highly differentially expressed genes are lost. Nevertheless, we derived a short list of genes highly differentially expressed in all datasets under consideration. After clustering and normalization, these genes clearly demonstrate similarly changing dynamics of expression between the wounds and may be used for wound healing stage detection.Conclusions: A list of genes that may be used for transcriptomics-based wound healing stage detection is provided. In addition, we suggest experimental approaches that could help researchers to extract more meaningful results.


28
Wound healing is a dynamic process involving a series of coordinated biological processes 29 partitioned by scientist into four distinct stages [Canedo-Dorantes 2019]: hemostasis, 30 inflammation [Kim, 2008;Krzyszczyk, 2018], proliferation and remodeling [Bainbridge, 2013;31 Pastar, 2014]. Many cell types participate in wound healing [Wilkinson 2020], giving rise to 32 expression of different genes [Brant 2015;Deonarine 2007]. 33 Transcriptome microarrays detect expression of tens of thousands of genes. [Blumenberg 34 2019; Tachibana 2015]. Transcriptomic analysis of cutaneous wound biopsies is particularly 35 promising because the rapidly changing wound environment produces many activation stimuli 36 that induce cellular activation reactions that are not fully understood [Deonarine 2007 wounds. Comparison of transcriptomic data from different wounds was made previously as a 42 meta-analysis of many datasets from different tissue wounds [Sass, 2018]. The researchers 43 identified several groups of genes that changed expression in most wounds, but not in all. 44 In this manuscript, we look for genes showing universal expression patterns in different 45 wounds that could serve as wound stage indicators. We restrict ourselves to comparisons 46 between skin wounds. Corresponding datasets are [GSE460, GSE8056, GSE23006] -human and 47 mice burn wounds and mice surgical wounds. For translational work, we present a method to 48 identify patterns in gene expression that are consistent across varying wound experimental 49 conditions and species. This work assess the potential of transcriptomes for data-based 50 predictive models. 51 To fairly compare across studies, a systematic and consistent approach to identifying 52 differentially expressed genes is applied to all datasets. We identify orthologous genes between 53 mouse and human, only genes with orthologs in all three datasets are considered. We come out 54 with a short list of 58 genes highly differentially expressed in all 3 datasets under consideration. 55 After clustering and normalization, these genes clearly demonstrate similarly changing dynamics 56 of expression between the wounds. 57 Additionally, we make note of methodological problems in wound transcriptomic analysis 58 and emphasize which shortcomings should be resolved in future experiments to improve the 59 power of this experimental tool for wound investigation. 60 61 Data preparation 62 Consider 3 transcriptomic datasets from wounds: GSE8056 contains human skin burn wound. 63 GSE460 contains mouse skin burn wound. GSE23006 [Chen 2010] is mouse surgical 1mm wound 64 dataset containing both skin and tongue, of which we consider only the skin data. 65

66
To arrive at a list of genetic biomarkers associated with each wound healing stage we search for 67 genes that are reliably highly differentially expressed in all wounds. The following requirements 68 are imposed in our data pre-processing: 69 • Experimental error must be minimized. That is, we rely on genes with low biological 70 dispersion. 71 • A link between mouse and human genes must be found. 72 • To make a direct comparison, each gene must be presented only once in each dataset. 73 All replicates and repeated measurements must be analyzed, and one replicate or mean 74 value accepted. 75 • Only genes that are present in all datasets may be compared. 76

77
To compare gene expression between wounds, we apply several filtering procedures described 78 in "Methods". The number of genes left in each dataset at each filtering step is presented in

Intersections of filtered datasets 86
The 3 datasets contain 7937 (GSE23006), 4441 (GSE460) and 9249 (GSE8056) genes after filtering. 87 The intersections contain even less genes, see Table 2. 88 89  To account for variations in baseline expressions across datasets. To find common and unique 104 features of the wounds considered, we try to differentiate genes that are commonly differentially 105 expressed in all wounds from those with high variation across wounds. In other words, we ask 106 how many genes have approximately the same value of fold change in all wounds. 107 Figure 1 shows the distribution of genes by the value of fold change ∆ . As seen from 108  Spearman rank correlation coefficients calculated in Table 1.  The distribution of ∆ is similar for all 3 datasets (Figure 1), however, the correlation between 129 datasets is weak (Table 3) wounds, similarly to mouse surgical wound, this cannot be seen in the burn wounds. 143 The same is observed for many other genes; in this work they are filtered out. 144 145

Figure 3. (a) Plots of Il1a expression change vs time: comparison between wound datasets. (b) 146
The same gene dynamics along with possible gene expression plots (fake plots) corresponding 147 to data but not captured due to lack of time points (dashed line). 148 149 Commonly highly expressed genes 150 We search the genes having high ∆ in each dataset. As described above, there are 1622 151 genes that appear in all three datasets after filtration. As there is no natural cutoff for ∆ values 152 ( Figure 1) we select the first 300 genes with the highest ∆ from each dataset. The intersection 153 of 3 subsets of 300 genes is 58 genes only. The plots of these 58 genes' dynamics in 3 wound 154 datasets are presented in Figure 4. 155 The genes were divided into 5 clusters (see Figure 4) [Raudys, 1991]. However, following [Sass 204 2017], we compared existing wound data, but focused on finding genes with similar dynamics. 205

. Gene expression dynamics of commonly highly expressed genes in the 3 datasets. Each 176 row represents the same dataset and each column represents the same cluster of genes listed 177 in the legend under each column. Bold blue line in each plot corresponds to the mean value of 178 gene intensity within each cluster (calculated for each dataset separately). Vertical axis
We found 5 clusters of genes whose dynamics were similar in 3 available datasets. These are 206 strong candidates to universal biomarkers of known wound healing stages. 207 During our comparison analysis we found several methodological problems that hopefully 208 will be solved in the future. 209 Wound healing is a dynamical process. Experimental data collection, such as gene 210 expression intensity, should be done with appropriate frequency (sampling rate). In signal 211 processing the theorem of Nyquist-Shannon limits the frequency of discrete sampling to obtain 212 satisfactory information from a continuous signal [Dieter, 1999]. The sampling rate frequency 213 must be twice larger than the highest frequency of the signal. In many existing wound 214 transcriptomic datasets, the sampling rate doesn't satisfy this condition (see Figure 3). It is traditional in this field to assume high fold change in gene expression as high 221 significance of the gene. However low sample rate may lead to missing gene expression peak 222 (burn wounds in Fig 3) and misinterpretation of its significance. On the other hand, in the fold-223 change values of genes there is no "bi-modal" distribution (highly changing and no-changing 224 dynamics) (Figure 1). Whether or not two-fold change or three-fold change expression is 225 considered significant is an arbitrary choice of the researcher. If high fold change is equivalent to 226 the "significance", then the significance of genes decreases gradually, there are no "significant" 227 and "non-significant" genes. Probably, the choice of the first N genes with the highest fold-228 change in expression is more reasonable option to select the most significant genes. 229 The development of transcriptome technology is important for understanding wound 230 healing and for the improvement of intelligent methods of wound treatment. However, much 231 work is still to be done. For instance, transcriptomic data are usually not accompanied by 232 additional data, such as histology/morphology of the wound tissue. In fact, transcriptomic signals 233 are taken from many cells at once, and any information about the cell types in the wound at each 234 time would give much more insight into understanding of gene expression in wound healing. 235 In this work we performed wound transcriptomic dataset comparisons without any 236 reference to cells or biological processes and found a set of 58 genes that could serve as 237 indicators of wound healing stage. We used a filtering approach which allowed for comparisons 238 between species and was agnostic to experimental set up. The results were a set of 58 genes 239 which may be used as a basis for future analysis. Potentially this set of biomarkers, or an 240 improved set pending further data collection and analysis, can be used in biomarkers toolkit for 241 gene-expression based wound stage detection. 242 243

244
In each dataset the intensity of gene expression is measured at multiple time points. In GSE23006 245 the intensities are represented as log2(intensity) while in GSE460 and GSE8056 datasets original 246 intensities are shown. To make datasets comparable, we return GSE23006 to initial intensity 247 before filtering by taking power of 2 of the numbers represented in the GSE23006 array. 248

Orthologous genes between mouse and human 249
To compare particular genes between human and mouse, we find orthologs -homologous genes 250 between species. All orthologous genes in mouse and human were matched by gene symbol to 251 their homologene ID in the Human and Mouse Homology Class report. Now we compute the threshold for the maximum relative error, which will determine which data 281 is kept and which is discarded based on an acceptable value for relative error. The threshold for 282 each time point is chosen to be as follows 283 where we take four standard deviations above the mean which is inclusive of 99.98% percent of 286 data assuming a normal distribution (Note that three standard deviations is inclusive of 99.72%). 287 We find the maximum relative error across the three samples, where the new matrix = 288 ( 1 , 2 , 3 ) ∈ × . The maximum is taken element wise across the three matrices, that is 289 Some genes are mentioned in the dataset several times (several repetitions or several 300 transcripts). In addition, for some genes the repeated rows contain too different dynamics. To 301 leave "one gene -one row" we make filtering based on correlation between repeated gene rows. In this work we use the threshold = 0.9. If the condition (*) is satisfied, we take one of highly 311 correlated gene intensity rows ⃗ , ⃗ (we can take the mean between the intensities of two 312 highly correlated genes). If the condition (*) is not satisfied, the gene is not included in further 313 analysis. 314

315
Several genes corresponding to the same homologene number (4) 316 For some genes it may happen that the same homologene number corresponds to two genes. 317 For example, homologene number corresponding to gene X in mouse corresponds to genes X1 318 and X2 in human. In this case we check if there is high correlation between homologenes X1 and 319 X2 ( ( ⃗ 1 , ⃗ 2 ) > ) and take one of them. Otherwise, these homologenes are not included 320 in further analysis. 321

322
Ethics approval and consent to participate 323 All procedures were performed in accordance with relevant guidelines.