Keloid is a clinical disease that is difficult to cure. Although keloid is a benign mass, it often grows beyond the boundary of the injury and invades the surrounding normal skin with the appearance of a crab claw, which seriously affects the appearance of the healed skin9. Keloids are mostly found in populations aged 10–30 years, especially in African, Hispanic, or Asian ethnic groups. Keloids often occur on the chest, earlobes, shoulders, and back9. Despite the existence of different treatment methods such as radiotherapy, hormone therapy, and surgical resection, keloid has a relatively high recurrence rate10. Many scholars believe that fibroblasts are one of the main participants in the occurrence and development of keloid. After skin injury, a complex signal regulatory network is activated to control the proliferation, migration, and secretion of fibroblasts. Therefore, the biological behaviour of fibroblasts is a hot spot in the study of the mechanism of keloid formation11.
Gene microarray and sequencing are important means to study gene expression profiles and transcription levels and are widely used in fields such as regenerative medicine12 and in the study of diseases9 and tumours7. In recent years, the research focus on transcriptomes has gradually shifted from protein-coding genes to the epigenetic field involving non-coding RNAs (ncRNAs). ncRNAs are a class of RNAs that are not directly translated into polypeptides and were once considered ineffective components in the process of gene expression and transcription13. With the development of epigenetics and gene and proteomic research methods in recent years, ncRNAs have gradually been discovered to be related to gene expression. ncRNAs not only regulate the process of gene transcription, post-transcriptional modification, and translation but also form a regulatory network of competing endogenous RNA (ceRNA), thereby affecting the biological functions of cells, tissues, and organisms14. Based on the length of the RNA molecules, ncRNAs are divided into small non-coding RNAs (sncRNAs), lncRNAs, and circular RNAs (circRNAs). At present, the functions of most lncRNAs remain unclear. Wang et al. investigated the expression and effect of lncRNA-H19 in keloid fibroblasts. H19 was found to regulate the vitality and apoptosis of fibroblasts through the action of the miR-29a/COL1A1 axis13. Therefore, we used existing gene information on keloid fibroblasts to identify new lncRNA molecules and enable further exploration of epigenetic regulatory mechanisms in keloid.
A large amount of sample data can be obtained through big network databases such as the GEO database or The Cancer Genome Atlas (TCGA). However, previous microarray data can often be used only for transcriptome or microRNA analysis due to limitations of the number and type of probes. When conducting analyses on non-coding RNAs, a new round of sample collection and sequencing analysis is often required. In addition, some microarrays, such as the Affymetrix Human Genome U133 Plus 2.0, allow the acquisition of certain ncRNA data. In these cases, the probe annotation must be updated, and secondary data mining is required. Based on the method proposed by Yang et al.3, we combined R language with network databases and selected the *.CEL file of GSE7890 (raw Affymetrix data) for secondary analysis. In the present study, we found that a total of 155 mRNAs exhibited changes in the keloid group compared to the control group. Among these mRNAs, the expression of 31 mRNAs was upregulated, while the expression of 124 mRNAs was downregulated. The GSE7890 dataset has also been analysed by other researchers. Wang et al. reported in 2017 that the expression of 67 genes was changed in the keloid group compared with the normal group (i.e., the expression of 15 genes was upregulated, while the expression of 52 genes was downregulated)5. However, Zhang et al. reported in 2019 that the expression of 832 genes was changed in the keloid group (including 269 upregulated genes and 563 downregulated gene)6. We believe that the discrepancy between our results and the results of previous studies is due to differences in algorithms and inclusion criteria. By analysing the KEGG enrichment data, we found significant gene enrichment in the p53, TNF, and cell cycle signalling pathways, which was different from the findings of the above articles5–6. In addition, we identified eight lncRNAs that were differentially expressed in keloid, including RP11-420A23.1, RP11-522B15.3, RP11-706J10.1, LINC00511, LINC00327, HOXB-AS3, RP11-385N17.1, and RP3-428L16.2. By comparing the expression of DElncRNAs between keloid and normal skin fibroblasts, we found that all DElncRNAs except for RP11-385N17.1 had increased expression in the keloid group compared to the control group. The differences in LINC00511 and RP11-706J10.1 expression were statistically significant. However, only the change in rp11-706j10.1 was consistent with the trend obtained from microarray mining. The expression trend and effect of rp11-706j10.1 still require confirmation with more specimens and animal experiments. Although differences were found between the qPCR and the microarray results, these differences may be related to the choice of threshold during microarray analysis and the small number of clinical samples. Moreover, the differences also indicated the possibility of further lncRNA mining through a combination of secondary analysis of the microarray data in open databases and bioinformatic methods to a certain extent.
Given the constantly updated database information, the old microarray data can be transformed into a powerful tool using bioinformatic methods. More information can be extensively mined from these old microarray data, which expands their use. The data included in the present study were obtained from few sources, and we can try to include the same type of data from multiple sources. These data can be analysed under the premise of satisfying normalization analysis, which would greatly reduce sample consumption, manpower, and time and money investments and quickly yield the required information. However, the present study still had limitations: First, microarrays have a relatively small data capacity, and the amount of data obtained was related to the type and number of probes designed. Therefore, only a small amount of data could be mined, which was not comparable with the amount of data obtained using other methods such as whole transcriptome sequencing. The results obtained can only be used as a reference. Second, due to the uncertainty of simple bioinformatic analysis, the results of secondary mining needed to be combined with more clinical samples and animal experiments, which would allow a better verification of the accuracy of the results and the clarification of the specific biological effects of the predicted molecules.
In summary, the present study achieved in-depth mining of lncRNA information from GEO microarray data using bioinformatic methods and identified the potential epigenetic regulatory mechanisms affecting keloid formation using existing databases.