A Study on the Expression of mRNAs and lncRNAs in Keloid Fibroblasts Based on GEO Microarray Data Mining

Objective The purpose of this study was to nd the coding RNA (messenger RNA, mRNA) and long non-coding RNA (lncRNA) expressed in keloid through the analysis of GEO microarray chip of keloid broblasts. Method: GEO database GSE7890 database was downloaded with selection of keloids and normal scar group data. The data were analyzed by R language combined with online database. The log2FC > 1, p < 0.01 was chosen as screening criteria, and the differentially expressed mRNAs were screened for GO and KEGG function analysis. Results were including (Rp11-420a23.1, and and 5 down-regulated (LINC00511, LINC00327, Rp11-385n17.1 and Rp3-428l16.2). qPCR analysis of in keloid broblasts showed that the expression of all DElncRNAs except for RP11-385N17.1 was increased in the keloid group compared to the control group. Moreover, the differences in LINC00511 and RP11-706J10.1 were statistically signicant.


Conclusion
The non-coding RNA information of GEO chip data can be deeply mined through bioinformatics, and the potential epigenomic mechanism affecting keloid formation can be found from the existing database.

Introduction:
Keloid is a type of pathological scar, and the pathogenesis involves abnormal hyperplasia of dermal brous tissue, which leads to excessive collagen deposition in the process of wound healing 1 . Although a large number of studies have investigated the mechanism of keloid occurrence and development and treatments for keloid, the outcomes of keloid treatment remain unsatisfatory 2 . Gene microarray is a powerful tool for studying gene expression pro les. However, due to limitations of the production process and algorithm of early gene microarrays, the required data cannot be obtained completely 3 . With the rise of bioinformatic analytical methods, many scholars have attempted to mine new transcriptome information from old microarray or sequencing data through improved algorithms, providing novel research ideas for clinical and scienti c research 3-7 . This study performed a secondary analysis on microarray data related to keloid formation downloaded from the Gene Expression Omnibus (GEO), an open source database of the National Center for Biotechnology Information (NCBI), using a combination of R language and bioinformatic analytical technology in the network database. The aim was to identify the coding RNA (messenger RNA, mRNA) and long non-coding RNA (lncRNA) expressed in keloid and to predict their expression status. First, this study veri ed the possibility of lncRNA retrieval by conducting a secondary analysis on microarray data with bioinformatic methods. Second, this study identi ed transcription information related to the pathophysiological process of keloid from these data, providing a novel research method for better exploring the pathological mechanism of and treatments for keloid.

Acquisition of GEO microarray results
The microarray dataset GSE7890 was obtained from the NCBI GEO database (http://www.ncbi.nlm.nih.gov/geo). This dataset mainly includes a keloid broblast group without prednisone treatment (n = 5), a normal scar broblast group without prednisone treatment (n = 5), a keloid broblast group after prednisone treatment (n = 5), and a normal scar broblast group after prednisone treatment (n = 4) 4 . Based on the needs of the experiment, we downloaded the raw data for the keloid and the normal scar groups without prednisone treatment (including the Affymetrix CEL le and the probe annotation le) and performed a further analysis. The platform of the microarray was GPL570 [Affymetrix Human Genome U133 Plus 2.0 (Affymetrix, USA)].

Processing and analysis of the keloid microarray data
After the data were successfully downloaded, we adopted the method reported by Yang et al. 3 . Namely, the custom chip description les (CDFs) for GENCODE genes were download from the BRAINARRAY website (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp), and the data were subjected to preprocessing (including background correction and normalization) and log2 transformation using Affymetrix Power Tools (APT). The gene-level probe sets were mapped to the human GENCODE annotation (version 28) using custom Perl scripts. Only the RNAs in the GENCODE database whose probe sets were annotated as "LNCRNA" and "PROTEIN-CODING RNA" were retained, while all other genes were ltered out. The validity of data normalization was con rmed by a straight line in the log2PM box plot. Differentially expressed genes were de ned as genes with |logFC|>1 and an adjusted p < 0.01, including mRNA and lncRNA. Thus, the differentially expressed mRNA (DEmRNA) and lncRNA (DElncRNA) data were obtained.

Gene Ontology (GO)/Kyoto Encyclopedia of Genes and
Genomes (KEGG) functional enrichment analysis of DEmRNA DEmRNAs were subjected to GO and KEGG functional enrichment analyses using the DAVID database (Annotation, Visualization and Integrated Discovery, http://david.ncifcrf.gov/) 8 . The false discovery rate (FDR) was used as the inspection index. An FDR < 0.05 was considered statistically signi cant. GO and KEGG enrichment charts were created.
2.4 Quantitative polymerase chain reaction (qPCR) analysis of changes in DElncRNA expression in keloid and normal skin broblasts Pathological tissues obtained from keloid patients and normal skin obtained from patients with a nonpathological scar constitution during initial surgery were randomly selected (n = 3). The entire process was in line with the ethical standards of the First A liated Hospital of Sun Yat-Sen Medical University, and informed consent was obtained from all patients. The detailed treatment method of the specimens is described in previous literature published by our research group 1 .
qPCR was performed to examine DElncRNA expression in the two groups of broblasts, and the method is described in detail in previous literature 1 . Brie y, the two groups of broblasts were digested and resuspended, and total RNA was extracted from these broblasts using TRIzol. The concentration and purity of the RNAs were examined using a NanoDrop 2000, and the nal concentration of the RNAs was adjusted to 200 ng/µl. RNA samples (1 µg) were reverse transcribed using the RevertAid First Strand cDNA Synthesis Kit (Thermo, Massachusetts, USA). Appropriate amounts of cDNAs were then ampli ed in a uorescent quantitative PCR instrument (StepOnePlus, Thermo, Massachusetts, USA) using FastStart Universal SYBR Green Master Mix (Roche, Basel, Switzerland). The speci c steps were performed in accordance with the manufacturer's instructions. Each sample was subjected to three experimental repeats. GAPDH expression was used to normalize the expression of mRNAs. Information for the genes tested and their primers is shown in Supplemental Table 1. The relative expression levels of the genes were calculated using the 2 −ΔΔCt method.
The data were analysed using SPSS V. 15 software. Comparisons between the groups were performed using the t test. The signi cance level was set to 0.05. A p value less than 0.05 indicated that a difference was statistically signi cant.

GO and KEGG enrichment analyses of DEmRNAs
DEmRNAs were subjected to GO enrichment analysis ( Fig. 2A). In the biological process subclass, DEmRNAs were found to be mainly enriched in the "single organization process" category ( Fig. 2B). In the cell component subclass, DEmRNAs were mainly enriched in the "intracellular", "intracellular parts", "intracellular organelle", "organelle", "membrane-bound organelle", and "intracellular membrane-bound organelle" categories (Fig. 2C). In the molecular function subclass, DEmRNAs were mainly enriched in the "binding" category (Fig. 2D).
The KEGG enrichment analysis showed that the DEmRNAs were mainly enriched in the p53 signalling pathway, sphingolipid metabolism, protein processing in the endoplasmic reticulum, the tumour necrosis factor (TNF) signalling pathway, the cell cycle, and neuroactive ligand-receptor interaction (Supplemental Table 3).

qPCR examination of DElncRNA expression in keloid broblasts
qPCR analysis of DElncRNAs in keloid broblasts showed that the expression of all DElncRNAs except for RP11-385N17.1 was increased in the keloid group compared to the control group. Moreover, the differences in LINC00511 and RP11-706J10.1 were statistically signi cant (Fig. 4).

Discussion:
Keloid is a clinical disease that is di cult to cure. Although keloid is a benign mass, it often grows beyond the boundary of the injury and invades the surrounding normal skin with the appearance of a crab claw, which seriously affects the appearance of the healed skin 9 . Keloids are mostly found in populations aged 10-30 years, especially in African, Hispanic, or Asian ethnic groups. Keloids often occur on the chest, earlobes, shoulders, and back 9 . Despite the existence of different treatment methods such as radiotherapy, hormone therapy, and surgical resection, keloid has a relatively high recurrence rate 10 . Many scholars believe that broblasts are one of the main participants in the occurrence and development of keloid. After skin injury, a complex signal regulatory network is activated to control the proliferation, migration, and secretion of broblasts. Therefore, the biological behaviour of broblasts is a hot spot in the study of the mechanism of keloid formation 11 .
Gene microarray and sequencing are important means to study gene expression pro les and transcription levels and are widely used in elds such as regenerative medicine 12 and in the study of diseases 9 and tumours 7 . In recent years, the research focus on transcriptomes has gradually shifted from protein-coding genes to the epigenetic eld involving non-coding RNAs (ncRNAs). ncRNAs are a class of RNAs that are not directly translated into polypeptides and were once considered ineffective components in the process of gene expression and transcription 13  A large amount of sample data can be obtained through big network databases such as the GEO database or The Cancer Genome Atlas (TCGA). However, previous microarray data can often be used only for transcriptome or microRNA analysis due to limitations of the number and type of probes. When conducting analyses on non-coding RNAs, a new round of sample collection and sequencing analysis is often required. In addition, some microarrays, such as the Affymetrix Human Genome U133 Plus 2.0, allow the acquisition of certain ncRNA data. In these cases, the probe annotation must be updated, and secondary data mining is required. Based on the method proposed by Yang et al. 3 , we combined R language with network databases and selected the *.CEL le of GSE7890 (raw Affymetrix data) for secondary analysis. In the present study, we found that a total of 155 mRNAs exhibited changes in the keloid group compared to the control group. Among these mRNAs, the expression of 31 mRNAs was and 563 downregulated gene) 6 . We believe that the discrepancy between our results and the results of previous studies is due to differences in algorithms and inclusion criteria. By analysing the KEGG enrichment data, we found signi cant gene enrichment in the p53, TNF, and cell cycle signalling pathways, which was different from the ndings of the above articles 5-6 . In addition, we identi ed eight lncRNAs that were differentially expressed in keloid, including RP11-420A23.1, RP11-522B15.3, RP11-706J10.1, LINC00511, LINC00327, HOXB-AS3, RP11-385N17.1, and RP3-428L16.2. By comparing the expression of DElncRNAs between keloid and normal skin broblasts, we found that all DElncRNAs except for RP11-385N17.1 had increased expression in the keloid group compared to the control group. The differences in LINC00511 and RP11-706J10.1 expression were statistically signi cant. However, only the change in rp11-706j10.1 was consistent with the trend obtained from microarray mining. The expression trend and effect of rp11-706j10.1 still require con rmation with more specimens and animal experiments. Although differences were found between the qPCR and the microarray results, these differences may be related to the choice of threshold during microarray analysis and the small number of clinical samples. Moreover, the differences also indicated the possibility of further lncRNA mining through a combination of secondary analysis of the microarray data in open databases and bioinformatic methods to a certain extent.
Given the constantly updated database information, the old microarray data can be transformed into a powerful tool using bioinformatic methods. More information can be extensively mined from these old microarray data, which expands their use. The data included in the present study were obtained from few sources, and we can try to include the same type of data from multiple sources. These data can be analysed under the premise of satisfying normalization analysis, which would greatly reduce sample consumption, manpower, and time and money investments and quickly yield the required information. However, the present study still had limitations: First, microarrays have a relatively small data capacity, and the amount of data obtained was related to the type and number of probes designed. Therefore, only a small amount of data could be mined, which was not comparable with the amount of data obtained using other methods such as whole transcriptome sequencing. The results obtained can only be used as a reference. Second, due to the uncertainty of simple bioinformatic analysis, the results of secondary mining needed to be combined with more clinical samples and animal experiments, which would allow a better veri cation of the accuracy of the results and the clari cation of the speci c biological effects of the predicted molecules.
In summary, the present study achieved in-depth mining of lncRNA information from GEO microarray data using bioinformatic methods and identi ed the potential epigenetic regulatory mechanisms affecting keloid formation using existing databases.

Declarations
Acknowledgement: This research was supported by the Natural Science Foundation of Guangdong Province (2020A151501159) and China postdoctoral science foundation (2020M672990).
Con ict of interest: The authors have no con icts of interest to declare.
Contributions ZL and SX downloaded the data and analysed them. ZL and ZL interpreted the experimental data. YD and YZ helped draw the gures. BH was responsible for the statistical analyses. YX and XL designed the research. ZZ edited the manuscript. YX and XL revised the manuscript. All authors read and approved the nal manuscript.

Consent to Participate:
This study conformed to the guidelines established by the Ethics Committee of the First A liated Hospital of Sun Yat-sen University, and written informed consent was obtained from all included patients.

Consent to Publish
The Authors con rm that the work described has not been published before; that it is not under consideration for publication elsewhere; that its publication has been approved by all co-authors, if any; that its publication has been approved by the responsible authorities at the institution where the work is carried out. Charts of the normalization analysis of the raw data for the keloid and control groups and a volcano plot and heat map of differentially expressed mRNAs. A) Normalization chart of the raw data; B) Volcano plot of the differentially expressed mRNAs between the two groups; C) Heat map of the differentially expressed mRNAs between the two groups. In the grouping bar, blue-green represents the keloid group, and pink represents the normal control group. In the heat map, blue represents genes with downregulated expression, while red represents genes with upregulated expression. Darker colour corresponds to a greater difference in gene expression.  Heat map of the differentially expressed lncRNAs between the keloid group and the control group. In the grouping bar, blue-green represents the keloid group, and pink represents the normal control group. In the heat map, blue represents genes with downregulated expression, while red represents genes with upregulated expression. Darker colour corresponds to a greater difference in gene expression.