Prediction of Differentially Expressed miRNAs as Potential Biomarkers for Aniridia-Associated Keratopathy based on Expression Prole Analyses

Background: Aniridia is a rare hereditary disorder that affects most structures of the eyes. This autosomal dominant disorder is caused by haploinsuciency of Pax6, a critical gene for proper development of the eye. This study attempted to identify novel diagnostic differentially expressed miRNAs and related mRNAs to develop a deeper understanding of the molecular mechanisms and to provide new ideas for the diagnosis and treatment of aniridia-associated keratopathy (AAK). Methods: The miRNA and mRNA expression data were downloaded from GEO for differential expression analysis. R programs, WGCNA, and miRNA targets were used to identify differentially expressed genes (DEGs). The R package was used to screen candidate miRNAs as potential biomarkers, and predicted targets and DEG intersections were determined. A regulatory network between optimal differentially expressed miRNA and DEGs was then constructed. Function analysis and pathway enrichment of miRNA and mRNA were both performed. In addition, transcription factors (TFs) of differential miRNAs and molecular compounds that may be ecient were predicted. Results: We used three methods to identify DEGs: 509 differential genes were screened by R, 1522 by WGCNA, and 732 by prediction of different miRNA targets. In total, 18 DEGs were found, encompassing 9 upregulated genes and 9 downregulated genes. Eight differentially expressed miRNAs were identied using the R package: ve were upregulated and three were downregulated. Among them, three miRNAs (miR-204-5p, miR-224-5p, and miR-30a-5p) were considered optimal potential biomarkers, and their regulatory network with DEGs was created by Cytoscape. IL-4-mediated signaling events were the most enriched signaling pathways. Based on these DEGs, CHL1 and SOCS3 were most closely associated with clinical characteristics, which related to sex and stage separately. Conclusions: This study identied novel diagnostic differentially expressed miRNAs and related mRNAs by developing a deeper understanding of the molecular mechanisms, based on that we provided new ideas for the diagnosis and treatment of AAK.


Introduction
Aniridia is a rare hereditary disorder that affects most structures of the eyes. This autosomal dominant disorder is caused by haploinsu ciency of Pax6, a critical gene for proper development of the eye [1], usually resulting from mutations in the PAX6 gene or its regulatory components or chromosomal defects [2]. Aniridia is characterized by the partial or near complete absence of the iris, but the disease is not just con ned to the iris; nearly half of aniridia cases have related glaucoma [3]. In addition, aniridia involves other ocular abnormalities; the most common ocular features are nystagmus, foveal hypoplasia, cataract, glaucoma, and aniridia-associated keratopathy (AAK). Of these progressive pathologies, the ocular surface can be impaired particularly severely by AAK during eye development. This can occur at a wide range of ages and is characterized by progressive conjunctivitis with chronic irritation and photophobia.
According to the recommendations of some relevant researchers, AAK subtypes could be considered separate diseases, thereby facilitating treatment decisions and patient strati cation for future clinical studies and trials [4].
Although "aniridia" was originally de ned as a hereditary disease caused by a PAX6 abnormality, noncoding regions of PAX6 are affected too, and hence the de nition of aniridia was expanded. Phenotypes of epithelial, neural, immune, and limbal stem cell status have recently been studied in detail in phenotypic AAK [5]. Some researchers believe that more than 400 unique mutations in the PAX6 gene may lead to a series of clinical phenotypes [6]. As most studies considered mutations in PAX6 as a homogeneous group, other genes may interact with it, or the variation of downstream genes of PAX6 may cause speci c phenotypes [7]. As clinical genetic analysis becomes more complicated and widespread, heterogeneous gene mutations require more detailed attention. Because of the signi cant individual differences in the clinical manifestation, progression, and prognosis of AAK, it would be di cult to develop general guidelines for treatment. Our study is particularly helpful for its diagnosis and differentiation. The purpose of this study was therefore to perform a detailed cytogenetic analysis to explore the genetic variation and its potential association with clinical traits. Since few studies have focused on AAK, we hope that our work will lead to deeper study of the diagnosis, treatment, and prevention of this debilitating ocular surface condition.

Patients and samples
Aligned les and clinical information on GSE137996 [8] and GSE137997 [8] for 20 AAK patients and 20 healthy control subjects were retrieved from the GEO dataset (http://www.ncbi.nlm.nih.gov/geo/). The study was performed on bulbar conjunctival cells that were sampled from subjects for whom microRNA (miRNA), mRNA expression, and clinical information were available.

Differential expression analysis
Using the analysis program R, a differential expression analysis was performed; the expression pro les of 20 patients and 20 normal samples in GSE137996 were compared to identify candidate differentially expressed genes (DEGs). Genes were retained under the rule of a |log2 (fold-change)| > 1 and an adjusted P < 0.05.
Weighted gene co-expression network analysis (WGCNA) for mRNA A gene co-expression network analysis was speci cally performed using mRNA data of the 20 microarray-measured samples from GSE137996 using the R package WGCNA [9]. A hierarchical clustering analysis of AAK and normal samples was performed, based on the expression of AAK, to remove outlier samples. An adjacency matrix was transformed from the correlation matrix using the adjacency function (ai, j = | Cor(Xi, Xj)|β). The t soft threshold power (β) was screened to ensure the construction of scale-free networks, based on the Pearson's correlation coe cient between two groups.
Topology overlap measurement and robust network measurement were calculated in pairs based on the adjacency matrix. β = 6 was selected to construct a scale-free network in this study. Then, the dissimilarity based on topological overlap was used as the input for unsupervised hierarchical clustering using the dynamic tree cutting algorithm. As a result of the TOM-based dissimilarity measure, average linkage hierarchical clustering was implemented, and genes with similar expression modes were classi ed into the same modules by one-step network construction and module detection. The module eigengenes (MEs) represents the rst principal component-related module, which is considered to represent all genes in the module. Eigengenes is performed to identify modules that are signi cantly associated with a disease. The processes of WGCNA were performed using the R program, and two MEs were selected as candidate DGEs [10].

Data processing and analysis of miRNAs
The data expression pro les of AAK were searched in the GEO database, and the dataset GSE137997 was included in our study. The R package (limma) was used to screen out genes between the control and AAK groups according to the criteria P < 0.05 and absolute log fold change greater than 1. Transcription factors (TFs) of differential miRNA and Gene Ontology (GO) annotation were realized by Funrich software (version 3.1.3), which revealed the TF enrichment analysis of differentially expressed miRNAs and biological processes (BPs), cellular components (CCs), and molecular functions (MFs) of the miRNAs separately.
Analyses of miRNA-mRNA targets Investigating the target genes of miRNAs is crucial for identifying their regulatory mechanisms and functions. Here, screened miRNAs were input to search for mRNA targets employing the miRNA-target tool, Funrich. DEGs were determined by the intersection of the results of differential expression analysis, WGCNA, and target mRNAs. Relying on DEGs, the regulatory networks of the miRNA-mRNA pairs were extracted (based on an expression fold change > 1 and a false discovery rate < 0.05) and visualized using Cytoscape software (version 3.7.1).

Analyses of the DEGs
GO annotation and visualization of the DEGs were performed with R packages ("clusterPro ler," "org.Hs.eg.db," "enrichplot," and "ggplot2"). Pathway enrichment analysis was accomplished using the website tool ConsensusPathDB (http://cpdb.molgen.mpg.de), which currently includes 32 public resources for interactions of Homo sapiens.

Expression levels and clinical characters of the DEGs
We determined the expression levels of the DEGs with age, sex, and stage of AAK and screened out genes that related to clinical characters. A perl program was employed to translate the IDs of clinical-related genes into probe names, as an input to CMAP (https://portals.broadinstitute.org/cmap/). The website based on experiments was designed to predict which molecular compounds are likely to act on target genes. The inclusion criteria were the rst ten drugs with a mean value greater than 0.5 and an adjusted P < 0.01.

Sample traits and data
Two mRNA and miRNA expression pro les for conjunctival cells (GSE137996 and GSE137997) abstracted from donors with or without AAK were downloaded from GEO in July 2020. Both of these datasets included original expression pro les derived from microarrays that already had been background-corrected and standardized. We compared 40 samples, including 20 patients and 20 normal controls.

Differential expression analysis
Using the R program, 509 DEGs were screened from 34,730 in total, including 278 upregulated genes and 231 downregulated genes.

AAK-related WGCNA modules and genes
To identify groups of genes with highly similar binding "signatures," we adapted WGCNA to describe the correlation patterns of AAK, as it was one of the best methods for the construction of large networks in an unsupervised manner. The WGCNA package in R was applied to construct a co-expression network using the expression values of all the genes included in the 40 samples from the GSE137996 dataset. No sample was excluded from subsequent analysis.
The scale-free topology network model was built to study gene expression networks. Based on the correlation coe cients for genes in the cohort, the adjacency matrix was transformed from the correlation matrix, with its power value of 7 as the soft threshold (Fig. 1A). The scale independence was 0.90, and the mean connectivity of the co-expressed network was high enough, ensuring a scale-free network (Fig. 1B). Thirteen non-overlapping modules were constructed, and two highly AAK-correlated modules were detected (Fig. 1C). As shown in Fig. 1D, the blue (r = -0.89, P = 0.01) module and the black (r = 0.68, P = 0.01) module were strongly associated with AAK. We obtained 1522 genes through this step, 189 from the black module and 1333 from the blue one. Scatterplots of module membership versus gene signi cance were performed, as core genes were selected with an absolute module membership score of > 0.8 and gene signi cance score of > 0.5.

Analyses of miRNAs
We ltered 2449 miRNAs in 40 samples from the GSE137997 dataset by adapting the R program and retained 8 miRNAs under the rule of a |log2 (fold-change)| > 1 and an adjusted P < 0.05. Five of them were upregulated and three were downregulated. The TF enrichment analysis result is presented in Fig. 2, where the blue bar means the percentage of genes that miRNA enriched in TFs, with the red bar representing the p-value; the top ten TFs with a percentage of enriched genes greater than 10% are shown. An enrichment analysis of the miRNAs based on GO revealed the ten most signi cant functional enrichments in BP, CC, and MF (Fig. 3). Speci cally, genes were mostly enriched in the regulation of nucleobase, nucleoside, nucleotide, and nucleic acid metabolism and transport on BP and almost uniformly distributed in the calcineurin complex, collagen type XIII, nuclear euchromatin, the clathrin coat of the trans-Golgi network vesicle, cell fraction and actomyosin, and the actin part related to CC; as for MF, genes were particularly enriched in GTPase activity, TF activity, ubiquitin-speci c protease activity, and RNA binding. miRNA target prediction Considering interactions that can actually regulate the expression of genes, we predicted target genes based on the similarity between miRNAs and mRNAs. A total of 723 target genes were detected. DEGs were determined as the intersection of differential expression analysis, MEs of WGCNA, and target genes of miRNA, encompassing nine upregulated genes and nine downregulated genes (Table 1).

Regulatory network of the miRNAs and mRNAs
Relying on the variation in DEGs and differential miRNAs, we obtained their relationship in AAK. mRNAs and miRNAs (acting as nodes) and their regulatory relationships (acting as lines) constitute a regulatory network, which was visualized by Cytoscape. Triangles and circles represent miRNAs and mRNAs, respectively. Red means patients had increased expression, and green means reduced, compared to the control group (Fig. 4). Both positive regulation and negative regulation of miRNA and mRNA are shown in this network.

Function annotation and pathway enrichment
An enrichment analysis of the DEGs based on GO is shown as emapplots of BP, CC, and MF (Fig. 5). The result of pathway enrichment revealed that they were signi cantly enriched in genes involved in immunity, lipometabolism, and glycometabolism ( Table 2). The inclusion criterion was a q-value < 0.05.

Gene expression and clinical signi cance
Gene expression levels are shown in Fig. 6. None of the genes showed signi cant differences with age (y = 40), while CHL1 was related to sex and SOCS3 had a correlation with stage. The expression of CHL1 in AAK was lower than that in healthy people and particularly low in female patients. SOCS3 did not involve age or sex but was only associated with stage. It showed higher expression during the mild stage, suggesting that it may become an effective biomarker for early diagnosis. Therefore, we adapted CMAP to predict drug molecules that may target these two genes as acting sites. As shown in Table 3, we provided the cmap name, mean, experiment times, enrichment value, and p-value. Four of these molecular compounds may take effect by increasing gene expression, and the others may play a role by decreasing expression.

Discussion
We compared the expression of mRNAs and miRNAs in 20 AAK patients and 20 healthy subjects and identi ed novel potential biomarkers of this rare and challenging disease. We focused not only on genes that may associate with AAK but also on those related to clinical features of the disease. We hope our study can provide a new method to help evaluate and treat patients.
We applied WGCNA to the analysis of non-tumor disease, identi ed DGEs with the intersection of multiple approaches, and presented their a liation with miRNAs, trying to identify the most quali ed genes to be biomarkers. The regulation of differential miRNAs and DEGs was shown to elucidate their expression level change and the regulatory relationship between them in AAK. Although there were studies showing that AAK deteriorated with age [11], we didn't nd a speci c age with a high incidence of the disease. Based on function analysis and pathway enrichment, we predicted the small molecule active drugs most likely to be helpful and demonstrated their possible effectiveness.
A few highly conserved signaling pathways are vitally important for eye development. Known as a vital tumor suppressor in carcinogenesis, CHL1 works by repressing the PI3K/AKT signaling pathway [12]. As a neurotransmitter, it is involved in cognitive activities [13] and some neurological diseases. It guides neuronal survival and growth in the nervous system and can regulate the regeneration of axons [14]. It is signi cantly lower in female than in male patients and may become a potential biomarker for the diagnosis of female patients. We found that the regulator of CHL1, miR-30a-5p, was decreased in AAK patients, further demonstrating that CHL1 may play a critical role in the process. SOCS family proteins are mainly in charge of excessive cytokine signaling restraint, and because of the ability to regulate certain allergic autoimmune diseases, SOSC3 gets special attention [15]. Experimental results indicate that SOSC3 is a key regulator of physiology, also participating in immune homeostasis. Dysregulation of SOSC3 can cause a variety of diseases, including cancer, autoimmune diseases, and neurodegenerative disorders. A study showed that SOCS3 induced regrowth of retinal axons and the formation of functional synapses in the superior colliculus after optic nerve injury [16]. Although the upregulation of miR-30a-5p was able to decrease the SOCS3 protein level, the promotion of miR-19a-3p [17] or miR-185 [3] was also associated with reduced SOCS3 expression. It was shown that miR-204-5p can inhibit cell proliferation, migration, invasion, and metastasis [18]. Our study found it was downregulated in AAK, which may be related to the process of corneal conjunctivation, in which the conjunctival epithelium invades the cornea and causes the formation of new blood vessels. We found that miR-224-5p, upregulated in AAK, may contribute to understanding the disease mechanism, while other researchers found that it was increased in the pathogenesis of hepatocellular carcinoma [19]. In total, these factors may be optimal diagnostic biomarkers and inform the development of a therapeutic strategy for this disease.
Balancing anti-and pro-angiogenic factors helps maintain the avascular environment of the corneal limbus in AAK. Some researchers believe that the conjunctiva can invade the cornea and strengthen conjunctival cell signaling, which leads to clinical symptoms of neovascularization [20]. The classic symptoms always appear after childhood, including corneal pannus and corneal conjunctivation [21].
Existing clinical treatments depend on the severity of the ocular surface signs; arti cial tears can relieve slight symptoms, while severe cases need amniotic membrane transplantation, keratoplasty, or even limbo-keratoplasty [22]. By screening for genes related to AAK, our study may help with early diagnosis and differential diagnosis to prevent serious consequences and extend the treatment window, working as an accessory examination.
Unfortunately, there were some certain limitations in this study. The sample size was insu cient, the clinical information was inadequate, and there was a lack of datasets for validation. The expression validation of the ve miRNAs was consistent with the present bioinformatics analysis. Additional in vitro and in vivo experiments, such as cell culture and establishment of an animal model, are required to further investigate the potential mechanisms underlying AAK. Furthermore, applying only conjunctival cells was not an overwhelming support, so using these results as diagnostic methods requires further examination. The present study may provide a research basis for the diagnosis and treatment of AAK.

Declarations
The author wanted to thank all the patients who provided samples for scienti c study and all the researchers who collected, settled and uploaded these data. The author also would like to thank the colleagues for their help.

Authors Contributions
Daowei Zhang conceived and designed the experiments, performed the experiments, analyzed the data, wrote the program, prepared gures and tables and authored drafts of the paper.
Shenghai Zhang conceived and designed the experiments, reviewed drafts of the paper and approved the nal draft. With 0.90 was decided as scale independence, we selected power value of 7 as the soft threshold of the adjacency matrix (Fig.1A). In the meanwhile, the mean connectivity of the co-expressed network was high enough to ensure a scale-free network (Fig.1B). The branches of the cluster dendro-gram correspond to the 13 gene modules, each piece of the leaves on the cluster dendrogram corresponding to a different gene module (Fig.1C). Among them, the blue (r = -0.89, P = 0.01) module and the black (r = 0.68, P = 0.01) module were associated most strongly ones with AAK (Fig.1D). With 0.90 was decided as scale independence, we selected power value of 7 as the soft threshold of the adjacency matrix (Fig.1A). In the meanwhile, the mean connectivity of the co-expressed network was high enough to ensure a scale-free network (Fig.1B). The branches of the cluster dendro-gram correspond to the 13 gene modules, each piece of the leaves on the cluster dendrogram corresponding to a different gene module (Fig.1C). Among them, the blue (r = -0.89, P = 0.01) module and the black (r = 0.68, P = 0.01) module were associated most strongly ones with AAK (Fig.1D).     Network of optimal differential miRNAs and DEGs. Triangles represent miRNAs and circles represent mRNAs, red color means the expression level increased while green means expression reduced in AAK patients.

Figure 4
Network of optimal differential miRNAs and DEGs. Triangles represent miRNAs and circles represent mRNAs, red color means the expression level increased while green means expression reduced in AAK patients.   Expression level of CHL1 and SOSC3. CHL1 was related with gender signi cantly (Fig.6A). The expression of whom is lower in female(F) than male(M), with both of them expressing higher in control group(W). SOSC3 was associated with stage of AAK closely (Fig.6B), which expressed higher in the mild stage than in the severe stage.