Differential Analysis on the Granulosa Cells Microarray Data in Identifying HA and NA PCOS Specific Genes
The combined flowchart for first part of our study was illustrated (Fig. 1a). Co-occurrence analysis based on the abstract of PubMed published papers was performed to stat the correlation between PCOS and four subtypes of ovarian cells including granulosa cells, cumulus cells, theca cells and stromal cells. Granulosa cells were with the most text-mining correlation within the four cell subtypes with PCOS (Fig S2a). Microarray analysis was then performed on granulosa cells from 12 individuals. Clinic information illustrated these individuals were within three groups: normal individuals, HA PCOS individuals and NA PCOS individuals (Table S1). Groups were separated based on clinical symptoms and testosterone level, and afterwards granulosa cells in all groups were collected and assayed. Pearson’s correlation analysis for all genes based on the expression value of the 12 samples showed high quality for our microarray data (Fig S2b). We further validated the precision of the microarray result using quantitative real-time PCR (RT-qPCR) method to check the 14 top genes in microarray data and found the RT-qPCR result was in concordance with microarray data and some published work (Fig. 1c). Box Whisker plot, matrix plot, principal component analysis (PCA) and sample clustering analysis further confirmed the quality of our microarray data(Fig S1a-d).
The microarray data was then preprocessed and normalized. Differential analysis was afterwards performed and showed 615 HA PCOS, NA PCOS or NM differentiated genes including a series of known PCOS and feminine tumor markers (Fig. 1b). These results confirmed the clinical research value for our microarray data. We further filtered the genes expressed highest in HA PCOS or NA PCOS, and with statistical difference with the other 2 groups, and these genes were named HA or NA PCOS specific genes. On this filtering, 130 NA PCOS specific genes and 43 HA PCOS specific genes were identified. Functional analysis was afterwards further applied on the selected specific genes to deeper filter NA PCOS and HA PCOS potential marker genes.
Functional Analysis On Ha Pcos Specific Genes
DAVID Gene Ontology (GO) analysis was performed on the 43 HA PCOS specific genes (Fig. 1d)29. Our most interesting term were the terms associated with glucose metabolism according to the close interaction between PCOS and insulin. The glucose metabolism terms possessed approximately 25% of all terms (Fig S2c). The DAVID KEGG Pathway (KEGG) results for HA PCOS were not showing patterns for our interest. Thus for the next step, genes on these glucose metabolism biological process (BP) terms were extracted, and the number of terms which these genes appeared were counted (Fig. 1e). CASR and SOX4 were appearing on 2 terms and then selected. STRING ppi analysis were afterwards performed on CASR and SOX4 using multi-protein function and pressing “+” once to display all the inter-proteins between CASR and SOX4 (Fig S2d). CASR possessed more ppi interactions than SOX4. Meanwhile, co-occurrence analysis showed that CASR was co-occurred with PCOS, HA PCOS, NA PCOS and androgen receptor, while SOX4 was merely co-occurred with androgen receptor (Fig. 1f, Fig S2e). According to these results, SOX4 was consequentially removed, and CASR was selected as new HA PCOS functional marker.
Functional Analysis On Na Pcos Specific Genes
DAVID GO analysis was performed on the 130 NA PCOS specific genes29. BP analysis results showed the majority terms within the top 20 terms were immunity terms or immunity related terms (Fig. 2a,b). Meanwhile, the majority terms within all BP terms were immunity or immunity related terms (Fig S3a,b). The KEGG pathway analysis results showed the majority pathways within the top 20 pathways were immunity pathways or immunity related pathways (Fig S3c,d). Meanwhile, the majority pathways within all KEGG pathways were immunity pathways or immunity related pathways (Fig S3e,f). These results indicated the close relationship between NA PCOS and immunity. For this reason, NA PCOS functional markers should also be immunity or immunity related genes. We extracted 100 genes which appeared on immunity or immunity related terms/pathways on all 4 conditions (BP top 20, BP all, KEGG top 20, KEGG all).
For further filtering, appearance frequency of the potential marker genes on all BP terms were counted (Fig S4a). The 100 genes with more than 5 frequency were reserved. Subsequent STRING protein-protein interaction (ppi) analysis on the 100 genes using experimental and database links found 80 of the 100 genes were connected together (Fig S4b). The same procedure was also performed on top 20 BP terms, and the 86 genes out of 100 genes were with frequency > 2 and connected together (Fig S4c,d). 55 genes both appeared within the all BP results and top 20 BP results (Fig. 2c). And thus the 55 genes were regarded as potential marker genes. Further co-occurrence analysis with the word “PCOS” reserved 26 genes out of the 55 genes with co-occurrence number > 0 (Fig. 2d). Afterwards, we selected the GO BP terms which at least appeared 2 times within the 26 genes and performed co-occurrence analysis with the word “PCOS” on these terms. The term “cytokine” was mostly co-occurred (Fig. 2e). The 7 genes on the term “cytokine” out of the 26 genes were selected (Fig. 2f). Within all “cytokine” terms, the “response to cytokine” term was more related to immunity than the other terms. For this reason, the genes on this term, IL6R and CD274, were selected as new NA PCOS functional markers.
Further Validation For New Na Pcos And Ha Pcos Markers
Within new NA PCOS markers, IL6R (interleukin 6 receptor) belongs to a subunit of the interleukin 6 (IL6) receptor complex. As an IL6R ligand, IL6 is a potent pleiotropic cytokine that regulates cell growth and differentiation and plays important role in immune response.IL6R has been studied to be correlated with PCOS. The research found between PCOS patients and normal individuals, the most observed inflammation status of IL6R has been resulted from relative obesity or insulin resistance, and meanwhile not independent character of PCOS35,36. The IL6R difference in NA and HA PCOS has not been reported. CD274 (CD274 molecule), also named PDL1, encodes an immune inhibitory receptor ligand which was involved in the immune escape for tumor. The expression of CD274 in tumor cells is regarded as prognostic in many types of human malignancies such as colon cancer and ovarian cancer37–39. However, relationship of CD274 with PCOS has not yet been reported. Meanwhile, for new HA PCOS markers, CASR (calcium sensing receptor) is a plasma membrane G protein-coupled receptor that senses small changes in circulating calcium concentration. Previous studies have suggested that the pathological mechanism of insulin resistance in PCOS is related to calcium homeostasis, and CASR as an important calcium regulator may play an important role in PCOS pathogenesis40–42. Conclusively, these studies indicated that IL6R, CD274 and CASR might engage in important biological processes in NA PCOS or HA PCOS.
RT-qPCR was then applied to further confirm the expression of IL6R, CD274 and CASR in human granulosa cells. Groups were set as HA PCOS group and NA PCOS group. Notably, RT-qPCR showed that the expression level of IL6R and CD274 were significantly higher in NA PCOS than HA PCOS (Fig. 3a), and the result was in accordance with our former analysis that IL6R and CD274 were new NA PCOS markers. Meanwhile, RT-qPCR also illustrated that the expression level of CASR was higher in HA PCOS than NA PCOS (Fig. 3a), and it is also consistent with the result that CASR was new HA PCOS markers.
The protein level of NA PCOS specific genes IL6R and CD274 was further measured by Western blotting. Groups were set as HA PCOS group and NA PCOS group. Western blotting results further showed IL6R and CD274 were expressed on significantly higher protein level in NA PCOS than HA PCOS (Fig. 3b) and the similar results were also illustrated in level of band intensity (Fig. 3c). Combined with the experimental result of RT-qPCR and Western blotting, IL6R and CD274 were validated to be NA PCOS specific genes and new markers, and meanwhile CASR was validated to be HA PCOS specific genes and new marker.
Identifying Classification Markers For Ha Pcos And Na Pcos
Which markers were capable of classifying unassorted PCOS samples into HA and NA PCOS samples was of high clinical importance. We applied a random method on classification markers identification (refer to Methods, Fig. 4a). Firstly, genes were divided into HA PCOS, NA PCOS and NM groups according to expression value in each group. Secondly, 50 million cycles of calculations were performed on the random set of 10–50 genes within each group. The mean ranking of each group for expression was used to classify samples. If the calculated sample groups were identical with original groups, the random set of genes were reserved. For instance, 30 NA PCOS genes were randomly picked, and the mean ranking of the 30 genes on one sample was 17,000 of the 20,000 genes, and was higher than HA PCOS (11,000) and NM (9,000) and thus this sample was classified as NA PCOS. If this sample was also NA PCOS from clinical information, and if the same was true for all samples, this random set of genes might be classification markers.
Subsequently, the 578 reserved sets of genes were further filtered using most highly confident downloaded transciptome data from GEO datasets GSE34526, GSE102293 and GSE98595. Though the 3 datasets were unclassified, we could still apply the known PCOS and NM groups for these datasets, with NA PCOS or HA PCOS comparing PCOS, and NM comparing NM. The set of genes with highest matching rate were selected and regarded as classification markers (Fig. 4b). The matching rate for the classification markers on our microarray data and downloaded datasets was also illustrated (Fig. 4c). Afterwards, the classification markers were applied on the downloaded datasets and classified the PCOS samples into HA and NA PCOS samples. These HA and NA PCOS samples were merged into our microarray data and the heatmap of combined datasets were visualized (Fig S5a). HA or NA PCOS functional and classification markers were merged and regarded as HA PCOS or NA PCOS markers.
Discovering The Characteristic Difference Between Ha And Na Pcos
Co-occurrence analysis was applied on HA PCOS and NA PCOS up-regulated with common metabolism terms appeared on PCOS patients (Fig. 4d). HA PCOS was more correlated with obesity, vitamin D deficiency, hyperandrogenism, hirsutism, depression, insomnia and cardiac diseases. However, NA PCOS was more correlated with inflammation, immunity and insulin resistance. Meanwhile, we were interested on the relationship with feminine cancer for HA and NA PCOS. Survival p value (Fig. 4e) and hazard ratio (Fig. 4f) were calculated for HA and NA PCOS on 4 types of feminine cancers using GEPIA database33. No substantial difference was found on HA and NA PCOS on feminine cancer risks.
Causal Network Construction For Ha/na Pcos Markers And Female Infertility
The subsequent question to be solved was how HA or NA PCOS markers resulted in female infertility. An effective tool to solve the problem was the Apriori Rules Algorithm16,17 which was specifically designed to calculate causal relationship between terms (refer to Methods). The causal relationship might be promoting or inhibiting relationship. The algorithm was applied on two different form of data: merged transcriptome data (Fig. 5a) and PubMed paper abstracts (Fig. 5b). Transcriptome data were using the merged transcriptome data from the final step of classification markers identification, and the causal relationship between each pair of genes were calculated. For abstract data, 9832 PubMed paper abstracts on PCOS from the last 12 years were downloaded. NA PCOS and HA PCOS up-regulated genes were used to classify the downloaded abstracts as NA PCOS papers and HA PCOS papers. NCBI MeSH words43 with specific subheadings were applied as the measurement for the following Apriori Rules calculation. The number of MeSH terms co-occurred on the same abstract with each specific gene or MeSH term were recorded. Causal relationship was afterwards calculated based on the co-occurred number of MeSH terms. Apriori Rules method was then applied on calculated causal relationship to construct higher order rules with 3 elements for NA PCOS and HA PCOS. The term “Female infertility” was designed as the terminal for the causal cascade.
The 3 element rules ending as “Female infertility” were combined and merged with the rules from merged transcriptome data. Genes or terms with less connections were removed, and afterwards the the causal knowledge graph for HA PCOS (Fig. 5c) or NA PCOS (Fig. 5d) with female infertility was constructed. Mutual terms of HA and NA PCOS were removed from the knowledge graph (Fig S5b). From the knowledge graph, we were capable of deriving the causal route for HA or NA PCOS markers and important MeSH terms which finally resulted in female infertility. The arrows on the network pointed from the cause to the effect (promote or suppress), or the sub-term to the term. Most of the causal relationships on the network were accordant with public knowledge and published papers. Unreported causal relationship might be prediction of the hidden mechanism for HA and NA PCOS. From the knowledge graph, we also noticed the important role for our newly discovered HA and NA PCOS functional markers IL6R, CD274 and CASR on female infertility.
Drug Interaction Analysis Predict Specific Drugs For Ha And Na Pcos
The next clinical question to answer was the difference in drugs applied in HA and NA PCOS. We downloaded and applied the reported drug-gene interaction relationship from GRNdb34 database on identifying the drugs specifically interacted with HA markers and NA markers. The flowchart of the analysis was shown (Fig. 6a). 23 HA PCOS specific drugs and 6 NA PCOS specific drugs were discovered (Fig. 6b). The co-occurrence counts with PCOS were illustrated (Fig S5c,d). Further filtering removed drugs with less connections with marker genes. Then we constructed network based on the drug-gene interaction between selected drugs and HA/NA PCOS markers for HA and NA PCOS (Fig. 6c,d). The selected drugs were promoting or restraining the corresponding interacted markers. From the interaction network, we noticed androgen composed of most part, which was consistent with HA PCOS characteristics and probably promoting HA PCOS, while we also discovered flutamide and tamoxifen as specific drugs promoting or inhibiting HA PCOS. Within these drugs, tamoxifen had the potential to inhibit HA PCOS and promoted ovulation and pregnancy44. For NA PCOS, we found human albumin, herapin, insulin, adenosine, liothyronine sodium and antibiotic might promote or inhibit NA PCOS. According to reported drug function, liothyronine sodium might promote NA PCOS45, while antibiotic might potentially inhibit NA PCOS46. Human albumin might also inhibit NA PCOS by regulating immunity. Further drug experiments were required for effects of other drugs and to disclose the suitable drugs for HA and NA PCOS.