Sequencing of cDNA libraries and transcriptome profiles of OGCs
A total of 999,067,624 raw reads were acquired using the NovaSeq™ 6000 platform (Illumina, San Diego, CA, USA). After excluding adapter contamination, undetermined bases, and low-quality bases, 902,773,866 clean reads were obtained, accounting for 135.42 GB. Although the GC content of the clean data was ~42%, the quality scores of clean reads were >99.9 and >98.83% for Q20 and Q30, respectively. Thus, the reliability and quality of the sequencing data were sufficient for further analyses (Table 3). Over 96% of clean reads were mapped to the reference human genome using HISAT (http://daehwankimlab.github.io/hisat2/), including uniquely mapped reads from 84.10% to 86.76% (Table S1).
Table 3 Statistical data of the reads for 12 cDNA libraries
Sample
|
Raw reads
|
Raw bases
|
Clean reads
|
Clean bases
|
Valid ratio (reads)
|
Q20%
|
Q30%
|
GC content (%)
|
DOR_1
|
95746384
|
14.36G
|
87242788
|
13.09G
|
91.12
|
99.99
|
98.96
|
42
|
DOR_2
|
90791244
|
13.62G
|
79121838
|
11.87G
|
87.15
|
99.99
|
98.88
|
43
|
DOR_3
|
72114510
|
10.82G
|
66912764
|
10.04G
|
92.79
|
99.99
|
98.96
|
41.50
|
DOR_4
|
70676974
|
10.60G
|
65727486
|
9.86G
|
93.00
|
99.99
|
98.97
|
41
|
DOR_5
|
91948332
|
13.79G
|
85217766
|
12.78G
|
92.68
|
99.99
|
98.93
|
42
|
DOR_6
|
81704846
|
12.26G
|
74061172
|
11.11G
|
90.64
|
99.99
|
98.83
|
44
|
NOR_1
|
88597962
|
13.29G
|
81949398
|
12.29G
|
92.50
|
99.99
|
98.91
|
42
|
NOR_2
|
67940108
|
10.19G
|
62960406
|
9.44G
|
92.67
|
99.99
|
98.93
|
41.50
|
NOR_3
|
66981682
|
10.05G
|
61871236
|
9.28G
|
92.37
|
99.99
|
98.98
|
42
|
NOR_4
|
88814944
|
13.32G
|
73435164
|
11.02G
|
82.68
|
99.99
|
98.91
|
43.50
|
NOR_5
|
90275194
|
13.54G
|
77710494
|
11.66G
|
86.08
|
99.99
|
98.88
|
43
|
NOR_6
|
93475444
|
14.02G
|
86563354
|
12.98G
|
92.61
|
99.99
|
98.97
|
42
|
Identification of lncRNAs and mRNAs
To further recognize the protein-coding ability of unknown transcripts, CPC and CNCI were designed to exclude those with coding potential. As a result, a total of 45,333 novel lncRNAs were detected from 12 cDNA libraries (Table S2). These lncRNAs were dispersed evenly across the 46 human chromosomes (Fig. 1A). With regard to the genomic location of the lncRNAs, the types of lncRNAs were class_code “i” (intraintronic transcript, 72.24%), class_code “u” (intergenic transcript, 18.37%), class_code “o” (sense transcript, 3.55%), class_code “j” (bidirectional transcript, 3.25%), and class_code “x” (antisense transcript, 2.59%). These lncRNAs showed no apparent bias for genomic location (Fig. S1 and Fig. 1B). Additionally, a total of 47,520 known lncRNAs were identified between the two groups.
We compared lncRNAs with mRNAs to illustrate the structural features of lncRNAs. The expression and number of lncRNAs were comparable to those of mRNAs (Fig. 1C). Furthermore, the median length of lncRNA transcripts was 1504 bp, which was shorter than the median length of 2317 bp for mRNA transcripts. These findings suggest that these lncRNAs are shorter than the mRNAs (Fig. 1D). There were fewer exons in lncRNAs (mean, 3) than in mRNAs (mean, 9). In total, 68.86% of mRNAs had ≥5 exons, whereas 77.02% of lncRNAs had ≤3 exons (Fig. 1E). Moreover, in OGCs, the predicted open reading frame (ORF) length of lncRNAs was shorter than that of mRNAs (Fig. 1F).
Expression profile of mRNAs
A total of 457 DE mRNA genes were found between the OGCs of the two groups. Of these genes, 169 genes were upregulated, and 288 genes were downregulated (Fig. 2A and B). To further explore the functions of these DE mRNA genes, analyses of functional enrichment were performed using the GO database. Among the 457 DE mRNA genes, 3404 GO terms with functional information were enriched (Fig. 2C and D). A total of 542 GO terms were enriched significantly (P < 0.05). For biological process (BP), 359 terms were found, including “cell adhesion” (GO:0007155), “positive regulation of apoptotic process” (GO:0043065) and “steroid biosynthetic process” (GO:0006694). For molecular function (MF), 137 terms were discovered, including “oxidoreductase activity” (GO:0016491). For “cellular component” (CC), 42 terms were found, including “endoplasmic reticulum lumen” (GO:0005788) (Table S3).
We used the KEGG database to identify which genes had significantly enriched signaling pathways. Nineteen significantly enriched signaling pathways related to the DE mRNA genes (P < 0.05) were found, including “steroid biosynthesis”, “focal adhesion”, “ECM–receptor interaction”, “PPAR signaling pathway”, and “PI3K-AKT signaling pathway” (Fig. 2E).
Expression profile of lncRNAs
We discovered that 466 lncRNAs had differential expression between the DOR group and NOR group. Of these, 244 were upregulated (53 known and 191 novel), and 222 were downregulated (36 known and 186 novel) (Table S4, Fig. 3A and B). To further probe the functions of these lncRNAs, we forecasted the cis-regulated target genes of the DE lncRNAs between the DOR group and NOR group. A total of 52 lncRNAs were found to have target genes if 100 kilobase pairs (kbp) was used as the cutoff. Some lncRNAs had 2–3 target genes, and 57 probable lncRNA target genes were identified (Table S5). Based on these cis-regulated target genes, we used the GO database to identify 76 important GO terms: 57 BP terms (e.g., “stem cell proliferation, GO:0072089), 6 CC terms (e.g., “proteasome activator complex”, GO:0008537), and 13 MF terms (e.g., “CXCR3 chemokine receptor binding”, GO:0048248) (Fig. 3C and D). Analyses of signaling pathway enrichment using the KEGG database revealed that these lncRNA target genes were primarily enriched in “ferroptosis”, “fatty acid biosynthesis”, and the “JAK-STAT signaling pathway” (Fig. 3E).
We discovered the predicted outcomes of the DE lncRNAs with cis-regulated genes. We listed the first-six and last-four lncRNA–gene pairs according to the Pearson correlation coefficient. The first-six lncRNA–gene pairs were regulated in the same direction, whereas the last-four lncRNA–gene pairs were in the opposite direction (Table 4).
Table 4 Differentially expressed lncRNA–gene pairs between the DOR group and NOR group
Gene
|
lncRNA transcript
|
Cis location (bp)
|
Pearson correlation coefficient
|
CXCL10
|
MSTRG.57192.1
|
1K
|
1.00
|
TNFSF15
|
MSTRG.77544.1
|
100K
|
1.00
|
PGBD5
|
MSTRG.9312.2
|
100K
|
1.00
|
CNTN3
|
MSTRG.52583.1
|
100K
|
0.99
|
SLC16A10
|
MSTRG.66173.1
|
10K
|
0.99
|
SLC16A10
|
MSTRG.66167.1
|
100K
|
0.99
|
BNIP2
|
MSTRG.27592.2
|
100K
|
-0.31
|
SYNCRIP
|
ENST00000656092
|
100K
|
-0.32
|
NPIPB15
|
MSTRG.31257.1
|
10K
|
-0.33
|
C2CD2
|
MSTRG.48668.1
|
100K
|
-0.35
|
Coenriched GO terms of the DE lncRNAs and mRNAs
We then aimed to identify the key pathways that regulate the ovarian reserve. Hence, we identified five significantly enriched GO terms in the enrichment of the DE mRNAs and the enrichment of the lncRNA target genes. The significantly enriched GO terms were “CXCR chemokine receptor binding”, “CXCR3 chemokine receptor binding”, “positive regulation of transforming growth factor-beta production”, “regulation of bile acid biosynthetic processes”, and “cellular response to organonitrogen compound”. Three pathways were involved in BP. and the other two pathways were involved in MF (Table 5).
Table 5 Coenriched GO terms of DE lncRNAs and mRNAs
GO Term
|
GO function
|
P
|
GO:0048248 (CXCR3 chemokine receptor binding)
|
molecular_function
|
0.002
|
GO:0071636 (positive regulation of transforming growth factor beta production)
|
biological_process
|
0.005
|
GO:0045236 (CXCR chemokine receptor binding)
|
molecular_function
|
0.010
|
GO:0070857 (regulation of bile acid biosynthetic process)
|
biological_process
|
0.027
|
GO:0071417 (cellular response to organonitrogen compound)
|
biological_process
|
0.034
|
Construction of a co-expression network of DE lncRNAs and mRNAs
We performed comprehensive analyses of the lncRNA–mRNA regulatory network to further explore the potential regulatory mechanism. Co-expression analyses between the DE mRNA genes and lncRNAs were performed first. Twenty-four positively correlated lncRNA–mRNA co-expression pairs containing 24 DE lncRNAs and 19 DE mRNAs were acquired at the verge of a Pearson correlation coefficient (r > 0.40) (Table S6). Then, a lncRNA–mRNA regulatory network comprising dysregulated lncRNAs and their cis-target mRNAs was visualized using Cytoscape v3.7.1 (https://cytoscape.org/). Solute carrier family 16 member 10 (SLC16A10) was included in this network and was regulated by five lncRNAs: MSTRG.66173, MSTRG.66167, MSTRG.66177, MSTRG.66174, and MSTRG.66170 (Fig. 4). Among them, SLC16A10 was enriched in several GO terms, including “protein binding” (GO:0005515), “thyroid hormone generation” (GO:0006590), and “cell junction” (GO:0030054). SLC16A10 was also enriched in the “thyroid hormone signaling pathway” and “protein digestion and absorption”.
Verification of the DE lncRNAs by RT–qPCR
To verify the RNA-seq results, RT–qPCR was performed to examine seven of the identified DE lncRNAs between the OGCs from the DOR group and NOR group. These seven lncRNAs include NEAT1, RAB5A, MEG3, GNG12, MST1L, GCN1, ZEB2-AS1. Of these, the expression of NEAT1, GNG12, and ZEB2-AS1 was consistent with the RNA-seq results. The expression of the remaining four lncRNAs was not significantly different but showed a consistent trend with the RNA-seq data (Fig. 5). These results indicate that the seq-RNA data were reliable.