Genome-Wide Identi cation and Functional Prediction of Circular RNAs in Response to Heat Stress in Chinese Holstein Cows

Abstract Heat stress (HS) leads to substantial economic loss of dairy industry each year. The negative effect of HS in dairy cows is becoming one of the more urgent issue due to accelerating side-effects of global warming. Various genes are involved in HS response but the information about the role of noncoding RNAs, especially circular RNAs (circRNAs) is largely unknown. In our study, we aimed to investigate the different expression profile of circRNAs between HS and Non-heat-stressed condition (NC) of Chinese Holstein cow’s mammary gland. CircRNAs were identified using RNA sequencing and bioinformatics analysis. In total, 37405 circRNAs were detected and 95 were differentially expressed (DE), including 15 downregulated and 80 upregulated circRNAs in HS group compared to NC. Eight circRNAs were randomly selected to verify the RNA sequencing result. Further, Sanger sequencing validated the backsplicing site of the eight circRNAs. Moreover, results obtained from the Quantitative real time PCR (qRT-PCR) showed consistent expression trend with that of RNA sequencing. GO annotation and KEGG analysis suggested that these DE circRNAs probably involved in the energy metabolic regulation. Furthermore, we constructed ceRNA network and the result indicated that these DE circRNAs could regulate lactation through IGF1 and PRL signaling pathway.


Background
Heat stress (HS) can lead to substantial economic loss for the dairy industry by negatively affecting welfare and productive performance of dairy animals [1][2][3][4]. In recent years, the impact of HS has gained attention due to the problem of global warming. HS in dairy cows can mainly provoked by elevation in the ambient temperature, and numerous other external factors including relative air humidity, sunlight radiation, air circulation and precipitations caused cumulative effect [5,6]. Generally, the temperature-heat index (THI) is widely used in assessment of thermal burdens in cattle, of which that the value exceeding 72 indicates suffering of cows from the HS [7,8]. Cows under the HS condition suffer from the increased body temperature that exceeds the physiological limits. This can ultimately lead to impaired welfare and productive performance such as various diseases, decreased reproductive performance, reduced milk yield and even the death [6,9,10]. Cows with high genetic merit are more susceptible to HS which results in a signi cant economic loss for the dairy industry [11,12]. To improve animal welfare and productivity, many feeding and environmental conditioning strategies are used, however such practices increase the cost dramatically [1,[13][14][15]. Increasing evidence indicates that genetic variation contributes to HS response of cows [16,17]. Therefore, identifying and selecting animals which are thermotolerant can be the best way to alleviate the effect of HS [16].
Generally, HS negatively affects milk production, milk composition and mammary gland pathogens [2,18,19]. These all are directly related to altered biological processes of mammary gland. Yet the mechanisms underlying effects of HS on the mammary gland function have not been comprehensively studied.
circRNAs are a class of noncoding RNAs with naturally formed covalently closed-loop structures, and considered as byproducts of aberrant splicing when discovered initially. During the last decades, circRNAs were widely identi ed in viruses, archaea, fungi, plants and metazoans [22]. In mammalian cells, circRNAs are stable and observed to be conserved across species and often present tissue-speci c and stage-speci c expression pattern [23]. Recent studies have proved that circRNAs act as the key regulators of genes by sponging miRNAs, RBPs and even translated into peptides [22,[24][25][26].
Increasing evidence indicates that circRNAs play important roles in physiological and pathological conditions in human and mouse but functions in cattle are largely unknown [22]. Given the essential roles of circRNA in gene regulation, we hypothesized that they are involved in the HS response. Therefore, the aim of our study was to investigate the different expression pro le of circRNAs between HS and normal condition (NC) of Chinese Holstein cow's mammary gland. The obtained results comprehensively elucidate the underlying mechanism of HS response in cattle and provide the new insights in breeding of thermotolerant dairy cows.

Overview of circRNAs in mammary glands of HS and NC cows
To understand the in uence of HS on circRNA expression pro les eight Chinese Holstein cows in total were selected at rst, but sample G was not quali ed when circRNA libraries were being constructed. Therefore, 7 libraries were constructed for mammary glands of NC (n=4) and HS (n=3) cows. After sequencing, we obtained 568210498 raw reads in total from seven circRNA libraries. By removing reads containing adapter, reads containing ployN and low-quality reads from raw data, we nally recovered 541652318 clean reads (more than 81 G dates) for circRNA identi cation and expression pro ling analysis. The clean data were analyzed using nd_circ and CIRI2 [27,28]. Finally, 37405 circRNAs were found. Considering all the circRNAs identi ed, their genomic loci were extensively distributed on all chromosomes. Chromosome 1 produce the most circRNAs while chromosome 27 the least (Fig. 1a). The spliced length of 94% circRNAs ranged between 100 bp to 1000 bp, while 80% ranged from 150 bp to 500 bp (Fig. 1b). The full length of these circRNAs which is considered as the genomic distance between splice-sites, was detected and showed enormous variation (Fig. 1c). Among them, 68% had a length not more than 20000 nt, of which 5% were less than 1000 nt, 21% had length between 1000 nt to 5000 nt and 19% recovered with the length of 5000 nt to 10000 nt. As predicted, most of the circRNAs derived from exonic regions, however some of them were also detected from intron or intergenic region (Fig. 1d). The detail information of all the circRNAs identi ed was shown in Additional le 1: Table S1.

Analysis of differentially expressed (DE) circRNAs
The expression abundance of circRNAs was measured based on TPM and the different expression le was further analyzed. As mentioned above, in total 37405 circRNAs were identi ed, and our analysis revealed that among them, 17147 circRNAs were expressed commonly in both NC and HS groups, whereas 4463 circRNAs expressed only in HS group (Fig. 2a). Among these, a total of 95 circRNAs were classi ed as DE circRNAs between two groups, of which 80 circRNAs were up-regulated and 15 circRNAs were down-regulated in HS group relative to that of NC (adjusted P-value <0.05) (Fig. 2b, Additional le 2: Table S2). Further, we conducted a hierarchical clustering analysis of all the DE circRNAs according to their expression pro les. Samples in the same group were clustered together and the expression patterns are shown in Fig. 2c.

Validation of DE circRNAs
In order to validate the DE circRNAs screened by RNA sequencing, we randomly selected eight circRNAs and their sequences and expression levels were con rmed by further experiments. To amplify the junction region of circRNAs, divergent primers were designed as shown in Fig. 3a. The backsplice junction site was validated by Sanger sequencing of the ampli ed PCR products (Fig. 3b). We used Quantitative real time PCR (qRT-PCR) to determine the expression level of the eight circRNAs. Then the qRT-PCR results were compared with that of RNA sequencing. The results obtained from qRT-PCR were highly consistent with that of RNA sequencing ( Fig. 3c-d), indicating that the RNA sequencing results were reliable.

Functional analysis of DE circRNAs
In order to explore the biological function of the DE circRNAs, we conducted GO and KEGG analysis for their host genes. GO analysis showed that in total there were 125 signi cantly enriched GO terms in biological process (BP), cellular component (CC) and molecular function (MF) (Additional le 3: Table  S3). The top 10 enriched GO terms in each category were shown in Fig. 4a. The most signi cantly enriched terms in the three categories were non-recombinational repair, plasma membrane and nucleotide binding. Further, KEGG analysis was carried out to explore the signaling pathway involved in these circRNA host genes. The result from KEGG pathway enrichment analysis was shown in Fig. 4b and Additional le 4: Table S4, which indicated that the host genes of DE circRNAs were mainly involved in AMPK signaling pathway, ABC transporters, Glycerolipid metabolism, Adipocytokine signaling pathway, Salivary secretion, Glycerophospholipid metabolism and Fatty acid biosynthesis.
CircRNAs play an indispensable role in regulating biological processes by acting as miRNA "sponges". To explore the potential function of the DE circRNAs, we constructed circRNA-miRNA-gene interaction network (Fig. 5), in which circRNA act as a member of competing endogenous RNA (ceRNA). miRNAs and genes in the network were identi ed by our previous study [20], which were associated with HS response.
We used all DE circRNAs (n=95), miRNAs (n=27) and two genes (IGF1 and PRLR) to construct the circRNA-miRNA-gene interaction network. The interaction between circRNAs and miRNAs was predicted by miRanda. The target of IGF1 and PRLR was predicted by targetscan v7.2. The network was constructed by Cytoscape v3.7.1. As shown in the gure, 39 circRNAs potentially regulated HS response by IGF1 or PRLR pathway through sequestering these 21 miRNAs.

Discussion
The cellular response to HS triggers complex signaling pathway changes involving coding and noncoding genes. In current study, we screened and determined the different expression pro les of the mammary gland of HS and NC, which revealed that circRNAs play important regulatory role in HS response of mammary gland. During our study, we recovered 37405 circRNAs, of which 95 circRNAs were DE. In one of the studies carried out by Wang et al., 38 DE circRNAs were identi ed out of 2950, however, circRNAs derived from intergenic regions were not analyzed [29].
Evidence from previous study proved that several circRNAs can be produced from one gene loci [30]. For instance, circRNA circ_0009375, circ_0009360, circ_0009373, circ_0009362 were derived from CSN1S2. In earlier studies, these circRNAs derived from casein genes have been reported to be the key circRNAs involved in lactation [30]. In our study, the log 2 (Fold change) expression level for circ_0009375, circ_0009360, circ_0009373, circ_0009362, circ_0009346 between HS and NC were 4.9, 7.7, 4.5, 6.3, 6.9, respectively. The apparent difference implies that these circRNAs play critical role in HS response of mammary gland and may implicate in lactation regulation. However, these circRNAs associated with casein genes showed inconsistent expression trend in HS condition. Wang et al. [29] reported that in cows circCSN1S2 was downregulated during HS condition. Whereas, Sun et al. [31] reported circCSN1S1_1 and circCSN1S1_2 which are two circRNAs processed from one host gene, and showed contrasting expression trend during HS condition in sows. This inconsistent expression trend may relate to the exact lactation stage of cows, which need to be considered in further study.
The potential biological functions of the DE circRNAs were deduced by the bioinformatics analysis of their host genes. The top GO items in MF were GTP binding, GTPase activity, purine ribonucleoside triphosphate binding, which closely relate to energy metabolic. Furthermore, KEGG analysis showed AMPK signaling pathway as the most dominant enrich pathway, and it is known as "fuel gauge" of the cellular energy status [32]. The AMPK signaling pathway was widely reported to be affected by HS, and AMPK can serve as a stress indicator in animals [33][34][35]. Altogether, these results signi cantly suggest that the function of the DE circRNAs may relate to energy metabolic regulation.
Accumulating evidence revealed that circRNAs can serve as miRNA sponge to impair the target gene inhibition [25,27]. We analyzed the DE circRNAs data together with that of miRNAs and genes we got previously. We selected two genes (PRLR, IGF1) out of 96 genes to construct circRNA-miRNA-gene interaction network. Because prolactin (PRL) and insulin-like growth factor (IGF) pathways are two essential lactation regulating signaling pathways that are well demonstrated before [36]. PRLR and IGF1 are the key members of the two pathways, separately. PRLR is the receptor of PRL and the PRL signal can transmit into cell through PRLR. Afterwards, STAT5 become active and subsequently genes associated with milk synthesis gets activated [37]. IGF1 was reported to regulate milk protein gene expression through JAK/STAT5 pathway [38]. In our previous study, we identi ed 27 miRNAs associated with HS and 22 of them were involved in our circRNA-miRNA-gene interaction network. Notably, 10 of them (miR-190a, miR-15a, miR-370, miR-21-5p, miR-885, miR-1388-5p, miR-135a, miR-2284j, miR-133a, miR-133b) potentially form ceRNA network with both PRLR and IGF1. Particularly, miR-145 is a noteworthy miRNA in the network. IGF1 and up to ten circRNAs, including circ_0012242, circ_0024212, circ_0016987, circ_0012240, circ_0018446, circ_0002652, circ_0004962, circ_0011213, circ_0018526 and circ_0016976, competitively bind to miR-145. Earlier studies carried out on goat indicated that miR-145 was associated with lipid metabolism during lactation [39]. IGF1 has been reported to be a direct target of miR-145 [40,41]. Moreover, miR-145 was demonstrated to target the receptor of IGF1 to regulate IGF1 signaling [42][43][44][45]. 10 circRNAs mentioned above probably involved in lactation regulation mediating by miR-145 in HS response. Collectively, these results indicate that the DE circRNAs may implicate in lactation regulation through PRL and IGF signaling pathways by acting as miRNA sponges.

Conclusion
During our study, we identi ed the circRNAs expression pro le based on genome-wide RNA sequencing in HS and NC of mammary gland of Chinese Holstein cows. In total, we found 95 DE circRNAs. GO annotation and KEGG analysis suggested that these DE circRNAs probably involved in energy metabolic regulation. Moreover, the ceRNA network analysis indicated that these differently expressed circRNAs could regulate lactation through IGF1 and PRL signaling pathway. In brief, we identi ed circRNAs in mammary gland of Chinese Holstein cows and explored their function in HS response. This study will help researchers to achieve a better understanding of mechanism involved in HS response and thus provide new insights in breeding of thermotolerant animals.

Animal and Sample collection
Animal care and use procedures were approved by the Institutional Animal Care and Use committee of the Langfang Normal University (Langfang, China). In total, eight Chinese Holstein cows which are all the fourth lactation cows were selected from Yucheng dairy farm, China and divided into two groups (4 cows in each group). All cows were raised under the same management conditions. Mammary gland samples were collected in the cool spring month of March (15-20 °C) in the NC group and the hot summer month of July (30-38 °C) in the HS group. The animals were selected according to the rules described previously and the samples were the same with those used in the study of Li et al. [20]. In brief, THI was used as the HS indicator. THI was calculated as THI=0.72 (Td+Tw) + 40.6 where Td indicates the dry-bulb temperature and Tw indicates the wet-bulb temperature. The THI mean values for the NC and HS group were 65.8 and 83.8, respectively. The stress response of the cows was characterized by recording physiological parameters including the rectal temperature and heat tolerance coe cient. The physiological parameters have been shown in the study of Li et al. Cows were euthanized by intravenous injection of pentobarbital sodium (150 mg/kg body weight). Then mammary gland samples were removed after excision of the intramammary lymph node. The obtained samples were stored in the liquid nitrogen until further analysis.

RNA preparation
Total RNA of each sample was isolated using TRIzol reagent (Invitrogen, Carlsbad, CA) according to the manufacturer's instructions. The RNA degradation and contamination were monitored on 1% agarose gels. RNA purity was checked using NanoPhotometer spectrophotometer (IMPLEN, CA, USE). RNA integrity was assessed using the RNA Nano 6000 Assay kit of the Bionalyzer 2100 system (Agilent Technologies, CA, USA). RNA samples with Integrity Number (RIN) value >7.5 were used for library construction.

Library preparation and sequencing
A total amount of 5 µg RNA per sample was used as input material for the RNA sample preparations. rRNA was removed by Epicentre Ribozero rRNA Removal Kit (Epicentre, USA). The linear RNAs were digested with 3 U of RNase R (Epicentre, USA) per μg of RNA. Sequencing libraries were generated using NEBNext Ultra Directional RNA Library Prep Kit for Illumina (NEB, USA) following manufacturer's recommendations and index codes were added to attribute sequences to each sample. In brief, fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5×). Random hexamer primer and M-MuLv Reverse Transcriptase (RNase H-) were used for the synthesis of cDNA rst strand. Subsequently, second strand was synthesized using DNA Polymerase I and RNase H. The dNTPs with dTTP were replaced by dUTP in the reaction buffer.
Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. 3' ends of DNA fragments were modi ed with adenylation. Then NEBNext Adaptors with hairpin loop structure were ligated to prepare for hybridization. Library fragments were puri ed using AMPure XP system (Beckman Coulter, Beverly, USA) to select cDNA fragments of preferentially 150 to 200 bp in length. Then the sizeselected and adaptor-ligated cDNA was incubated with 3 µl USER Enzyme (NEB, USA) at 37°C for 15 min before PCR. Subsequently, PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. Further, the PCR products were puri ed (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system.
The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumina) according to the manufacturer's instructions. After cluster generation, the libraries were sequenced on the Illumina platform and 150 bp paired end reads were generated.

Sequence mapping and circRNA identi cation
Reference genome and gene model annotation les were downloaded from genome website directly. Index of the reference genome was built using bowtie2 v2.2.8 and paired-end clean reads were aligned to the reference genome using Bowtie [46]. circRNAs were detected and identi ed using nd_circ [27] and CIRI2 [28]. Circos software was used to construct the circos gure.

Differential expression
The raw counts were rst normalized using TPM (Transcripts per million reads). Normalized expression level = (readCount*1,000,000) / libsize (libsize is the sum of circRNA readcount). Differential expression analysis of two groups was performed using the DESeq R package (1.10.1). DESeq provide statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution [47]. The resulting P values were adjusted using the Benjamini and Hochberg's approach for controlling the false discovery rate. Genes with an adjusted P-value <0.05 found by DESeq were assigned as DE.

Validation of DE circRNAs
In order to verify the result of RNA sequencing, we randomly select eight DE circRNAs to be further analyzed. The backsplicing site was ampli ed using divergent primers (Additional le 5: Table S5). The ampli cation products were detected by agarose gel electrophoresis followed by Sanger sequencing. The expression levels of the selected eight circRNAs were con rmed by Quantitative real time PCR (qRT-PCR). Total RNA was reverse transcribed into cDNA using RevertAid First Strand cDNA Synthesis Kit (Thermo scienti c, USA) according to the manufacturer's protocol. Then the qRT-PCR was carried out with a total volume of 20 μl using GoTaq qPCR Master Mix (Promega, USA). Each assay was performed in triplicate using the following program: 94 °C for 5 min; 35 cycles of 95 °C for 30 s, 58 °C for 15 s, and 72 °C for 30 s. The relative expression level was calculated by the method of 2 -ΔΔCt and β-actin serves as the reference gene. The data for each DE circRNAs were analyzed between NC and HS group by Student's t-test using SPSS.

GO and KEGG enrichment analysis
Gene Ontology (GO) enrichment analysis for host genes of DE circRNAs were implemented by the GOseq R package, in which gene length bias was corrected [48]. GO terms with corrected P value less than 0.05 were considered signi cantly enriched by DE genes. KEGG is a database resource (http://www.genome.jp/kegg/) for understanding high-level functions and utilities of the biological system [49]. We used KOBAS software to test the statistical enrichment of differential expression genes or circRNA host genes in KEGG pathways [50].     CircRNAs-miRNAs-gene interaction network. The orange V type nodes indicate DE miRNAs in HS condition identi ed in our previous study. The blue circles indicate D circRNAs. The purple diamond represent IGF1 or PRLR gene as indicated.