Identification of long non-coding RNAs involved in fat 1 deposition in pigs using two high-throughput sequencing 2 methods 3

: 23 Background: Adipose is an important body tissue in pigs, and fatty traits are critical in pig 24 production. The function of long non-coding RNA (lncRNA) in fat deposition and metabolism has 25 been proven in previous studies. In this study, we focused on lncRNAs associated with fattening 26 traits in pigs. The adipose tissue of six Landrace pigs with either extremely-high or -low backfat 27 thickness ( n high = 3, n low = 3) was collected, after which we performed strand-specific RNA 28 sequencing using biological replicates and pooling methods. 29 Results: A total of 19,631 genes and 2,013 lncRNAs were identified using the coding potential 30 calculator, coding-non-coding index, and Pfam database, including 334 known transcripts and 31 1,679 novel transcripts. Using edgeR, we determined that 220 lncRNAs and 1,512 genes were 32 differentially expressed (|Fold Change| > 2 and false discovery rate < 0.05) between the two 33 groups in biological replicate RNA sequencing (RNA-seq), and 127 lncRNAs and 2,240 genes 34 were differently expressed in pooling RNA-seq. Further Kyoto Encyclopedia of Genes and 35 Genomes and Gene Ontology enrichment analysis of the differentially expressed genes found that 36 some of the genes were involved in several key pathways related to fat development. After 37 targeting gene prediction, we determined that some cis-target genes of the differentially expressed 38 lncRNAs play an important role in fat deposition. For example, ACSL3 is cis-targeted by lncRNA 39 TCONS-00052400, and it can activate the conversion of long-chain fatty acids. In addition, 40 lncRNA TCONS_00041740 was up-regulated in the high backfat thickness group, and its 41 cis-target gene ACACB was also up-regulated in this group. It has been reported that ACACB is the 42 rate-limiting enzyme in fatty acid oxidation. Conclusions: Since these genes have necessary functions in fat metabolism, the results imply that the lncRNAs detected in our study may affect fat deposition in pigs through regulation of their 45 target genes. In summary, our study explored the regulation of lncRNA and their target genes on 46 fat deposition in pigs and provided new insights for further investigation of the biological 47 functions of lncRNA. In addition, the gene PLA2G12B in the BH group. PLA2G12B A2 group is encoded by this gene and belongs to the phospholipase A2 ( PLA2 ) group of enzymes, which plays a role in lipid hydrolysis by releasing free fatty acids and lysophospholipids [52] . Studies have shown that a reduction of PLA2G12B decreases the amount of serum triglyceride (TG)-rich VLDL particles secreted by the liver, resulting in a reduction in TG content [53] . PLA2G12B can also participate in the pathogenesis of idiopathic membranous nephropathy (iMN) by regulating lipid metabolism [54] . In a previous study by Guan, PLA2G12B -null mice had obvious accumulation of large lipid droplets in the liver, displaying the fatty liver phenotype [55] . These results indicate that the genes exclusively expressed in the high or low backfat groups may also have a certain regulatory effect on lipid metabolism. to and TCONS-00052400 to ACSL3 These results can provide useful information for understanding the regulation of by lncRNA in pigs. However, further genetic experiments are still needed to validate the association of the lncRNA and mRNA functions presented in this study.


Introduction 51
Fat deposition is an important biological process in pig growth. Fat traits are closely related to 52 pork quality as well as the production efficiency and reproductive traits of pigs [1] . The fat content 53 of pigs also affects consumers' choice of pork [2] . Furthermore, pigs are an ideal animal model for 54 human disease research [3] on various diseases such as type 2 diabetes [4] , hypertension, and 55 coronary heart disease [5] . Determining the molecular markers that affect subcutaneous fat 56 deposition in pigs is an important way to effectively accelerate its genetic improvement. Therefore, 57 it is of great significance to explore the molecular regulation mechanisms of fat deposition in pigs. 58 Fat deposition is a complex process influenced by multiple genes and epigenetic regulators 59 including long non-coding RNA (lncRNA). LncRNA is a type of non-coding RNA with a 60 transcript length greater than 200 nt, and accounts for approximately 80% of mammalian genomic 61 transcription products [6] . LncRNA can regulate the expression of target genes through a variety of 62 methods and plays an important role in complex biological processes in the body. Many studies 63 have shown that lncRNA can activate gene transcription at the transcriptional level [7] , participate 64 in translation substitution at the post-transcriptional level [8] , and produce imprinted genes at the 65 epigenetic modification level [9] . 66

Target genes prediction and functional annotations 157
To investigate the function of the differentially expressed lncRNAs, we searched their nearby 158 genes and considered them as potential targets of lncRNAs. It has been reported that the main 159 function of lncRNA is to regulate protein-coding genes through cis-and trans-regulation [32,33] . 160 Bedtools (v2.25.0) [34] was used to search neighborhood genes around 100 kb upstream and 161 downstream of the DELs. In addition, we examined the Spearman's rank correlation between the 162 expression levels of the DELs and DEGs to predict their targeting relationship. 163

Statistical Analysis 164
All data were reported as mean ± standard error of mean. The functions glmFit() and glmLRT() of 165 the edgeR package were used to perform a negative binomial generalized log-linear model fitting 166 and significance test, respectively, and we used the FDR method to adjust the p values. An FDR < 167 0.05 was considered significant. 168 169

170
Overview of the lncRNA sequencing data 171 In the biological replicate RNA-seq, a total of 16.24, 16.50, 16.20, 16.32, 16.07, and 17.34 Gb of 172 clean data were generated in the six samples, respectively. Furthermore, the results of the pooling 173 RNA-seq were 16.45 Gb and 15.16 Gb in the extremely-high (HIGH) and extremely-low (LOW) 174 backfat thickness groups, respectively. The average of the guanine-cytosine (GC) content was 175 between 49.37% and 55.55%. The Q30 ranged from 92.53% to 94.27%. These results show that 9 the quality of our eight libraries was suitable for subsequent data analysis. 177 Next, the clean reads were aligned to the reference genome (Sus scrofa 11.1) using Tophat v2.1.1. 178 In each sample, more than 77.82% of the clean reads were uniq mapped. The summary of the 179 sequencing data is shown in Table 1. 180 Guanine-cytosine (GC) (%) is the percentage of G and C bases in the total nucleotides; Q30 (%) is the percentage of the bases' mass greater than or equal to Q30 in the clean data; Total Reads is the number of clean reads; Uniq Mapped Reads is the number and percentage of reads that mapped to a unique position in the reference genome in the clean reads; Multiple Mapped Reads is the number and percentage of reads that mapped to multiple positions in the reference genome in the clean reads.

Identification and feature analysis of putative lncRNAs in Landrace pig backfat 1
We followed stringent criteria and created a pipeline to identify lncRNAs, as shown in Figure 2A. transcripts with non-coding potential classcodes of 'j,' 'i,' 'o,' 'u,' 'x,' and '=' were reserved [35] , 7 after which we filtered them with the length and exon numbers. Next, we predicted the coding 8 potential by CNCI and CPC and compared the transcripts to the Pfam protein database. Finally, 9 we identified 2,013 lncRNAs ( Figure 2B). Among them, 334 were known transcripts and 1,679 10 were new transcripts, including 1,250 lincRNAs (62.10%), 506 anti-sense lncRNAs (25.14%), 154 11 sense lncRNAs (7.65%), and 103 intronic lncRNAs (5.12%) ( Figure 2C). These 2,013 lncRNAs 12 were distributed throughout all pig chromosomes, but chromosome 1 contained the most lncRNAs 13 ( Figure 2D). The majority of the lncRNA lengths were between 601 and 800 nt, and lncRNA with 14 two exons were the most common, as shown in Figures

extremely-high or -low backfat pigs 18
Studies have shown that the expression level of lncRNAs is usually lower than that of 19 protein-coding genes [36] . In the present study, we compared the expression of the putative 20 lncRNAs with the mRNAs. We found that the lncRNAs tended to be expressed at a lower level 21 than the protein-coding genes in both the biological replicate and pooling RNA-seq (Figures 3A  22 and 4A). In the biological replicate RNA-seq, 1,512 genes and 220 lncRNAs (Additional File 2) 23 were differentially expressed between the two groups, of which 820 genes were up-regulated in 24 the BH group compared to those in the BL group, and 692 were down-regulated ( Figures 3B and  25 D). The expression of 116 lncRNAs was up-regulated in the BH group, and 104 lncRNAs were 26 down-regulated ( Figure 3G). Next, we used the 1,512 and 220 differentially expressed mRNAs 27 and lncRNAs, respectively, to perform a clustering analysis. As shown in the heatmap, the 28 expression patterns of the samples in the BH group were distinguished from their expression 29 patterns in the BL group, both for the DEGs ( Figure 3E) and DELs ( Figure 3H). 30

31
In the pooling RNA-seq, we carried out the same analysis as that used for the biological replicate 32 RNA-seq. We found that 2,240 genes and 127 lncRNAs (Additional File 2) were differentially 33 expressed in the two groups, of which 1,206 genes were up-regulated in the HIGH group 34 compared to those in the LOW group, and 1,034 were down-regulated ( Figures 4B and E). There 35 were 83 lncRNAs with up-regulated expression levels in the HIGH group, and 44 lncRNAs were 36 down-regulated ( Figure 4F). 37 In addition, to further explore the difference in gene expression between the high and low backfat 38 thickness groups, we analyzed the lncRNAs and genes exclusively expressed in the two groups.

GO and KEGG functional enrichment analysis of DEGs 46
We used the differentially expressed genes detected in the two sequencing methods to perform a 47 functional enrichment analysis. Figures 5A and C show the ten most significant (FDR < 0.05) 48 terms in the Biological Process (BP) class and the five most significant terms in the Cellular 49 Component (CC) and Molecular Function (MF) classes. In the biological replicate RNA-seq, a 50 total of 977 DEGs were annotated in GO terms. We found many terms directly related to fat 51 development and metabolism, especially in the BP class, including 'tricarboxylic acid cycle,' 52 'triglyceride homeostasis,' and 'triglyceride catabolic process.' Furthermore, there were four 53 mRNAs enriched to 'positive regulation of triglyceride catabolic process'. This GO term was also 54 significant but not within the top ten; so, it is not displayed in Figure 5. In addition, DEGs were 55 also enriched in some glucose metabolism pathways such as 'gluconeogenesis' and 'glycolytic 56 process.' The results of the pooling RNA-seq are shown in Figure 5C. We focused on several 57 terms, including 'positive regulation of fatty acid biosynthetic process,' 'cholesterol biosynthetic 58 process,' 'lipoprotein metabolic process,' and 'carbohydrate metabolic process.' We also found 59 terms such as 'gluconeogenesis' and 'glycolytic process,' which were the same as those in the 60 biological replicate RNA-seq. All the terms that were considered important are highlighted by a 61 red frame in Figure 5.

Target gene prediction of differentially expressed lncRNAs 79
To further understand the potential function of lncRNA in fat deposition, we predicted the target 80 genes of the lncRNAs. The general method of predicting lncRNA target genes is to search 81 upstream or downstream to identify nearby protein-coding regions, termed as cis-regulating target 82 genes [37] . It has been reported that lncRNA can regulate coding genes around 10 to 500 kb up and 83 downstream [38] . We searched the nearby coding genes around 100 kb up and downstream from the 84 39 DELs, which were common in the two sequencing results. We found 76 pairs of cis-regulatory 85 relationships between 39 DELs and 67 genes (Additional File 3). 86 If the expression patterns of lncRNA and mRNA show a highly positive or negative correlation, 87 their functions may be highly correlated [39] . Therefore, we conducted a Spearman's rank 88 15 correlation analysis between the expression of the DELs and DEGs. The expression levels of 116 89 DEGs were significantly correlated with the 39 DELs (correlation coefficient > 0.934, P < 0.05). 90 They were also considered as potential target genes of these DELs. We used 39 DELs and their 91 target genes to draw a picture of the regulatory network to investigate the relationship between 92 them ( Figure 6). We highlighted the lncRNAs and their target genes for further analysis with 93 yellow nodes and red labels. In our study, six full-sib Landrace pigs with either extremely-high or -low backfat thickness were 108 taken as paired samples, and their back subcutaneous tissues were collected. We compared the 109 transcriptome data of the two groups using two methods of RNA-seq, identified the differentially 110 16 expressed lncRNAs and genes, then analyzed their function in fat development. The results of our 111 study provide a basis for the selection of pigs for breeding using lncRNAs and their target genes to 112 improve the fat deposition traits of pigs. 113 In this study, we identified a total of 19,631 genes and 2,013 lncRNAs. In accordance with the 114 results of a previous study [36] , we found that the expression of lncRNA was much lower than that 115 of mRNA. Our results are similar to those in several previously published articles on adipose 116 tissue transcriptome sequencing. For example, Miao identified 4,910 lncRNAs, 119 of which were 117 differentially expressed in the intramuscular fat tissue of Jinhua and Changbai pigs [40] . The results 118 for the functional enrichment of the DEGs showed that some genes were enriched to pathways 119 related to glucose and lipid metabolism. In addition, these pathways were confirmed to be related 120 to lipid metabolism in previous studies, including APOA1 [41] and STARD3 [42] . Chen identified 121 581 putative lincRNAs related to pig muscle growth and fat deposition, and their target genes 122 were involved in fat deposition-related processes such as the lipid metabolic process and fatty acid 123 degradation [43] . The KEGG results showed that the meaningful pathways were mostly 124 concentrated on glucose metabolism. Some of these pathways, such as 'Glycolysis / 125 Gluconeogenesis,' were also found in a previous study by our team [44] . Although they are not 126 directly related to lipid synthesis and metabolism, glucose and lipids can change into each other 127 and participate together in the tricarboxylic acid cycle. Therefore, the process of glucose 128 metabolism can have an indirect effect on lipid metabolism. 129 As a kind of non-coding RNA, the main role of lncRNA is to regulate their target genes: peroxisome proliferator-activated receptor-γ and sugar response element binding protein, and then 140 regulate the fat production process in the liver [46] . Furthermore, ACSL3 can also affect the 141 secretion of very low-density lipoproteins (VLDL) by promoting the synthesis of lecithin [47] . 142 ACACB (acetyl-CoA carboxylase beta) is a differentially expressed gene that is cis-regulated by 143 the lncRNA TCONS-00041740, which is located on the antisense strand of ACACB. Moreover, in 144 the KEGG enrichment analysis, ACACB was involved in the 'pyruvate metabolism' pathway. 145 Based on previous studies, we knew that ACACB is the rate-limiting enzyme in fatty acid 146 oxidation [48] . Moreover, in ACACB knockout mice, continuous fatty acid oxidation increases 147 insulin sensitivity, and feeding them a high fat/high carbohydrate diet is more likely to cause 148 obesity and diabetes [49] . In a previous study by Li, ACACB was shown to be a marker gene for 149 childhood obesity [50] . 150 Some genes were exclusively expressed in one of the two groups. For example, PLCB2 151 (phospholipase C beta 2) was exclusively expressed in the BL group. Phospholipase C is a class of 152 glycerol phospholipid hydrolases, hydrolyzing the glycerol phosphate C3 site [51] . The protein 153 encoded by PLCB2 is a phosphodiesterase that catalyzes the hydrolysis of phosphatidylinositol 4, 154 18 5-bisphosphate to the secondary messengers inositol 1,4,5-trisphosphate (IP3) and diacylglycerol. 155 In addition, the gene PLA2G12B was exclusively expressed in the BH group. PLA2G12B 156 (phospholipase A2 group XIIB) is encoded by this gene and belongs to the phospholipase A2 157 (PLA2) group of enzymes, which plays a role in lipid hydrolysis by releasing free fatty acids and 158 lysophospholipids [52] . Studies have shown that a reduction of PLA2G12B decreases the amount of 159 serum triglyceride (TG)-rich VLDL particles secreted by the liver, resulting in a reduction in TG 160 content [53] . PLA2G12B can also participate in the pathogenesis of idiopathic membranous 161 nephropathy (iMN) by regulating lipid metabolism [54] . In a previous study by Guan,162 PLA2G12B-null mice had obvious accumulation of large lipid droplets in the liver, displaying the 163 fatty liver phenotype [55] . These results indicate that the genes exclusively expressed in the high or 164 low backfat groups may also have a certain regulatory effect on lipid metabolism. 165 Biological replicate and pooling RNA-seq are two conventional methods used in RNA-seq 166 experiments. In this study, biological replicate RNA-seq detected more lncRNAs and genes than 167 did pooling RNA-seq. Few unique lncRNAs were identified in the latter. However, in the function 168 annotation, the unique genes of both sequencing methods did not enrich the GO terms and KEGG 169 pathways related to glucose and lipid metabolism. To a certain extent, this indicates that the 170 unique genes have no direct influence on fat deposition in Landrace pigs. These results illustrate 171 that when RNA-seq is used to excavate key genes, biological replicate RNA-seq is superior in 172 'quantity', but there is no obvious 'qualitative' difference between the two methods. We propose 173 that when the experimental conditions are constrained, such as if the cost of sequencing is too high 174 or the samples are difficult to obtain, replacing biological replicate RNA-seq with pooling 175 RNA-seq can also achieve important results. However, using RNA-seq to determine important 176