Datasets Used in This Study
In this study, we obtained previously published RNA-seq data representing six transcriptomes from the GEO database (ID:GSE99749)[12]. Two sows(Tibetan pigs (TP) & Yorkshire pigs (LW))used for RNA-seq were raised in the Tibet Agricultural and Animal Husbandry College Farm, under the same dietary and drinking water standards. After 60 days of fertilization, we collected the longissimus dorsi muscle tissue of the embryo. Divided into two groups (Tibetan pigs (TP) & Yorkshire pigs (LW)) according to breed. Each group randomly selected a pregnant sow and randomly took out nine embryo samples. The nine embryonic samples were randomly divided into three parts, each part as a biological replicate (each RNA library contains an equimolar ratio of RNA from the three samples.), and each group contains three biological replicates. The pig gene annotations were downloaded from ftp://ftp.ensembl.org/pub/release-91/gtf/sus_scrofa, and the non-redundant reference sequence database was downloaded from https://ftp.ncbi.nih. gov/blast/db/.
Animals and sample collection
The experimental Yorkshire pigs were provided by the National Livestock Engineering Research Center of Huazhong Agricultural University. All Yorkshire pigs were raised under the same temperature, humidity, ventilation conditions and feeding standards. After fasting for 12 hours, three sows 55 days of gestation were randomly selected and euthanized by electric shock and rapid bleeding. Then, we collected 3 embryos from each sow for a total of 9 embryos. Collecting the longissimus dorsi muscle of the embryo and stored them in liquid nitrogen for later use.
RNA-Seq Reads Mapping and Initial assembly
We use FastQc to evaluate the quality of sequencing reads in the data. Low quality reads were removed using Trimmomatic (version 0.3.2)[38] with the default parameters. The clean data obtained were aligned to the Sus scrofa genome (SusScrofa11.1) from University of California Santa Cruz (UCSC) using the HISAT2 (version 2.0.1) default parameters. Sorting mapped reads and remove duplicates via SAMtools(version 0.1.19)[39]. In addition, we assembled the read map using the default parameters of StringTie (version 1.2.2)[40]. At the same time, we set the StringTie's "-G" option for the novel transcript assembly. Finally, we used the merge function of StringTie software to combine the transcription files from six samples (GTF format) into a non-redundant transcriptome file.
Pipeline for lincRNA Identification
361 lincRNAs were screened based on established pipelines in our laboratory [41]. The complete pipeline follows the one shown in Figure 1A. Step 1, Transcripts representing intergenic transcripts classified as "U" were screened using the gffcompare program StringTie (version1.2.2). Step 2, based on the transcript characteristics of lincRNA, transcripts with transcript lengths greater than 200 bp and exon numbers greater than 2 were screened. Step 3, the coding potential of lincRNAs were verified using a coding potential calculator (CPC) tool[42], and the lincRNAs with CPC value < 0 was entered into the downstream screening program. Step 4, to improve the accuracy of lincRNA screening, we predicted the coding potential of transcripts from multiple perspectives. We translated the transcript into six possible protein sequences, which were then transcribed and compared to the Pfam database. Finally, no Pfam hit (E value <1e-5) transcripts are retained. Step 5, Transcripts were aligned with NCBI NR and UniRef90 databases using the BLASTX program, and transcripts of similar proteins in known proteins were filtered (E value <1e-5). Step 6, to minimize the chance of false positives, transcripts that were not expressed (FPKM values) in all samples were removed.
Comparison of characteristics between protein- coding gene and lincRNA
We selected 45788 "gene_biotype = protein_coding" transcripts from the pig's genome annotation file (SusScrofa 11.1) to define them as transcripts of protein coding genes. In addition, we use the "blastn" instruction to divide lincRNA into known lincRNA and novel lincRNA. We then identified and compared the transcript lengths, exon lengths, exon numbers, and FPKM averages for these three categories.
Differential expression analysis of lincRNA
We used the counting software HTSeq[43] to count the number of reads in the six samples, and then divided the six samples into two groups according to the variety, namely TP and LW, and compared them in R using the "Deseq2" package[44]. The gene of |log2FoldChange|>1, padj<0.05 is a differentially expressed gene. Then, it is calculated by taking the intersection of the potential lincRNA obtained from the pipeline to obtain the differentially expressed lincRNA(DELincRNA), at the same time, it is intersected with the protein coding gene expressed in the sample to obtain a differentially expressed protein encoding gene.
lincRNA target gene prediction
Consistent with previous research [41]. Because lincRNA can regulate its potential target genes in cis. Based on this, we used bedtools software (version 2.17.0) to search protein-coding gene in the 100 kb upstream and downstream of the lincRNA locus, and used R to calculate the Pearson correlation coefficient between DEL and protein-coding genes. Finally, protein-coding gene with a correlation coefficient greater than 0.9 were identified as potential target genes for DEL. At the same time, if the protein-coding gene is differentially expressed in the two groups, this protein-coding gene is a potential target gene for differential expression.
Function enrichment analysis
Due to the limitations of pig genome annotation, this study included background human orthologous genes[45]. After transforming the pig gene into a human gene in the Ensembl website, gene ontology (GO) and the Kyoto Gene and Genomic Encyclopedia (KEGG) pathway enrichment analysis were performed in Metascape[17]. Subsequent selection of p value less than 0.05 is a valid result.
Prediction of DELs function by QTL
In this study, the pig QTL annotation file was downloaded from the animal QTL database, and the location information of DEL was proposed in the non-redundant transcription file according to the ID of DEL. After that, we performed QTL mapping on DEL using the BEDTools (version 2.17.0).
Correlation verification between DEL and its PTG
We used the longissimus dorsi muscle from nine 55-day-old embryos of Yorkshire pigs and performed RT-qPCR to verify the expression correlation between DELs and PTGs. For quantitative verification, in the first step, total RNA was extracted using Trizol reagent (Invitrogen, Life Technologies, CA, USA) and performed according to the manufacturer's instructions. In order to prevent the degradation of RNA, before cDNA synthesis, we measured the purity and concentration of total RNA at 260 and 280 nm with a microphotometer (Thermo, NanoDrop 2000, United States). At the same time, we conduct a gel electrophoresis test to detect whether the RNA is degraded. There are usually three frequency bands, of which 28S and 18S are clear, and the brightness ratio is about 2:1, indicating that there is no degradation. Next, cDNA synthesis was performed using the RevertAid First Strand cDNA Synthesis Kit (Thermo, Wuhan, Cat#k1622).According to the manufacturer's instructions, qPCR for DELs and PTGs detection in Roche LightCyler 480 system (Roche, Mannheinm, Germany) was performed using SYBR Green (CWBIO, Beijing, China, CW0957). Ten pairs of RT-qPCR primers were designed using the Primer 5 program (Table.S5, Table.S6). 18S rRNA is used as an endogenous control gene. The RT-qPCR data were analyzed using the 2-△△CT method and R scripts were used to perform related linear regression analysis.