Nanopore Long-Read Transcriptomics Proling Reveals Gene Expression Signatures of Mouse Tumor Endothelial Cells 2H-11

Background:Tumor endothelial cells (TECs) play an indispensable role in tumor growth and metastasis. Compared with normal endothelial cells (NECs), TECs exhibit unique phenotypic and functional heterogeneity in terms of metabolism, genetics, and transcriptomics. It is not only the key to coordinate tumor angiogenesis, but also an important factor of immune regulation in the tumor microenvironment. In recent years, the role of TECs in tumor metabolism and invasion has been continuously reported. However, the research on the mechanism behind the complex functions of TECs is still at the basic stage. We use Oxford Nanopore Technology (ONT) three-generation full-length transcriptome sequencing to detect all genetic structural changes in the transcriptome of mouse TECs 2H-11 and mouse NECs SVEC4-10. Results: In Tumor endothelial cells 2H-11,1847genes are up-regulated and 1202 genes are down-regulated. According to the Gene ontology (GO) enrichment analysis of differentially expressed genes (DEGs), we found that different functional trends related to metabolic processes, developmental processes, localization, immune system processes, and locomotion are the main reasons for the differences. DEGs are mainly enriched in signal pathways related to cancer, immunity and metabolism, involving Pathways in cancer,Antigen processing and presentation , Proteoglycans in cancer, Focal adhesion, MAPK signaling pathway ,Protein digestion and absorption(cid:0)ECM-receptor interaction(cid:0)PI3K-Akt signaling pathway and Glutathione metabolism. We also obtained the structural variation of transcripts such as alternative splicing, gene fusion, and alternative polyadenylation and accurately quantied the expression of the transcript. Some of our results have been conrmed in other documents. But other data have not been reported yet, which is the focus of our future exploration. Conclusion: We try to use transcriptomics and bioinformatics methods to characterize tumor endothelial cell-related genes and signaling pathways.It could help better understand the molecular mechanisms of tumor endothelial cells involved in tumorigenesis and development. DEGs in key pathways may be potential diagnostic markers or therapeutic targets of TECs. Our data also provide Our results provide the to study tumor endothelial cells in cell metabolism, participation in angiogenesis and other mechanisms behind the occurrence Our to to use experiments and bioinformatics methods

related to tumor endothelial cells, and to better understand the molecular mechanisms of tumor endothelial cells involved in tumorigenesis and development.

Data description
The long-read transcriptome data presented here were from mouse normal endothelial cells SVEC4-10 and mouse tumor endothelial cells 2H-11 culture under laboratory condition,with three samples each. (Sample H1, H2, H3, from TECs 2H-11 and Sample S1, S2, S3 from NECs SVEC4-10). Totally,7,145,224,5,738,625 and 8,308,634 clean reads were generated from three samples of 2H-11, with an average read length of 1186 bp 1047bp and 916 bp and an N50 of 1560bp,1419 bp and 1219 bp. 5,487,487, 4,739,190 and 21,496,454 clean reads were generated from three samples of SVEC4-10, with an average read length of 1191 bp,1188bp and 1086 bp and an N50 of 1527bp,1500 bp and 1380 bp, respectively (Table. 1). For both 2H-11 and SVEC4-10 the length of clean reads was distributed among 1 kb~10+ kb, and the most abundant length was 1 kb (Fig. 1A and B ).In addition, the average quality (Q) scores of majority of clean reads of 2H-11 and SVEC4-10 were Q13 ( Fig. 2A and B). After discarding rRNA, 6,905,496 , 5,509,170 , 7,939,379 , 5,312,938 , 4,535,096 and 20,822,758 clean reads were gained, and among them 89.10%, 88.57%, 88.66%, 88.96%, 87.27% and 89.23% were identi ed as being full-length (Table. 2). The length of full-length clean reads from 2H-11 and SVEC4-10 was distributed among 1 kb~10+ kb; the most abundant lengths for both were 1 kb ( Fig. 3A and B).After removing redundant reads, the lengths of remaining full-length transcripts of both 2H-11 and SVEC4-10 were ranged from 1 kb to 9 kb, with the largest group of 1 kb ( Fig. 4A and B).

DEG identi cation and selection
Transcripts of differentially expressed genes (DEGs) were characterized using isoform sequencing (Iso-Seq) analyses of 2H-11 cells and SVEC4-10 cells. The results revealed obvious differences between 2H-11 cells groups and the SVEC4-10 groups. We discovered 1847 up-regulated and 1202 down-regulated DEGs in the 2H-11 cells. The hierarchical clustering heat map, MA plot, and volcano plots were generated to represent the up-and down-regulated genes (logFC ±2 and p<0.001 ). Fig. 5A represents the heatmap of up-and down-regulated genes in red and green, respectively. The volcano plot (Fig. 5B) and the MA plot ( Fig. 5C) visualize the differences between measurements taken in SVEC4-10 cells and 2H-11cells DEGs.

Gene annotation and functional classi cation of DEGs
All the DEGs were uploaded to the GO Enrichment Analysis tool and database for annotation. The biological processes (BPs), cellular components (CCs), molecular functions (MFs), and pathways were predicted in the signi cantly enriched GO terms of the differentially express genes (Fig. 6). The enrichment trend of some DEGs in these GO secondary functions is different from the enrichment trend of all genes. It is important to analyze whether these functions are related to the difference.
Alternative splicing pro les The main types of gene alternative splicing (AS) are as follows: (A) Exon skipping; (B) Alternative 3'splicing site; (C) Mutually exclusive exon; (D) Alternative 5'splice site; (E) Intron retention. From the analysis results of the Astalavista software, We collected statistics on the occurrence of the above 5 alternative splicing events in the transcripts of 6 samples, and obtained a statistical graph of the number of predicted alternative splicing events in each sample. (Fig. 9) After analysing the data of 2H-11 and SVEC4-10, we detected Exon skipping was the most frequent AS events and Mutually exclusive exon was the rarest AS events. The prevalence of different AS events was similar between 2H-11 and SVEC4-10, indicating relevant pathogenesis of these two types of cells. It is worthy that one gene might possess several alternative splicing patterns.For each sample and every possible splice event, percent-splice-in (PSI) value was calculated, which is the ratio of normalized read counts indicating inclusion of a transcript element over the total normalized reads for that event (both inclusion and exclusion reads).
Because the number of samples>=4, PSI-Sigma uses delta PSI>0.1 and Pvalue<0.01 to lter by default to get the ltered differential alternative splicing events. (Table. 12) Changes in average PSI values when comparing groups of samples mean a shift in splicing patterns between the groups or a splice event. Differentially spliced genes (DSGs) between 2H-11 and SVEC4-10 were analysed and the top 20 signi cantly altered AS events were summarized in Table. 12. A3SS(Alternative 3'splice site) and SES(Single exon skipping) were dominant AS types, and several genes (Gm20425, H2-K1, Zscan21, Npm1) demonstrated two AS events with opposite preference in 2H-11 and SVEC4-10. It is obvious that most AS patterns were more active in TECs than NECs. Moreover, the results demonstrated signi cant difference of AS patterns among different stages of 2H-11 and SVEC4-10 (all FDR<0.001), which indicate that certain alternative splicing patterns might show signi cant differences between TECs and NECs.
Alternative polyadenylation (APA) analysis Polyadenylation refers to the covalent linkage of polyadenylic acid to messenger RNA (mRNA) molecules.
In the process of protein biosynthesis, this is part of the way to produce mature mRNA ready for translation. In eukaryotes, polyadenylation is a mechanism that interrupts mRNA molecules at their 3'ends. The polyadenylic acid tail (or poly A tail) protects mRNA from exonuclease attack.And it is very important for the termination of transcription, export of mRNA from the nucleus and translation. The alternative polyadenylation (APA) of precursor mRNA may contribute to the diversity of the transcriptome and the coding ability of the genome and the regulatory mechanism of genes. We use TAPIS pipeline to identify APA. [27] The APA identi ed by each sample is shown in Fig. 10A and B.Moreover,we use DREME to analyze the sequence of 50bp upstream of the polyA site of all transcripts and the identi ed motif is shown in Fig. 10C and D.

Find fusion transcript
Use the consensus sequence before de-redundancy to screen each sample for fusion transcripts according to the following conditions:(1) must map to 2 or more loci; (2) minimum coverage for each loci is 5% and minimum coverage in bp is >= 1 bp; (3) The total length of all loci compared must account for more than 95% of the total length of the transcript; (4) The distance between the two loci must be more than 10kb.The fusion transcripts obtained from 6 samples ranged from 33 to 108.

Sequence prediction of coding region of new gene
The TransDecoder (v3.0.0) software is based on the length of the Open Reading Frame (ORF), the loglikelihood score, the comparison of the amino acid sequence and the protein domain sequence of the Pfam database, etc. It can identify reliable potential coding sequence (Coding Sequence, CDS) from the transcript sequence. Use TransDecoder software to predict the coding region sequence and its corresponding amino acid sequence of the new transcript obtained. A total of 19,021 orf were obtained this time, of which 9,954 were complete orf. The predicted sequence length distribution of the entire ORF region encoding protein is shown in the Fig. 11.

Transcription factor analysis
Transcription factors are proteins that can bind to a speci c nucleotide sequence upstream of a gene. These proteins can regulate the binding of RNA polymerase to the DNA template, thereby regulating gene transcription. Animal transcription factor identi cation uses the animal transcription factor database-animalTFDB 3.0 [12]. This project predicts a total of 2,047 transcription factors from the new transcripts long noncoding RNA (LncRNA) prediction Because lncRNA does not encode protein, it is possible to determine whether the transcript is an lncRNA by screening the transcript for coding potential. Four methods of CPC [13] analysis, CNCI [14] analysis, CPAT [15], pfam[16] protein domain analysis were used to predict the newly discovered transcripts for lncRNA. A total of 954 lncRNA transcripts were predicted by the four methods. In order to visually display the analysis results, the noncding transcripts identi ed by the above 4 analysis softwares are added to the 4 analysis results to take the intersection. Then according to the position of the lncRNA on the reference genome annotation information (gff), the lncRNA is classi ed as shown in Fig. 13.

Discussion
In recent years, a few studies have used popular high-throughput gene expression technologies, microarrays and serial analysis of gene expression (SAGE) [34][35][36][37][38][39] to have a more complete understanding of individual TECs functions. The detection method of analyzing the entire gene expression pro le or the global transcriptome at the single-cell level has gradually attracted attention [40][41][42][43]. In the eld of tumor angiogenesis, some single-cell RNA sequencing (scRNA-seq) studies have reported a descriptive list of previously known TECs phenotypes [44,45]. Transcription Omics studies revealed only a limited number of ECs (endothelial cells) phenotypes [46][47][48][49]. Powerful single-cell transcriptomics studies can now characterize EC phenotypes in more detail and can modify traditional views on tumor endothelium.We use the ONT three-generation full-length transcriptome to detect all gene structural changes in the transcriptome.Moreover, it accurately identify structural variations in transcripts such as alternative splicing, gene fusion, selective polyadenylation and other transcripts ( mRNA or polyA+ lncRNA) expression was accurately quanti ed. Differentially expressed gene sets were obtained and differential analysis was performed, including functional annotation and pathway analysis.
We provide comprehensive transcriptomic analysis of mouse tumor endothelial cell 2H-11 and mouse normal endothelial cell SVEC4-10 data sets. This method can provide gene expression pro les for endothelial cell phenotype changes caused by various factors in the tumor microenvironment. We screened a total of 3049 differentially expressed genes (DEGs) in TECs 2H-11 and NECs SVEC4-10. Cell metabolism, immune system processes, cell development and positioning, biological adhesion and other biological functions have different trends in DEGs and all expressed genes, suggesting that these functions may be the main reason for the differences. These functions are re ected in a wealth of signaling pathways, including Pathways in cancer, PI3K-Akt signaling pathway, Focal adhesion, MAPK signaling pathway, Antigen processing and presentation, ECM-receptor interaction and Glutathione metabolism. It is comprehensively found that TECs play a vital role in promoting tumor progression in various ways such as induceing tumor angiogenesis,changing cell metabolism and participating in immune processes.
The literature reported the characteristic gene expression pro le of breast cancer-derived ECs at the single-cell level, indicating that extracellular matrix (ECM) related genes play a key role in tumor-derived ECs. And it has been determined that some of these genes can be used as general TECs biomarkers. In addition, their DEGs are also enriched in extracellular matrix metabolism, vascular smooth muscle, drug metabolism, cancer pathways, and axon guidance related signaling pathways [50].
Our research also detected that DEGs are enriched in signal pathways such as ECM-receptor interaction, Glutathione metabolism, and Pathways in cancer. These results broaden the different roles of DEGs of TECs in the extracellular matrix. It also provides a new direction in cell metabolism. In cancer-related, we have obtained 88 differential genes. These genes can be screened again according to relevant experimental methods and veri ed through speci c experiments. Exploring the mechanism behind them is the content of our next research. Cantelmo et al. found that TECs have high glucose metabolism, which can shunt intermediate products to nucleotide synthesis. And they have identi ed a large number of metabolic genes expression pro les as metabolic targets in TECs [51]. Immature endothelial cells and apical cells have stronger glycolytic gene characteristics. It is speculated that this may be caused by a worse nutrient de ciency environment and more abundant metabolic change signals [52].In our results, endothelium. After restoring ETBR in vitro, the adhesion of T cells was restored and the ability of T cells to home to tumors was also signi cantly enhanced. Up-regulation of ETBR has been observed in many cancers, including breast and ovarian cancer, and up-regulation of ETBR indicates unfavorable results [53][54][55]. The speci c mechanism of tumor endothelial ETBR up-regulation is not yet clear, but it may be related to the process of neovascularization. In our results, a total of 26 DEGs are enriched in the signal pathway of antigen processing and transmission, which suggests that TECs can be used as an antigen presenting cell to participate in the initiation and activation of T cells. This function is likely to be involved in a certain process of up-regulation of ETBR in tumor endothelium. According to reports in the literature, BMP2 secreted by liver cancer cells promotes cell proliferation, migration, invasion and angiogenesis [56]. In our results, BMP2k, BMP1 and BMP4 were found to be signi cantly DEGs in 2H-11 and SVEC4-10.
These genes were enriched in TGF-beta and Transcriptional misregulatio signaling pathways respectively, and participated in tumor angiogenesis and other processes. In the literature studying whether the intratumoral microenvironment of head and neck squamous cell carcinoma (HNSC) is highly immunosuppressive, it was found that the abnormal expression of TECs in HNSC. And the secreted HSPA12B can be partially absorbed by macrophages through OLR1, leading to subsequently activate the PI3K/Akt/mTOR signaling pathway and increase the expression of M2 markers [57]. In our results, there are a total of 70 DEGs enriched in the PI3K/Akt signaling pathway. We can screen the genes related to HSPA12B by methods such as gene fusion or gene co-expression, and provide research for elucidating the crosstalk mechanism between tumor endothelial cells (TECs) and Tumour-associated macrophages (TAM). The data helps to better understand the formation of the immunosuppressive microenvironment within the tumor [57]. All of our results con rm some views in the current research eld from the direction of transcriptomics, but more importantly, we have discovered more related genes and used bioinformatics methods to predict their main functional roles and participation signal pathways.
Our results provide the possibility to study tumor endothelial cells in cell metabolism, participation in immune system processes, tumor angiogenesis and other mechanisms behind the occurrence and development of tumors. Our goal is to continue to use speci c experiments and bioinformatics methods to verify our results in the later stages.
More and more evidences show that abnormal alternative splicing (AS) is a common event in the occurrence and progression of cancer [58]. Abnormal pre-mRNA alternative splicing has been widely accepted as a new contributor to cancer development [59,60]. Although many cancer-speci c mRNA isotypes have been identi ed, our understanding of the spectrum of alternative splicing events and their functional pathways is far behind. With the rapid development of high-throughput sequencing and bioinformatics methods, it is possible to more comprehensively reveal AS in cancer. Different splicing patterns of a gene lead to multiple subtypes, which makes alternative splicing and its regulatory mechanism in cancer more complicated [61,62]. We obtained the alternative splicing types of 2H-11 and Alternative 5'splice site; (E) Intron retention (Fig. 9). After analysing the data of 2H-11 and SVEC4-10, we detected Exon skipping was the most frequent AS events and Mutually exclusive exon was the rarest AS events. The prevalence of different AS events was similar between TECs and NECs, indicating relevant pathogenesis of these two types of cells. It is worthy that one gene might possess several alternative splicing patterns. Differentially spliced genes (DSGs) between 2H-11 and SVEC4-10 were analysed and the top 20 signi cantly altered AS events were summarized in Table 12. A3SS(Alternative 3'splice site) and SES(Single exon skipping) were dominant AS types, and several genes (Gm20425, H2-K1, Zscan21, Npm1) demonstrated two AS events with opposite preference in TECs and NECs. According to reports, the ve most widely used spontaneously metastatic immune activity models (AT-84, SCC VII, MOC2, MOC2-10, and 4MOC2) were sequenced and a comprehensive genome analysis was performed to analyze the patient's tumors and utility evaluation. The results found that these ve models have genetic variations in 10 cancer markers and 14 signaling pathways and mechanisms (metabolism, epigenetics, and immune evasion), with similar degrees in patients. Immune de ciencies in HLA-A (H2-Q10, H2-Q4, H2-Q7, and H2-K1), Pdcd1, Tgfb1, Il2ra, Il12a, Cd40, and Tnfrsf14 were identi ed [63]. In our results, in addition to the same alternative splicing subtypes H2-Q4, H2-Q7, H2-k1, we also detected the following alternative splicing subtypes H2-D1, H2-T2, H2-T22, H2-T24, H2-DMa. It indicated that the subtypes obtained by alternative splicing of TECs may play an important role in the tumor progression of immunode ciency, and the newly detected subtypes are likely to play an importment role with the known subtypes or independently play a functional role that has not yet been studied.
It is obvious that most AS patterns were more active in TECs than NECs. Moreover, the results demonstrated signi cant difference of AS patterns among different stages of TECs and NECs (all FDR<0.001).It reveals that certain alternative splicing patterns might show signi cant differences between TECs and NECs.In cancer vessels, TECs express several atypical splicing isoforms not expressed (or expressed at low levels) in NECs, which could represent putative targets for anti-angiogenic therapy [64]. Many of our results have not been studied in cancer and their biological functions are unclear. Therefore, alternative promoters, alternative terminators, and misregulation of exon skipping of these genes are promising research directions for elucidating the complex mechanisms of tumorigenesis and development using TECs as an entry point in the future.

Conclusion
In this study, we have shown the feasibility of developing transcriptomics pro ling from viable single endothelial cells. Nanopore long-read transcriptome analysis shows the gene expression pro le between tumor endothelial cells (TECs) and normal endothelial cells (NECs). Our results also accurately identify the structural variations of the transcript, such as alternative splicing, gene fusion, alternative polyadenylation, etc. We also accurately quanti ed the expression of transcripts and made simple predictions for new genes, LncRNA and transcription factors. Our research helps to understand the potential genes and pathways that may target TECs in future treatment research and expand the understanding of TECs progression.In addition to of this, our current results have limitations. So our next plan is to perform functional veri cation and mechanism clari cation of some results based on the joint application of speci c experimental techniques and bioinformatics. Although the TECs we used were derived from mouse, they have high homology between mouse and humans. It is completely meaningful for us to use mouse for experimental design.Understanding cell heterogeneity and regulatory changes under disease conditions is the basis for successful medical treatment development. Our transcriptomics analysis provides a wealth of candidate molecules for the development of targeted methods to counter CO2. SVEC4-10 cells are grown in DMEM (PM150210) containing 10% FBS(164210-500)and 1% P/S(PB180120) at 37°C and 5% CO2.
RNA isolation, cDNA library construction and Nanopore sequencing Firstly, total RNA of 2H-11 three samples and SVEC4-10 three samples were respectively isolated using TRizol reagent (Thermo Fisher, Shanghai, China) on dry ice.1ug total RNA was prepared for cDNA libraries using cDNA-PCR Sequencing Kit (SQK-PCS109) protocol provided by Oxford Nanopore Technologies(ONT). Brie y ,the template switching activity of reverse transcriptases enrich for full-length cDNAs and add de ned PCR adapters directly to both ends of the rst-strand cDNA. And following cDNA PCR for 14 circles with LongAmp Tag (NEB). The PCR products were then subjected to ONT adaptor ligation using T4 DNA ligase (NEB). Agencourt XP beads was used for DNA puri cation according to ONT protocol. The nal cDNA libraries were added to FLO-MIN109 owcells and run on PromethION platform at Biomarker Technology Company (Beijing, China).

Oxford Nanopore Technologies Long Read Processing and remove redundant
The original data format of Nanopore sequencing off-machine data is the second-generation fast5 format that contains all the original sequencing signals. After base calling is performed through the Guppy software in the MinKNOW2.2 software package, fast5 format data will be converted to fastq format for subsequent quality control analysis.
(1) After the original fastq data is further ltered for short fragments and low-quality reads, the total Clean Data is obtained. Clean reads were ltered with minimum average read Q score=6 and minimum read length=500bp; ribosomal RNA were discarded after mapping to rRNA database.
(2) According to the principle of cDNA sequencing, Full-length non-chemiric (FLNC) transcripts were determined by searching for primer at both ends of reads.
(3) Clusters of FLNC transcripts were obtained after mapping to reference genome with mimimap2, and consensus isoforms were gained after polishing within each cluster by pin sh (https://github.com/nanoporetech/pin sh); consensus sequences were mapped to the reference genome (assembly AAP 1.0) using minimap2.
(4) Mapped reads were further collapsed by cDNA Cupcake package (https://github.com/Magdoll/cDNA_Cupcake) with min-coverage=85% and min-identity=90%; and alignments with differences only in the 5'exons are merged for alternative splicing analysis. Integrating the de-redundant results of each sample, we nally got 65,462 non-redundant transcript sequences.
Gene functional annotation and Quanti cation of gene expression levels and Differential expression analysis Gene function was annotated based on the following databases: NR (NCBI non-redundant protein sequences);Pfam (Protein family); KOG/COG/eggNOG (Clusters of Orthologous Groups of proteins); Swiss-Prot (A manually annotated and reviewed protein sequence database); KEGG (Kyoto Encyclopedia of Genes and Genomes) and GO (Gene Ontology).Full length reads were mapped to the reference transcriptome sequence. Reads with match quality above 5 were further used to quantify. Expression levels were estimated by reads per gene/transcript per 10,000 reads mapped.Use the DESeq2 R package (1.6.3) to analyze the differential expression of six groups of three samples each of 2H-11 and SVEC4-10.
DESeq2 provide statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P values were adjusted using the Benjamini and Hochberg's approach for controlling the false discovery rate. Genes with a FDR < 0.01 and fold change ≥ 2 found by DESeq2 were assigned as differentially expressed.

Functional enrichment analysis
Gene Ontology (GO) enrichment analysis of the differentially expressed genes (DEGs) was implemented by the GOseq R packages based Wallenius non-central hyper-geometric distribution [21], which can adjust for gene length bias in DEGs.
Kyoto Encyclopedia of Genes and Genomes (KEGG) [22] is a database resource for understanding highlevel functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (http://www.genome.jp/kegg/). We used KOBAS [23] software to test the statistical enrichment of differential expression genes in KEGG pathways.

Gene Set Enrichment Analysis
Gene Set Enrichment Analysis (GSEA)can perform enrichment analysis on all genes based on the expression of all genes without prior experience. General difference analysis usually only focuses on some signi cant up-regulated or down-regulated genes, and this will miss some genes that are not signi cantly differentially expressed but have important biological signi cance. The GSEA does not set a difference threshold and can detect weak but consistent trends. In this project, the GSEA-P software, MSigDB 1.0 [24]is used for GSEA analysis, the KEGG pathway and the gene set of the BP, CC, and MF branches of GO are used as the gene set of interest, and the log2FC of each difference group is used as the background gene set. Score to analyze the enrichment of the gene set of interest, and nally control pvalue or FDR as a signi cantly enriched gene set. Here, according to the analysis results of GSEA, we select the rst 5 GO nodes/pathway of the enrichment as the enrichment map.

Structure analysis and Transcription factors prediction
Transcripts were validated against known reference transcript annotations with gffcompare [25].
Alternative splicing events including Intron retention(IR), Exon skipping(ES), Alternate Donor site (AD), Alternate acceptor site (AA) and Mutually exclusive exon (MEE) were identi ed by the AStalavista tool [26]. Alternative polyadenylation analysis was conducted with TAPIS [27]. The reliable potential Coding Sequence (CDS) is obtained by Trans Decoder (v3.0.0) software[28] based on the length of Open Reading Frame (ORF), Log-likelihood Score, amino acid sequence and protein domain sequence of Pfam database. A total of 19,021 ORFs were obtained, including 9,954 complete ORFs.Animal transcription factors were identi ed from animal TFDB [29]. lncRNA analysis Four computational approaches include CPC [30] /CNCI [31] /CPAT [32] /Pfam [33]/ were combined to sort non-protein coding RNA candidates from putative protein-coding RNAs in the transcripts.Putative proteincoding RNAs were ltered out using a minimum length and exon number threshold. Transcripts with lengths more than 200 nt and have more than two exons were selected as lncRNA candidates and further screened using CPC/CNCI/CPAT/Pfam that have the power to distinguish the protein-coding genes from the non-coding genes.

Statistical analysis
Data are expressed as the mean ± standard deviation. Two-tailed Student's t-test was used to assess the difference between two groups.One way analysis of variance followed by Dunnett's multiple comparisons test was used to determine differences between groups. P < 0.05 was considered to indicate a statistically signi cant difference.Statistical analysis was performed using SPSS software version 13.0 (SPSS, Inc., Chicago, IL, USA).
The following tables appear at the end of the pathway analysis in the results.      Figure 1 Length distribution of clean reads generated from Nanopore sequencing. A Clean reads of 2H-11. B Clean reads of SVEC4-10.

Figure 2
Average quality distribution of clean reads derived from Nanopore sequencing. A Clean reads of 2H-11.B Clean reads of SVEC4-10.

Figure 3
Length distribution of full-length reads. A Full-length reads of 2H-11. B Full-length reads of SVEC4-10.  indicates higher expression, and green indicates lower expression. S1, S2 and S3 indicate three biological replications of SVEC4-10 cells. H1, H2 and H3 indicate three biological replications of 2H-11 cells. B The volcano plot was constructed by plotting the negative log of the log10 FDR (false discovery rate) ) value on the y-axis.This results in data points with low log10 FDRvalues (highly signi cant) appearing toward the top of the plot. The x-axis is the log2FC(fold change) between the two cells (SVEC4-10 and 2H-11). C MA plot visualizes the differences between measurements taken in SVEC4-10 cells and 2H-11cells DEGs, by transforming the data into M (log ratio), A (mean average) scales and log2FPKM (fragments per million), log2FC, then plotting these values. The green dots represent down-regulated DEGs, the red dots represent up-regulated DEGs, and the black dots represent non-differentially expressed genes.

Figure 6
Differentially expressed genes in GO annotation classi cation statistics map.The abscissa is the GO classi cation, the left side of the ordinate is the percentage of the number of transcripts, and the right side is the number of transcripts. This gure shows the transcript enrichment of each secondary function of GO in the background of differential expression transcripts and the background of all transcripts, re ecting the status of each secondary function in the two contexts, and the description of the secondary functions with obvious differences in proportions. The enrichment trend of differentially expressed transcripts is different from that of all transcripts, so we can focus on analyzing whether this function is related to differences.

Figure 7
Differentially expressed genes KEGG pathway enrichment statistics map. A The y-axis represents the enriched KEGG pathways and the horizontal axis represents the rich factor. The higher the rich factor, the more signi cant the enrichment. The diameter of the sphere represents the number of unigenes in each pathway. The colour indicated correct P value (q value) on multiple hypothesis testing. The lower the q value, the more reliable the testing.B The colors of the radial lines represent the types of different pathways (pink: Focal adhesion; yellow: Hepatitis C; green: NOD-like receptor signaling pathway; blue: Pancreatic secretion; purple: Renin secretion).The size of the center circle represents the number of genes enriched in the pathway.The small circles of different colors represent the fold change(FC) between 2H-11 and SVEC4-10.   Predicted CDS-encoded protein length distribution map.

Figure 12
Transcription factor type distribution. The abscissa is the classi cation of transcription factor families, and the ordinate is the number of transcription factors.