Fresh leaves, stems, and strobiles of one Juniperus squamata individual were collected from Kangding, Sichuan Province, China. For each tissue, the short paired reads were sequenced by Illumina platform. We also mixed the samples of each tissue and generated the long reads by the PacBio Sequel platform. Total RNA of the samples was isolated using the Plant RNA kit (Omega bio-Tech., USA) and then treated with RNase-free DNase I (NEB) to remove DNA. RNA degradation and contamination were monitored on 1% agarose gels and RNA purity was checked using the NanoPhotometer® spectrophotometer (IMPLEN, CA, USA). RNA concentration was measured using Qubit® RNA Assay Kit in Qubit® 2.0 Fluorometer (Life Technologies, CA, USA). RNA integrity was assessed using the Bioanalyzer 2100 system (Agilent Technologies, CA, USA). The Single-molecule real-time (SMRT) bell library was constructed with the Pacific Biosciences DNA Template Prep Kit 2.0 and SMRT sequencing was then performed on the Pacific Bioscience Sequel System. The sample used for Illumina sequencing was harvested using the same methods. The library was constructed using Illumina HiSeq X Ten. Adapter clipping and quality filtering of the Illumina raw reads was done using Trimmomatic version 0.36 [10]. Based on the quality check, the last two base pairs from each read were removed to minimize the overall sequencing error.
The raw full-length transcriptome sequencing data of samples were processed using the SMRT link version 4.0 software (https://www.pacb.com/support/softwaredownloads). Subread BAM files were generate from raw reads, parameters: -minLength 200, -minReadScore 0.75. Circular consensus sequence (CCS) was generated from subread BAM files, parameters: -min_length 50, -max_drop_fraction 0.8, -no_polish TRUE, -min_zscore -9999.0, -min_passes 2, -min_predicted_accuracy 0.8, -max_length 15000. CCS BAM files were output, which were then classified into Full-Length non-chimeric (FLNC) and non-full length (NFL) fasta files by examining the 5’ and 3’ adapters and the poly(A) tail. Iterative Clustering and Error Correction (ICE) algorithm was utilized to cluster FLNC fasta files to obtain cluster consensus. Quiver from SMRT link (parameters: -hq_uiver_min_accuracy 0.99, -bin_by_primer false, -bin_size_kb 1, -qv_trim_5p 100, -qv_trim_3p 30) were then utilized to polish cluster consensus sequence with NFL fasta files to obtain polished consensus sequence.
To obtain high quality corrected consensus sequence, additional nucleotide errors in polished consensus sequence were corrected using the Illumina RNA-seq data obtained from the same individual with the software LoRDEC version 0.7 [11]( parameters: -k 23 -s 3). Any redundancy in corrected consensus sequence was removed by CD-HIT version 4.6.1 [12](parameters: -c 0.95 -T 6 -G 0 - aL 0.00 -aS 0.99 -AS 30) to obtain final a set of unique transcript isoforms. Benchmarking universal single-copy orthologs (BUSCO) version 3 was used to assess the quality of final transcript isoforms [13]. The summary statistics and length distributions of the PacBio SMART sequencing are shown in Data file 1 (Table S1 and Fig. S1). The results of BUSCO are shown in Data file 1 (Table S2). All three data sets obtained and their NCBI GenBank Accession numbers are listed in Table 1.
MISA version 1.0 was employed to identify SSRs from final unique transcript isoforms of Juniperus squamata [14](parameters: definition(unit_size, min_repeats): 1-10 2-6 3-5 4-5 5-5 6-5, interruptions(max_difference_betw-een_2_SSRs): 100). Finally, 57, 393 SSRs were identified which were containing in 42, 273 sequences. The details of SSRs of Juniperus squamata, including primer sequences, SSR type, annealing temperature, product size etc., are shown in Data file 2.
Table 1: Overview of data files/sets.
Label
|
Name of data file/data set
|
File types
(file extension)
|
Data repository and identifier (DOI or accession number)
|
Data file 1
|
Summary and assessment of the data set
|
MS Word file(.docx)
|
Figshare (10.6084/m9.figshare.14572125)
|
Data file 2
|
SSRs of Juniperus squamata
|
MS Excel file(.csv)
|
Figshare (10.6084/m9.figshare.14572098)
|
Data set 1
|
js.fastq.gz
|
fastq (.fastq.gz)
|
NCBI(SRR13966305)
|
Data set 2
|
juniperus_squamata_final.fastq.gz
|
fastq (.fastq.gz)
|
NCBI(SRR13993906)
|
Data set 3
|
Juniperus_squamata_final_unique_transcript_isoforms.fastq.gz
|
fastq (.fastq.gz)
|
NCBI(SRR14000623)
|