Plant materials
The oil tea germplasm bank was established at the National Sanmenjiang Forestry Farm in Liuzhou, Guangxi, China in the 1970s. The bank consisted of 300 individuals. These individuals were from 138 families, with 1-3 individuals in every family. These families were derived from 27 provenances. The detailed information is as follows: seven provenances, TM (TieMao), QT (QiTuan), MZ (MengZai), SBL (SanBoLing), SD (SanDao), SMJ (SanMenJiang) and BY+ (BinYang), in the Liuzhou region, six provenances, LN (LiNong), JT (JiangTang), FL (FuLin), HB (HuangBao), GP (GuPao) and FM (FengMu), in the Hezhou region, three provenances, LK (LiuKui), RY (RenYong), and BY (BanYue), in the Yulin region, three provenances, BM (BaMa), DY (DouYang) and FS (FengShan), in the Hechi region, two provenances, PL (PingLe) and GL (GuiLin), in the Guilin region, two provenances, BS (BaiSe) and TY (TianYang), in the Baise region, LZ (LongZhou) in the Chongzuo region, WM (WuMing) in the Nanning region, and CX (CengXi) in the Wuzhou region. All the samples were selected across the region by the Guangxi Forestry Research Institute and were placed into the QS (pinyin initials of Chinese abbreviation of guangxi forestry research institute) population, and the samples without recorded provenances were put into the UN population. The LN provenance had the most families (31 families), while seven provenances, FM, FS, JT, LK, LZ, MZ, and RY, had only one family each. These individual trees were planted based on a randomized complete block design, with three replicates of single-tree plots at a spacing of 2.0◊3.0 m.
DNA extraction, construction of genomic libraries and high-throughput sequencing
Fresh leaves of 300 individual plants were collected, and DNA was extracted using the modified cetyltrimethyl ammonium bromide (CTAB) method. The concentration and quality of each DNA sample were assayed using 1.0 % agarose gel electrophoresis and a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA). All DNAs were diluted with TE (10 mmol L-1, EDTA 1 mmol L-1, pH 8) buffer to a working concentration of 25 ng/µl.
For construction of the genomic libraries, fresh leaves of C. oleifera var Cenruan No. 3 were harvested for genomic DNA isolation. Two libraries of 350 nt and 500 nt in length were generated according to the instructions and sequenced with the Illumina HiSeqTM 4000 platform.
Assembly of genomic sequences and identification of genomic SSRs
After adaptors and low-quality reads including more than 5 % uncertain nucleotides or 20 % low-quality bases (Q < 10) were removed from the raw reads, all clean reads were assembled into unigenes using SOAPdenovo with a kmer of 81 and extended further with GapCloser software. The Perl script MISA was employed to search for SSR loci with a minimum of 10, 6, 5, 4, and 4 nucleotides for 2-, 3-, 4-, 5-, and 6-nucleotide repeat units, respectively. Corresponding primers were designed by Primer3.
Polymerase chain reaction (PCR) amplification and SSR genotyping
Six DNAs were randomly selected as templates, and then polymorphic SSRs were screened. Twenty-two polymorphic SSRs were employed to genotype the 300 trees, these SSRs consisted of sixteen genomic SSRs, prefixed with cog to represent the first three letters of the phrase camellia oleifera genome, and six EST SSRs, prefixed with the three letters coe. The cogSSR markers were developed in this study, while the coeSSR markers were previously reported (Wu et al. 2019).
PCR was carried out in a 10 µl volume including 5 µl 2◊SanTaq PCR Mix (Sangon Biotech Co., Ltd., Shanghai, China), 1 µl DNA at 25 ng µl-1, 2 µl sterilized ddH2O, and 1 µl of each forward and reverse primer (10 µmol L-1). The PCR program was as follows: 5 min at 95°C, 33 cycles of 45 sec at 95°C, 45 sec at 60°C, and 60 sec at 72°C, and a final extension of 5 min at 72°C. The PCR products were separated with 8 % denaturing polyacrylamide gels and visualized by silver staining. The clear bands of PCR amplification were recorded for future analysis.
Genetic diversity and structure
PowerMarker V 3.25 (Liu and Muse 2005) and GenAlEx version 6.5 (Peakall and Smouse 2012) were used to calculate the parameters of genetic diversity, including the number of observed alleles (Na), effective number of alleles (Ne), observed heterozygosity (Ho), expected heterozygosity (He), Shannon’s index (I) and polymorphic information index (PIC), for every polymorphic SSR marker in the oil tea germplasm in this study. Shared allele genetic distances between samples were calculated, and then samples were clustered using the neighbor-joining method. The dendrogram was drawn with MEGA X (Molecular Evolutionary Genetics Analysis) (Kumar et al. 2018).
STRUCTURE 2.3.4 software was employed to analyze the genetic structure of the tested germplasm using an admixture model with burn-in periods of 10,000 and 100,000 MCMC replications. The K values, which represented the subpopulation number, were between 2 and 10. Each K value was run 10 times. The most likely number of K values was determined by the highest value of ΔK (Evanno et al. 2005) obtained with STRUCTURE HARVESTER v0.6.94 (Earl and vonHoldt 2012).
Development of a core collection
The R package Core Hunter version 3 was used to generate the core collection with maximized genetic diversity and allelic richness (Beukelaer et al. 2018). The sampling fractions were designated as 5 %, 10 %, 15 %, 20 %, 25 %, 30 %, 35 %, 40 %, 45 % and 50 % of the entire collection. The genetic diversity parameters, including Na, Ne, Ho, He, I and PIC, were estimated for the designated core collections mentioned above and then compared with those of the entire collection using t-tests. The smallest core collection showing nonsignificant differences from the entire collection (P ≤ 0.05) was regarded as the core collection.