Although some types of molecular markers, such as RAPD, ISSR, SSR and ARFP, have been used in the molecular biology studies of cannabis, such as genetic diversity analysis, sex identification, and QTL mapping [9-10, 12, 27], these molecular makers are still fewer in number compared with those available for other crops. To date, the maximum number of molecular markers developed for cannabis is 3442 [9], which makes it difficult to meet the demand of genetic map construction and QTL mapping. In addition, a genome-wide survey of InDels has not yet been carried out for cannabis. In this study, 47558 InDels were identified in the cannabis genome, and the average density across the FN genome was 0.053 InDels/kb, which was much less compared to that found in other species such as human, rice, and oilseed rape [20, 30-31].
Molecular analyses like map-based gene cloning, GWAS, and MAS, rely on the availability of a large number of genetic markers with detailed information of their position on the genome. The PCR-based InDel markers are extensively applied during initial mapping for the identification of unknown genes in rice, maize, wheat, and other crops [22, 32-35]. However, all the different cannabis molecular makers reported in previous studies lacked the information pertaining to their physical position on the chromosome [9-10, 13, 27], which hindered the comprehensive molecular analysis of cannabis. In this study, 40274 InDel markers were developed with a density of 47.1 InDels/Mb; importantly, the detailed physical positions of all identified InDels on the cannabis genome were also determined, making it convenient to identify InDel markers in target genome regions, which, in turn, would help speed up map-based cloning and marker-assisted trait selection research in cannabis.
To analyse the population structure of the 115 cannabis germplasms from the varieties cultivated in China, 84 InDels distributed along the cannabis chromosomes with intervals of approximately 10 Mb were chosen for the polymorphism analysis, and 38 InDels were found to exhibit polymorphism among 3 accessions. The polymorphism rate was 45.2%, similar to the extent in chickpea (46.6%) [36], lower than found in jute (58%) [23], and higher than that in maize (18.68%) [22], which indicated that the polymorphism rate may relate to the plant species. Additionally, of the 36 InDels, 14 InDels amplifying only two fragments were selected for the genotyping of the 115 accessions. The PIC values ranged from 0.1209 to 0.6351, with an average of 0.4109, indicating that most of the InDels have a moderate range of genetic diversity, which was lower than that of SSR markers in cannabis [10]. The possible reason was that most of the InDels used in this study are biallelic (Fig. S2), while, in general, SSRs are multiallelic.
The information regarding the genetic structure of different genotypes can guide breeding programs for developing varieties with a broad genetic background. The genetic diversity of the cannabis germplasm has been analysed using two types of markers: SSR and ISSR [9, 27]. In the present study, 39 fragments were amplified using the 14 InDels, and when Delta K was at a maximum value of 2, the 115 accessions were divided into two subgroups. Evidently, most of the cultivars from North China belonged to Group Ⅰ, while most cultivars from the south belonged to Group Ⅱ (Fig. 5). Similar to results of the population structure analysis, the 115 accessions were clearly clustered into two major groups using UPGMA clustering (Fig. 4). As cannabis is an annual and photoperiod-sensitive crop, and the day length may determine the floral transition and flowering times, we suppose that the climate, influenced by the latitude and day length, is an important factor affecting the cannabis germplasm diversity. The accessions from the North China and South China were individually classified into groups I and Ⅱ, (Fig. 4 and Fig. 5), perhaps because of the higher latitude (i.e. longer day length) and low latitude plateaus, respectively, which is in agreement with the analysis of Gao et al (2014) and Zhang et al (2018) [9, 37] . In addition, both group Ⅰ and group Ⅱ included the cultivars from central China like the HeNan provinces, implying that the breeders in these areas frequently exchange cannabis germplasm resources with the breeders from the north or south regions.
Cannabis is a short-day crop, which is sensitive to photo-period. Flowering time is an important agronomic trait, which affects the content of cannabidiol (CBD) and fiber yield. Except for InDel markers, 115 cannabis genotypes were also clustered into two groups according to their flowering time, the cultivars of the groups 1 mainly origined from the South China, and most of varieties in groups 2 are from North China (Fig. 6), consistent with the results of the population structure analysis and UPGMA clustering (Fig. 4 and Fig. 5). In general, when the cannabis cultivars, originating from the north regions, were introduce to the south regions, the plants will encounter early flowering. In this study, though the cultivars ‘22’ and ‘214’ originated from the north regions of China, LiaoNing and HeiLongJiang provinces, respectively, the plants didn’t encountered early flowering when cultivated in south regions of China (HuNan province), which indicated these cultivars may be of insensitivity to photo-period. Thus, these two cultivars would be an ideal germplasm for developing wide adaptable cannabis varieties to day length.
Due to the different economical values between female and male plants, a suitable ratio of females to males individuals is important for enhancing economic efficiency. To overcome the difficulties of the accurate identification of sex through morphological methods before flowering, three types of sex-linked molecular markers including SSR, AFLP, RAPD have been detected in previous studies [11-14]. For the first time, sex-linked Indel markers have been identified in cannabis in this study. Interestingly, similar to the sex-linked SSR markers CS308 [14], the same fragments in size was presented in both female as well as male plants using I1-10, indicating these markers were not specific to the Y chromosome, which is different from the markers MADC1 to MADC3 on Y chromosome [11-13].
The sex-linked markers can provided an entry point to identifying sex-linkage sequences, which help me to find the genes involved in sex-determination and differentiation [38]. Unfortunately, there was no hits found for the I1-10 sex-linked markers that were mapped onto the sex chromosome through the blast search against the NCBI nucleotide database (Table S3). A possible reason could be that, the Y chromosome-specific fragment amplified using I1-10 was only 251bp length, which give us limited sequence information. Thus, genome working technique will be used for getting more unknown DNA regions on either side of chromosomal regions of I1-10 marker in the further studies.