Morphological and physiological characterization
We collected 21 accessions from 14 provinces of China (Table 1, Fig. 3A). Among these accessions, nine were considered to be wild cannabis while 12 were cultivated cannabis, including 10 landraces and two breeding varieties, based on experience and phenotypic characteristics observed in their original growing areas (Fig. 1). To confirm whether differences among the accessions were caused by environment or genetics, we planted wild and cultivated cannabis in the same environment (Kunming). Obvious differences were observed (Table S1 and Fig. 2), hence genetic difference generated phenotypic characteristics between wild and cultivated cannabis. One important difference was that wild cannabis yielded small seeds (ranging from 3.15 to 9.80 g with mean 6.86 g/1000 grains) compared with the large seeds of cultivated cannabis (ranging from 17.40 to 63.13 g with mean 34.24 g/1000 grains). Mature seeds from wild plants fell off the pedicel easily, and most wild seeds had an obvious fleshy caruncle at the base (elongated attachment base). Germination tests showed that the natural germination rate of wild seeds was less than 2% at room temperature, and cold (4℃) and wet stratification treatment was necessary for germination of wild seeds (Table S1).
Table 1 Sample information
Accession
name
|
Sample
ID
|
Origin / location
|
Latitude(°N)
|
Type
|
Seed weight (g/1000grains)
|
Seed coat (Yes or No)
|
W163
|
W1
|
Yunnan, China
|
27.70
|
W
|
5.63
|
N
|
W270
|
W2
|
Tibet, China
|
29.59
|
W
|
5.42
|
N
|
W606
|
W3
|
Xinjiang, China
|
43.92
|
W
|
3.15
|
Y
|
W274-C
|
W4
|
Xinjiang, China
|
43.48
|
W
|
6.91
|
Y
|
W254-B
|
W5
|
Inner Mongolia, China
|
41.50
|
W
|
9.80
|
Y
|
W594
|
W6
|
Liaoning, China
|
42.68
|
W
|
9.56
|
Y
|
W596
|
W7
|
Jilin, China
|
45.06
|
W
|
5.78
|
Y
|
W50
|
W8
|
Shandong, China
|
36.41
|
W
|
8.29
|
Y
|
W645-A
|
W9
|
Inner Mongolia, China
|
50.16
|
W
|
7.19
|
Y
|
C466
|
C1
|
Qinghai, China
|
36.50
|
L
|
48.34
|
N
|
C294
|
C2
|
Gansu, China
|
39.42
|
L
|
33.61
|
N
|
C263
|
C3
|
Inner Mongolia, China
|
42.15
|
L
|
52.45
|
N
|
C480
|
C4
|
Shaanxi, China
|
38.28
|
L
|
38.32
|
N
|
C623
|
C5
|
Shanxi, China
|
37.87
|
B
|
29.22
|
N
|
C602
|
C6
|
Shanxi, China
|
37.43
|
L
|
40.96
|
N
|
C597
|
C7
|
Jilin, China
|
45.06
|
L
|
19.32
|
Y
|
Lu'an HanMa
|
C8
|
Anhui, China
|
31.45
|
L
|
18.53
|
Y
|
Bama HuoMa
|
C9
|
Guangxi, China
|
24.15
|
L
|
17.40
|
N
|
C197
|
C10
|
Guizhou, China
|
26.66
|
L
|
25.36
|
N
|
C102-B
|
C11
|
Yunnan, China
|
24.24
|
L
|
63.13
|
N
|
YunMa1
|
C12
|
Yunnan, China
|
26.11
|
B
|
24.27
|
N
|
Purple Kush
|
PK
|
USA (https://www.ncbi.nlm.nih. gov/sra/?term=SRP008673)
|
-
|
B
|
-
|
-
|
Chemdawg
|
CD
|
https://www.medicinalgenomics. com
|
-
|
B
|
-
|
-
|
Harlequin (14569)
|
HL
|
USA (https://www.ncbi.nlm.nih. gov/sra/?term=SRR4446095)
|
-
|
B
|
-
|
-
|
Finola
|
FN
|
Finland (https://www.ncbi.nlm. nih.gov/sra/?term=SRP008673)
|
61.98
|
B
|
-
|
-
|
USO-31
|
US
|
Ukraine (https://www.ncbi.nlm.
nih.gov/sra/?term=SRP008673)
|
50.08
|
B
|
-
|
-
|
W, Wild (Judging from experience); L, Landrace (domesticated, locally adapted, traditional variety); B, Breeding variety (cultivar selected by humans for desirable traits); "-" indicate missing data.
Except for seeds from Yunnan (W1) and Tibet (W2), the other seven wild seeds all had a seed coat (camouflage covering, a thin dark brown film attached to the surface of a seed), while only two cultivated seeds from Jilin (C7) and Anhui (C8) had a small amount of seed coat (Fig. 2). Meanwhile, wild cannabis bloomed earlier than domesticated cannabis. Although the flowering time of W1 and W2 was about 55 days, the flowering time of other wild cannabis accessions was less than 35 days (Table S1). In addition, the height of the first branch, petiole length, compound leaf width and leaflet width of wild cannabis were significantly lower than those of cultivated cannabis (Fig. S1). We also observed that landraces (C1-C4, C6, C7) from high latitudes showed early flowering, early maturity, dwarf stature and almost no branches when planted at low latitude (Kunming) (Fig. S1). However, wild cannabis still produced a relatively large number of branches in Kunming.
Sequencing, variation and heterozygosity
To identify the genetic basis of wild and cultivated cannabis, we performed whole-genome resequencing for 21 Chinese accessions using Illumina Hiseq 2000 platforms. Furthermore, genome sequencing data of three marijuana and two European fiber cannabis were collected from public data (Table 1). After genotyping and stringent quality-filtering, the filtered high-quality reads were mapped back to the most contiguous and complete genome of cannabis (GCA_900626175.2) with a mapping rate varying from 89.03% to 98.57% and average 10.83× genome coverage depth (Table S2). Based on comparisons to the reference genome, we identified 18.07 million single-nucleotide polymorphisms (SNPs) located in the nine autosomes and the X chromosome for further analysis. Most SNPs (84.28%) were located in intergenic regions and only 5.09% were located in coding sequence regions (Table S3). Focusing on SNPs in coding regions between wild and cultivated cannabis from China, the ratio of nonsynonymous to synonymous substitutions for wild and cultivated cannabis was 0.746 and 0.760, respectively (Table S3). This ratio was lower than that for self-pollinated crops: Arabidopsis (0.83) [29], rice (1.29) [30], soybean (1.61) [31] and tomato (1.45) [32].
The proportion of heterozygous SNPs among the total number of SNPs of 26 samples was also calculated (Table S4). Heterozygosity in cannabis from Northwest China (including wild cannabis in Xinjiang) and foreign marijuana was relatively high (HL had the highest heterozygosity of 0.705), and the heterozygosity was higher than that of cannabis from other regions of China and European hemps. It should be noted that the well-known “Bama HuoMa” (C9), which has been cultivated as a local variety for hundreds of years in Southwest China, displayed low SNP heterozygosity, with a heterozygosity of 0.541. US and FN, two fiber hemps from Europe, had the lowest heterozygosity of 0.492 and 0.530, respectively.
Population structure of wild and cultivated cannabis
To identify genetic population structure and relationships among the 26 samples at the genomic level, we conducted principal component analysis (PCA) [33] and reconstructed a neighbor-joining (NJ) tree using the 18.07 million high-quality SNPs. In the PCA, all samples could be divided using the first and second eigenvectors into three groups: Chinese cannabis, marijuana and European fiber cannabis (Fig. 3B). This suggests that Chinese cannabis is a distinct population. Focusing on Chinese cannabis, wild cannabis was separated from cultivated cannabis, and two wild cannabis samples from Xinjiang were relatively independent of the others; they also showed geographic clustering. The NJ tree of Chinese cannabis agreed with the PCA results (Fig. 3C) [34]. Wild cannabis diverged from cultivated cannabis, and both groups were split to two branches, respectively. Wild cannabis samples from Southwest China (W1 and W2) clustered together and were relatively independent from other wild cannabis accessions. In addition, two landraces from Jilin (C7) and Anhui (C8) were clustered into one branch, which was relatively independent from other cultivated cannabis samples. This indicates that these four resources may be intermediate types between wild and cultivated cannabis. In the NJ tree, US and FN, two typical European fiber cannabis, were genetically closest to Chinese wild cannabis, especially cannabis from Xinjiang (W3 and W4).
To explore the genetic relationships among cannabis resources, we performed a structure analysis to cluster individuals into different numbers of ancestors using a block relaxation algorithm (Fig. 3D) [35]. We obtained a consistent result with PCA and the phylogenetic tree. For K=2, we found that five overseas cannabis samples and two wild cannabis samples from Xinjiang (W3 and W4) gathered in a subgroup and were distinct from other cannabis samples in China. For K = 3, Chinese cannabis could be divided into two subgroups: wild and cultivated. Two wild cannabis samples (W1 and W2) and one landrace (C8) showed hybrid lineages between wild and cultivated cannabis in China, while W3 and W4 showed an admixture between wild and foreign resources. When K = 4, two fiber hemps (US and FN) separated with the three marijuana and formed a subgroup with W3 and W4. These results suggest that cannabis resources in Xinjiang may have played an important role in the domestication and spread of cannabis.
Genetic diversity and domestication genes
Genome-wide patterns of genetic diversity for all samples were estimated using the parameter θπ [36]. As shown in Fig. 3C, there is parallel diversity among cannabis. The average diversity of Chinese wild and cultivated cannabis was 4.09×10-3 and 4.07×10-3, respectively, indicating that the diversity of the wild and cultivated resources was similar. This was comparable to the European fiber cannabis (3.59×10-3) and marijuana (4.14×10-3). The higher diversity of Chinese cannabis may be due to the lower breeding.
After exclude outliers and admixture individuals, we used the coefficient of nucleotide differentiation (FST) and the difference in nucleotide diversity across populations (Δπ) to scan for positively selected signals on the chromosomes. The X chromosome is more sensitive to domestication history and selective effects than autosomes [37, 38]. For each method, the top 0.5% windows of autosomes and the X chromosome were separately picked for gene annotation. Overall, we identified 75 common positive selection genes (PSGs) using FST (458 PSGs) and Δπ (203 PSGs) (Table S5, Fig. 4).
Among the 75 common PSGs, three were related to flowering. CENTRORADIALIS (CEN)-like protein 1 (encoded by CET1) is strongly expressed in developing inflorescences in Arabidopsis and Antirrhinum [39, 40]. Its overexpression delays flowering and alters flower architecture in Hevea brasiliensis [41]. Histone-lysine N-methyltransferase (SUVR5) mediates H3K9me2 deposition and affects flowering time by binding partner lysine-specific histone demethylase 1 homolog 1 (LDL1) [42]. Nuclear poly(A) polymerase 4 (PAPS4) creates the 3'-poly (A) tail during maturation of pre-mRNAs that affects mRNA stability [43]. Overexpression of PAPS4 results in earlier flowering and reduces FLOWERING LOCUS C (FLC) expression in Arabidopsis [44].
We also identified five PSGs related to the growth and development of plants. 3-Hydroxy-3-methylglutaryl-coenzyme A reductase 1 (HMGR1) is the key limiting enzyme in phytosterol biosynthesis in plants [45]. Overexpression of HMGR1 results in elevated sterols, early flowering, increased stem height, increased biomass and increased total tuber weight in Solanum tuberosum [46]. Tetratricopeptide repeat protein (PYG7) is localized to the stromal thylakoid and essential for photosystem I assembly in Arabidopsis [47]. The serine/threonine protein kinase Constitutive Triple Response 1 (CTR1) is a negative regulator of the ethylene response pathway in Arabidopsis [48]. Ethylene is important for plant growth, development and stress responses [49]. Zinc finger CCCH domain-containing protein 2 (TZF4), a transcriptional regulator, affects seed germination by controlling genes critical for ABA and GA responses in Arabidopsis [50]. AT-rich interactive domain-containing protein 5 (ARID5) is a subunit of a plant-specific imitation switch complex and regulates development and floral transition in Arabidopsis [51].
Furthermore, we identified four genes related to stress responses. Sensitive to proton rhizotoxicity 1 (STOP1), a zinc finger transcription factor, regulates various stress tolerances in Arabidopsis. For example, STOP1 is activated to rapidly inhibit root cell elongation under external phosphate limitation conditions [52]. It is also crucial for proton and aluminum tolerance in Arabidopsis [53]. Additionally, it reduces the expression of CBL-interacting protein kinase 23 (CIPK23) to regulate potassium (K+) homeostasis under salt and drought stress [54]. Homeobox-leucine zipper protein (HAT22, also named ABIG1), is up-regulated in response to drought and abscisic acid treatment in Arabidopsis [55]. Its overexpression reduces the chlorophyll content of seedlings and causes earlier onset of leaf senescence in Arabidopsis [56]. Microrchidia 2 (MORC2) contributes to resistance against disease and pathogen-associated molecular immunity triggered by R proteins [57, 58]. The K+ channel encoded by KAT3, also known as AtKC1, is a Shaker-like K+ channel that regulates the uptake and biomass allocation of K+ in Arabidopsis roots under low K+ stress [59].
Flowering time and flowering gene expression
Although we have counted the flowering time of cannabis from different latitudes of China under natural short-day (SD) conditions of Kunming (Table S1), it is necessary to study the flowering response of different cannabis under long-day (LD) conditions. We found that all wild cannabis showed flower buds in the first 50 days under LD conditions, and materials from high latitudes budded slightly earlier than those from low latitudes. W9 from the North latitude of 50.16 appeared flowering buds only 31 days after planting. However, both the cultivated cannabis from the North and South maintained a vegetative growth state without budding till 100 days after planting. Next, we designed a set of experiments to study the flowering gene expression of wild and cultivated subpopulations under LD and SD conditions. Two wild cannabis (W9 and W4) and two cultivated cannabis (C4 and C10) resources from high and low latitudes were selcted to examine expression differences of flowering integration factors (FT-like and SOC1) and flowering regulation factors (FLC-like and CET1).
Buds of W9 and W4 appeared at the timepoints of LD3 and LD4 under LD conditions, respectively, while buds on C4 and C10 did not appear until SD3 under SD conditions. Under LD conditions (LD2-LD4), expression levels of FT-like in W9 and W4 were significantly (p < 0.01) higher than those in C4 and C10 (Fig. 5). It should be noted that in the four periods of LD conditions, the expression of FT-like showed a positive correlation with the latitude of the material source (Fig. 5). That is, the higher the latitude of the resource, the greater the expression of FT-like. W9 had the highest FT-like expression in these four periods, increasing to a maximum in LD3 with a relative expression value of 63 (Fig. 5). Under SD conditions, FT-like expression was rapidly induced to a high level in all four samples, with relative expression levels of 10339, 9228, 11627 and 4959 at SD3, respectively. During LD conditions (LD1-LD3), the expression of SOC1 in W9 and W4 was also significantly higher than that in cultivated cannabis (Fig. 5). FLC-like and CET1, as negative regulators of flowering, showed little change in expression in the four samples at different development stages, with maximum relative expression levels being 7 and 3, respectively, and there was no significant difference between wild and cultivated samples. These results show that wild cannabis can still bloom even under extreme long-days of 18 h light, and its promotion of flowering is realized by FT-like expression induced by photoperiod.