Genetic Diversity And Population Structure of Indian Mustard (Brassica Juncea) Based On Morphological and Molecular Markers


 Rapeseed-mustard is one of the most important oilseed crops and providing a major source of edible oil in the world besides having other economic importance like leafy vegetables, ornamentals, and hedge crops. However, the genetic diversity present in the Brassica gene pool has not been investigated in detail. To address this problem, a study was conducted on 76 genotypes of B. juncea including cultivars, exotic lines, registered genetic stocks, advanced breeding lines, and germplasm lines. The genetic diversity was analyzed with the help of 50 polymorphic SSR and EST-SSR markers. For these genotype-marker combinations, a total of 126 alleles were amplified. Using molecular and phenotypic data, the dendrogram was constructed based on Jaccard’s similarity coefficient and Manhattan dissimilarity coefficient and linkage algorithm UPGMA. All the genotypes were grouped into 5 clusters based on their dissimilarity matrix. Population structure analysis grouped the genotypes in 8 clusters and various degrees of admixture was also observed. The grouping of genotypes appears effective as per their pedigree. The marker data was found more accurate to characterize the diversity and study the population structure than the quantitative trait data. The results of the present investigation will provide useful information for the identification of important alleles for future studies and pave the way to enhance genetic gains in Indian mustard.


Introduction
Rapeseed-mustard crops are grown around the world under a spectrum of agro-climatic zones ranging from rainfed to irrigated conditions. In India, these crops are grown on normal to saline soil, in northeastern to western hills, and in the different cropping systems. Among the various oilseed crops, Indian mustard (B. juncea L. Czern & Coss) is one of the most important edible oilseed crops contributing 9.3 MT production with a productivity of 1.5 t/ha during 2018-19 (Bodh et al., 2020). But due to yield ceiling and biotic and abiotic losses, there is a large gap between the supply and demand of edible oil in the country. The mustard germplasm is a rich source of valuable genes that can be used by plant breeders to develop improved varieties having resistance/tolerance to various biotic and abiotic stresses along with higher yield. The success of any breeding program depends upon genetic diversity in the germplasm, its intensive exploitation and utilization in crop improvement (Singh et  Parents having genetic differences, mainly in terms of the number of genes and/or different allelic forms, resulted in good heterosis. Therefore, the assessment of genetic diversity among genotypes showing a signi cant impact on the success of the breeding program is important. The knowledge of population structure, similarity and divergence in B. juncea genotypes will be useful for breeders to predict the best parents for the Brassica improvement program. To estimate the available diversity in the germplasm, several methods are available viz., evaluation of phenotypic variation, biochemical and DNA-based polymorphic analysis. However, phenotypic characterization is unreliable since it is affected by environment, labor demanding and numerically and phenologically limited (Duminil and Di Michele 2009). On the other hand, DNA based molecular markers are reproducible, ubiquitous, stable, and reliable (Song et al. 1988; Snowdon and Friedt 2004). There are several classes of molecular markers available, among them, microsatellite markers (simple sequence repeats or SSR) are considered as most suitable due to their codominant and multiallelic nature, higher reproducibility, abundance and genome-wide coverage (Kumar et al. 2015). A number of SSRs and EST-SSR markers have been developed for Indian mustard varying in their degree of polymorphism (Shi et al. 2014; Singh et al. 2016a). Therefore, SSR markers covering the entire genome will help in the unbiased study of the genetic diversity of Indian mustard which in turn gives a molecular description of mustard cultivars.
The assessment of Indian mustard germplasm diversity has been carried out by several researchers earlier (Bansal et al. 2009;Ghosh et al. 2009;Singh et al. 2013b). However, most of these studies are based on a small number of genotypes with less effective RAPD markers and nondescript SSR markers. It has been hypothesized that the use of random markers for diversity analysis may not consider the functional variability present in the coding region of the genome (Zhang et al. 2010). Identi cation of variation occurring at a heterotic locus is crucial for plant breeding. Therefore, it is pertinent to study the genetic diversity by using random as well as EST-SSR markers which would perform their suitability to access diversity at the genetic level.
In the present study we selected 76 different genotypes including landraces, elite cultivars, mega-varieties, breeding lines and registered genetic stocks having common knowledge about their pedigree and variability with the objective to identify different heterotic groups. The knowledge gained would be useful to exploit heterosis and recombination advantages in the future for developing various kinds of mapping populations viz., recombinant inbred lines (RILs), near isogenic lines (NILs) and nested association mapping (NAM) population and other Brassica crop improvement programs.

Plant material
The experiment was conducted at ICAR-Directorate of Rapeseed Mustard Research, Bharatpur, Rajasthan, India, in randomized block design with three replications and two years of phenotypic data were recorded. A total of 76 mustard accessions taken in this study were obtained from the germplasm section of ICAR-Directorate of Rapeseed-Mustard Research, Bharatpur, Rajasthan, which represents landraces, popular cultivars, breeding lines and registered genetic stocks from different mustard breeding centers across the country ( Table 1). The genotypes were sown in two rows of ve meters each. The experiment was carried out with standard agronomic practices.

Selection of SSR markers
A set of 120 SSRs along with 20 EST-SSR markers developed in our laboratory were selected for the analysis (Singh et al. 2016a). These markers could effectively illustrate the diversity among the selected germplasm (Table S1).

SSR assay
Total genomic DNA was isolated by modi ed liquid N 2 free method (Siddiqui et al. 2011) and assayed with a total of 140 SSR + EST SSR markers. The PCR mixture contained 25-30 ng of template DNA, 5 pm of each primer, 0.05 mol dNTPs, 10x PCR buffer, and 0.5 U Taq polymerase in a reaction volume of 10 µl.
the PCR cycle was set up as initial denaturation at 94°Cfor 5 min followed by 40 cycles (1 min denaturation at 94°C, 1 min annealing at 55-60°C and 1 min of extension at 72°C) and nal extension of 72°C for 7 min. The PCR products were separated on a 3.5 % agarose gel and run for 3 h in 1x TAE buffer. PCR amplicons were visualized under UV transilluminator using EtBr.

Data analysis
All the statistical analysis was conducted using R software (Team 2021

Principal component analysis (PCA)
Pearson's correlation coe cients between the phenotypic traits are presented in Fig. 1. It showed some highly correlated characters like PH, PB, SMS, and TSW which may bias diversity analysis as an effect of multiple-collinearity. Therefore, principal component analysis (PCA) was performed to estimate the variability independently. The important traits along with their eigenvalues obtained from PCA of the correlation matrix are given in Table 3 and contributions of different PCs are shown in Figure S1. A signi cant part of total variation (69.5%) was accounted by only four major PCs. The PC1 and PC2 together explained 44.6% of the total variability observed. The variables contributed to the rst two PCs are given in Fig. 2. It showed that traits like SD, PB, and TSW contribute to PC1 signi cantly, while SL, MSL, SB, and SMS Showed signi cant contributions to PC2.

Clustering of genotypes based on phenotype
Agglomerative clustering (bottom-up) of 76 genotypes on the basis of phenotypic data were performed by calculating the dissimilarity measure by Euclidean distance matrix using two linkage methods (i.e. complete and Ward). Both, Ward and complete linkage method analysis resulted in 5 groups named A-E and 1-5, respectively (Fig. 3). In the Ward's clustering method, group A is the largest group with 38 genotypes followed by group B (13 genotypes), group C (12), and group E (7), whereas D is the smallest group with 6 genotypes. In complete linkage clustering, group 3 is the largest with 25 genotypes followed by group 1 (20), group 4 (15), group 2 (9), and group 5 (7 genotypes). These major clusters corresponded to the clusters identi ed by the two-dimensional scaling of the rst two PCs.
The rst group is the largest group consisted of many signi cant variants like Pusa Bold, RH-749, UP-170 etc. The second group consisted DRMRIJ-1206, BioQ-108, NE68, SN55, etc while the third group contained B-15, HP11, RH-114, and IC-520747. Further, cluster analysis of all the four PCs based on the unweighted pair group averaging of Euclidean distance of the factor revealed three major clusters. These major clusters corresponded to the clusters identi ed by the two-dimensional scaling of the rst two PCs. The contribution of genotypes to major PCs has shown in Table S2. Genotype Heera (6.7%), EC-557025 (6.7%), RH-114 (5.6%), and BT-15 (5.5%) contributed maximum to the rst component whereas RH-second component.
Most of the genotypes of group B of Ward linkage were grouped together into cluster 4 of complete linkage. Genotypes of group A (Ward linkage) were divided into group 1 and group 3 of clusters obtained by complete linkage. Group E of Ward cluster was highly correlated with group 5 of the complete cluster with 7 common genotypes viz., IC-520747, BT-15, JCR-914, Dwarf, HP-11, B-15, and RH-114. Group B, C and D correlated to groups 4, 1 and 2, respectively.

K-mean clustering of genotypes
The K-mean clustering method is used to optimally group the individual into K-groups. K-mean clustering of the mustard genotypes based on phenotypic data at k = 2, 3, 4 and 8 was given in Fig. 4. The optimal K value estimated was 4 ( Figure S2). At k = 2, the genotypes were grouped into 2 distinct clusters but with higher k values, the clusters started to merge similar to that of population structure analysis.

Molecular Marker Characterization
The selected panel of SSR and EST-SSR markers showed allelic variation ranging from 1 to 5. Out of total 140 markers selected, 50 showed polymorphism. The PIC values of these markers ranged from 0.1 (cnum587aF) to 0.79 (gi258660353). The average PIC of these polymorphic markers was 0.47 and maximum numbers of alleles were found in cnu_m583aF.

Clustering of genotypes based on molecular markers
Clustering of the genotypes based on the 50 polymorphic markers generated two major groups and nine sub-groups (Fig. 5). Cluster A has a maximum of 23 genotypes followed by clusters I, B and D having 12, 11 and 10 genotypes, respectively. Cluster C and G have 10  Cluster H has two genotypes Pusa Mahak and BPRQ-215, both having early owering in common but BPRQ-215 is a double zero quality line whereas Pusa Mahak has a considerably higher level of erucic acid and glucosinolate contents.
Most of the quality lines viz., PDZ-1, PDZ-2, PDZ-3, RLC-3, RLC-4, and Heera were categorized into cluster G except Pusa Karishma grouped in cluster A, BPRQ-215 grouped into cluster H and PM-21 and PM-30 grouped into cluster I. Custer I consists of genotypes primarily having biotic stress tolerance viz., Bio-YSR, DRMRIJ-1237 and DRMRIJ-1206 having tolerance against white rust and PAB-9511, PHR-2, RH-1230, and RH-1235 having various degree of tolerance against Alternaria blight disease. These markers showed a considerably higher level of genetic diversity among the mustard genotypes.

Population structure of genotypes
Population structure analysis of 76 genotypes classi ed the population into two groups and subdivided into 8 subgroups (Fig. 6). Structure analysis was performed at K2 to K10. Population structure at K8 correlated with the genotypic dendrogram and selected for further analysis. Genotypes having less than 70 individual ancestry proportions (q values) are considered admixture (Table S3)

Discussion
The phenotypic and genetic diversity of 76 Indian mustard genotypes including germplasm lines, varieties, exotic lines, genetic stocks, and land races were studied by analyzing 13 yield component traits and 50 polymorphic microsatellite markers. The genotypes of Indian mustard exhibited signi cant variation for 13 yield and yield component traits ( Table 2) Indian mustard accessions. The clustering of the genotypes based on both the methods grouped genotypes in a quite similar fashion except for group A, which mainly divided into group 1 and 3.
K-mean clustering method is generally used to study the population structure by better clustering of the individual into K-groups. The population is divided into 2 large groups at K = 2 and further grouping showed admixture in the population as clusters started to overlap. This pattern of K-mean clustering corresponds to the grouping of the population by STRUCTURE analysis. However, Stift et al. (Stift et al. 2019) found that STRUCTURE analysis is better to study the population structure that K-mean clustering method.
Previously, genetic diversity in Indian mustard at the molecular level has been carried out using isozyme markers (Kumar and Gupta 1985) and

Conclusions
The results of the present investigation showed that the genotypes originated from a particular center clustered together, showing a narrow genetic base in Indian mustard. Also, population structure of Indian mustard showed high level of admixture in some genotypes which revealed the already present genetic lineages mix. Therefore, there is a need for broadening the genetic base and diversifying mustard    Dendrogram showing genetic relationship in between 76 Brassica genotypes based on genotypic data.
The identi ed groups are labeled with A, B, C, D, E, F, G, H and I.