Metabolic pathway responsive gene encoding enzyme anchored EST–SSR markers based genetic and population assessment among Capsicum accessions

Gene encoding enzyme based EST–SSR markers are more potent or functional marker system to evaluate astounding genetic and structural differentiation in plants. It is very useful in shaping divergences in metabolic fingerprinting, ecological interactions, conservation and adaptation among plants. Therefore, gene encoding enzyme mediated EST–SSR markers system were used presently to evaluate genetic and population structure among 48 Capsicum accessions. Total of 35 gene encoding enzyme based EST–SSR markers was used and generated 184 alleles at 35 loci with an average of 5.25 alleles per locus. The average value of polymorphic information content, marker index and discriminating power was 0.40, 0.232, and 0.216 respectively which revealed noteworthy degree of marker efficacy and their competency was further supported by primer polymorphism (93.57%) and cross transferability (44.52%). A significant genetic variability (Na = 1.249, Ne = 1.269, I = 0.247, He = 0.163, and uHe = 0.183) was identified among the Capsicum accession using EST–SSR markers. The mean value for Nei gene diversity, total species diversity (Ht), and diversity within population (Hs) were 0.277, 0.240 and 0.170 respectively. The coefficient of gene differentiation (Gst) was 0.296 indicating significant genetic differentiation within the population and Gene flow (Nm) was 1.189, which reflect a constant gene flow among populations. AMOVA revealed more genetic differentiation within the population which is similarly supported by principal coordinate analysis among the different Capsicum population. Thus, gene encoding enzyme based EST–SSR markers represent a potent system for estimation of genetic and structural relationship and is helpful for estimation of relationships or variations studies in plants.


Introduction
Metabolism and underlying gene regulation are fundamental biological processes that are important to the entire living system. Cell and organisms require the harmonization of metabolic state and gene expression coordination that maintain homeostasis, cell growth, and differentiation (Knaap and Verrijzer 2016). At different developmental stages, cellular demands & conditions fluctuate from cell to cell and individuals, hence the amounts with functionality of metabolites and energetic demands of cell are different to maintain metabolic state. Plants are primary producers of various complex molecules such as metabolites, enzymes, natural products, and secondary metabolites. These molecules are often confined to particular cells, tissues, species and specific taxon lineages while wide diversity of these molecules are produced under specific conditions such as developmental stages, ecological situations and environmental conditions. Hence, these molecules are key player in shaping divergences in metabolic fingerprinting, ecological interactions and adaptation among plants. Amongst molecules, enzymes work in organized metabolic system that is known as metabolic pathway wherein coordinated series of chemical reactions take place with molecular transformations that occurs within cells. Therefore, a different type of biological processes and phenomena which occur in the organism require activity of different types of enzymes that promote different types of chemical reactions in which the product of one reaction becomes the substrate for another reaction and a cascading effect is created. Therefore, a presence of key enzymes is hallmark of specialized metabolic pathways and generates metabolite scaffolding by various chemical transformations (Schläpfer et al. 2017). These are generally compartmentalized into different cell and organelles with differential metabolic pathway compartmentalization also. Hence, this allows advanced level of system regulation within the cell and maintains cellular homeostasis.
According to metabolic gene assignment and annotations, we used 35 different gene encoding enzymes based expressed sequence tags (ESTs) sequences having simple sequence repeats (SSRs) which belonged to various metabolic pathways including secondary metabolites synthesis pathways. Gene encoding enzyme based EST-SSR markers mediated validation was performed on various Capsicum accessions for genetic assessment study. Chilli (Capsicum spp, 2n = 2x = 12, genome size = 3.3 GB) is one of the important fruit vegetable and spice widely cultivated around the globe with tremendous levels of health promoting ingredients. The Capsicum genus exhibits broad structural and genetic variations in term of fruit size, color and pungency which reflect intra and inter specific relationships (Gupta et al. 2019). Furthermore, ESTs is most important nucleotide commodity in plant genomics providing resource for gene discovery & expression analysis, gene annotation, molecular marker development, transcriptomics profiling, and proteomic exploration (Rudd 2003;Parkinson and Blaxter 2009;Haq et al. 2021). EST is short nucleotide sequence or short transcribed portion of genome, ranging from 200 to 800 nucleotide bases in length, randomly selected single-pass sequence reads derived from cDNA libraries (Nagaraj et al. 2007). With the development of user friendly bioinformatics and computational biology tools, characterization and annotation of EST sequences is now feasible and ESTs with putative functions can be utilized for various functional genomics research. The area of functional genomics attempts to describe the roles and interactions of genes and gene products through use of genome wide approaches and focuses on the dynamic aspects of gene transcription, translation and protein-protein interactions (Bunnik and Le Roch 2013).
ESTs have their own biological significance as they exist in expressed regions of genome and are important for the molecular marker development especially simple sequence repeats (SSR or microsatellites). EST-SSRs markers per se are functional molecular markers for deciphering "putative functions or particular gene encoding enzymatic activity" (Haq et al. 2014;Singh et al. 2019) which are helpful in variety of genomic applications in plants. Presence of SSRs in expressed regions or ESTs are found to be more conserved, important and more transferable across taxonomic boundaries than anonymous SSRs (Pashley et al. 2006;Ellis and Burke 2007;Haq et al. 2014). SSRs or microsatellites are DNA sequences ranging from 1 to 6 nucleotides long, tandemly repeated sequences, and dispersed randomly and ubiquitously throughout the genomes including both coding and non-coding regions of genome (Ellegren 2004;Haq et al. 2016Haq et al. , 2021. EST-SSRs are more preferred molecular marker in various plant genomic investigations such as in evaluating genetic polymorphism, genetic diversity, population genetics, biodiversity, high resolution genetic maps, gene mapping, QTL (quantitative trait locus), germplasm characterization, cultivar identification, paternity analyses, marker assisted breeding taxonomical, and comparative genomic studies (Kantety et al. 2002;Eujayl et al. 2004;Varshney et al. 2007b;Ukoskit et al. 2018;Haq et al. 2021). The aforesaid genetic applications of EST-SSRs are possible due to codominancy character, multi-allelic nature, high reproducibility, high polymorphism, high transferability, chromosome-specifc location, extensive genome coverage, highly informativeness and wide genomic distribution (Agarwal et al. 2008;Parida et al. 2010;Haq et al. 2014).
Looking into the vast applications of these EST-SSR markers, the present study was undertaken to genetically assess 48 different Capsicum accessions using 35 gene encoding metabolic enzyme based EST-SSR markers that are unique marker system to explore demarcation among plants which has not been reported in details so far.

Plant materials and culturing
A total of forty eight accessions were included in the present study for the genetic assessment and population structure in Capsicum. All the plant accessions were procured from distinct research centers around India. These accessions belong to 43 varieties of C. annumm L., 3 varieties of C. baccatum L. and 2 varieties of C. frutescens L. (Table 1). Healthy seeds of all accession were grown in a seedling tray and kept in plant growth chamber under controlled growth environment of 26 ± 1 °C temperature, 16 h photoperiod and 300 μmol/m 2 s −1 photosynthetic photon fluxes (Gupta et al. 2019).  (Haq et al. 2014). All the PCRs were carried out in 10 μl final volume, each using thermal cycler (Bio Rad, UK). Each reaction mixture contained 1.0 μl of DNA template (25 ng), 1.0 μl Taq buffer (10X) with 2.5 mM of MgCl 2 , 1.0 μl of primer (10 pmol/μl), 0.2 μl of dNTPs (10 mM) and 0.1 μl of Taq DNA polymerase (3 U). PCR amplification contained initial denaturation at 94 °C for 3 min followed by 35 cycles which included denaturation at 94 ∘ C for 1.0 min followed by annealing at 48-52 °C for 1.0 min depending upon primers and then extension at 72 °C for 2 min with final extension at 72 °C for 7.0 min. All the PCR products were resolved on 1.2% agarose gel (Himedia) through agarose gel electrophoresis which was carried out in 0.5X TBE (Tris-Borate-EDTA) buffer for ∼1.5 h at 70 V. Ethidium bromide dye was used for staining the agarose gel and visualization of DNA bands was done in Bio-Rad gel doc system and DNA banding data was used for further analysis.

Efficiency of EST-SSR marker
The amplified bands from PCR amplification profiles were scored as binary matrix (1 for presence and 0 for absence) for each primer. The efficiency of EST-SSR markers was measured by polymorphism information content (PIC), marker index (MI) and discriminating power (DP) using iMEC platform (Amiryousefi et al. 2018). Marker effectiveness was further measured through relative primer polymorphisms and cross-transferability within and across different chilli accessions (Haq et al. 2016). The data of cross amplifications were employed to develop a gradient polar graph based on Euclidean distance methods using TBtools software (Chen et al. 2020).

Genetic and structural analysis in the population of Capsicum
Genetic structure was identified through the estimation of various parameters such as number of different allele (Na), number of effective allele (Ne), Shannon's Information Index (I), Expected Heterozygosity (He), Unbiased Expected Heterozygosity (uHe), analysis of molecular variance (AMOVA) and Principal Coordinate analysis (PCoA) using GenALEx 6.5 programme (Peakall and Smouse 2012). Further the diversity parameters namely; Nei gene diversity, total species diversity (Ht), and diversity within population (Hs). Also, coefficient of gene differentiation (Gst), and Gene flow (Nm) were examined to evaluate genetic flow using the POPGENE 1.32 software (Yeh et al. 1999). Furthermore, Capsicum population makeup was explored by STRU CUT RE software (Pritchard et al. 2000;Agarwal et al. 2018) using cluster algorithm to identify genetic clusters in the form of K-value (sub-population). The analysis was performed in multiple runs arranging successive values of K from 1 to 10 (Singh et al. 2020). The optimum K value was determined based on means of plotting the mean value of the log posterior probability of the data [L(K)] against the given K value (Pritchard et al 2000). The output file was used to make cluster visualization by STRU CTU RE HARVESTER (Earl and VonHoldt 2012). Furthermore, binary matrix was used to generate Jaccard's similarity coefficient using Free tree software (Pavlicek et al. 1999) and binary matix also used to generate dendrogram based on UPGMA (Unweighted Pair Group Method Using Arithmetic Averages) algorithm using TreeView X software (Page 1996).

EST-SSR amplifications
The Potential of metabolic pathways responsive gene encoding enzyme based 35 EST-SSR markers was evaluated for genetic assessment study within and across the 48 different Capsicum accessions including 43 accessions of C. annumm, 3 of C. baccatum and 2 of C. frutescens. All markers used in the present study are associated genes in various metabolic biosynthetic pathways including secondary metabolic biosynthetic pathway. Out of 35 EST-SSR markers, 22 markers were trinucleotide repeat SSR, 11 were to tetranucleotide repeat SSR and 2 were dinucleotide repeat SSR. All 35 EST-SSR primers were retained due to their successful amplification amongst 48 different Capsicum accessions (Fig. 1). A total of 184 alleles were obtained at 35 loci with an average of 5.25 alleles per locus. Also, the DNA banding ranged from 1 to 10 bands and EST-SSR markers such as EMS-21 and EMS-13 showed maximum banding profile while EMS-28, 29 and 33 represented least banding profile.

Efficiency of gene encoding enzyme based EST-SSR markers in Capsicum accessions
The efficiency of gene encoding enzyme based EST-SSR markers amongst different Capsicum accessions was identified through the estimation of various parameters such as, polymorphic information content (PIC), marker index (MI) and discriminating power (DP). The PIC ranged from 0.39 to 0.51 with an average of 0.40 PIC value. Moreover, the differential DNA banding pattern amongst chilli accessions was defined by DP and the average value was 0.216 which ranged from 0.081 to 0.467. Furthermore, marker index exhibited a range from 0.08 to 0.520 with an average of 0.232 MI (Table 2). Thus, EST-SSR markers revealed significant genetic polymorphism effectively amongst different chilli accessions taken for this study.

Markers polymorphism and cross-transferability in Capsicum accessions
Primer polymorphism was identified among 48 different chilli accessions which ranged from 47.92% to 100% with an average of 93.57% polymorphism (Additional file 1). The lowest primer polymorphism was seen in markers such as EMS-32 (47.92) while maximum (100%) primer polymorphism was observed in EMS-6, 12, 13, 20 and 25. Moreover, the capacity of cross amplification or cross transferability was observed amongst 48 different chilli accessions which ranged from 6.0 to 76% with an average of 44.52% (Additional file 1). The data of cross amplification was used to generate polar gradient graph which categorized broadly Capsicum accessions into two major groups (Fig. 2).

Characterization of Population structure of Capsicum accessions
A significant genetic differentiation was observed within and across chilli accessions using AMOVA (P < 0.001) which is useful for partitioning of the overall variation. The result indicated that 99% of total variance occurred within chillies and 1% variance obtained among chillies (Fig. 3). More genetic variation was observed within the C. annumm using principal coordinate analysis (PCoA) (Fig. 4). The structural plasticity in the Capsicum population was further identified by Jaccard's similarity coefficient, and UPGMA clustering among different Capsicum accessions. According to structure analysis, the computed membership fractions of 48 accessions for different k values ranged from 2 to 10 (Fig. 5). The log likelihood determined by STRU CTU RE showed the optimal value 2 (K = 2). Consequently, the highest analysis of the ad hoc measure (ΔK) gave an acute peak at K = 2, which demonstrated the presence of 2 subpopulations (SG1 and SG2) in the entire set of accessions used (Fig. 5). Based on the corresponding fractions, the genotypes with a probability ≥ 80% categorized into analogous subgroups and others termed as admixtures. The SG1 comprised of 32 genotypes, while SG2 included 5 accessions with > 80% probability. Eleven accessions   (Fig. 6). Further, the genetic relatedness was analyzed amongst Capsicum accessions by UPGMA cluster analysis. Broadly, two major (I, and II) clusters were characterized with 27 and 12 different chilli accessions respectively, with some loose associations within accessions were also observed (Fig. 7). The dendrogram represented these grouping of chilli accessions into distinct branches in the dendrogram and showed closeness within different chilli accessions through different branching manner. Major  association with in chilli accessions was found to be seen in C. annumm into different branching system while C. baccatum and C. frutescens accessions represented association with several C. annumm accessions in the present study.

Discussion
EST-SSR markers used in the present study have functional participation in various metabolic pathways as their origin is based on the gene encoding enzymes involved in the biosynthesis of secondary metabolites and primary metabolites. ESTs belong to transcribed region of genome and significantly play active role in between genomics and molecular ecology (Bouck and Vision 2007). The EST sequences have functional contribution in gene discovery, gene annotation, marker development, transcriptomics profiling, and proteomic exploration (Haq et al. 2021). Notably, ESTs are most important genomic resources for SSR development, serving as molecular marker (EST-SSR) and are useful in variety of genetic applications. Among the other marker system, EST-SSRs marker are more favored in plant genetics due to their multiallelic nature, high reproducibility, cross transferability, codominant inheritance, and extensive genome coverage (Haq et al. 2016). Hence, EST-SSR markers have been used for varieties of genetic applications such as, germplasm characterization, cross transferability, cultivar identification, population genetics, gene mapping, QTL (quantitative trait locus) and marker assisted breeding (Eujayl et al. 2004;Varshney et al. 2007a;Haq et al. 2014Haq et al. , 2021Ukoskit et al. 2018;Singh et al. 2020).
The Gene encoding enzyme based 35 ESR-SSR markers retained their PCR amplification successfully and total of 184 alleles was obtained with an average of 5.25 alleles per locus. Our study is in the agreement with earlier report using EST-SSR based genetic polymorphism and other genetic analysis in different Capsicum species (Yi et al. 2006;Nagy et al. 2007;Ince et al. 2010;Huang et al. 2011;Kong et al. 2012;Shirasawa et al. 2013;Tsaballa et al. 2015). Marker efficiency was evaluated by polymorphic information content (PIC), marker index (MI) and discriminating power (DP). PIC ranged from 0.39 to 0.51 with an average of 0.40 PIC value which is slightly lower than earlier report in chilli using EST-SSR markers wherein PIC ranged from 0.24 to 0.86 with an average of 0.62 PIC (Portis et al. 2007) and PIC ranged from 0.50 to 0.91 with an average of 0.77 reported in different Capsicum species using SSR markers (Tsaballa et al. 2015). While using ISSR marker, PIC  (Tsaballa et al. 2015). A considerable degree of polymorphism was observed in different Capsicum accessions using EMS marker which was further supported by DP (0.216) and MI (0.232) values that highlights significant level of polymorphism in chillies (Amiryousefi et al. 2018).
Marker efficacy was further estimated by primer polymorphism and cross transferability in 48 different Capsicum accessions. An average primer polymorphism was 93.57% ranged from 47.92 to 100%. In a similar study, an average primer polymorphism was 81.52%, ranging from 50 to 100% in Capsicum accessions using SCoT (Start Codon Targeted polymorphism) markers (Gupta et al. 2019). In the present study, observed cross transferability ranged from 6.0% to 76% with an average of 44.52% among 48 different chilli accessions which are quite comparable with different chilli accessions using other marker systems Portis et al. 2007;Rai et al. 2013;Gupta et al. 2019;Haq et al. 2022). Thus, results of primer polymorphism and cross transferability displayed competency of chosen primers and exhibited inter accessions polymorphism and allelic conservation among chillies.
Using EST-SSR markers, the parameters for genetic variability were identified through Na, Ne, I, He, and uHe all of which plays important role in genetic characterization of chilli populations. Overall in Capsicum population, the mean value for Na, Ne, I, He, and uHe were 1.255, 1.282, 0.256, 0.170 and 0.191 respectively and C. annumm showed higher genetic diversity compared to C. baccatum and C.  (Table 4). Increased genetic variability in C. annumm might be influenced by mutation rate, reproduction pattern, genetic recombination, genetic drift, gene flow, and natural selection. Therefore, species with higher genetic variation owes it to their widespread ecological distribution, robust environmental adaptation, survivability and evolutionary consequences (Yan et al. 2019). A significant level of genetic variation observed among Capsicum accessions in the present study which proves the successful polymorphism of unique gene encoding enzyme based EST-SSR markers. Moreover, genetic differentiation is also related with population sizes which have an impact in germplasm characterization, higher the population size more is the chance of genetic differentiation and more chance of displaying higher heterozygosity (Wang 2005;Aljumaili et al. 2018).
Genetic differentiation (Gst) and gene flow (Nm) are important marker of population genetic structure characterization. According to Nei (1978), Gst classified into low (Gst < 0.05), median (0.05 < Gst) and high (Gst > 0.15) for genetic differentiation of population. In the present study, the mean Gst value was 0.296 for gene encoding enzyme based EST-SSR marker indicating that majority of variation is found within populations. The value obtained for Nm based on the mean Gst was 1.189, it indicates moderate gene flow among populations of Capsicum and also indicates that the gene exchange was high among populations. It is commonly known that the value Nm < 1, Nm > 1 and Nm > 4 are classified as low, moderate and extensive gene flow, respectively. Therefore, it is normally believed that Nm value more than 1 is enough to impede genetic drift and prevent genetic differentiation among populations. While, Nm less than 1 is insufficient to counteract the effects of genetic drift, the dominant factor leading to genetic differentiation among populations (Slatkin 1985;Schnabel and Hamrick 1995). Our study is in consistent with earlier reports on genetic characterization of wild and domesticated population of Capsicum annumm where Gst and Nm was 0.056 and 4.21 respectively with differentiating results using isoenzyme based analysis (Hernández-Verdugo et al. 2001).
Chilli population structure was further characterized by AMOVA, PCoA, Structure analysis, Jaccard's similarity coefficient and UPGMA analysis. Gene encoding enzyme based EST-SSR markers analysis displayed a maximum of 99% genetic variation within the population and 1% among the population of chillies through AMOVA. Also, comparable pictorial representations of population structure were supported by principal coordinate analysis (PCoA) which exhibited consistency of results in the characterization of Capsicum accessions. This increased variation within population may be due to distinct ecological conditions, adaptations and variations in morphological characteristics in chillies. Also, polymorphism of different microsatellite repeats offer a great efficacy to identify inter and intra specific genetic polymorphism (Husnudin et al. 2019). The result of UPGMA clustering analysis revealed two major groups and similar population structure was obtained through structure analysis which majorly characterized all the genotypes into two subpopulations. The finding of the structure analysis was found to be analogous to the UPGMA-based genetic diversity among different Capsicum accessions. Earlier, the population structure analysis has been represented into two to ten subpopulation in Capsicum species using most related accessions from different origins (Albrecht et al. 2012;Jaiswal et al. 2020). Therefore, the result of the present study represented a significant genetic relationship among different chilli accessions due to varied genome size, morpho-physiological variation and distinct agro-ecological environments. Important factors which explain these result regarding harmony and discordance among the chilli accessions owe it to the nature of marker system used, level of polymorphism, number of detected loci, and region coverage of genome by each markers (Souframanien and Gopalakrishna 2004), occurrence of distributions either local or geographically distinct spawning groups, natural selection as well as adaptation, survivability, and evolution in changing environments (Yang et al. 2014;Li et al. 2018).

Conclusion
In the present study, gene encoding enzyme based EST-SSR markers mediated genetic assessment analysis was performed on different Capsicum accessions. The genetic polymorphism, cross-transferability, and genetic & structural plasticity were identified among 48 accessions of chilli using EST-SSR markers. A significant genetic polymorphism was identified through the measurement of PIC, MI and DP factors which reflected markers efficiency in chilli accessions. Also, EST-SSR markers displayed a considerable level of primer polymorphism and cross-transferability which reflected genetic discrimination and conservation among Capsicum accession. Moreover, generated DNA amplification profiles reflected a significant level of genetic variability among Capsicum accessions through the calculation of various genetic parameters such as Na, Ne, I, He and uHe. A noteworthy degree of genetic diversity index was identified by the estimation of Nei's gene diversity, Hs, and Ht which displayed molecular variability among chillies and thus the knowledge obtained could be used in the management and characterization of chilli germplasm. Increased level of Gst was observed with moderate level of gene flow among the population which inhibit genetic drift and prevent genetic differentiation among populations. Furthermore, the population structure was explored by AMOVA, PCoA and UPGMA which reflected more genetic variability within the chilli population. Thus, the present study support fundamental insight for genetic variability among Capsicum accession and generated information could be used in the effective management and characterization of chilli accessions in crop improvement programmes.