Seed colour classes in beans
In terms of colour classification based on visual assessment, the lines used in present study were predominantly red in colour with diverse shades and brilliances. Out of 278 accessions, 128 were red followed by yellow (36), chocolate (30), black and brown (29 each), white (19) and green (7) (Table 1, Figure 1 and 8). The set represents the diverse variability of plain seeded bean varieties from local landraces as well as the accessions from national gene bank (NBPGR) and various international gene banks. Seed colour is an important varietal attribute for adoption in Western Himalayan region and determines the price it fetches in local market. The local consumers in Kashmir valley invariably prefer red coloured beans especially the small seeded red, followed by other classes (Sofi et al, 2022a). Seed colours such as chocolate, brown, white and greenish beans are usually preferred in snap bean class, usually consumed as green and shelled beans. Black beans are less preferred in Western Himalayan region. In addition to seed colour, seed brilliance is also used as an important morphological parameter for consumer seed quality trait as brilliant seeds are usually hard cooking due to a thicker palisade layer (Konzen and Tsai, 2014, Sofi et al., 2022b)
Seed colour delineation using L*a*b and δ E colour space
The descriptive statistics of L*a*b and δ E are presented in Table 1. The mean value for L* was lowest for red (31.31) and highest for white (78.17) with mean of 40.51 across all colours. Mean value for a* was lowest for white (-0.42) and highest for red (19.60) with mean of 12.97 across all colours. Similarly, mean value for b* was lowest for black (-0.29) and highest for yellow (40.86) with mean of 15.05 across all colours. In terms of deviations from standard colours depicted by δ E, highest mean value was observed in red colour (6012.16) and lowest was recorded for green (18.36) with a mean of 2525.72 across all colours. Varga et al. (2019) also characterized seed coat colour in terms of CIE L*a*b* colour coordinates of 100 common bean accessions belonging to five mono-coloured landraces using colourimeter and computer vision. The percentage difference between the two methods across all samples colour coordinates was 5.81%, for L*, 23.32% and for a* and 44.44% for b*. Using stepwise discriminant analysis, it was found that spectroscopic method correctly classified 97% of accessions into their respective landrace. The co-efficient of variation (CV) which is a statistical measure of the dispersion of data points in a data series around the mean was lowest for white and highest for chocolate in case of L*. For a*, CV was highest and lowest for black and yellow colours respectively and for b* CV was highest and lowest for black and white colours respectively. Similarly, for δ E, CV was highest and lowest for green and red colours respectively. Across all colours, CV was highest in case of b* and lowest for δ E. The perusal of descriptive statistics indicates huge variation within a particular colour class as depicted by higher range and invariably higher CV values. The abnormally high values of CV for some colours are mainly because of smaller sample size. The variation of tristimulus parameters L*, a*, b* and δ E are also depicted by graphical means in Figure 4 and 5, that shows graphical variation patterns in these parameters for seven colour classes as well as variation recorded across all colours. Higher range and CV values within a colour class reinstate the need for a quantitative assessment of seed coat colour in beans to remove the bias caused by subjective assessment of visual scoring of seed colour glasses and ambiguous grouping through frequency distribution. This is especially a case when large diversity panels are characterized for seed traits and get grouped in a few classes based on such subjective assessment. Buratto et al. (2021) proposed that qquantitative colourimetry can be used to characterize the colour of seeds. They compared visual colour evaluation and quantitative colourimetry of triticale seeds using the CIELAB colour space. The colorimetric data for L *(lightness) and chromaticity coordinates a* and b* of the seeds scored on scale ranging from 1 (very weak or absent colour) to 9 (very dark colour) and evaluated visually the seed colour. Seeds classified as score 1 showed mean values of 46.3, 6.6, and 16.4 for L, a*, and b*, respectively. On the other hand, seeds classified as dark or very dark showed values equal to or lower than 28.4 for L, 5.6 for chromaticity a*, and 6.0 for chromaticity b*. The use of colorimetric parameters showed a practical application and low subjectivity in the classification of the seed colour.
Konzen and Tsai (2014) used colourimeter data of tristimulus parameters to characterise seed colour in common bean and suggested that such analysis not only provided basic quantification of seed coat colour differences but also give better insights into seed brilliance as well as cooking quality. They also standardised the protocol for differentiation of Asp and J locus using data on seed colour and shininess through L*a*b* system. Recently, Halcro et al. (2020) used a Pheno-Seed platform using different cameras and colour assessment using RGB as well as RGB to L*a*b* transformation and found high R2 values for calibration samples. They proposed that increased precision and higher rates of data acquisition compared to traditional techniques will help to extract larger datasets and explore seed colour diversity with greater details.
Recently, Sadohara et al (2022) has extended use of colorimetric tristimulus data of L*a*b* parameters by application of machine learning algorithms based on seed colour, hilum ring and corona to characterize a set of 295 yellow bean genotypes for seed appearance and postharvest darkening through L*a*b* colour values. A model to exclude the hilum ring and corona of the seeds, black background, and light reflection was developed by using machine learning, allowing for targeted and efficient L*a*b* value extraction from the seed coat. The machine-learning-aided model used to extract colour values from the seed coat, the wide variability in seed morphology traits for future breeding and research efforts to meet consumers’ expectations for bean seed appearance.
Principal component analysis
Based on the principal component analysis done for 4 variables viz., seed colour (quantified on 1-7 scale) as well as L*a*b* parameters, the variability was concentrated in first two principal components. The criterion of cumulative variance (70-80%) and eigen value > unity we5re used (Kovacic, 1974) and the total variance explained with the first two PC’s was 86.74%. Latent roots (Eigen values) for significant PCs ranged from 2.07 (PC1) to 0.19 (PC4). The first two PC’s that were used for constructing PCA biplot (Figure 6) graph explained 86.74% (51.69% by PC1 and 35.04% by PC2) of total variation mainly contributed by colour and L* in PC1 and b*, L* and a* in PC2. The factor loadings of a* in PC1 and colour in PC2 were negative. The factor loadings (component loadings) in PCA are the correlation coefficients between the variables and PC’s. The genotype-trait biplot indicated various trait correlations based on the proximity and angle of two vectors (Yan and Rajcan, 2002). In a PCA biplot, close alignment of trait rays forming a small acute angle corresponds to strong positive correlation. In the present study, based on the factor loading graph, colour is strongly correlated with L* as is evident from its significant contribution in both PC1 and PC2. This can be also substantiated by the overall graphical variation depiction in Figure 3, that shows largest variation in L* as compared to a* and b* when taken for entire panel of 278 genotypes. Seed colour has almost no correlation with b*, whereas biplot shows negative correlation of colour with a*. This situation is based on the scale used in the present study, since red was given scale code of 1, thus genotypes with red colour will have invariably lower values of a*. However, this cannot be taken as a universal relationship as different workers may use a different scale of colour designation. The multivariate analysis clearly delineates the diversity panel of 278 genotypes into distinct colour groups (Figure 6) as shown by concentration of genotypes of similar colour class into specific regions of four coordinates of biplot based on L*, a* and b* values and their observed relationship with colour scores. Sadohara et al. (2022) also used principal component analusis based on BLUP values and found that component 1 separated brown and Amarillo dark beans from the rest of the market classes explaining >60% of the total variance. Principal component 2 separated genotypes based on b* values, explaining 33% of the total variance. Together, PC1 and PC2 explained 94% of the total variance, indicating that the color values of the lines could be compressed to two dimensions instead of three without losing much information. This seemed to be because of the high negative correlation between L* and a* values. The L* and a* values are correlated mainly because of the Amarillo dark genotypes having higher a* and lower L* values than the other market classes. Thus, separation of colours could be achieved by accounting for only two PCs explaining 94% of total variation.
Cluster analysis
Cluster analysis based on Ward’s method (Euclidian distances) classified the 278 genotypes (Table 4, Figure 7) into seven clusters with largest number of genotypes (121) in cluster-1, followed by 54 in cluster-5, 40 (cluster-4), 30 (cluster-2), 27 (cluster-7), 5 (cluster-6) and 1 (cluster-3). Cluster one had mostly the red genotypes with just two yellow genotypes, whereas most of the yellow genotypes were grouped in cluser-2. Most of the black and few chocolate genotypes were grouped in cluster-4, whereas, most of brown and chocolate genotypes were grouped in the cluster-5, with few brown ones in cluster 6. The cluster-7 contained all the white and green seeded genotypes. The cluster-3 had just one red genotype R50. As such no report is available in common bean regarding diversity analysis based on L*a*b* values. However, the genetic diversity of 466 melon germplasm were analysed using quantitative analysis of fruit colour based on the L*a*b* values of the peel, the outer flesh and inner flesh were determined by precision colorimeter. The results high genetic diversity in melon fruits colour measured digitally by L*a*b* values and substantially overcame the disadvantage of inaccurate colour description and helped in quantitative analysis of colour that could lay the foundation of digital description of fruit colour in breeding.