Correlations and Multivariate Analysis Across Non- Segregation and Segregation Generations in Two Cotton Crosses

Background: The cotton crop is one of the most important natural bers crops for textile manufacture in the world. The present research uses Pearson’s correlation coecient and multivariate analysis to assess the interrelationships, similarities and dissimilarities among non-segregation (P 1 , P 2 and F 1 ) and segregation (F 2 , BC 1 and BC 2 ) generations for seed cotton yield and yield components in the two crosses Giza 92 x Pima S6 and Giza 93 x C.B. 58. Results: The analysis of variance exhibited highly signicant genetic variability among six generations for all studied traits in the two crosses. The F 1 performance was higher than the other generations for all the studied traits in the two crosses. The results had positive and high signicant correlations between seed cotton yield/plant, lint cotton yield/plant and No. of bolls/plant traits across all six generations in the two crosses. A number of positive correlations were observed among the six generations for all studied traits in the two crosses. The UPGMA hierarchical clustering showed higher level of similarity coecients among the six generations and among the studied traits. Similarity coecients ranged from 0.96 to 0.99 and from 0.65 to 0.96, respectively. In the principal component analysis (PCA), the PCA1 extracted had Eigenvalue >1 across six generations for all studied traits in the two crosses. The PCA displayed total variation of 91.84% among the six generations contributed by PCA1 (79.47%) and PCA2 (12.38%) and had mainly distinguished the generations in different groups. The PCA1 and PCA2 were dominated by F1 and segregation generations in the two crosses, respectively, showing high correlations with the rst two PCAs. All studied traits as well as boll weight and lint percentage traits contributed positive signicant component loadings for the PCA1 and PCA2, respectively. The biplot analysis of the relationship between the six generations revealed that the most appropriate generations for selecting yield traits were F1 in the two crosses and BC 1 and BC 2 in the cross Giza 93 x C.B. 58. Conclusion: From the obtained results, we recommend considering backcrossing may be done for 2–5 cycles (BC 2 – BC 5 generations m recommended agronomic practices of cotton applied from sowing to harvesting to good and healthy plant population and in cotton, the The results of mean performances indicating the presence of genetic variability for these traits in those studied materials. The F1 generation showed superiority in studied traits at the two crosses. These results indicated that the relation between non-segregation and segregation generations revealed that there was different behavior for studied traits in the study materials. Thus, it is possible to benet from the selection in the segregation generations in future breeding programs of improving these traits in Egyptian cotton. El-Hashash (2017) has been reported that the mean values of non-segregation generations were better than the mean values of segregation generations in some cases and conversely in another case in the two crosses of Egyptian cotton. Positive and signicant correlation coecients were observed among and within the six generations for studied traits in the two crosses. The correlation coecients among different pairs of plant traits for the two crosses across generations indicated that cotton yield can be improved and increased by increasing most yield components traits. The results could be a possibility of plants with desirable attributes of cotton yield in the next segregating generations. In this connection, Srour and El-Hashash (2012) reported similar results. Positive intergeneration correlations for studied traits in the two crosses indicated that selection for increased value of one trait will result increase in value of other. The nding of Barman and Borah (2012) in mutant rice strain and Srour and El-Hashash (2012) in cotton revealed that the correlation coecient among F 2 , F 3 and F 4 generations were signicant or highly signicant for studied traits. A signicant correlation (p < 0.05 or 0.01) between cotton yield and yield components traits across the single and double-cross hybrids were observed by El-Hashash (2013). The results showed a positive and signicant correlation among all combinations of the six generations across studied traits. Generally, the seed cotton yield/plant positively and signicantly correlated with all studied traits in the two crosses, except the lint percentage in the cross Giza 93 x C.B. 58. This implies that cotton yield is effective for selection in a later generation and the two crosses may be used in improving cotton yield in Egyptian cotton. Like present ndings, the signicant positive correlation for yield and its components traits were reported by Khokhar et al. (2017); Jarwar et al. (2019); Kumar (2020); Rehman et al. (2020) and Sarwar et al. (2021). Srour and El-Hashash (2012) and Khokhar et al. (2017) stated that more than one trait can be used as a selection standard in the next segregating generations of cotton. The maximum similarity occurred between seed and lint cotton yield/plant traits as well as between P 1 and BC 1 generations, while the minimum similarity occurred between seed index and lint percentage traits as well as between F 1 and F 2 generations. Based on 100 cotton genotypes of Gossypium hirsutum L., the mean performances of yield and yield components traits were grouped into six different clusters (Akter et al. 2019). The tree diagram had exhibited the highest correlation between the traits or generations inside each cluster. While the lowest correlation of traits or generations were found among the clusters. In amphidiploid cotton hybrid plants, Muminov et al. (2020) mentioned that the cluster analysis grouped the parents and F 1 -F 6 generations into four different clusters. According to the report of Abdel-Monaem et al. (2020), the parents and F 1 generation plants four and seven major clusters, and the clusters of F 1 generation were wide from in the variance in the dataset. These results were consistent with Abdel-Monaem et al. (2020) among parental cotton genotypes and their F 1 s cross combinations in Egyptian cotton. Our results were harmonic with El-Hashash (2016) who reported that there is a break in the plot that separates the meaningful components from the trivial components. Most researchers would agree that the rst two PCAs are probably meaningful. This nding is in agreement with those reported by Abasianyanga et al. (2017); Jarwar et al. (2019) and Sarwar et al. (2021).

group and show heterogeneity among groups. It is complementary to PCA (El-Hashash 2016). Many researchers have used the PCA and cluster analysis to assess the relationship and diversity between several cotton germplasm, in addition to knowing the relationships between seed cotton yield and its components traits (Shah et Sarwar et al. 2021). Scree plot is usually used for visual assessment of factor, which explains high amount of the changes in the data (Jarwar et al. 2019), also it further highlighted the partitioning of the principal components (Abasianyanga et al. 2017).
Due to the increasing demand for Egyptian cotton, this present investigation was conducted to study the interrelationships, similarities and dissimilarities among non-segregation and segregation generations for seed cotton yield and yield attributes in the two crosses by adopting Pearson's correlation coe cient and multivariate analysis.

Methods
Genetic Material and Field Procedure: Four genotypes were used in this study namely Giza 92, Giza 93 (Egyptian varieties), Pima S 6 (Egyptian American) and C.B. 58 (USA barbadense), which all belong to the specie Gossypium barbadense, L. The experiments were carried out during the three successive growing seasons from 2018 to 2020 at Sakha Agricultural Research Station, Kafr El-Shiekh Governorate, Egypt. In 2018 season, the four parental varieties were crossed to produce F 1 hybrid seeds for the two crosses Giza 92 x Pima S 6 and Giza 93 x C.B. 58. At 2019 season, each F 1 was backcrossed to both parents to obtain BC 1 and BC 2 ; the parents were also crossed for more hybrid seeds and the F1 plants was selfed to obtain F 2 seeds. The six populations i.e., P 1 , P 2 , F 1 , F 2 , BC 1 and BC 2 for the two crosses were evaluated separately in a randomized complete blocks design with three replications during 2020 season. Each replicate consisted of 10 rows for F 2 , 5 rows for BC 1 and BC 2 crosses (segregating generations), and 3 rows for each non-segregating generations P 1 , P 2 and F 1 . Each row 4 meters in length and 0.60 m in width as well as comprised 10 hills. Hills were spaced at 40 cm apart and thinned to one plant per hill. All recommended agronomic practices of cotton were applied from sowing to harvesting to get a good and healthy plant population as usual.
Traits measurement: The studied traits on a ten individual guarded plant basis were taken of the six non-segregation and segregation generations in the two crosses studied. The data were recorded for boll weight in grams (B.W., g), seed cotton yield/plant in grams (S.C.Y./P, g), lint cotton yield/plant in grams (L.C.Y./P, g), lint percentage (L.%), number of bolls/plant (No. of B./P) and seed index (S.I., g) traits. The collected data were statistically analyzed.
Statistical Approaches: The data of studied traits were subjected to a one-way ANOVA test following the method of Steel and Torrie (1997) to determine the signi cant differences and the coe cient of variation (CV%) among six generations using XLSTAT software as described by Addinsoft (2021). According to Gomes (2009), the estimates of CV% were classi ed as very high (CV≥21.0%), high (15.0%≤CV≤ 21.0%), moderate (10%<CV≤20%) and low (CV<10%). The signi cance test was done with the least signi cant difference test (L.S.D) at 0.05 and 0.01 levels of probability according to Steel and Torrie (1997). Quantity and plot Pearson's correlation coe cient as well as multivariate analysis (principal component and cluster analysis) were performed for a better understanding of the relationship among studied traits across six generations using the computer software program PAST version 4.03 (Hammer et al. 2020).

ANOVA and Mean Performances:
The results of one-way ANOVA showed statistically signi cant differences (P < 0.01) among non-segregation (P 1 , P 2 and F 1 ) and segregation (F 2 , BC 1 and BC 2 ) generations for yield and yield components traits studied in the two crosses Giza 92 x Pima S 6 and Giza 93 x C.B. 58 (Fig. 1). The values of generations/error variances ratio were greater than unity (G/E ratio > 1) for all traits studied in the two crosses. According to classi cation by Gomes (2009), the values of CV% recorded for studied traits across the six generations in the two crosses were low (CV < 10%). The lint percentage and seed index traits recorded the lowest values of CV % in the two crosses (Fig. 1).
Highly signi cant differences among the two parents and their F 1 , F 2 and backcross generations means were observed for all studied traits in the two crosses (Fig. 1). The variety Giza 93 exhibited the best performance for all traits studied compared with the other three varieties, except for the seed index trait (variety C.B. 58). The F 1 mean performance was determined to have a high value compared with the assessed respective parents and three segregation generations for all the studied traits in the two crosses. When comparing within the segregation generations, the BC 1 generation showed superiority in most studied traits in the two crosses.

Correlation analysis:
The Pearson's correlation coe cient of seed cotton yield and its components are separated for an evident understanding of relationships between these traits in each generation as presented in Table 1. Out of 180 Pearson correlation between studied traits, 56 (ranged from 6 in P 1 to 13 in BC 1 ) and 57 (ranged from 7 in P 1 to 13 in F 2 ) positive correlation coe cients were seen within the six generations in the two crosses Giza 92 x Pima S 6 and Giza 93 x C.B. 58, respectively. Seed cotton yield/plant, lint cotton yield/plant and No. of bolls/plant traits had strong positive and highly signi cant correlations (p < 0.01) across all six generations in the two crosses. Boll weight was signi cantly and positively correlated (p < 0.01) with seed and lint cotton yields/plant in F 2 generation during the two crosses studied, and with lint percentage trait in BC 1 generation under the cross Giza 93 x C.B. 58. The seed index shows a positive and signi cant correlation with seed cotton yield/plant, lint cotton yield/plant and lint percentage traits in BC 1 generation across the cross Giza 92 x Pima S 6 . Lint percentage displayed a positive association with seed index in P 2 and BC1 generations (p < 0.05), as well as with lint cotton yield/plant in BC 1 , BC 2 (p < 0.05), P 2 , F 1 and F 2 (p < 0.01) generations during the cross Giza 93 x C.B.58.   combinations of the six generations across studied traits (Fig. 1B). Generally, the seed cotton yield/plant positively and signi cantly correlated with all studied traits in the two crosses, except the lint percentage in the cross Giza 93 x C.B. 58.
Cluster analysis: Based on the six generations data in the two crosses (Fig. 3A), the cluster analysis separated the studied traits into three main clusters at 80% similarity. The rst cluster contains No. of bolls/plant, seed cotton yield/plant and lint cotton yield/plant traits with 96% similarity.
While the second (seed index) and third (lint percentage and boll weight traits) clusters comprised of the rest studied traits at 80% similarity. Figure (3B) comprising of six generations could be further divided into three clusters based on the data of yield and yield component traits in the two crosses. The rst cluster consists of one generation (F 1 ) at 96% similarity. The second cluster consisted of three generations (P 1 , BC 1 and BC 2 ) with 99% similarity. While, the third cluster comprised of two generations (P 2 and F 2 ) at above 99% similarity. The greatest similarity occurred between P 1 and BC 1 generations followed by between P 2 and F 2 generations, while the lowest similarity occurred between F 1 and F 2 generations.
Principal component analysis: Principal component analysis (PCA) has been used to estimate the similarities and dissimilarities relationships between the studied traits across six generations variables in the two crosses of cotton, which are graphically displayed in a biplot of PCA1 and PCA2 (Fig. 4). Out of ve PCAs, the rst main PCA1 extracted had eigenvalues larger than one (Eigenvalue > 1) with a value of 4.77 (Fig. 4). While the rest four PCAs had eigenvalues less than one (Eigenvalue < 1). The rst two PCAs contributed 91.84% of the total variation existing among six generations regarding studied traits in the two crosses. The contributions of PCA1 to the total variance were higher than that of the other components, with PCA1 describing only about 79.47% of the measured data total variability.
The ve PCAs for six generations based on the studied traits during the two crosses are shown in Table 3. The PCA1 was dominated by F 1 generation in the two crosses and by P1, BC 1 and BC 2 generations in the cross Giza 93 x C.B. 58. On the other hand, the segregation generations in uenced the PCA2 in the two crosses, which explained 12.38% of the total variability. The PCA3 was controlled by P 2 and F 1 generations across the two crosses as well as by BC 1 and BC 2 generations in the cross Giza 92 x Pima S 6 . As for, the PCA4 was affected by non-segregation generations in the cross Giza 92 x Pima S 6 as well as by P 1 and F 2 generations in the cross Giza 93 x C.B. 58. Based on the six generations variables in the two crosses (Table 4)  The relationships among yield and its components traits across six generations in the two crosses studied are graphically displayed by the biplot of the rst two PCAs (Fig. 4). The two crosses performances during the six generations displayed a positive correlation among yield and its components variables, but, they differed in their degree and consistency in quantity. Positive and signi cant correlations (p < 0.05 or 0.01) between all possible pairs for investigated traits were found across six generations in the two crosses, except between lint percentage and seed index traits had positive and insigni cant correlation.
When comparing the six generations, the PCA1 and PCA2 showed that the yield and yield components variables were distributed in different regions and formed different groups (Fig. 4). The F 1 generation in the two crosses produced the highest yield and yield components. The F 1 generations in the two crosses as well as P 1 , BC 1 and BC 2 generations in the cross Giza 93 x C.B. 58 occupied Quadrants rst (I) and fourth (IV) of the diagram, and which is strongly correlated with PCA1. On the other hand, the rest generations in the two crosses were positively associated with PCA2 and occurred in the second (II) and third (III) quarters. Lint percentage and boll weight traits were located in quarter I with P 1 , BC 1 and BC 2 in the cross Giza 93 x Pima S6C.B. 58 and with F 1 generation Giza 92 x Pima S 6 . As for, the other traits were located near the F 1 generation in quarter IV across the cross Giza 93 x C.B. 58. The F 2 generation in the two crosses as well as BC 1 and BC 2 generations in the cross Giza 92 x Pima S 6 were located near the most studied traits. The scree plot of the PCA for six generations on yield and yield components during the two crosses displayed that the rst two eigenvalues correspond to the whole percentage of the variance in the dataset (Fig. 5).

ANOVA and Mean Performances:
The mean squares due to six generations exhibited high signi cance for studied traits in the two studied crosses. The genetic variances were higher than the environmental variances among the six generations for those traits across the two crosses. Cluster analysis: For comparison and determination of the similarities and differences between traits and generations studied on the basis of the data in the two crosses, UPGMA hierarchical clustering with correlation similarity index was used. By the cluster analysis, both studied traits and six generations were divided into three clusters. The maximum similarity occurred between seed and lint cotton yield/plant traits as well as between P 1 and BC 1 generations, while the minimum similarity occurred between seed index and lint percentage traits as well as between . The PCA1 had eigenvalues higher than one. Therefore, the PCA1 was kept for the nal analysis, in which, the PCA1 explains variance (79.47%) more than an individual attribute (Sharma, 1996) and it expresses more variability and support to select the trait with a positive loading factor. The results indicating that the rst two PCAs may be used to summarize the original variables in any further analysis of the data, as well as to explain 91.84% of the total variation and the grouping of the PCAs. Based on all measured data in the two crosses, the rst two PCAs had mainly distinguished the generations in different groups. Therefore, the rst two PCAs were employed to draw a biplot (Fig. 4). indicates the signi cance of these variables. The two crosses across six generations displayed a positive correlation among seed cotton yield and its components traits, but, they differed in their degree and consistency in quantity. These results indicate that selection based on these traits would result in an increasing cotton yield in both crosses. Hence, emphasis must be placed on these materials in a breeding program to improve the Egypt cotton. The biplot showed the degree of correlation amongst cotton traits (Sarwar et al. 2021). When comparing the six generations, the PCA1 and PCA2 showed that the yield and yield components variables were distributed in different regions and formed different groups, and therefore these results indicate that there are differences between these variables across the six generations in the two crosses. The respective variable distances from the rst two PCAs demonstrate the contribution of different variables at the total variability (Sarwar et al. 2021).
The biplot diagram depicted the contribution of yield and its components traits in creating a variability of six generations. The biplot analysis of the relationship between the six generations revealed that the most appropriate generations for selecting yield traits were F 1 in the two crosses and BC 1 and BC 2 in the cross Giza 93 x C.B. 58. Meichinger (1987) declared that F 2 , BC 1 and BC 2 offer equal alternatives with respect to time, work, inbreeding level, and the amount of genetic variation released within lines in subsequent sel ng generations if linkage and epistasis are of small importance. Therefore, the choice to separate a population can be based on the characteristics of the rst segregating generations. The Fig. 1 shows that there is a divergence among the six populations, thus these diversity can be used to improve the yield and its components in cotton. During the biplot diagram of the rst two PCAs, the plants close to the ideal type would be

Conclusions
Signi cant divergences among the six generations for all studied traits in the two crosses were observed by ANOVA. The F 1 performance was higher than the other generations for all the studied traits in the two crosses. The results of Pearson's correlation coe cient and multivariate analysis from our study could be useful and use in breeding programs for cotton yield improvement. Therefore, we recommend considering backcrossing may be done for 2-5 cycles (BC 2 -BC 5 ) at C.B. 58 parent for improving Egyptian cotton yield in future.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and material
All data and materials are available.

Competing interests
No competing of interest.

Funding
The study was funded by the two authors.
Authors' contributions WY and EE suggested the research idea, designed the experiments, collected eld data and equally contributed to interpreting results, writing and revising the manuscript, as well as approved the nal manuscript to be published. Figure 1 The values of means and LSD for studied traits during the non-segregation and segregation generations in the two hybrids G. 92 x Pima S6 (blue columns) and Giza 93 x C.B. 58 (green columns). P1 and P2: First and second parents; F1 and F2: First and second generations; BC1 and BC2: First and second back crosses, respectively; G/E ratio: mean squares of generations/ mean squares of error; C.V%: Coe cient of varition; LSD values denote highly signi cant differences between the six generations were analyzed by ANOVA test.

Figure 2
Plot describing Pearson's correlation between yield and yield components traits (A) as well as between six generations (B) in the two hybrids of cotton. BW: Boll weight; SCY: Seed cotton yield/plant; LCY: Lint cotton yied/plant; L%: Lint percentage; No.B: Number of bolls/plant; SI: Seed index; P1 and P2: First and second parents; F1 and F2: First and second generations; BC1 and BC2: First and second back crosses, respectively. The large and medium blue circles indicate a positive and signi cant (* p < 0.05) or highly signi cant (** p < 0.01), while the small blue circles indicate a positive and non-signi cant correlation.   Scree plot of PCA between respective eigenvalues and components number.