Agronomic and molecular evaluation of rice lines in a breeding program

Obtaining new and improved varieties of rice requires long and complex plant breeding programs. The early detection of desirable characteristics is a complex process, especially when seeking to improve yield, as the interaction between the environment and plants may hinder selection in early generations considerably. Techniques that facilitate the selection of plants with desirable characteristics in early generations are highly valuable to plant breeders. An indirect selection method in early generations of rice was examined by principal component analysis of performance supported by eld tests with a honeycomb design. This study used double haploid lines of rice obtained by crossing two rice varieties, namely ‘Benisants’ and ‘Gigante Vercelli’. This method was compared to indirect selection using genomic tools such as high-throughput molecular marker analysis. The main factors that can be used in indirect selection have been selected by principal component analysis. The model resulting from the phenological evaluation and principal component analysis with six selected variables explained 98.73% of the total variability of yield. The variable that contributes the most to the model is the Harvest Index. The best selected lines provided 32% and 43% higher yield values than the parentals and match the results from indirect selection with molecular markers.


Introduction
Rice (Oryza sativa L.) is one of the most important crops worldwide. There are thousands of varieties of rice that differ in their qualitative, quantitative, agronomic and environmental adaptation characteristics.
Plant breeders are continually seeking to improve some of these characteristics, which requires nding new sources of variation as a function of type of rice, production ecosystem, or environment of the new variety, as the selection characters may vary with their environment or system (Jennings et al., 1981;Aguilar et al., 2005;Cheng et al.;2020).
New varieties should also be adapted to a wide range of environments. However, cultivars with wide adaptability or stability are di cult to identify when the phenotypic response to environmental changes varies between study plants. The interaction between the plant or genotype and the environment (GxA) can reduce the progress of selection by making it di cult to identify potentially superior cultivars.
When selecting for desirable characteristics for rice improvement, new gains are usually obtained by crossing two varieties or pure lines of rice, which can provide sources of variation. Targeted selection is an effective, yet delicate procedure for driving genetic progress. Selection tends to focus on early generations resulting from crosses from the segregating populations, where selection for yield is prioritised and where competition between plants affects this potential (Ntanos & Roupakias, 2001). However, rigorous pedigree selection, the limited number of plants that can be managed and GxA interactions can lead to a loss of desirable genotypes in early generations of rice. These could otherwise have been used as homozygous lines in later generations (Nagai cited by Ntanos & Roupakias, 2001). In turn, competition between individuals in a segregating population can be intense and decrease the reliability of selection. Mass selection could therefore be more effective than pedigree selection. For these reasons, rice breeders use and adapt different selection methods in their varietal improvement programs, promoting adaption to different situations.
In other cereals, including wheat (Triticum aestivum L.) and barley (Hordeum vulgare L.), the selection of individual plants in early generations may be effective for qualitative traits but ineffective for quantitative traits such as grain yield (Nagai cited by Ntanos & Roupakias, 2001). However, some breeders suggest that selection for high-yielding genotypes should be performed in the F2 (second lial) generation and in subsequent segregating generations (Mckenzie & Lambert, 1961;Sneep, 1977), using suitable eld designs to evaluate expression in as many genotypes as possible. For example, successful selection can be achieved when plants are adequately distributed in the eld using a "honeycomb" design (Mitchell et al.,1982;Lungu et al., 1987;Fasoulas & Fasoulas, 1995;Roupakias et al., 1997). Selection in early generations is based on qualitative traits with high heritability that are easy to measure and evaluate. These traits can be morphological, controlled by major genes, and by some polygenic traits, such as harvest index, days to owering, stem length, and plant type and vigor which can be evaluated and are highly heritable. In segregating populations derived from divergent parentals, visual selection may also be effective for performance components such as panicle type and size, number and size of grains per panicle, and grain fertility (Yonezawa, 1997). However, xing these desirable characters in the selected plants requires a high number of study characters and several generations. The selection process is therefore long and expensive.
Other selection methods have been developed to accelerate the breeding process, such as those based on double haploid line (DHL) mapping, which allow xing genetic systems of individual lines. A fast and widely used technique for obtaining DHLs is anther culture. This technique makes it possible to produce homozygous lines from segregating populations by chromosomal doubling of haploid pollen and plant regeneration, in an in vitro cultivation cycle (Lentini et al., 1977). Double haploid (DH) plants obtained from anther cultures are homozygous and can therefore be used directly for yield trials. The stability of characters obtained via anther culture is similar to that via conventional methods (Sugimoto & Takeoka, 1998). However, the resulting DH plants have not undergone an evaluation and selection process for agronomic characteristics, biotic or abiotic stress, grain quality, and adaptation to a speci c ecosystem. In conventional methods, by contrast, they have already been selected from a number of desirable agronomic and adaptive characteristics by the F4 or F5 generation. For this reason, DH plants are the starting point for the breeder to begin the evaluation and selection cycle. This usually starts in the R2 generation, after multiplying the small number of seeds obtained in the R1 generation. From R2, the evaluation and selection process can be performed similarly to conventional improvement methods.
Several qualitative and quantitative characters are evaluated during the improvement process. Direct selection in the rst generations is di cult if performance is considered the main parameter; for this reason, methods that simplify the process are valuable. An indirect method is Principal Component Analysis (PCA), which is a variable reduction procedure used when collecting data on variables that may be related. PCA makes it possible to transform these variables into arti cial components that explain most of the variance of the measured variables and may help selection in the rst generations.
In eld tests, performance is subject to considerable experimental error and performance components are often inversely correlated (Xing & Zhang, 2010). Molecular techniques are especially useful when working with quantitative traits with low heritability as an alternative to eld and greenhouse tests. These involve a multitude of genes with little effect and may depend on the environment. One of these techniques consists of mapping and analysing chromosomal regions to identify genes with quantitative effects (quantitative trait loci, QTL) which can be used as traceable markers throughout generations of heritable characters. QTL make it possible to relate the genotype of an organism to the expressed phenotype. Hundreds of QTL involved in the regulation of performance-related traits have been detected in rice, most linked to the grain-weight yield component. However, yield components more sensitive to environmental variations, such as variations in the number of fertile stems, are seldom mapped. For QTLs to be useful in an improvement process, they must be validated in different populations because results vary between tests and between varieties.
For rice breeders, the development of selection methods for evaluating qualitative and quantitative characteristics of rice in early generations is a valuable resource, both from an e ciency perspective in obtaining new varieties and from the economic standpoint, by reducing the cost of eld tests. The main objective of this study was to evaluate the suitability of an indirect method for selection of performance in early generations by PCA. A lattice square design was used in eld tests and the results were compared to those of another indirect selection method, based on high-throughput analysis of molecular markers (microarrays). To achieve this objective, selection was performed in early generations of high-yield lines derived from a DHL obtained by crossing two varieties of rice, 'Benisants' and 'Gigante Vercelli', subsequently: a) characterising the DHL molecularly by high-throughput molecular marker analysis (microarrays); b) determining the main yield-related variables or factors that can be used in an indirect selection process in early generations by PCA; and c) comparing two indirect selection techniques, PCA and massive molecular marker analysis (microarrays).
The use of indirect selection of yield assessed in eld tests by PCA, based on the evaluation of individual plants and the comparison to indirect selection using molecular markers is a novel approach. This is a less expensive method than is commonly used by conventional seed improvement companies.

Material And Methods
'Benisants' and 'Gigante Vercelli' (GV) and DH plants were cultivated from a cross between two half cycle Japonic-type rice varieties obtain their seeds, which were individually collected. The crossing was done by Copsemar (Cooperativa de Productores de Arroz, Sueca, Spain) and DH plants were obtained at the laboratory of Aula Dei Experimental Station of the CSIC (Zaragoza, Spain). 'Benisants' is a medium-height Spanish variety (Cooperativa de Productores de Semillas de Arroz, S.C.L., Copsemar) with medium-sized pearl-like grain. GV is a tall Italian variety with a large pearl-like grain. Of all the lines obtained (DHL), only those lines that initially had a similar shape and cycle adapted to the eastern part of Spain and that presented some difference with some parentals were selected. Furthermore, a small group of similar lines to GV were used because GV shows good resistance to Rice Blast (RB) disease, caused by the fungus Magnaporthe grisea (T.T. Hebert) M.E. Barr. In total, 115 target lines were selected. The 'Gleva' variety (Semillas Certi cadas Castells SL) was used as the control.
Mass markers analysis (microarrays) was used for the characterisation and molecular selection of DHLs, using both the original DH population and the parentals. To this end, DNA was isolated from the leaves of 115 DHLs as well as from the two parentals 'Benisants' and GV, following the Diversity Arrays Technology's protocol (DArT, Yarralumla, Australia, http://www.diversityarrays.com) and then analysed on its SNP (Single Nucleotide Polymorphism) markers platform. The DNA samples of the problem rice populations (parentals and descendants from the crossing) were hybridised in a series of DNA microarrays. This was achieved by xing a collection of more than 6,000 randomly cloned sequences in more than 50 rice varieties on glass plates. This genotyping system determines the fragments with which each problem sample is to be hybridised. The differences in hybridisation patterns reveal the genome regions that vary among the analysed samples. This technique is particularly useful for studying quantitative characters, such as germination vigour (Reinke, 2006), for which a reference panel designed by DArT is used. For the molecular characterisation, the genetic distance between each pair of individuals was calculated using the genetic distance coe cient (Nei et al., 1983) with the POPULATIONS software, v. 1.2.31, and a dendrogram created following the Neighbour-Joining method (Saitou & Nei, 1987), implemented in the MEGA5 programme (Tamura et al., 2011).
In order to evaluate yield-related characters phenotypically, eld trials were run in 2009 and 2010 in a plot covering a surface area of 1.47 ha in Sueca (Valencia, Spain). In each of the trials, the 115 DHLs, the two parental lines in duplicate and the variety 'Gleva', as a control, were compared, for a total of 121 treatments. The design of the trials was in a balanced lattice square, with the 121 treatments arranged in repetitions for each of the lines. The elemental plots were 0.5 × 0.5 m in size and contained a single plant.
The area of each elemental plot was large enough to avoid competition between plants. It also allowed the maximum expression of the morphological characters that emerge in an isolated setting without restricting water supply and nutrients, and with no infections (Donald & Hamblin, 1976;Kropff et al, 1994;Ntanos & Roupakias, 2001). The whole trial area was surrounded by a row of parental 'Benisants' plants up to the same plantation mark to avoid the edge effect. The total surface area measured 350 m 2 and the 'Fonsa' rice variety was sown in the rest of the plot. Fertilisation for the trials used 20 kg of complex fertiliser 27-13-10 into the preparatory soil work, which was equivalent to 154 kg ha The parameters determined for all the plants were: height to panicle node (NH), measured from the plant base to the main panicle node; main panicle length (PL), obtained from the difference between the total height and NH; number of panicles per plant (PN); grain weight per plant (GW); total plant weight (TW), calculated as the sum of GW and the straw weight value for each tested plant; harvest index (HI), obtained as the quotient between GW and TW; and the number of days elapsed since sowing until 50% panicle emergence (DE), an indicative parameter of the cycle and each variety.
A subgroup of 55 DHLs, representative of the diversity found, were selected from the results obtained in the eld tests and in the molecular characterization. A detailed characterization of the parameters related to the panicles was carried out in these 55 DHLs, and they were used in the analysis of the yield components. To this end, grains were removed from two panicles from each plant, lled grains were separated from empty ones, and the following parameters were determined: number of grains per panicle (NGP); number of lled grains per panicle (NFGP); percentage of lled grains per panicle (%FILL); and the weight of 100 grains (W100). The Opto Rice High-Resolution Image-Processing Station was used to count and weigh grains and to measure length (L), width (W) and the grain L/W ratio.
RB susceptibility was evaluated during the 2010 eld trial as the environmental conditions that year favoured the development of the disease. In each plant, RB incidence (necrosis) in the panicle node was assessed because the damage it causes at this point is the most critical in terms of yield. To this end, the number of stems showing necrotic spots on the panicle node was counted and the percentage of harmed nodes was calculated against the total number of panicles.
The GENSTAT 12.1 software was used for statistical analysis. For each evaluated parameter, a Mixed Linear Model analysis (REML) was run. This model includes set (LINE and YEAR) and random (REPETITION, ROW, COLUMN) terms so as to ensure the analysis of all existing variability.

Results
DArT analysis yielded 465 SNPs between both parentals 'Benisants' and GV, which were unequally distributed in the 12 chromosomes. Based on these markers, a proximity dendrogram (NJ) was obtained with 115 DHLs and the parentals. The dendrogram showed that a large proportion of DHLs was genetically closer to the 'Benisants' variety, whereas few lines were more closely related to GV (Figure 1).
In the eld trials for phenotypic evaluation, rice plants were grown normally in 2009 and 2010. Evaluation of the phenological parameters of the DHLs revealed that for most DHLs, values for the evaluated parameters did not signi cantly differ from those observed for the 'Benisants' parental. Only a small group of lines had consistently similar values to the GV parental. This group appeared in the DArT analysis on a different main dendrogram arm from that of the other lines (Figure 1).
The complete population analysis results (121 genotypes) revealed signi cant differences related to the effects Year and Line in all the studied variables (p-value< 0.01). The Year x Line interaction was also signi cant (p-value< 0.01) for all the variables, except PL and DE (Table 1). All the variables presented signi cant differences for the Line effect. NH showed the largest difference, which explains why this variable presented the largest difference between parentals. A statistically signi cant difference also appeared in all the parameters for the Year effect (p-value<0.01). This effect can be explained by the climate differences in the crop-growing years in this test. Finally, the Year x Line interaction (Table 1) was also signi cant (p-value<0.01), except for two characteristics: PL and DE.
An effective indicator of the experimental error in the trials is the Coe cient of Variation value (CV). The CV values remained low for all the variables, as also indicated by studies by Kyriakou & Fasoulas (1985) and Fasoulas (1988). The values of most of the studied variables were below or close to 10%, which con rms the effectiveness of the design and trial data.
The results of the selected subpopulations with 55 genotypes of the yield component variables (Table 2) showed that all the variables were signi cant (p-value<0.01) for Year except for NGP. This indicated a strong genetic component in this parameter that was not affected by environmental conditions, which also occurred with the other parameters. Differences were signi cant for all the variables for Line and the Year x Line interaction (p-value<0.01). For all variables in both years, results were not always normally distributed. Asymmetries appeared, given the large difference found for the parental GV compared to most lines.
Height to panicle node (NH). This characteristic presented very high heritability with a very low standard error for the difference found. It was thus considered very stable and not in uenced by environmental effects. Three clearly different groups were found among the Lines within a range of measures that varied from 52.36 cm to 95.08 cm (50-60 cm, 65-70 cm, 80-95 cm). Most tested lines gave low values, included the reference 'Gleva' variety and the parental 'Benisants'. The rest of the lines had a signi cantly different NH to this large group, which was a similar group to the other parental, GV, and a small intermediate group (Figure 2).
Main panicle length (PL). The reference variety 'Gleva' exhibited the lowest PL value in both trial years (13.32 cm) versus the parental GV and its maximum value (20.43 cm). The other parental, 'Benisants', was among the mean values, between 'Gleva' and GV. The difference between the two extremes implied a 53% increase for PL. The obtained values suggest a positive asymmetric distribution with similar values among all the tested lines, but the parental GV and some similar lines did not come close to the other lines ( Figure 3).
Number of panicles per plant (PN). Signi cant differences were obtained in the values corresponding to PN, with a difference of 33% between the extreme values. The standard error of the difference in means was very high and the observed heritability was the lowest. These results imply that this variable has a very strong environmental effect and no differences could be concluded between both parentals ( Figure   4).
Grain weight per plant (GW). This was the most important crop outcome variable. Signi cant differences were found between the two study years; the mean values obtained in 2010 were 20% lower than those obtained in 2009. In 2009, the maximum values were nearly double the minimum values. In 2010, however, this difference was smaller and ranged up to 60%. This could result from differences in the transplanting dates and the more favourable climate conditions for crop growing in 2009. The standard error of the difference was also high, which indicates wide variability between the measured values and relatively low heritability. The values ranged between 90.01 g and 161.19 g, with a difference of 79% ( Figure 5).
Total plant weight (TW), or total biomass. This parameter has been indicated by some researchers as a reference to select genotypes. The values corresponded to the GW values, with a 78% difference between the maximum (268.64 g) and minimum (150.31 g) values. The two factors Year and Line and their interaction were signi cant (p-value<0.01), as indicated for the GW variable, because both variables were related. This interaction indicated the different performance of the lines in terms of plant GW and TW according to the environmental conditions. In 2009, the maximum values were nearly double the minimum values, while the difference between them reached 70% in 2010. Most of the lines were grouped between both parentals, and were biased towards one of the two, which was 'Benisants' in this case ( Figure 6).
Harvest Index (HI). This parameter measures e ciency in grain production compared to the produced total biomass. The standard error of the difference was low and heritability was high (88%), which indicates that this could be an reliable and stable indirect parameter to evaluate the productive potential of an improvement line. The difference between the minimum and maximum values was 65%. The values displayed discontinued progression with the parental GV at the very bottom of the graph and were isolated from those for the other lines, and most of the remaining lines had similar values to the other parental 'Benisants' (Figure 7).
Days until 50% panicle emergence (DE). This parameter has been very stable in this population, with the lowest CV and standard error values, and with very high heritability. The combination of the three indicators con rmed that this parameter is one of the most stables under similar crop-growing conditions. Indeed, no signi cant interaction appeared in this parameter, which meant that the lines had a similar cycle, despite the differing environmental conditions. Nonetheless, they did differ in the two years studied. Most of this series of lines presented biased values towards the parental 'Benisants' (97.13 days) and with lower values, while the values were higher for the variety GV (109.17 days). A small group was genetically similar to GV with values intermediate between both groups (Figure 8).
100-grain weight (W100). This parameter was stable. Here the effect of Year and Line, and the interaction of both, was statistically signi cant (p-value<0.01). The values of the lines followed a normal distribution, except for three lines with abnormally low and different values from those of the two parentals. A 26% difference was found between the line with the minimum value and that with the maximum value. In the case of the two parents, there were no differences between the two (3.35 g for variety 'Benisants' and 3.47 g for variety GV) (Figure 9).
Number of grains per panicle (NGP) and Number of lled grains per panicle (NFGP). These two parameters behaved similarly, as the corresponding histograms indicate (Figures 10 and 11). These characteristics are closely related and are considered two of the most important yield components. The values obtained were relatively stable during the two-year study period. The distribution of this characteristic between the lines on the average of years was asymmetric on the right as the group of genetically similar lines to the parental GV obtained the highest values, while showing discontinuity for the other lines. The difference between the minimum and maximum values for these parameters was notably high: 61% in NGP, and 58% in NFGP (Figures 10 and 11).
Mean percentage of lled grains per panicle (%FILL). This parameter indicates the relationship between the two previous parameters, namely NGP and NFGP. The mean relationship between NFGP and NGP was 91%. The differences observed between Lines and Years in the trial were statistically signi cant, as was the Year x Line interaction (p-value<0.01). This characteristic presented an asymmetric distribution to the left, caused by the lower percentages that the few more similar lines to the parental GV presented. Most lines were similar to the parental 'Benisants' (Figure 12).
Necrosis on the panicle node. Minimum values were obtained by the subgroup that was genetically more similar to the parental GV and the highest values were observed in the reference variety 'Gleva' with a 10fold higher value. The parental variety 'Benisants' was among the mean values. Although the results could be evaluated in 2010 only, analysis of variance (ANOVA) revealed some signi cant differences among the lines (p-value< 0.01) (Table 4), which con rmed that the design employed in the trial was capable of discriminating the lines more sensitive to pyricularia, even though the employed plantation mark did not simulate normal crop-growing conditions (Figure 13).

Discussion
In principle, a DHL population shows the expected genetic variability of one F2 generation (Lentini et al., 1997). The large proportion of DHLs closer to the 'Benisants' variety can be explained by the crossing of 'Benisants' x GV, which sought to incorporate certain characteristics of GV into the 'Benisants' variety, such as a larger grain, longer panicles, more grains per panicle, and better tolerance to pyricularia.
Rice plant growth is dependent on environmental conditions, speci cally temperature, to a great extent.
Climate conditions differed during the two-year period (2009-2010) when the trials took place. The low minimum temperatures when crop growing started during the 2010 eld trial affected the growth of the lines, which resulted in less vigorous plant development and tillering. As observed by other authors (Guimaraes, 2009), some parameters can be affected by environmental conditions, speci cally temperature variations. When temperatures increased, plants presented better tillering. Heat variation also differed in both study years, with maximum temperature values in May and August higher in 2010 than 2009, which could have in uenced plant development in both years. Another difference was total annual rainfall, with higher levels of 798.5 mm in 2009 compared to 542.1 mm in 2010. Rainfall in September 2009 was particularly relevant as several study plants were lost in this period. However, the higher frequency of rainy days at the end of August and in September 2010 provided more favourable conditions for RB to appear, which coincided with rice maturation.
The environment in uences all heritable characters. The concept of heritability represents what proportion of phenotypic variance is due to genotypic variance. We observed that the most stable variables and those with higher heritability were NH and DE, whose values exceeded 90%, while PN and GW were affected mostly by environmental conditions and would therefore not be suitable for directly selecting genotypes.
Once the genetic heritability of the different traits and the correlations between them have been obtained, it will be easier to increase total yield using indirect selection through the yield components (Diz & Schank, 1995;Rebetzke et al., 2002;Ukaoma et al., 2013). The analysis of the yield components using the mixed model that combined the results obtained in the two consecutive eld trial years, provided medium values for the main characteristics of each evaluated line and revealed statistically signi cant differences between the selected DHL. Yoshida (1983) reported the following heritability values: TN 54%; NGP 88%; %FILL 83%; W100 73%. These values indicate that these three main yield components could be used in an improvement programme and the percentage of the character remaining among generations would be high. The two lines with the highest GW values of 161.19 g and 160.10 g, respectively, were those corresponding to 96 and 99, which did not signi cantly differ from one another (Table 4). Increased yield could be achieved with either increased biomass production (TW) or HI (Khush, 1995, Cheng et al., 2020, which was previously the case with the reference variety 'Gleva', where this increase was achieved by a high grain-straw ratio. However, the two selected lines with better grain performance (96 and 99) showed a major increase in TW (biomass) compared with parentals 'Benisants' and GV (Table 4), in line with ndings for hybrid rices (Song et al., 1990;Yamauchi, 1994;Peng et al., 2000).
The NH value of these two better performing lines were similar to those of the parental GV. These values were high, which can make using these lines as a variety under the current crop-growing conditions di cult as they could be susceptible to early embedding. Resistance to embedding is one of the main selection objectives in the improvement programmes run in Spain today. For NT, the selected lines presented high and intermediate values between the parental 'Benisants' and the reference variety 'Gleva'. The TW values found in the selected lines were higher by 40% than those of the parentals and the reference variety 'Gleva', indicating a relationship between increased biomass and grain yield. For HI, the only variety with a very low HI value was GV, due to its height, resulting in a high weight value in the vegetative part for grain performance, which was the lowest of the series of lines.
Major differences were found in the NFGP value between selected lines 96 and 99 (153 and 177 respectively) and the parental 'Benisants' and the reference variety 'Gleva' (128 in both, indicative of short panicles), which implied increases of 19.5% and 38%, respectively. The parental GV had an intermediate value (144).
Progression was also noted among the values for W100, with the 'Gleva' variety showing the lowest value and trial lines 96 and 99 higher values, which were close to those for the parental GV. The difference between the mean of the two lines and that of the 'Gleva' variety was 11%. The values of the trial lines were similar to that for the parental GV, with which they shared genetic similarities.
For some characteristics, similarities were observed between the lines belonging to the same dendrogram arm where the variety GV was located (Fig. 1). For two of the four evaluated characteristics, a similar and clearly differentiated group formed that shared values with almost all the lines that belonged to the same dendrogram arm obtained in the DArT analysis (90, 92, 93, 96, 97, 98 and 99 in Fig. 1).
It proved useful to study the correlations between the evaluated variables to identify the variables that best predicted grain performance for indirect selection purposes (Table 5). The correlation coe cients between each pair of variables obtained within the − 1 to + 1 range measured the strength of the linear relationship among the variables. Based on these correlations, we can state that GW increased in this population with TW and its related characters: higher NH, PN, and NGP. This is in agreement with previous ndings (Song et al., 1990;Yamauchi, 1994;Peng et al., 2000).
The PCA provided the weights of all the variables in the corresponding PCs. The rst component explained almost 50% of the total variance. The variables that correlated positively with this rst component were NH, PN, NFGP, and TW and those inversely correlated were HI and %FILL; i.e. phenological variables and grain performance-related variables. This component is clearly and speci cally related to "plant size and its vegetative development" and to yield, which corroborates the aforementioned relationship between plant size and grain yield. Similar results were obtained by Kayode et al. (2008), Sanni et al. (2012) and Nachimuthu et al. (2014). The second component explained 21.45% of variance and was related to GW, this being the main variable, and to PN, which re ects the close relationship between both characters. This was followed by HI. This component was related more with the grain performance variables and the plant's GW/TW ratio than with plant size and can facilitate the identi cation of lines with higher yield values. The rst two PCs sum almost three quarters of the total estimated variability. The third component explained 11.32% of variance, and was related closely with W100 and inversely correlated with PN. This component is related to grain size.
Based on the results obtained from the evaluation of the morphological parameters and yield in the eld/laboratory, 58 lines were classi ed using a dendrogram (Fig. 15), where four groups were categorized. The rst group was composed only of the parental GV, the second of lines 90, 92 and 93, the third of lines 96, 97, 98 and 99, and the fourth of the other lines, along with the parental 'Benisants' and the reference variety 'Gleva'. These groups resulted from hierarchical cluster analysis, with similar results to those obtained with the PCA (Fig. 14). This nding demonstrates the relationship between both analysis types, as indicated by other authors (Shaibu & Maji, 2012).
Multiple linear regression analysis with all the evaluated variables (NH, PN, NGP, NFGP, %FILL, TW, W100, HI) is a useful tool to study variables that best predict GW. Variables NH and PN were ruled out when generating the model because they were not statistically signi cant. The equation of the adjusted model that describes the relationship between GW and the six signi cant independent (p-value < 0.05) variables is: GW = -14.9693 + 0.916703*NFGP -0.829354*NGP -1.41359*%FILL + 0.611113*TW + 4.00042*W100 + 212.653*HI The model's R-squared t was 98.73%, which indicates that the variability of the retained predictor variables explained 98.73% of the total variability of GW. In this model, the most relevant predictor variable was the HI; the higher the HI, the heavier the GW. Other variables positively correlated with GW are NFGP, TW and W100. However, NGP and %FILL were negatively related; i.e., the combination of a higher NGP and a higher %FILL, but a lower GW, is possible.

Conclusions
In conclusion, the experimental design of a lattice square and a plantation mark in an isolated setting (0.5 x 0.5 m) allowed the expression of morphological characters, as well as the phenological and productive characterisation of the family of DHLs. Our evaluation allowed us to understand performance in the eld and to select the best genotypes. The DaRT analysis enabled us to classify lines genetically and con rmed the previous selection of lines that were more phenotypically similar to the parental 'Benisants', except for a few DHLs whose characters were more like the parental GV.
The two most outstanding DHLs in the population obtained 32% and 43% higher yield values than the parentals 'Benisants' and GV, respectively. This indicates a transgressive inheritance phenomenon over the two parentals.
Grain yield showed the highest positive correlation with the plant's TW and more moderate positive correlations with PN, NFGP and NH and number of stems which, along with W100 and HI, predicted the best results, especially the mean GW per plant.
The heritability values we obtained were very high for some of the main characters, which con rmed the indirect selection theory as opposed to direct yield selection. The highest values were found in Hl, DS, NGP and NFGP.
The cluster analysis performed using the data obtained from the 2009 and 2010 eld trials was similar to the dendrogram created with the genetic analysis done of the population. This will assist in validating the lattice square design method in an isolated setting, as it is capable of obtaining the phenotypical differentiation of the genotypes to be tested.