Seven data files were simulated with p = 12 variables (X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11 and X12) and n = 6340 observations, using the mvrnorm function of the MASS package in R software (R Core Team, 2021). The variables were simulated with a multivariate normal distribution, with a mean zero and standard deviation one, and conditioned to all 66 Pearson's linear correlation coefficients (r) between the variables in each data file being equal. Thus, the correlation matrices between the 12 variables in files 1, 2, 3, 4, 5, 6 and 7 showed 66 r values equal to 0.35, 0.45, 0.55, 0.65, 0.75, 0.85 and 0.95, respectively.
The eighth data file was formed by 12 variables measured in 6340 maize plants (Zea mays L.). These real data were obtained from experiments conducted in the 2008/2009 (first experiment) and 2009/2010 (second experiment) agricultural seasons, in the experimental area of the Plant Science Department, at the Federal University of Santa Maria, Santa Maria, State of Rio Grande do Sul, Brazil (29º42'S, 53º49'W, at 95 m altitude). In the first experiment were evaluated 361 plants of the single hybrid P32R21, 373 plants of the three-way hybrid DKB566, and 416 plants of the double cross hybrid DKB747. In the second experiment were evaluated 1777 plants of the single hybrid 30F53, 1693 plants of the three-way hybrid DKB566, and 1720 plants of the double cross hybrid DKB747.
In all the 6340 plants, the following variables were measured: plant height at harvest (PH, in cm), ear insertion height (EIH, in cm), ear weight (EW, in g), number of grain rows per ear (NR), ear length (EL, in cm), ear diameter (ED, in mm), cob weight (CW, in g), cob diameter (CD, in mm), hundred grains weight (HGW, in g), number of grains per ear (NGE), grain length (GL, in mm), calculated as the difference between the diameters of ear and cob divided by two, and grain yield (GY, in g per plant).
In the eight data files, consisting of 12 columns (variables) and 6340 rows (observations), principal component analysis (PCA) was performed from Pearson's linear correlation matrix between the variables. The correlation matrix, for the maize data file, was chosen due to the different variable measurement scales.
For each data file, 989 sample sizes (number of observations) were planned, with the initial sample size of 12 observations (in this study considered as a reference, i.e., minimum size required for principal component analysis) and the other sample sizes obtained with the increment of an observation. Thus, the planned sample sizes were n = 12, 13, 14, ..., 1000 observations. Thus, sample sizes of 12 to 1000 observations were planned. For each sample size planned, in each data file, 3000 resamples with replacement were obtained. In each resample, the eigenvalues estimate of the first two principal components (PC1 and PC2) were obtained. Thus, for each planned sample size, 3000 estimates of the PC1 and PC2 eigenvalues were obtained.
Based on 3000 eigenvalue estimates for each sample size and principal component (PC1 and PC2), the 97.5% percentile, the mean, the 2.5% percentile and the coefficient of variation (CV, in %) were determined. The percentile 97.5% (P97.5%), mean, percentile 2.5% (P2.5%), and coefficient of variation (CV, in %), for n = 12 and n = 1000 observations were presented in a table and the others were plotted in graphs for better visual representation. The statistical analysis was performed using Microsoft Office Excel and the R software (R Core Team, 2021).