Jatropha plants have a great variation in the fruit ripening time, with the same plant showing fruits at different stages of ripeness . In this study, changes of pericarp color were used as indicators of ripening, and all fruits were collected in the ‘brown dry’ maturity stage . Fruits were obtained from two accessions grown respectively in Canarana, Mato Grosso State, Brazil (13° 33′ 16′′ S 52° 16′ 20′′ W) and in Rio Largo, Alagoas State, Brazil (9° 29′ 41′′ S 35° 51′ 12′′ W).
Three seedlots (Lots 1, 2 and 3) were investigated. Lots 1 and 3 were obtained from the same mother plant grown in Canarana, both collected manually by the farmers: the fruits of Lot 1 were harvested in December 2018 and the fruits of Lot 3 were harvested in February 2019. In Canarana, the mean annual temperature is 24.8 ºC and the mean annual precipitation is 1541 mm, with a dry season from May to July, and a wet season from December to March. In the Köppen-Geiger system the climate of Canarana is classified as Aw. The fruits of Lot 2 were collected manually from field experiments carried out in Rio Largo. In Rio Largo, the mean annual temperature is 24.1 ºC and the mean annual precipitation is 1630 mm, with a dry season from October to February, and a wet season from March to September. In the Köppen-Geiger system the climate of Rio Largo is classified as Am.
After the harvest, fruits were kept at room temperature for one week. Then, seeds were extracted manually from the fruits and each seedlot was homogenized and evaluated for moisture content (fresh weight basis) which ranged from 11.3 to 11.8%. All seedlots were packed in Kraft paper bags and stored at 20 ºC and 40% RU during the experimental period. In this condition, the seed water content was reduced, varying between 6.5 and 6.6%. Traditional tests were performed to rank the lots based on germination and vigor.
Traditional tests to rank lots based on germination and vigor
Seeds were sown on paper towel and sand substrates and kept at 30 ºC and a photoperiod of 12 hours: ten repetitions of 10 seeds per lot were distributed on paper towels moistened with distilled water (1: 2.5, g: ml), and four replications of 25 seeds per lot were sown in sand (moistened to 60% of its water holding capacity) in plastic trays. The percentage of normal seedlings per lot were recorded at 5 and 10 days after sowing. To calculate the germination rate index – GRI , the number of emerged seedlings on paper substrate was monitored daily during 10 days.
Four replications of 15 seeds per lot were weighed and maintained for 6 hours in containers with 75 mL of distilled water at 25 oC . The electrical conductivity (μS cm-1 g-1) was measured using a DIGIMED DM-32 conductivity meter.
Four subsamples of 25 seeds per lot were sown in plastic trays containing sand moistened to 60% of its water holding capacity. Boxes were maintained at room temperature. The percentage of emerged seedlings was determined at 10 days after sowing.
Data from germination tests, electrical conductivity and seedling emergence were analyzed separately by analysis of variance in a completely randomized design and the means compared by the Tukey’s test (P < 0.05).
Fat and protein content
Proximate chemical composition analysis of the seeds was performed according to the methods of the Association of Official Analytical Chemists  for crude fat (AOAC No.4.5.01) and crude protein content (AOAC No.4.2.11). Percent data of crude fat and crude protein content were separately fitted to a linear model with normal distribution for errors, including seedlot as the fixed effect in the linear predictor. Post-hoc contrasts between seedlots were further determined by Tukey test (P < 0.05).
Multispectral images were obtained using a VideometerLab4 (Videometer A/S, Herlev, Denmark) and its software VideometerLab version 3.14.9. This instrument is integrated with a sphere providing homogeneous and diffuse illumination using strobe light-emitting diode (LED) technology. Reflectance images were captured at 19 different wavelengths (365, 405, 430, 450, 470, 490, 515, 540, 570, 590, 630, 645, 660, 690, 780, 850, 880, 940 and 970 nm), combining them into high-resolution multispectral images (40 μm/pixel). Every pixel in the image contains reflectance data, which varies depending on color, texture and chemical composition of the sample.
Ten replications of 10 seeds per lot were placed in 9-cm Petri dishes. Before image acquisition, the individual and automated adjustment of light intensity in each wavelength band was performed to optimize the illumination for the specific type of sample, resulting in an improved signal-to-noise ratio in such a way that the multispectral images captured from different seed classes could be directly comparable. Light setup was adjusted using a representative sample area, then the strobe time of each illumination type was optimized with respect to this area. The auto light assures an optical dynamic range of each band without saturation within the auto light ROI. Subsequently, the instrument was calibrated using three calibration targets: (i) uniform bright disc, (ii) uniform dark disc, and (iii) geometric disc which is black with dots in a rectangular grid.
Multispectral images were captured from both ventral and dorsal seed surface of 10 samples with 10 seeds per lot. The overview of the ventral and dorsal surfaces of the three seedlots is shown in Fig. 8. After successive lighting using 19 LEDs (sequential strobes), multispectral images of a sample (plate with 10 seeds) were captured in a few seconds, requiring no sample preparation. The ROI of each seed was extracted into a Binary Large Object (BLOB) toolbox, a built-in function in VideometerLab software; each BLOB was a representation of one seed. Mean spectra were plotted to show the difference among the three seedlots based on their multispectral signatures. A nCDA algorithm was used as a supervised model based on multispectral image transformation, which allows to minimize the distance to observations within seedlot and to maximize the distance to observations among seedlots.
We applied a PCA method to process the multispectral data using the “FactoMiner” package . A biplot using the first two principal components (PC1 and PC2) was built to select the most meaningful wavelengths to discriminate the seedlots, according to Pearson’ correlation test (P < 0.05). Multispectral data corresponding to only meaningful wavelengths, as previously assigned by PCA were used in a CDA model implemented with a “candisc” package . We tested the effect of low, high and medium vigor (i.e., three classes of seed physiological potential) on the multispectral data using a multivariate analysis of variance (MANOVA). The statistical analyses were performed using VideometerLab software and the “free software environment for statistical computing and graphics” R .
In total, 100 seeds per lot were radiographed. Seeds were numbered and fixed on an adhesive paper in groups of 10 seeds. Radiographic images were generated using a MultiFocus digital radiography system (Faxitron Bioptics LLC, USA). This system is equipped with a complementary metal-oxide-semiconductor (CMOS) X-ray sensor coupled with an 11 μm focal spot tube and up to 8X geometric magnification and provides as high as 6 μm resolution for seed imaging with a choice of a 48 μm or 24 μm. The built-in advanced Automatic Exposure Control selects the appropriate exposure time and kV settings for each sample.
After X-ray imaging, four repetitions of 25 seeds were sown in sand (moistened to 60% of its water holding capacity) placed in plastic boxes (32.0 x 28.0 x 10.0 cm), kept at 30 ºC and photoperiod of 12 hours. At 10 days after sowing, the individual seeds were evaluated for different quality traits: normal seedlings, abnormal seedlings and dead seeds. Next, they were separated into three different classes based on seed performance in the germination test and tissue integrity in the radiographic images. A CDA analysis was implemented by “candisc” package in R  to provide the best discrimination among seedlots categories using a dataset derived from X-ray classes, reflectance data at 940 nm and quality traits (normal seedling, abnormal seedling and dead seed).
Three models were developed using LDA algorithm. The first model was created using multispectral data at 940 nm. Data obtained from X-ray classes were used to develop the second model. Finally, multispectral and X-ray data were combined to create the third classification model. In total, 300 seeds were used to develop the models. Training was run with 210 seeds (70%), and the remaining 90 seeds (30%) were used for independent validation set. Additionally, 5-fold cross-validation was performed using training data. The metrics of accuracy, Cohen's Kappa coefficient, sensitivity and specificity were calculated using a confusion matrix to evaluate the models. Data analysis was performed by R software using the “caret” package .