Hardware description
The FruitPhenoBox (Figure 5) consists of an external frame of 850x850x660 mm built of aluminum profiles (30x30mm) mounted on a wooden base plate. Five rgb cameras, one from each side and one from the top (DFK 33UX273, 1296 x 1080 pixel + TCL 0814, www.theimagingsource.com) are mounted on the aluminum profiles and oriented toward the center, where a weight scale with serial connection port (Kern PCB 1000-1, www.kern-sohn.com) and a pedestal is placed. Cameras and scale are connected to a common personal computer. Two LED-rings (YONGNUO YN308, www.yongnuo.eu) are providing homogeneous lightning from the bottom and from the top. A barcode scanner (QuickScan Lite QW2400, www.datalogic.com) is used for registering unique ID/Cultivar name/accession number/QR code and a USB pedal (6210 - 0084 USB Footswitch, www.herga.com) is used as trigger. White PVC panels (3mm) with camera holes (39 mm diameter fitting camera lens) shield the FruitPhenoBox from external light disturbance.
Image acquisition and analysis
Image acquisition and analysis are steered by two different Matlab [12] scripts that were compiled to run in the Matlab Compiler Runtime 2018b environment. The first script controls the hardware as follows: Prior photograph shooting, camera parameters are defined to fix white balance parameters at first run and reload them afterwards. Prior to the measurement process, a unique barcode is assigned to the cultivar of each single tree name (ID) and scanned with a 2D-scanner. A fruit is placed on a pedestal on the scale in the center of the FruitPhenoBox, and the foot pedal triggers the snapshot from the five cameras as well as the recording of the fruit weight. This latter is recorded in the image metadata. Image names are generated concatenating ID, camera number, an incremental number starting from one, the shooting date and time, and are stored in 48-bit TIF format in a folder "images" and distributed in subfolders according to the barcode and the camera used for shooting (e.g. images/ID/cam1). Each snapshot generated five lossless compressed (packbits) TIF-images (1296x1080, 48-bit, each sizing approximatively 8 Mbytes each). Once all fruits are imaged, each single image is then segmented using a second Matlab script to remove the image background, define fruit position and outline its contours. Segmentation is done using hsv-color space in several steps: the ratio of saturation and value was thresholded with 0.2 and intersected with all values of saturation above 0.19 resulting in the first mask m_1. Additionally for each camera an image of the empty FruitPhenoBox (taken without apple) is subtracted from the actual image and the difference segmented with Otsu’s method [13] resulting in a second mask m_2. The set union of both masks was calculated as apple area in each image. Distribution of pixels color of the fruit area was analyzed and used to generate a histogram. Width and height were calculated from each single image utilizing the real-world pixel size of calibration images.
Fruit contours were converted from cartesian to polar coordinates (around their centroid) and interpolated to integer degree values for simple further analysis. For side views the contour was corrected to maximize symmetry to the y-axis by rotation around its centroid, the fruit mask was subsequently rotated by the determined angle. As an indicator of fruit symmetry, the fruit masks of side views were mirrored vertically at the axis parallel to the y-axis through their centroid. The area of the differences of original mask and mirrored mask was divided by the double area of the original mask yielding 0 for perfect symmetric shapes and 1 as maximum for this indicator for very unsymmetric fruit shapes. The radius of the contours was compared to the shapes provided in pomological description manuals [7], assigning fruit shape to one of the 13 available categories based on the highest correlation. Top view contours coordinates were transformed into polar coordinates with origin in the center of mass of the apple area. In the following these coordinates were used for determination of circumcircle, mean radius and incircle as well as the deviations of the radius to the mean circle. The standard deviation of these deviations was calculated and normalized to radius 1. Then the single image analyses were grouped per fruit, and then a text file containing the information of the variables listed in Table 1 per fruit of each single cultivar was generated.
Table 1: Apple fruit output variables of the second Matlab script assessed with the FruitPhenoBox.
Variable name
|
Description
|
ShapeClassCorr_side
|
normalized cross-correlation coefficient of side shape with reference shapes
|
ShapeClass_top
|
id of best-fitting reference shape, top view
|
weight_g
|
mass of apple [g]
|
diameter_deviation_cm
|
maximal deviation of diameter from mean diameter [cm]
|
diameter_cm
|
mean diameter [cm]
|
diameter_min_cm
|
minimum diameter [cm]
|
diameter_max_cm
|
maximum diameter [cm]
|
width_cm
|
apple width of all side views [cm]
|
width_mean_cm
|
mean apple width [cm]
|
height cm
|
apple height of all side views [cm]
|
height_mean_cm
|
mean apple height [cm]
|
symmetry
|
symmetry value: difference area with mirrored mask / 2 / mask area
|
symmetry_angle_deg
|
rotation angle of mask to achieve the minimal symmetry value
|
colhistheader
|
column names of color histograms
|
colhist
|
color histogram of hue and saturation of all views
|
R-Script for results visualization
The output files of the second Matlab script are then used as input for an R script [14] to summarize and visualize the data. The script creates summary factsheets (Supplementary data) per tree or cultivar depending on user input. For this purpose, a text file with the field plan of the experimental site including tree name/ID, cultivar name, row and tree position is required as an additional input file. The resulting factsheet contains information about fruit weight, width, height. Each top shape, an average side shape per fruit and an average side shape per single tree or cultivar (in case several trees of the same cultivar are investigated) is displayed. The average side shapes are determined using the polar coordinates; for this purpose, the average radius is calculated for each degree. Furthermore, the factsheet shows five photos, one from each side, of the first fruit per tree or cultivar, a color histogram of the mean values and additional information about the shape and data collection. Overall, the factsheets give an overview of the data and a first impression of the cultivar specific properties. Additionally, the R script exports polar coordinates, Cartesian coordinates, raw data of color histograms, correlation to predefined shapes and fruit weight and caliber in different text files. Thus, all the data collected is available in a compact and machine-friendly format for further analysis. The R script requires the packages dplyr [15] and tidyr [16] for data wrangling. The packages ggplot2 [17], gridExtra [18], gtable [19], qrcode [20] and magick [21] are required for image processing, data plotting and creating the fact sheet.
Performance assessment
To assess the performance of the FruitPhenoBox, approximatively 100 fruits from 15 apple cultivars each ('Ariane', 'Bonita', 'Braeburn', 'Milwa' (Diwa®), 'Gala Galaxy', 'Gala SchniCo Schniga', 'CH 101' (Galiwa®), 'Golden Reinders', 'Ladina', 'Mariella', 'SQ159' (Natyra®), 'Cripp's Pink' (Pink Lady®), 'PremA96' (Rockit®), 'Rustica' and 'Topaz') were photographed by a single user. Time stamps were used to estimate the average time elapsed between the snapshot of two different fruits, including the time required for replacing the fruit in the chamber. Average shapes per cultivar were generated, together with a boxplot representation of the distribution of weight, width, height, and shape index (height/weight). One-way ANOVAs were performed in R [14] to analyze differences of the four parameters between cultivars. For each parameter, letters indicating significance groups (p < 0.05) were assigned according to a post hoc Tukey test.
Optimal sample size determination
To determine the optimal sample size required to obtain confident results, we tested sample sizes (n) between five and 50 apples per cultivar. For each sample size, a random subset of n apples per cultivar was randomly selected from approximately 100 apples and the mean of their weight, width and height was calculated (200 times for each sample size). From the resulting mean sample values, the 2.5% and 97.5% quantiles and the means were calculated for each trait and cultivar. Description 95% confidence interval is the range between the lower and upper quantile divided by the mean value per cultivar and its half correspond to the relative margin of error.