Materials
We conducted a study in Guanshang Town, Zhangshu City, Jiangxi Province, China, collecting fruits from over seven years old trees of twelve varieties, including CL3, CL4, CL18, CL23, CL27, CL40, CL53, CL116, and so on. 108 grafting combinations were obtained by scions and half-sibling seed rootstocks of 12 varieties (Table S1 and Table S2). The fruits of these tree combinations were collected for characteristics determination. The varieties were chosen 5 trees, each with similar age, good growth, and no pests or diseases, for each combination. We randomly selected and measured 30 fruits from each combination. After peeling and drying the seeds, we conducted the next measurement. Each test consisted of three biological replicates. The collection of all the samples complies with institutional, national, or international guidelines and legislation. The local forestry management department authorizes the collection of all samples for this research.
Determiation of Fruit characteristics
To measure the characteristics of fruit, including the weight (g), height (mm), and diameter (mm) of fresh fruits, the weight of dried seeds (g), dried kernels (g), and kernel oil content (g), we can use a vernier caliper with a sensitivity of 0.01 mm and a 0.01 g electronic balance. Additionally, we can calculate the fruit shape index, kernel ratio of dried seeds, oil content of dried kernels ratio, and dry seed oil content using specific formulas. The fruit shape index is calculated by dividing the fresh fruit height by the fresh fruit diameter and multiplying the result by 100%. The oil content of dried kernels can be calculated by dividing the weight of kernel oil by the weight of dried kernels, multiplied by 100%. The dry kernel oil ratio can be calculated by dividing the weight of kernel oil by the weight of dry seeds, multiplied by 100%. The dry seed oil ratio is calculated by multiplying the kernel oil content by the kernel-fruit ratio of dry seeds and multiplying the result by 100%.
The oil extraction from seeds by Soxhlet Extraction (SE)
All samples of C. oleifera seeds were powdered by a laboratory plant grinder. Approximately 10 g of ground sample were weighed and recorded as w0(g), then transferred to a Soxhlet extractor filled with 180 mL petroleum ether (60–90 ℃), and extracted at 88 ℃ for 6 h. Finally, the solvent was evaporated under vacuum. The residual was dried at 60 C in vacuum to a constant weight of w1(g). The oil content was calculated and expressed according to the formula: w = w1/w0 100%. Experiments were conducted in triplicate.
Fat analysis of the extracted C. oleifera oil by GC
As FA of C. oleifera oil presented in the form of fatty acid triglycerides in general, it must be transformed to be methyl esters of fatty acids means of sodium hydroxide. 0.2 ml of the extracted C. oleifera oil were put in 10 mL tube. Two millilitres of 0.5 mol/L sodium hydroxide–methanol was added into the tube, shook, and then placed at 60 ℃ in water-bath for 30 min, 5 mL nhexane were added. The supernatant was taken for injection to a gas chromatography spectrometer (HP6890 series, Agilent Techologies Inc.), equipped with a Hp-5 capillary column (30 m 0.25 mm 0.25 lm). The injector and detector temperature were set at 280 ℃. The oven temperature was programmed from 100 ℃ to 270 ℃ with a speed of 5 ℃/min and a final hold of 5 min. The signals from the detector were integrated as normalised percentages from the calibration curve by the HP software, and the main four individual fatty acid (oleic, linoleic, palmitic, stearic acid) were expressed as % of the total fatty acids. The unsaturated acids were considered as the sum of the oleic acid and linoleic acid.
Deep Neural Network (DNN)
This study used the combination of rootstock and scion varieties as input. It measured the parameters of C.oleifera outputs, including palmitoleic acid C16:1 (y1), cis-11 eicosanoic acid C20:1 (y2), unsaturated fatty acid (y3), oleic acid C18:1 (y4), linoleic acid C18:2 (y5), linolenic acid C18:3 (y6), oil rate (y7), fruit height (y8), fruit diameter (y9), and peel thickness (y10). Since there were significant variations in fruit phenotypic characteristics among different varieties, five parameters, namely fruit height, fruit diameter, fresh fruit weight, fresh seed weight, and number of fresh seeds, were selected instead of the variety number.
The hidden layer between the input and output layers can consist of one or more layers. The number of layers and neurons depends on the number of samples and the complexity of the task. Generally, a deeper and more layered model can improve accuracy by providing better nonlinear expression ability. This enables the model to learn complex transformations and fit more complex feature inputs. However, more network parameters also require more time and samples for training.
Out of 2108 samples, 1300 were used for training and 400 were used for validation, with the rest used for testing. Using the Relu activation function in the hidden layer is necessary for the network to learn nonlinear functions. The output layer uses the linear transfer function directly, and each hidden layer is connected with a dropout function (with a dropout rate of 0.1) to temporarily discard network information and reduce overfitting. The DNN uses the Stochastic Gradient Descent (SGD) optimizer for training (200 epochs), with a learning rate of 0.01. The loss function selects the Mean Squared Error (MSE), while the evaluation index selects the Mean Absolute Percentage Error (MAPE) to measure the performance of the model. The model that performed the best was selected based on the MAPE values of the validation set, and comprised 5 fully connected layers (1-5) and 8 neurons (2, 4, 8, 16, 32, 64, 128, 256) with different numbers.
This article constructs a double loop that combines the number of hidden layer layers and the number of neurons to form a network model. The training and validation sets are inputted to obtain the minimum MAPE value in each epoch and store the corresponding model information.
Data analysis
The experimental data were organized and analyzed using PyCharm 2020, Anaconda 3, and Tensorflow 2.1.