The heritability estimated for body weight in broiler chickens from a pure line population was of moderate magnitude, accounting for 23% of the phenotypic variance. This result indicates that the response to selection should be effective in a short to medium term. The ratio of maternal permanent environmental variance over the phenotypic variance was low and contributed to 5% of the body weight variation. Although the variance fraction accounted for by the maternal permanent environment effect was relatively low, the inclusion of this effect in the model is essential to avoid an inflation of the variance of the additive genetic effect. Body weight estimates of heritability and the fraction for maternal permanent environmental variance were consistent with other studies using the same trait in broilers from single pure lines [23, 24].
For the DNN implementation, a random search was used for hyperparameter optimization, leading to the selection of different models for each subset of the training set. This result indicates that the choice of the best DNN architecture was strongly affected by the amount of data available during training. Therefore, the random search did not provide a robust DNN structure to predict body weight throughout the training set partitions. Recently, simulated annealing and genetic algorithms have been considered for hyperparameter optimization in machine learning applications [25, 26]. Such approaches may provide a more robust DNN architecture, and as consequence may show a better predictive performance compared to random search. However, Bellot et al. (2018) [22] evaluated the performance of DNN on the genome-enable prediction of complex traits in humans using a genetic algorithm for hyperparameter optimization, and also reported that DNN had similar results with Bayesian regression models.
Hyperparameter optimization is a very difficult task, which involves the exploration of various DNN architectures to find an optimal parameter set within a specific search space. Such component of the learning process is crucial for the success of DNN and is very demanding on computational resources and time. Parallel computing as employed in our study can be used to alleviate time issues, where each DNN architecture is trained and evaluated independently on different computers. However, parallel computing requires expensive computational resources, which in most situations is not available for many researchers. Despite such challenges, hyperparameter optimization is critical to obtain DNN architectures which could deliver greater predictive performance. For instance, in our study, the difference of predictive performance between the best and worst DNN in each sub-sampling of the training set was considerably large. Therefore, implementing DNN with no hyperparameter optimization may inadvertently define a DNN architecture that delivers a poor predictive performance. Moreover, the hyperparameter optimization cost is relatively minor compared to the cost to collect, store, and analyze genomic data. Therefore, hyperparameter optimization should be considered for genome-enabled prediction applications in animal and plant breeding programs.
The best models selected for each partition of the training set have some type of regularization (i.e. L2 > 0 and dropout rate < 1) to improve model generalization. The large number of inputs typically observed in genome-enabled prediction, and the high correlation between markers due to linkage disequilibrium may negatively affect the performance of DNN. Regularization approaches such as dropout can prevent complex co-adaptations between units [27], reducing the observed association among inputs from adjacent layers. Therefore, this result suggests that DNN with regularization techniques are recommended to improve predictive performance on new observations for genome-enabled prediction. Similar result was reported by McDowell (2016) [19], who found better predictive performance for DNN with some kind of regularization compared to DNN without regularization for genome-enabled prediction of complex traits in different plant species.
The selection of DNN hyperparameters considering the predictive performance on a tuning set may not reflect the best predictive performance in the testing set. For instance, for each sub-sampling of the training set at least one DNN with different architecture had a greater predictive performance on the testing set compared to those DNN selected based on the lowest MSEP observed in the tuning set. Therefore, selecting DNN architecture by measuring the predictive performance on a tuning set may not deliver optimized predictive performance on new records. Nevertheless, DNN optimization based on the predictive performance on a testing set provides results that are optimistically biased since some information from the testing set is considered a priori. Thus, in our study the correct strategy was to select the DNN architecture based on the predictive performance in the tuning set.
Deep neural networks are gaining prominence in genome-enabled prediction because of several advantages including flexibility to accommodate complex relationships between output variables and predictors, their high predictive performance, and no parametric assumptions regarding variable distributions [28]. Although DNN has emerged with an enormous potential to transform genome-enable prediction, recent studies showed no evident superiority of DNN relative to traditional genome-enable prediction models. For instance, Rachmatia et al. (2017) [20] used deep belief networks to predict complex traits in maize and found that DNN outperformed linear regression models in only 2 out of 8 traits. McDowell (2016) [19] compared DNN with 5 linear regression methods (i.e. ordinary least squares, lasso, ridge regression, elastic net, and Bayesian ridge regression) on 6 traits from 3 different species (i.e. Arabidopsis, maize, and wheat). In this study DNN outperformed traditional regression methods in about 50% of the time. In another study, Montesinos-Lopez et al. (2018) [21] compared a multi-task DNN with Bayesian multi-trait and multi-environment model using complex traits in maize and wheat under different environments. The authors reported a greater predictive performance of DNN when genotype x environmental interactions were not included in the analysis and a lower performance when such terms were considered in the analysis. According to these studies, the performance of DNN is strongly affected by many factors including the genetic architecture of a trait, the presence of non-additive effects, hyperparameter optimization, and the DNN architecture considered for genome-enabled prediction (e.g. multilayer perceptron or convolutional neural networks). These findings are consistent with our study, in which Bayesian regression models showed similar or greater prediction correlation than DNN, but worst MSEP.
The lowest MSEP of DNN reflects the predictive bias estimates in each sub-sampling of the training set. Deep neural networks showed greater inflation on the prediction of body weights compared to all Bayesian models using up to 20% of the data, and less biased estimates afterwards, indicating an advantage for DNN over Bayesian models. The Spearman’s correlation and the agreement on the top 10-ranked broilers suggested a re-ranking of animals depending upon to the model used. Such difference in the ranking of broilers is more pronounced between Bayesian regression models fitted with the tuning set in comparison to the other genome-enabled prediction approaches, whereas DNN presented a slightly lower re-ranking of broilers relative to BRR and Bayes Cπ
Interestingly enough, the predictive performance of DNN was better than the BRR and Bayes Cπ when considering small sample sizes. This result is most likely because of the benefit of using in the training process a tuning set exclusive for DNN. However, after re-fitting the Bayesian regression models including also the tuning set data, such an advantage was accounted for and the superiority of DNN vanished. Strategies such as a k-fold cross-validation within the training set could be considered to select DNN architectures. However, in our study, implementing such an approach was extremely difficult due to the computational cost of performing a k-fold cross-validation in such a big data together with the sub-sampling process in the training set for each genome-enabled prediction approach.
Although DNN often show a greater predictive performance when trained with large sample size, for genome-enable prediction it seems that adding more data per se is not a guarantee to outperform benchmark models. The relative simple nature of the marker inputs (i.e. three genotypes coded as 0, 1 or 2) and the complex essence of quantitative traits may pose a challenge for DNN applied to genome-enabled prediction compared to other successful applications, such as in computer vision [22]. As pointed out by these authors, inputs used in computer vision are more complex and less structured than those available for genome-enabled prediction. Furthermore, the attribute (expected value of trait or genetic risk) used in genome-enabled prediction is often not directly observed, rather it is a function of genetic and environmental factors [22]. Therefore, the characteristics of the response variable and inputs may explain in part the similar predictive performance of DNN and Bayesian methods using large amount of data. Furthermore, body weight inheritance is suggestive to be mainly accounted for by genetic additive effects, with a lower contribution of non-additive genetic effects. Abdollahi-Arpanahi et al. (2016) [29] concluded that the dominance effects had a minor contribution in the phenotypic variation of body weight relative to additive effects. Additive inheritance is often well fitted by traditional linear models used for genome-enabled prediction. On the other hand, ANN is better suited to capture nonlinear relationships by using multiple layers and nonlinear activation functions. For instance, Dórea et al. (2018) [30] reported greater predictive performance of ANN compared to Partial Least Squares on the prediction of dry matter intake in lactating dairy cows, concluding that such a superiority is possibly explained by the ability of ANN to accommodate nonlinear relationships. Therefore, the additive genetic nature of body weight may be another potential explanation for the similar predictive performance between DNN and Bayesian models.
It is important to point out some disadvantages of DNN when applied to genome-enable prediction compared to traditional linear regression models. The first drawback has been previously discussed, and reflects the importance of hyperparameter optimization in DNN performance. The second disadvantage is the lack of biological interpretability of the results obtained with DNN. For instance, extracting information from multiple hidden layers is very difficult, turning the algorithm into a “black box” regarding biological interpretation. A practical example of this lack of interpretability is that the effect of each marker cannot be estimated separately, while SNP effects are easily obtained in traditional linear models used for genome-enabled prediction. Another issue of DNN is that such a predictive approach is more susceptible to overfitting than linear models. In our study, we used early stopping, dropout, and a L2 norm to tackle overfitting and the results indeed suggested that such approaches helped to improve generalization. Despite all of these limitations, DNN had a better performance in terms of MSEP but worst prediction correlation compared to the Bayesian regression models. Therefore, DNN should be more explored in genome-enable prediction to find scenarios in which DNN is clearly superior. Common DNN strategies used in the field of computer science including multi-task DNN (i.e. similar to multi-trait analysis), novel algorithms for parameter optimization, and different types of network structures (e.g. convolution and multi-input networks) can be easily adapted and implemented for further analysis in genome-enabled prediction.