## DMD accurately forecasts time-resolved phenotypes

DMD is a data-driven diagnostic and time-series prediction method that has found diverse applications from engineering to finance14 and has only recently been applied in the biological sciences16. To illustrate the effectiveness of DMD, we applied it to predict the dynamics of growth-related traits derived from multi-modal HTP imaging measured at 25 time points, five days per week for five weeks, beginning at day 15 after sowing, in a maize multi-parent advanced generation inter-cross (MAGIC) population of 347 RILs of which genotyping data were available for 330 (see Methods). A one-day shift of the second of three HTP experiments resulted in a two-day gap and thus five consecutive measurements per week, as these were adjusted between the three experiments based on the days after sowing (DAS). By using network clustering, we selected 50 traits as representatives for the compendium of 498 top, side, and combined measurements of leaf-colour and -geometric traits (see Methods, Tables 1 and S1).

Table 1

| Q & R |

Minimum | *plant average hue value in side images* |

Q1 | *Top view RGB blue pixels* |

Mean | *Side view RGB convex hull* |

Q3 | *Top view RGB average saturation* |

Maximum | *Side view fluor standard deviation of lightness* |

For a single maize line, the time-resolved phenotype was arranged into a \(\:p\times\:T\) matrix \(\:\mathbf{X}\), where \(\:p\) is the number of traits and \(\:T\) is the number of time points (Fig. 1a). Two submatrices, \(\:{\mathbf{X}}_{1}\) and \(\:{\mathbf{X}}_{2}\), which are offset by a single time point are derived from \(\:\mathbf{X}\) (Fig. 1a), and then used to calculate a best fit linear \(\:p\times\:p\) operator (matrix) \(\:\mathbf{A}\), linking the phenotype at one time point with the phenotype in the following time point (Fig. 1a). We note that the usage of linear operator does not assume that the traits change linearly with time. The operator \(\:\mathbf{A}\) is usually non-symmetric and can be computed directly using the classical DMD approach (see Algorithm 1, Methods).

We first tested how accurately the operator \(\:\mathbf{A}\), calculated using the data from a specific genotype, was able to recreate the underlying time-resolved phenotypes. These tests were performed in two scenarios, iterative and recursive. In the iterative scenario, the traits at each time point were predicted based on the measured data on the traits from the preceding time point. In the recursive scenario, the traits at each time point were predicted based on the traits predicted from the preceding time point, with measurements available only for the initial time point (see Methods).

We found that the operator \(\:\mathbf{A}\) derived from the classical DMD approach was able to near perfectly recreate the training data when used iteratively, with a mean prediction accuracy of ~ 1 across all traits over the time points within a five-day block of daily measurements, and a mean prediction accuracy of 0.70 (± 0.20) for the time points immediately following the two-day gap, for which no data were available (Fig. S1). In the recursive scenario, the classical DMD approach resulted in a perfect recreation of the training data for the first five-day block; however, following the first two-day gap, the prediction accuracy decreased rapidly as a result of error propagation (Fig. S1). These results indicated that the classical DMD approach yielded models which overfitted to the training data, and were not robust to slight deviations in the input data.

We additionally tested another DMD algorithm, named Schur-based DMD, which has improved numerical stability17. This algorithm uses the singular vectors associated with the \(\:r\) largest singular values of \(\:{\mathbf{X}}_{1}\) alongside the Schur decomposition to reconstruct another operator \(\:{\mathbf{A}}_{\mathbf{r}}\) (Algorithm 2, see Methods), which is used in place of the operator \(\:\mathbf{A}\) in the prediction procedure outlined above. We note that like the operator \(\:\mathbf{A}\), \(\:{\mathbf{A}}_{\mathbf{r}}\) is not necessarily symmetric. In comparison to \(\:\mathbf{A}\) from the classical DMD, we found that use of \(\:{\mathbf{A}}_{\mathbf{r}}\) in the iterative scenario resulted in decreased mean prediction accuracy across all traits and all time points of 0.84 (± 0.18). However, in the recursive scenario, which is particularly relevant for practical applications, we observed robust performance, supported by the mean prediction accuracy of 0.78 (± 0.16) and 0.79 (± 0.13) across all traits and all time points and for the last time point, respectively (Fig. S2). These findings demonstrate that the Schur-based DMD exhibited good predictive performance, and prompted our investigation of the extent to which the elements of \(\:{\mathbf{A}}_{\mathbf{r}}\) and corresponding matrices used in its derivation can be predicted from genetic markers.

## Building blocks of dynamicGP are heritable and can be predicted from genetic markers

For each genotype, the operators \(\:\mathbf{A}\) and \(\:{\mathbf{A}}_{\mathbf{r}}\) as well as the intermediate component matrices in the Schur-based DMD (Algorithm 2, see Methods) can be determined from phenotypic data. Here, we treat each individual entry of these matrices as a trait to be predicted by GP models for the operators \(\:\mathbf{A}\) or \(\:{\mathbf{A}}_{\mathbf{r}}\) (Fig. 1b). Our approach, dynamicGP, uses genetic markers to predict matrix entries, which can then be combined with selected phenotypic measurements to make longitudinal predictions of plant development for unseen genotypes.

Here, the use of the Schur-based DMD has the advantage that its intermediate components contain fewer entries than either \(\:\mathbf{A}\) or \(\:{\mathbf{A}}_{\mathbf{r}}\). This renders the components of Schur-based DMD a potential entry point for the computationally efficient prediction of \(\:{\mathbf{A}}_{\mathbf{r}}\) from genetic markers. We examined the marker-based heritability of the entries of all the intermediate matrices of the Schur-based DMD using 70,846 single nucleotide polymorphisms (SNPs) from the MAGIC maize population (see Methods). These matrices include those of the singular value decomposition of \(\:{\mathbf{X}}_{1}=\mathbf{U}\varvec{\Sigma\:}{\mathbf{V}}^{\mathbf{T}}\), the rank reduced representation of the operator \(\:\mathbf{A}\), denoted by \(\:\stackrel{\sim}{\mathbf{A}}\), the matrices \(\:\mathbf{Q}\) and \(\:\mathbf{R}\) resulting from its Schur decomposition as well as the projected DMD modes, \(\:\varvec{\Phi\:}\) (Algorithm 2, Methods).

We first observed that non-zero heritability measures for the entries of these matrices were obtained by using the first two singular vectors (\(\:r=2\), see Methods). Specifically, we observed that the mean heritability of \(\:\stackrel{\sim}{\mathbf{A}}\) entries was 0.28, ranging from 0.13 to 0.43. Similarly, the entries of matrix \(\:{\mathbf{U}}_{\mathbf{r}}\) exhibited mean heritability of 0.35, ranging from ~ 0 to 0.75. Moreover, the entries of \(\:{\varvec{\Sigma\:}}_{\mathbf{r}}\) displayed mean heritability of 0.47, while those of \(\:{\mathbf{V}}_{\mathbf{r}}\) showed mean heritability of 0.20, ranging from 0.03 to 0.39 (Fig. 2a). Regarding the Schur decomposition of \(\:\stackrel{\sim}{\mathbf{A}}\), the mean heritability of \(\:\mathbf{R}\) entries was 0.30, ranging from 0.20 to 0.39. In addition, the \(\:\mathbf{Q}\) entries demonstrated a mean heritability of 0.04, ranging from ~ 0 to 0.09. Notably, the mean heritability of \(\:\varvec{\Phi\:}\) entries was 0.10, ranging from ~ 0 to 0.42 (Fig. 2a). These findings indicated that substantial parts of the building blocks of dynamicGP exhibit moderate to high heritability, suggesting that they may be predictable from genetic markers, and we can therefore use them to predict operator \(\:{\mathbf{A}}_{\mathbf{r}}\).

To test this hypothesis, we used 20 iterations of 5-fold cross validation to examine the genomic predictability of the components of the singular value decomposition of \(\:{\mathbf{X}}_{1}\) and the entries of \(\:\stackrel{\sim}{\mathbf{A}}\) using SNPs. The entries of \(\:\stackrel{\sim}{\mathbf{A}}\) exhibited a mean prediction accuracy of 0.24 (± 0.15), while the entries of matrix \(\:{\mathbf{U}}_{\mathbf{r}}\) demonstrated a slightly higher mean accuracy of 0.31 (± 0.16). Moreover, the entries of \(\:{\varvec{\Sigma\:}}_{\mathbf{r}}\) showed a mean prediction accuracy of 0.39 (± 0.12), whereas those of \(\:{\mathbf{V}}_{\mathbf{r}}\) displayed a slightly lower mean accuracy of 0.19 (± 0.14). Regarding the Schur decomposition of matrix \(\:\stackrel{\sim}{\mathbf{A}}\), the entries of \(\:\mathbf{R}\) exhibited a mean accuracy of 0.27 (± 0.13), while those of \(\:\mathbf{Q}\) showed a lower mean accuracy of 0.08 (± 0.12); in addition, the mean accuracy of the entries of \(\:\varvec{\Phi\:}\) was 0.15 (± 0.14) (Fig. 2b). These results suggested that the building blocks of dynamicGP can indeed be predicted from genomic data using standard genomic prediction models.

## Differences in performance between the two versions of dynamicGP

We next used the predicted values for each entry in matrices \(\:\varvec{\Phi\:}\) and \(\:\mathbf{R}\) to recreate the predicted operator \(\:{\mathbf{A}}_{\mathbf{r}}\) for each line (Fig. 1c). These predicted operators were used to make longitudinal predictions of all 50 traits over the entire time domain using the two versions of dynamicGP, iterative and recursive, defined similarly to the usage of DMD (see Methods).

We found that the iterative approach yielded more consistent accuracies across the investigated time points compared to the recursive approach (Fig. 3a). Across all traits and time points, the iterative approach resulted in a mean prediction accuracy of 0.34 (± 0.30) in 20 iterations of 5-fold cross validation. The *average blue value of plant pixel colors*, which quantifies the blue-yellow color space of visible light (see Table S1 for trait descriptions), emerged as the best-performing trait, with a mean prediction accuracy of 0.79 (± 0.07). In contrast, the *plant average hue value in side images*, quantified in the hue, saturation and value color space, showed the lowest prediction performance of -0.38 (± 0.12). Conversely, the recursive approach had a mean prediction accuracy of 0.17 (± 0.19) across all traits and time points. Here, the standard deviation of the *y values in the fluorescent xyz color space from top images* could be predicted with mean prediction accuracy of 0.52 (± 0.11), while the *plant average hue value in side images*, as quantified in the hue, saturation and value color space, again displayed the lowest mean prediction accuracy of -0.30 (± 0.13). The recursive predictions, as expected, tended towards zero at later time points as errors compounded.

We observed that traits exhibiting consistent heritability over time, indicated by smaller values for the coefficient of variation, tended to demonstrate higher prediction accuracies across all time points. This relationship was supported by a moderate negative Pearson correlation coefficient of -0.46 between the mean prediction accuracy for the traits from iterative dynamicGP across all time points and the coefficient of variation of the heritability estimates across the entire time points (Fig. 3d); the corresponding correlation coefficient for recursive dynamicGP was − 0.37 (Fig. S10). Therefore, in line with expectations, traits whose heritability does not vary during development were found to be better predicted by dynamicGP approach.

## Performance of dynamicGP compared to baseline models

To assess the predictive performance of dynamicGP, we compared it with a GP baseline approach using RR-BLUP models, which is the standard in breeding programs. The RR-BLUP models were trained with data on each trait from the first time point in a training set and used to predict the trait in all subsequent time points for the testing set. Inspecting the difference of performance between dynamicGP and the baseline for traits with different predictability, we found that the iterative version of dynamicGP outperformed the baseline RR-BLUP models consistently across all investigated time points, while the recursive version performed competitively, outperforming the RR-BLUP models for all time points between day 16 and day 33 (Fig. S8). Specifically, the mean prediction accuracy across all traits for the first time point, on which the models were trained, was 0.24 (± 0.15). While this accuracy was maintained when the baseline RR-BLUP models were applied to predict the traits over the first four subsequent time points, already at the second time point, both versions of dynamicGP outperformed the baseline (Fig. S8). The recursive version of dynamicGP was superior to the baseline for 12 time points, after which the accuracies of the two models were closely aligned (Fig. S8). Moreover, the iterative version of dynamicGP outperformed both the baseline and the recursive version of dynamicGP for every time point.

We examined the differences in mean prediction accuracy across all time points between dynamicGP and the baseline models for each trait. The best performing trait using the recursive method, the standard deviation of the *y values in the fluorescent xyz color space from top-down images*, showed mean prediction accuracies above 0.5 of all time points, which was 60% higher than the best performing trait in the baseline models (Fig. 4). In the iterative method the best performing traits, namely the average *blue value of pixels* as well as the average *brightness measurements of green pixels* and *pixels with 27% saturation* (Table S1), showed mean accuracies above 0.7 of all time points, which was ~ 2.4-fold higher than the best trait from the baseline model predictions (t-test, p-values < 0.001 for all time points, Fig. 4). Furthermore, both versions of dynamicGP yielded mean squared errors much lower than the RR-BLUP baselines (Fig. S9).