## Modeling dynamical systems with recurrent neural networks

Few machine learning models are suitable for timeseries data31. A popular category of ML models able to learn dynamical systems are known as recurrent neural networks (RNNs). Recurrent neural networks have a property such that the output of the model is fed back into the model as an input. In this way the current state of the system is dependent on its own history, much like the structure of ordinary differential equations (ODEs). We note that the general form of an ODE is given by \(\dot{x}=f\left(x\right)\). It is necessary to apply a discretization in time in order to simulate a solution to the ODE. The solution is approximated by an iterative scheme \({x}_{i+1}=g({x}_{i},{x}_{i-1},\dots ,{x}_{n})\), where the function \(g\) and the history dependence \(n\) depends on the approximation method used. If a dynamical system can be represented by an ODE, then a RNN can approximate the governing equations. The next question is whether galvanotaxis can be accurately modeled by an ODE given the stochastic characteristic of cell migration. RNNs are deterministic models but they can approximate stochastic behavior by converging to a chaotic model32. In this work, we show that such a chaotic model provides a good approximation to variability seen in cell migration.

We use a long short-term memory (LSTM) recurrent neural network (see Supplementary Fig. 1 for more details) to predict the direction of cell migration based on previous measured angles of migration and the current strength of the electric field (see Fig. 2). This is also referred to as a one-step ahead prediction. It has been shown that cell movement can be completely described mathematically using the speed and the angle of migration25. Furthermore, the speed is independent of the EF. Thus, our model only needs to consider the angle of migration. Here, the angle of migration is referred to as directedness and defined as the cosine of the angle between the electric field and the straight line which connects the centroid of the cell from its initial to current location. We note that simulations of cell trajectories can be reconstructed using directedness values. Figure 2 (c) shows a reconstruction of single cell trajectories from computed directedness assuming constant speed. LSTM models have feedback connections and are designed to explicitly avoid the vanishing gradient problem, meaning that they can process entire sequences of timeseries data33. LSTM networks are advantageous over other recurrent networks since they are relatively insensitive to the duration of time delays34. These advantages make LSTM models desirable for understanding complex systems, and LSTM models have had success capturing the behavior of noisy dynamical systems35,36.

LSTM networks, like all neural networks, are trained by iteratively updating the internal weights of a network, which are usually randomly initialized, to minimize the loss function on the training set. For multilayer neural networks, including RNNs, this loss function is nonconvex in general. Thus, the weights found in the training stage are not guaranteed to represent a global minimum of the loss function, and the exact local minimum found is dependent on the initial weights. To ensure that results are not dependent on any one set of initial weights, we train 50 randomly initialized models with identical architecture on the same training set. We then evaluate the performance of all 50 models so that the results reflect the overall performance of this modeling approach. All of our presented results show predictions of all models on every cell to demonstrate that these results are not dependent on any one random weight initialization.

## Recurrent NNs can predict the directedness of EF-induced cell migration at the single cell level

We first demonstrate the ability of the LSTM model to capture cell migration patterns under an electric field by predicting cell directedness one step ahead, given measured cell directedness at previous time points. We first train and test the model on a collection of timeseries Cranial Neural Crest Cell (CNCC) data1 capturing single cell migration under a set of EFs: 0mV/mm, 15mV/mm, 30mV/mm, 50mV/mm, 75mV/mm, 100mV/mm, and 200mV/mm (see methods). To evaluate the model’s accuracy, we consider the distribution of root mean squared error (RMSE) values for single cell trajectories over a population of cells when comparing predicted directedness at each time step to the measured ground truth.

Figure 3 shows the results of predicting single-cell behavior for all EFs. In particular, the median values and their distribution across the 50 models are plotted against time. See Table S1 for the distributions of RMSE values when predicting on the training, validation, and test sets. The center and spread of the RMSE distributions for the training and test sets are comparable, implying that the model is not overfit. This is further supported by model simulations in a later section.

Additionally, to demonstrate that the predictions are indeed informed we compare the results to those of two naïve predictors (see Fig. 3). The first of these predictors, which we will call the “constant directedness” model for this discussion, makes a naïve assumption that directedness will remain constant from one timestep to the next. So, the directedness prediction made by this naïve model is just the previous directedness value. The second naïve predictor, which we refer to as the “linear” predictor, makes a linear extrapolation using the previous two directedness values. That is, the rate of change of directedness between the previous two timesteps is assumed to remain constant between the previous timestep and the next timestep. Figure 3 shows the error distributions of the naïve predictors alongside that of our base model, the LSTM. See Table S2 for the median RMSE values and corresponding IQR values. It’s interesting to note that simply assuming no change in directedness leads to a better approximation than a linear extrapolation. Thus, prediction of directedness is non-trivial even under a constant EF and it is clear that cell migration is driven by underlying dynamics.

## Recurrent NNs can predict the directedness of EF-induced cell migration at unseen EFs

To understand the generalizability of the model with respect to the EF strength, we use the same modelling framework to both interpolate and extrapolate to EF strengths that were not seen in the training set. For interpolation, we remove all instances of cells in an intermediate EF, 30mV/mm, from the training set and train a new model with identical architecture as before. For extrapolation, we follow a similar approach except we remove all instances of cells in an extreme EF, 200mV/mm from the training set. The model is then tested on the complete data including unseen EFs. We also highlight the performance exclusively on cell trajectories under EFs omitted during training. First, we evaluate the ability of the model to interpolate to unseen EF strengths by considering the performance of the model trained without 30mV/mm instances on all cells in the test set (see Fig. 4), as well as exclusively on the 30mV/mm test instances. On both the full test set and the 30mV/mm test set instances, the median RMSE of the interpolation model is only moderately higher than the base model (~ 5%). Additionally, the performance of the interpolation model on the 30mV/mm instances alone is comparable to the performance on the full training set (see Table S3), meaning that the model interpolates well to unseen EF strengths.

For extrapolation, we compare the base model to a model trained without any 200mV/mm instances (see Fig. 4). The median RMSE of this extrapolation model, when evaluated on the full test set is ~ 6.5% higher than the base model trained on the full training set. When evaluating this model specifically on the 200mV/mm, the median RMSE is ~ 17.4% higher than that of the base model. See Table S3 for median RMSE and corresponding IQR distribution values. We note that the base model predicts directedness exceptionally well at 200mV/mm when compared to its performance on the full test set. This implies cell migration is more predictive at this higher EF and removing this data when training the extrapolation model results in a noticeably increase in error across the full test set.

For both the interpolation and extrapolation tests, the model trained on the limited training set performs worse than our original model overall. This is to be expected given an overall smaller dataset. However, the acceptable performance exclusively on omitted EFs demonstrates the ability of our model to interpolate and extrapolate with respect to EF values, with relatively better performance on the interpolation task than on the extrapolation task.

## Transfer learning allows for high prediction accuracy when minimal data is available

Transfer learning is the method of using a model’s knowledge about one learning problem (called the source domain) to improve the performance on a second, related learning problem (called the target domain)37–41. While the traditional approach to machine learning requires learning a separate randomly initialized model from each domain’s training set to converge on a model specific to the task it was trained on, transfer learning involves learning a model for the source domain and then using that trained model as the starting point for learning the model for the target domain (see Fig. 5a). Transfer learning allows for target domain instances to be in a different feature space and have a different distribution than the instances in the source domain, which allows for relatively high performance when target domain data is too limited to allow for such performance were the model to be trained from a random initialization40,41. Because galvanotaxis experiments and manual cell tracking can be both expensive and time-consuming, galvanotaxis tracking datasets for some cell types may be limited in both the number of cells tracked and the variety of EF conditions in which experiments are conducted. Thus, transfer learning may be a pivotal tool in developing accurate models for cell types and experimental conditions for which data is limited. Here, we evaluate the effects of transfer learning on extending our constant EF CNCC model to different cell types and to a time-varying EF.

First, we consider transfer learning methods for making predictions about cells in time-varying EFs using the model which we trained on constant EFs. We evaluate the ability of the model to capture CNCC galvanotaxis dynamics in an experiment in which the polarity of a 200mV/mm EF is reversed halfway through the experiment (see Fig. 5). We compare the performance of a “reversal model”, trained only on the polarity reversal data, and a “transfer learning model”, which retrains the base model on the polarity reversal data. Once again, we use the base model predictions on the constant-EF test set as a performance benchmark.

The median RMSE of the reversal model is ~ 57.2% higher than the benchmark performance of the base model on the original test set. The transfer learning model provides an improvement of ~ 18.1% over the reversal model. The transfer learning model’s median test set RMSE is ~ 28.8% higher than the benchmark model’s median RMSE, which can likely be attributed to both the limited polarity reversal training data, as well as the increased complexity of dynamics in the time-varying EF setting. See Table S5 for median RMSE values. Despite the inability of the model to reach benchmark performance on this task (note that only one dataset with dynamic EF was available), we have demonstrated that transfer learning methods are effective at improving model performance for cells in time-varying EFs over models trained only in those settings.

Next, we evaluate the effectiveness of transfer learning methods for extending our method to cell types with limited galvanotaxis tracking data. We consider the application of our CNCC model to both fish keratocytes and human keratinocytes. Both of these cell types have been shown to migrate towards the cathode of an electric field2,3, while CNCC migrate towards the anode1. Thus, the model must learn to predict galvanotactic behavior which differs significantly from the behavior of CNCC. Using limited training sets for both target cell types, we compare the performance of a model that uses the same architecture as our original model but has been trained only on the target data with our CNCC model, which we have retrained using the same target data using transfer learning methods. Again, we use the performance of the CNCC model as a benchmark, as our goal is that these models, once transfer learning methods have been applied, can have target data test performance similar to the benchmark test performance on the CNCC data.

We have one keratocyte dataset and two keratinocyte datasets. The keratocyte data contains tracking timeseries for 0mV/mm, 50mV/mm, 100mV/mm, 200mV/mm, and 400mV/mm electric fields. All keratinocyte cells are tracked in 100mV/mm EFs. The keratocyte training set contains tracking data for two cells from each available EF strength and the two keratinocyte training sets each contain tracking data for two cells total. For keratocytes, images are taken, and cell positions are recorded every 30 seconds. The first keratinocyte dataset records positions at one-minute intervals, while the second keratinocyte dataset has positions recorded at ten-minute intervals. Thus, this task evaluates not only the ability of the model to transfer knowledge to other cell types, but also the ability of the model to adjust to different timescales.

Our keratocyte model, trained only on the keratocyte data, has a median RMSE of 0.0554 on the test set, which is ~ 189.7% higher than the median RMSE of the benchmark model performance on the CNCC test set. For transfer learning, we take the CNCC model and retrain the weights on the keratocyte training set, resulting in a median RMSE of 0.0260 on the keratocyte test set, which is ~ 53.1% lower than the keratocyte model which did not use transfer learning and ~ 11% lower than the benchmark performance on the CNCC dataset. So, the model trained only on our limited keratocyte data has much higher median RMSE than the benchmark, while the transfer learning model achieves lower median error than the benchmark (see Fig. 6).

The first keratinocyte model, the model trained only on the one-minute interval keratinocyte dataset, has a median RMSE of 0.1006, ~ 244.5% higher than the benchmark median RMSE. The transfer learning model, in which the base model was retrained using the same keratinocyte training set, has a test set median RMSE of 0.0362, which is ~ 64% lower than the median error of the model only trained on the keratinocytes, and is just ~ 24% higher than the median RMSE of the benchmark CNCC model. The spread of the error distribution was also much lower for the transfer learning model than for the keratinocyte-only model, with a RMSE IQR of 0.0274 for the transfer learning model and 0.2502 for the keratinocyte-only model (see Fig. 6).

The median RMSE of the model trained only on the ten-minute interval keratinocyte dataset when predicting on the test set is 0.2176, which is ~ 645.2% higher than the benchmark median RMSE. After retraining the CNCC model on the ten-minute interval keratinocyte training set, the resulting model has a median RMSE of 0.1134, which is ~ 288.4% higher than the benchmark model, but ~ 47.9% lower than the keratinocyte model that did not use transfer learning. Once again, the spread is significantly lower in the model that used transfer learning, with a RMSE IQR of 0.1059 in the transfer learning model and 0.2383 in the keratinocyte-only model (see Fig. 6). The unusually large increase in error overall may be due to the significantly increased sampling time from 5 minutes to 10 minutes.

While transfer learning in these cases did not always lead to performance comparable to the benchmark, both the median and IQR of RMSE distributions were much lower for the transfer learning model than for the model trained only on the target cell type in all cases. We have shown that transfer learning can be an effective approach for developing predictive models about cell types for which available data is limited, even when the source cell data differs from target cell data in significant ways, such as the time interval between observations and anodal- versus cathodal-directed migration.

NN-based models can be used for in silico studies

In recent years, the massive increase in the quantity of available data has led to much attention being paid to *in silico* biological studies, which are studies performed on computers using mathematical modeling and simulations42–45. The advantages of *in silico* studies include estimating hidden system parameters that are experimentally inaccessible46, optimizing the timeline of experimental procedures and product development47,48, reducing the need for animal and human trials48, and lowering experimental costs47–49. In this section, we demonstrate that the recurrent neural network-based model that we have developed can be used for *in silico* galvanotaxis assays with arbitrary and time-varying EFs.

We simulate cell migration experiments by designing an EF timeseries and using some initial ground truth data to begin making predictions. In this way, we can generate timeseries of synthetic galvanotaxis tracking data using arbitrary EF values, which may vary in time. We compare the distributions of synthetic directedness values with those from the ground truth data to evaluate the ability of the model to capture the long-term effects of EFs on CNCC.

The specific comparison we consider is between the distributions of the directedness values at the end of the experiments. Our simulations use 20 timesteps of initial ground truth data to begin making predictions and each CNCC is tracked in a constant EF for 37 timesteps after the initial image. Thus, we are comparing how the ground truth directedness distribution evolves in the final 17 timesteps with the evolution predicted by the simulation in the same time period. The ground truth data we consider are the 350 cells in the test set. These cells are used for the initial lookback to begin the simulations and for the comparison with ground truth final directedness values.

To determine the ability of our model to replicate the effects of an EF on cell motility *in silico*, we compare the distribution of final directedness values of the *in silico* synthetic data against the ground truth data across all the EF values in the CNCC dataset (see Fig. 7). In Fig. 8, we show the circular distribution of the directional data for a subset of the EFs. We compare the directedness values by EF to evaluate whether the model has learned the effects of various EF values on the cells. If the distributions of EF-level predicted directedness values are similar to those of the EF-level ground truth directedness values, we can conclude that the *in silico* studies capture the general migration behaviors of the CNCCs.

The means and medians of final directedness values computed by the simulations are closely correlated with the ground truth. The correlation coefficient between the means is R = 0.9906 and between the medians is R = 0.9721. In general, the distributions of simulated and ground truth final directedness values get closer as EF strength increases and cell behavior becomes more predictable.

Specifically, there is a significant drop in the differences between both means and medians at 30mV/mm and higher, compared to 0mV/mm and 15mV/mm simulations. The threshold of response of CNCC to electric fields has been identified as being in the range between 15mV/mm and 30mV/mm1, so we expect that the simulations will more closely reflect the ground truth in EF strengths above that threshold due to the largely stochastic behavior of cells below the threshold.

To get a better sense of the distribution of directionality across cells we also present cell motility in polar coordinates, where the angle represents the direction of motion (with 30 degree bin widths) and the radius represents the proportion of cells moving in a given direction. To create these rose plots using our synthetic data, we must map the directedness values generated by the model to (x,y) positions. These positions cannot be recovered exactly from the directedness values alone, so we approximate them using previous positions, as assumption of cell speed, and the previous heading of the cell (see Methods for more details). This approximation was shown to provide fairly accurate results in Fig. 2. We note that the LSTM model is ultimately a deterministic model and so directedness can converge to deterministic equilibrium point over long simulations. Figure 8 shows comparison of rose plots for experimental data and data generated by the LSTM model *in silico*.