Adaptive deep learning for time-varying systems with hidden parameters: Predicting changing input beam distributions of compact particle accelerators

Machine learning (ML) tools are able to learn relationships between the inputs and outputs of large complex systems directly from data. However for time-varying systems, the predictive capabilities of ML tools degrade if the systems are no longer accurately represented by the data with which the ML models were trained. For complex systems, re-training is only possible if the changes are slow relative to the rate at which large numbers of new input-output training data can be non-invasively recorded. In this work, we present an approach to deep learning for time-varying systems which does not require re-training. Our approach is to include adaptive feedback in the architecture of deep generative convolutional neural networks (CNN). The feedback is based only on available system output measurements and is applied in the encoded low-dimensional dense layers of the encoder-decoder CNNs. Our approach is inspired by biological systems in which separate groups of neurons interact and are controlled and synchronized by external feedbacks. We demonstrate this approach by developing an inverse model of a complex charged particle accelerator system, mapping output beam measurements to input beam distributions, while both the accelerator components and the unknown input beam distribution vary rapidly with time. We demonstrate our methods on experimental measurements of the input and output beam distributions of the HiRES ultra-fast electron diffraction (UED) beam line at Lawrence Berkeley National Laboratory. Our method can be successfully used to aid both physics and ML-based surrogate online models to provide non-invasive beam diagnostics. We also demonstrate our method for automatically tracking the time varying quantum efﬁciency map of a particle accelerator’s photocathode.


Introduction
Machine learning (ML) methods, such as deep neural networks, can learn relationships between the components of and predict the outputs of complex physical systems such as phases of matter 1 and extreme events in complex systems 2 . Consider an n-dimensional complex physical system whose evolution is described by a system of dynamic equations ∂ x(p,t) ∂t = F(x(p,t), p), where x = (x 1 , . . . , x n ) is a vector of physical quantities within the system such as atomic positions or energies, p = (p 1 , . . . , p m ) is a vector of controlled system parameters such as voltages or chemical concentrations, and F represents the nonlinear dynamics governing the physical system which may include partial derivatives with respect to the components of both x and p. Associated with a system such as (1) there is usually some physical-meaningful measurement, M (x(p,t), p), which depends on the parameter settings p and on the entire resulting time-history of x(p,t). For example, such a measurement may be material properties such as strength of a complex alloy or the results of a high energy physics experiment. There are cases where the system (1) is so large and complex that an accurate analytical representation of F is unknown. There are also cases in which an analytic physics-based model of F exists, but is so computationally expensive that it cannot be numerically simulated over large length and time scales and requires lengthy computations with high performance computing resources. In either of those two cases, ML methods can be useful for learning a representation of the system that quickly maps parameter settings to measurements. Tool such as recurrent neural networks can handle entire time-series trajectories and reinforcement learning approaches can model analytically unknown reward functions and their corresponding optimal controllers for analytically unknown systems. Such ML-based approaches can learn these representations directly from raw data, whether the data comes from measurements or simulations, by using large collections of input-output pairs, Figure 1. A high level overview of the adaptive ML approach for time-varying systems with flexibility for adaptation added by including an input vector of adaptively tuned parameter that is concatenated together with a vector output of a dense layer within a convolutional neural network after the encoder and before the generative section.
to learn parameter to measurement maps where w, b are the weights and biases of the ML model whose values are found by minimizing the error between measurements and predictions based on data set D as in (2). Once such maps are learned they can be used for applications such as fast searches over large parameter spaces, to extract correlations and physics from experimental data, and to guide the design of new systems.
To give a few concrete examples, various ML methods have now been demonstrated for a wide range of systems such as molecular and materials science studies 3 , for use in optical communications and photonics 4 , to accurately predict battery life 5 , to accelerate lattice Monte Carlo simulations using neural networks 6 , for studying complex networks 7 , for characterizing surface microstructure of complex materials 8 , for chemical discovery 9 , for active matter analysis by using deep neural networks to track objects 10 , for particle physics 11 , for antimicrobial studies 12 , for pattern recognition for optical microscopy images of metallurgical microstructures 13 , for learning Perovskit bandgaps 14 , for real-time mapping of electron backscatter diffraction (EBSD) patterns to crystal orientations 15 , for speeding up simulation-based accelerator optimization studies 16 , for Bayesian optimization of free electron lasers (FEL) 17 , for temporal power reconstruction of FELs 18 , for various applications at the Large Hadron Collider (LHC) at CERN including optics corrections and detecting faulty beam position monitors [19][20][21] , for reconstruction of a storage ring's linear optics based on Bayesian inference 22 , to analyze beam position monitor placement in accelerators to find arrangements with the lowest probable predictive errors based on Bayesian Gaussian regression 23 , for temporal shaping of electron bunches in particle accelerators 24 , for stabilization of source properties in synchrotron light sources 25 , and to represent many-body interactions with restricted-Boltzmann-machine neural networks 26 .
One challenge faced in applying ML methods to complex physical systems is if the systems have hidden time-varying characteristics which describe the parameter-to-physical system relationships. A simple example of a time-varying system iṡ where p h (t) is an unknown time-varying parameter and p is an input parameter. By hidden parameters we mean parameters whose settings are not directly observable. Note that in this simple example (4) if there is no feedback (p ≡ 0) the system is unstable and exponentially diverges. If the unknown time-varying function p h (t) repeatedly changes sign, such as where the frequency of oscillation is itself also an uncertain time-varying system, it is a major challenge for data-based methods because the input p to output x(t) relationship changes repeatedly in an unpredictable way, resulting in the smallest error for a large data set being an average approximation of p h (t) ≡ 0. Such systems are also challenging for standard feedback controls such as proportional integral derivative (PID) feedback which must assume a known sign of p h (t) and fails spectacularly, destabilizing the system even more whenever p h (t) changes sign. Although many model-independent adaptive feedback control methods exist 27 , the problem of designing a stabilizing feedback controller for unknown time-varying systems with unknown control directions such as (4) with analytical proof of convergence was not accomplished until 2013 by utilizing nonlinear time-varying adaptive feedback and Lyapunov stability analysis methods 28 . Another form of hidden time-varying characteristics is when the initial conditions of a system are themselves time-varying, not easily measurable, and influence the dynamics of the system in a non-trivial way.
In both of the above cases, if either the dynamics-to-parameters relationships or the initial conditions vary with time, then the accuracy of data-based predictions will start to degrade as the system changes to such an extent that it is no longer accurately Overview of an adaptive ML approach for ML-based time-varying inverse models. In our application a convolutional neural network is trained to represent an inverse physics model M −1 which maps an output distribution I o (t) to its associated input distribution I i (t) for a complex system (A,B). The ML-based estimate of the input I i is then fed into a physics or ML-based surrogate model (C) and the model's output is compared to measurements to quantify an error which is used to guide iterative feedback (D). In this adaptive approach the dense layers of the network include inputs from an adaptive feedback loop which are perturbed based on model or measurement-based errors (E).
represented by the data that was originally used to train the ML model. Although powerful ML methods such as domain transfer and re-training are available, for large quickly changing systems with many parameters they may become infeasible. Furthermore, for many systems even if it was possible to quickly collect new data to re-train an ML model for new conditions, sometimes such measurements are highly invasive and cannot be accomplished in real time without interrupting regular operations. Including time as an extra input to the ML tool and attempting to learn the time dependence of the system from data is only possible for simple cases in which the changes are predictable, such as monotonic or periodic changes. Although ML methods for time-series data exist, such as LSTM-based neural networks, those also need to be trained with consistent data and if the dynamics that govern that data change then the predicted time-series will be wrong. Model-independent adaptive feedback methods which respond in real time to un-modeled disturbances and changes face their own major limitations. Although adaptive methods are able to handle time-varying systems they are usually local in nature and can require lengthy tuning or get stuck in local minima especially for large complex systems with many parameters.
In this paper, we present an approach to deep learning for time-varying systems which combines the ability of data-based ML methods to learn global complex relationships directly from data with the robustness of model-independent adaptive feedback to handle unknown time-varying conditions. In particular we focus on encoder-decoder type convolutional neural networks (CNN) which encode high dimensional inputs to a low dimensional latent space of dense layers. Our approach adds flexibility into the generative half of the CNN by adding adaptively-guided feedbacks into the low-dimensional dense layers of the network. A high level overview of this approach is shown in Fig. 1. Our adaptive ML method is designed for time-varying systems of the forṁ where as before x = (x 1 , . . . , x n ) are physical quantities whose measurements are of interest, p = (p 1 , . . . , p m ) are adjustable parameters, and the new term p h (t) = p h,1 (t), . . . , p h,m h (t) are hidden time-varying parameters that influence the time-varying system dynamics F, but are not directly observable. Our goal is to develop an adaptive ML-based modelM for predicting a time-varying measurement M of states of the system (6). Our model is of the form where as before the w, b are network weights and biases that are trained from data sets, p are known set parameters, and thê p h (t) are time-varying tunable parameters that are adjusted dynamically according to an adaptive algorithm whose dynamics are represented as f h . Although we writep h (t) to represent the fact that these adjustable parameters are supposed to track the unknown hidden parameters p h (t), they can also include more abstract quantities such as additional neural network inputs, weights, or biases. The adjustable parameters are adaptively tuned based on feedback that depends on some function of the prediction, C(t) = µ M (t) , where we are only showing the time arguments of the functions to emphasize that they may be drifting and changing. For example, for the application described below, our ML model's goal is to predict the time-varying input beam distribution I i (t) of an accelerator based on the measurement of an output beam distribution I o (t) at another location further down the beam line. This is done because the input beam distribution is not easily directly measurable and therefore our  Figure 3. Overview of the HiRES UED adapted from 36 .
adaptive tuning adjusts both the neural network's estimate of I i (t) as well as a hidden parameter which represents a time-varying magnet in the accelerator lattice. The adaptive feedback is performed based on a comparison between the measured accelerator output beam I o (t) and that of an online model whose input is the ML-based approximated distribution and whose time-varying magnet setting is also adaptively adjusted. This allows us to make adjustments in real time purely based on available diagnostics without having to re-train the ML model. Our approach of combining external feedbacks with neural networks is analogous to a natural phenomenon in biological systems in which networks of neurons interact with each other and are controlled by external feedback loops and other networks 29 . For example recent studies have shown that the challenging task of synchronization between neuron networks can be achieved by feedback controls in the presence of signal delay, noise, and external disturbances 30,31 , and that resonance can be excited and controlled in complex neuron systems by using external feedback signals including chaotic resonances 32,33 . Some initial simplified adaptive ML studies have also begun on coupling the outputs of CNNs to adaptive feedback for real-time accelerator phase space control 34 and for predicting 3D electron density distributions for 3D coherent diffraction imaging 35 .
In the remainder of this paper we focus on ML-based inverse physics models for time-varying systems, an overview of the approach is shown in Fig. 2, which is a specialized form of the general setup in Fig. 1. This approach is motivated by the fact that very sophisticated and accurate physics models as well as ML-based surrogate models have been developed for many complex systems, which could benefit from accurate estimates of the time-varying initial distributions which are used as their inputs. In particular we will demonstrate our method for particle accelerators and their charged particle beams for which our inverse modeling approach is motivated by sophisticated beam dynamics models which have the potential to serve as non-invasive beam diagnostics if their parameters and input beam distributions could be matched to actual accelerators.
Particle accelerators are powerful tools for scientific research such as imaging of nuclear motion by ultrafast electron diffraction (UED) 37, 38 reaching femtosecond temporal resolution 39 allowing for the visualization of lattice dynamics 40 and monitoring the spatiotemporal dynamics of excited atoms or molecules 41 , imaging dynamic phenomenon at free electron lasers (FEL) 42 , and generating GV/m accelerating fields in plasma wakefield accelerators (PWFA) 43 . Major challenges faced by accelerators are the ability to quickly and automatically control the phase space distributions of accelerated beams such as custom current profiles, and switching between different experiments with order of magnitude changes in beam properties, which can require hours of hands on tuning due to limited non-invasive diagnostics, time variation of accelerator parameters, and complex time varying beams.
Intense bunches of 10 5 − 10 10 particles are complex objects whose 6D phase space (x, y, z, p x , p y , p z ) dynamics are coupled through collective effects such as space charge forces and coherent synchrotron radiation 44 . Accelerators generate extremely short bunches: 30 fs bunches at the SwissFEL X-ray FEL 45 , sub 100 fs bunches for UEDs 46 , and picosecond bunch trains for UEDs and multicolor XFELs 47 . The High Repetition-rate Electron Scattering apparatus (HiRES, shown in Fig. 3) at Lawrence Berkeley National Laboratory (LBNL), accelerates pC-class, sub-picosecond long electron bunches up to one million times a second (MHz), providing some of the most dense 6D phase space among accelerators at unique repetition rates, making it an ideal test bed for advanced algorithm development 36,48 .
Advanced diagnostics are important for particle accelerator applications because the dynamics of intense charged particle beams are dominated by complex nonlinear collective effects such as space charge forces and coherent synchrotron radiation in a 6 dimensional (6D) phase space (x, y, z, p x , p y , p z ). Non-invasive real-time phase space diagnostics would be of great benefit to all particle accelerators as they would provide information which could guide adaptive feedback mechanisms. Automatic fast feedback could then be used for real-time beam tuning, such as quickly changing between various experiments at free electron lasers by modifying the energy vs time phase space of the beam, for creating custom tailored current profiles of intense bunches in plasma wakefield acceleration experiments, and for controlling the transverse bunch shapes for ultra fast electron diffraction (UED) microscopy beam lines.
Although physics and machine learning-based surrogate models can potentially serve as non-invasive beam diagnostics, n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10   n=11  n=12  n=13  n=14  n=15  n=16  n=17  n=18  n=19  n=20   n=21  n=22  n=23  n=24  n=25  n=26  n=27  n=28 n=29 n=30 the main challenges they face are uncertain and drifting accelerator parameters and uncertain initial beam distributions at the entrance of an accelerator to be used for initial conditions. Such distributions drift with time requiring lengthy measurements that interrupt accelerator operations. For example, it has been demonstrated that an online model can be adaptively tuned in order to provide a virtual diagnostic for the time-varying longitudinal phase space (LPS) of an electron beam at the Facility for Advanced Accelerator Experimental Tests (FACET) at SLAC National Accelerator Laboratory by comparing the model's predictions to a non-invasive energy spread spectrum measurement 49 . Although the adaptive model tuning approach at FACET was able to track the beam's LPS in real time its accuracy was limited by the fact that there was no access to a good estimate of the time varying input beam distribution. Another set of virtual LPS diagnostics at FACET and at the LCLS free electron laser (FEL) were then developed which utilized neural networks to instantly map accelerator parameters to LPS measurements 50 . A third approach to virtual LPS diagnostics was recently developed at the LCLS which used neural networks with spectral inputs as well as parameter settings resulting in higher accuracy predictions 51 . Simulation-based surrogate modeling studies have also been performed for mapping LPS distributions between various locations in accelerators 52 . Such ML methods could also greatly benefit from adding an estimate of the time-varying initial beam distribution as an input via the approach proposed in this paper or from adding adaptive tuning to some of their dense layers to make them more robust to time-varying initial beam distributions.

Methods
In the present work we utilized an adaptive CNN-based inverse model for mapping output beam measurements measurements directly to the learned principal components that represent beam inputs at the High Repetition-rate Electron Scattering apparatus (HiRES, shown in Fig. 3) beamline at Lawrence Berkeley National Laboratory (LBNL), which accelerates pC-class, subpicosecond long electron bunches up to one million times a second (MHz), providing some of the most dense 6D phase space among accelerators at unique repetition rates, making it an ideal test bed for advanced algorithm development 36,48 . The first step was to collect data to learn a basis with which to represent a distribution of interest. We were interested in representing the time-varying input beam distribution I i (t)(x, y) at the photocathode of the electron gun of HiRES. We started by collecting laser spot images on the HiRES photocathode over several days. The transverse electron beam density distribution ρ(x, y) created by a laser pulse of intensity I(x, y) is given by ρ(x, y) = I(x, y) × QE(x, y). For our experiments we used a laser spot of small transverse size of roughly 200×200 µm 2 , an area small enough so that no significant quantum efficiency variations were expected and we could relate laser intensity directly to an initial electron bunch charge distribution.
In order to increase the generality of our approach the laser images were then each rotated through 360 deg at steps of 1 deg generating a total of 1800 52×52 pixel images. These images were then stretched out as 1×52 2 dimensional vectors and stacked in a 2704×1800 data matrix D. Considering this as a collection of 1800 vectors from a 2705 dimensional space, we then carried out a principal component analysis (PCA) 53 , a method closely related to singular value decomposition for finding lower dimensional representations of higher dimensional spaces, such as the method for finding the most important components of a large complex beam line 54 . PCA was performed by subtracting the mean of each column of D and then factorizing the data matrix D via singular value decomposition to calculate the score matrix T according to whose components can be used in linear combinations as a basis with which to represent the original images.
We then generated input beam distributions I i as linear combinations of N pca PCA components as Although hundreds of PCA components were required to represent > 95% of the variance in our data, we wanted to significantly reduce the dimensionality of our problem. The first 30 principal components (PC 1 , . . . , PC 30 ) are shown in Fig. 4 and qualitatively the most dominant modes are the first 15. To quantify the accuracy of reconstruction as a function of the number of PCA components used, we defined the error We then represented the 5 measured input beam distributions according to (9) and found that the error quickly dropped and leveled off at ∼8 % for N pca = 15 after which it decayed very slowly so we used the first 15 images shown in Fig. 4 as the basis vectors for representing input beam distributions for HiRES. This limits the lowest error we can achieve for real distributions at 8 %, but greatly speeds up real time adaptive tuning. Next we calibrated a General Particle Tracer (GPT) model using a generative neural network surrogate model (SM) to quickly map parameter settings to output beam distributions 55, 56 by a high dimensional parameter search (including average beam energy (x p , y p , z, E)) to match predictions to measured beam data. The calibrated GPT model was then used to generate 51 thousand pairs (X i , y i ), with each y i = (α i,1 , . . . , α i,15 ) representing 15 PCA coefficients sampled from a normal distribution to generate a random input beam distributionÎ i as in (9), and X i being the 51×51 image of the (x, y) output beam distribution at the first view screen of HiRES, generated by the GPT model usingÎ i as input.
We then trained a CNN to map output beam distributions to the accelerator's input beam distribution as shown in Fig. 5 (A). For the generative half of the CNN, instead of allowing the CNN to build in an arbitrary latent space by a standard approach such as transposed convolution layers, our approach guided the CNN to produce physically significant quantities which were the PCA components representing the basis out of which to build our input beam distributions.
The CNN was tested on 1000 unseen input-output pairs and had an average error as defined in (10) of 1.88 % with a standard deviation of 0.56 %. Next, we used an experimentally measured output beam distribution as the input to the CNN and were able to predict the corresponding measured input beam distribution with an accuracy of 14.05 %. Finally, the CNN's prediction was fine tuned using an iterative adaptive feedback approach with each new input distribution fed into GPT giving a new output distribution, as shown in Fig. 5 (C,D,E,B), resulting in prediction accuracy of 9.69 %. A detailed view of adaptive CNN predictions is shown in Figure 6. The input beam distributions generated by the CNN were then fed into the GPT model to produce output beam estimates, which were compared to measurements. The error between predicted and measured beam outputs was used to drive an adaptive feedback loop which was connected back into the dense layer of the CNN to adaptively tune the CNN's generative layer's predictions. The adaptive method used was extremum seeking (ES) for high dimensional dynamic systems of the forṁ x = f(x(p,t), p,t),ŷ = y(x(p,t), p,t) + n(t), where f is an analytically unknown time-varying nonlinear function, x are measurable system values, p are adjustable parameters, andŷ is a noisy measurement of an analytically unknown function y that can be maximized by adjusting the parameters p according to where ω j = ω i for i = j. The term α > 0 is the dithering amplitude and can be increased to escape local minima and k > 0 is feedback gain. For large ω i , the dynamics of (12) are, on average dp/dt = −kα∇ p y, a gradient descent (for k > 0) or ascent (for k < 0), with respect to p, of the actual, analytically unknown function y although feedback is based only on the noisy measurementsŷ(t), for details and proofs see 28,57,58 .

Demonstration at HiRES UED
In our approach the dynamic system of interest was the HiRES beamline and its GPT-based simulation. Our inverse model was designed to map HiRES output beam distributions to their associated input distributions and the CNN parameters being tuned, p, were dense layers which created the PCA coefficients defining the input beam distribution, as well as a hidden parameter within the HiRES simulation represented by the solenoid magnet setting. Our adaptive feedback was guided by maximizing the structural similarity index (SSIM) between a measured output beam distribution I o (t) = ρ m (x, y,t), and the GPT simulation-based output beam distributionÎ 0 (t) = ρ s (x, y,t), both of which were 7/12 10 where µ m and µ s are mean values of the measured and simulated distributions, σ 2 m and σ 2 s are their variance, σ ms is the covariance, and c 1 = c 2 = 10 −459 . The SSIM value lies within the range [−1.0, 1.0] with a value of 1.0 for images that are exactly the same.
To demonstrate the robust adaptive capability of our approach to simultaneously track both changing accelerator parameters and time-varying initial beam distributions, we performed a study running three sets of GPT simulations. In the first simulation, representing HiRES, an input beam was defined and its PCA components as well as the current of solenoid S1 were changed over time, resulting in a large variation of both the input and output beam. Based on that output beam of the first simulation, the CNN alone was used to predict the PCA components which generated an input to a second simulation to predict the output beam. This second simulation quickly diverged from the first because the system changed from the one that had been used to train the CNN. For a third simulation, the adaptive feedback algorithm (12) was also applied with y as in (13), which adaptively tuned both the PCA components and the current of solenoid S1 in order to track the time-varying output distribution of the first simulation, as shown in Fig. 5 (B-F). Fig. 7 (A) shows the error of the PCA component predictions and demonstrates that the CNN + ES setup was able to simultaneously track both the time-varying solenoid current and the PCA components. Fig. 7 (B) compares the SSIM of the output beams with and without ES for tracking, (C) shows the percent error (10), and Fig. 8 compares the target output beam and input beam with and without using ES. Using the adaptively tracked input distribution and S1 setting we simulated a 1.6 pC bunch with 3D space charge and compared the 6D phase space to that which would have been generated by the exact correct values as well as to the CNN alone in Fig. 9, showing an almost exact match of the 6D phase space.
Note that we chose solenoid S1 as the hidden time-varying parameter to be tracked only for the purpose of demonstrating this adaptive technique. In general many such parameters can be adaptively tuned simultaneously such as multiple quadrupole magnets which suffer from hysteresis.
Finally, to demonstrate the ability of the approach in Fig. 7 to track time-varying beamline paramters, we generated an artificial quantum efficiency map QE(x, y), and used the measured laser intensity I(x, y) to construct a beam distribution ρ = I × QE which was used as input to the GPT model. The output was fed into the CNN whose initial guess of ρ was then adaptively fine tuned. The QE of the cathode was then reconstructed as QE(x, y) = ρ(x, y)/I(x, y), as shown in Fig. 10.

Discussion
We have presented an approach for coupling adaptive feedback to encoder-decoder type CNNs in which an adaptive controller tunes inputs to the low dimensional dense layers preceding the decoder. This method has the potential to make such ML tools more robust for time-varying systems by making adjustments and changing inputs in real-time based only on outputs and online non-invasive system measurements. This adaptive ML approach can be useful for a wide range of complex time-varying systems.
We have demonstrated our method for adaptively tracking the time varying input beam distribution of the HiRES UED particle accelerator and the associated quantum efficiency (QE) when a measurement of the laser transverse at the cathode is available, for example through implementation of virtual cathode diagnostic. Our general approach can be applied at various     Figure 10. A: Schematic of laser image, quantum efficiency, and the resulting beam density. B, C: Target QE and beam density compared to adaptive reconstructions. particle accelerators to enable physics models to accurately map distributions between various locations to enable adaptive tuning. For example transverse deflecting radio frequency resonant cavity (TCAV)-based LPS measurements can be used to reconstruct the initial (z, E) LPS. Such model-based non-invasive diagnostics are possible because the rich physics in models are strong constraints on the dynamic maps between accelerator distributions, and therefore given sufficiently rich measurements it is very unlikely to achieve a close match at one section of an accelerator without uniquely reproducing the correct distribution of another.
In future work we plan to expand on our results by using several diagnostics simultaneously, including TCAV based LPS measurements and magnet tuning-based transverse phase space measurements. Our method can also be extended to 3D 9/12 distributions ρ(r) using 3D PCA components, spherical harmonics defining star-convex distribution boundaries ∂ ρ(θ , ϕ) as ∂ ρ(θ , ϕ) = ∑ l,m α m l Y m l (θ , ϕ), where a convolutional neural network can be used to generate the spherical harmonics coefficients α m l which are then adaptively tuned. A more general approach is to generate distributions from combinations of radial basis functions (RBF) defined as ρ(r) = ∑ n α n e −(r−r c,n) where the neural network predicts the RBF amplitudes α n , centers r c,n = (x c,n , y c,n , z c,n ), and decay rates σ n , all of which are adaptively tuned.