Mastering the Cahn–Hilliard equation and Camassa–Holm equation with cell-average-based neural network method

In this paper, we develop cell-average-based neural network (CANN) method to approximate solutions of nonlinear Cahn–Hilliard equation and Camassa–Holm equation. The CANN method is motivated by the finite volume scheme and evolved from the integral or weak formulation of partial differential equations. The major idea of cell-average-based neural network method is to explore a neural network to approximate the solution average difference or evolution between two neighboring time steps. Unlike traditional numerical methods, CANN method is not limited by the CFL restriction and can adapt large time steps for solution evolution, which is a significant advantage that classical numerical methods do not have. Once well trained, this method can be implemented as an fixed explicit finite volume scheme and applied to certain groups of initial value conditions for Cahn–Hilliard equation and Camassa–Holm equation without retraining the neural network. Furthermore, the CANN method also performs very well in handling data with corruption or low-quality data generated by Gaussian white noise. Numerical examples are presented to demonstrate effectiveness, accuracy and capability of the proposed method.


Introduction
In this paper, we develop the cell-average-based neural network (CANN) method [1] for two types of nonlinear partial differential equations formulated by the Cahn-Hilliard equation and Camassa-Holm equation. First, the Cahn-Hilliard equation initially introduced by Cahn and Hilliard in [2] many decades ago as a model for the phase separation in binary alloys. Recently, it has attracted scientists' attention in the applications for the moving interface problems in material science and fluid dynamics. The most popular expression for that in [3] is Here, ε is a relative small and positive constant. In this equation the solution can be driven to the two pure states u = ±1 by the reaction term f (u) = F (u) with F(u) = 1 4 (u 2 − 1) 2 . Alternatively, the Cahn-Hilliard equation also can be expressed as scalar case with ⎧ ⎨ where γ is a positive constant, b(u) is the non-negative diffusion mobility, and (u) is the homogeneous free energy density. Second, the Camassa-Holm equations, introduced by Camassa and Holm in [4,5] as a model for the shallow water motion on the flat surface, are one of the basic models of peaked solitons. In this paper, we consider the Camassa-Holm equation in [6] as follows where k is a constant. As the typical shallow water wave equation, the Camassa-Holm equation has attracted significant attention in the last two decades because of its interesting properties including the complete integrability, the presence of breaking waves, and algebrogeometric formulations [7,8]. An interesting fact of this nonlinearly dispersive equation is that it supports the peakon solutions. When the parameter k = 0, Eq. (1.3) gives a new form solution u(x, t) = ce −|x−ct| , which is called peakon. Camassa et al. [4,5] analyzed the behavior of the solutions of Eq. (1.3) and showed that certain initial conditions develop a vertical slope in finite time. At the same time, they showed that there exist stable multi-soliton solutions and derived the phase shift that occurs when two of these solitons collide. However, the disadvantages of smoothness at the peak of the peakon bring high-recurrence dispersive mistakes into the simulations.
Over the last three decades, machine learning has attracted scientists' attention. Neural networks as a new class of functions have been used recently for solving partial differential equations (PDEs) due to their expressive power. A distinctive feature of neural networks is the powerful learning mechanism generated by neurons that can automatically adapt to the solution of the PDEs. This leads to some recent studies on applications of neural networks to PDEs. The first approach parameterizes the solution as a neural network or a deep CNN between finite Euclidean spaces. The most prominent advantages of this approach are automatic differentiation and mesh free. This discusses the approximation ability of neural network. Under this circumstance, there have been early works [37,38], the works of [39,40] and the popular PINN methods of [41][42][43][44][45][46][47], etc. While PINN methods are successful for elliptic type PDEs, the method is inefficient for time evolution problems because its approximation is limited to a fixed time window. We further mention the works in [48,49], for these methods are explored with Fourier basis considered and residual networks applied to evolve the dynamical system.
Another popular direction combines neural network with classical numerical methods to improve their performance, for example as the trouble-cell indicator in [50], the weights estimate to improve the performance of WENO scheme with deep reinforcement network in [51], the WENO schemes augmented with convolution networks for shock detection in [52] and DG methods with convolution network for strong shock detection in [53]. We further have the work of [54]in which network is applied estimating total variation bounded constants for DG methods. Considering the vigorous development of neural network methods for PDEs, the Cahn-Hilliard equation and Camassa-Holm equation require significant ongoing efforts to develop and analyze the stable, accurate, and efficient numerical neural network methods.
The purpose of this paper is to develop the cellaverage-based neural network (CANN) method in [1] to simulate the Cahn-Hilliard equation and the Camassa-Holm equation. Motivated by finite volume scheme, CANN method is to follow the solution properties and characteristics of the PDEs to build up neural network solvers. One interesting aspect of this paper is that different to PINN or related methods in which a global solution with neural networks is sought, our CANN method is a local and mesh-dependent solver that can be applied to solve the equation as a regular explicit method. Notice that we do not discretize or approximate differential terms involving spatial variable. Instead, having the neural network handles all spatial variable-related differentiation and integration approximation are the major ideas of our method.
The starting point of CANN method is the integral formulation of the time-dependent PDEs obtained by integrating the equation over the box of spatial cell and temporal interval. With the notation of cell averages, the rewritten integral form of the equation can be regarded as the cell average difference between two consecutive time levels, for example, between t n and t n+1 . The CANN method applies a neural network to approximate the cell average difference between such two time levels. One important feature is the introduction of the network input vector, which can be interpreted as the stencil of the scheme at time level t n . The major contribution of the CANN method is to explore such a network structure that exactly matches an explicit one-step finite volume scheme. Thus the network parameter set, after offline supervised learning from a given data set, behaves as the coefficients of the scheme.
Essentially the CANN method can be considered as a time discretization scheme, for which it is critical to maintain the stability and control the accumulated error in time. Preliminary numerical tests show, once well trained, the cell-average neural network method can be relieved from the explicit scheme CFL restriction and can adapt large time step size (e.g., t = 8 x) for solution evolution, even being implemented as an explicit method. Encouraged by this significant advantage that classical numerical methods do not have, it can be generated as an extremely fast and efficient solver for the long time simulation. The CANN method also performs good generalization ability for a variety of initial conditions. In the training process, cell averages from a single solution trajectory are collected as training data set to train the network. It turns out that the CANN method is able to learn the mechanism of nonlinear wave propagation with such a small training data set. Once well trained, the CANN solver can be applied to solve a group of different initial values problems of with insignificant generalization error without retraining the neural network. Based on the solid foundation built for the CANN method, we further propose it to deal with the corrupted data or low-quality data generated by white noise for applications in real world. The CANN method is fairly robust against data noise and can capture the main structure of the wave propagation well in the approximation without retraining the neural network.
The paper is organized as follows. In Sect. 2, we express the motivation of CANN method and present CANN method for the Cahn-Hilliard equation and the Camassa-Holm equation. Then, we emphasize the learning data setup and the training process in Sect. 3. In Sect. 4 we provide numerical experiments to validate the proposed method and illustrate features and capability of the method. In Sect. 5, we draw the conclusion.

Cell-average-based neural network method (CANN)
In this section, we will first introduce the motivation and mechanism of CANN method. Then we will discuss the training process and implementation of the CANN, whose features and capabilities will be illustrated in Sect. 3.

Problem setup, motivation and cell-averaged neural network method
We consider to solve high-order partial differential equations (PDEs) Here t and x denote the time and spatial variables and [a, b] is the spatial domain. Differentiation operator L is introduced to represent a generic high-order differentiation operator, such as L(u) = u x x x for dispersive equation and L(u) = −u x x x x for the fourth-order biharmonic equation. So we have ⎧ ⎨ for the Camassa-Holm equation.
The CANN method is motivated by finite volume scheme. Our principle is to follow the solution properties and characteristics of the PDEs to build up neural network solvers. Traditional numerical methods will be served as the guideline to construct neural network structure. Once well trained, the CANN method can be applied solving PDEs (2.1) as a regular finite volume scheme. By setting a uniform partition of [a, b] into J cells, we have x = b−a J as the cell size and [a, b] = J j=1 I j with I j = [x j−1/2 , x j+1/2 ] as one computational cell where x 1/2 = a, x J +1/2 = b. Furthermore, we have partition in time and adapt t for the time step size and we have t n = n × t with t 0 = 0. Now integrate partial differential Eq. (2.1) over the computational cell I j and time interval [t n , t n+1 ], we have With the definition of cell averageū j (t) = 1 x I j u(x, t) dx, Eq. (2.4) can be integrated out as As the starting point of the CANN method, we design our neural network based on integral format (2.5). The major idea of CANN method is to explore a fully connected network N (·; ) to approximate the solution average differenceū j (t n+1 ) −ū j (t n ) or evolution between two neighboring time steps. So (2.5) becomes where denotes the network parameter set of all weight matrices and biases. Then, from (2.6), we have (2.7) Withv out j =ū n+1 j andv in j =ū n j in (2.7), we have the format of the CANN method for PDEs. Given the solution averages ū n j at time level t n , we apply following neural network to approximate the solution averagē u n+1 is an important component that should be carefully chosen. Its general format is given as − → V in j = ū n j− p , · · · ,ū n j−1 ,ū n j ,ū n j+1 , · · · ,ū n j+q T , (2.9) where we include p cell averages to the left ofū n j and q cell averages to the right ofū n j in the definition of the input vector. For example, for some linear PDEs, we can use the simple architecture with p = 1 and q = 0, see the left figure of Fig. 1. However, for nonlinear PDEs, we always need to choose more complicated architecture with p ≥ 1 and q ≥ 1. Notice for j = 1 or j = J or those close to boundary cells, the stencil or network input vector − → V in j of (2.9) requires averages values of p cells to the left and q cells to the right of the current cell. In this paper we apply the exact solution averages for ghost cells outside the domain to implement boundary conditions. The suitable stencil or the p and q values in (2.9) determine the effectiveness of the neural network method approximating the solution averageū n+1 j at the next time level. In this paper, we consider a standard fully connected neural network with M (M ≥ 3) layers. The input and output vectors of the network are the first and last layer. Among the total M layers, the interior (M − 2) are the hidden layers. Thus the minimum structure of the neural network involves one hidden layer with M = 3. We have n i (i = 1, . . . , M) denote the number of neurons in each layer. The first layer is the input vector with n 1 = p + q + 1 as its dimension, and the last layer is the output vector with n M = 1 as its dimension. This neural network defines a scalar-valued function (2.10) such that N (·; ) accurately approximates 1 x t n+1 t n I j L(u) dxdt, the right-hand side of (2.5). The optimal parameter set of the network N (·; ) will be obtained by training the network intensively over the given data set. A neural network function N (z) is typically represented as compositions of many layers of functions where W and b are weights and bias connecting the neurons from ith layer to (i + 1)th layer. The symbol • denotes the composition of functions, and M is the depth of the network. For l = 1, . . . , M − 2, the N l is called the lth hidden layer of the network defined as follows: where z 0 = z and can be substituted by − → V in j . We have σ l : R → R as the activation function (l ≥ 2), which is applied to each neuron of the lth layer in a componentwise fashion. In this paper tanh x function is the activation function and its application to a vector is defined component-wisely. Specifically σ l = tanh x is applied between all layers, except to the output layer for which we have σ M (x) = x. In this paper, the fully neural network N (·; ) consists of M layers. Each of two consecutive layers is connected with an affine linear transformation and a point-wise nonlinear activation function. The function or the mapping N (·; ) is a composition of following operators, The optimal parameter set of the network N (·; ) will be obtained by training the network intensively over the given data set.

Training process
In this section, we discuss how to train the network to obtain optimal parameter set * such that neural network (2.8) can accurately approximate the solution average evolutionū n j →ū n+1 j . Learning data sets which include training data and target data play a crucial role in the whole training process. Training data sets are usually generated from initial conditions, while target data set is usually from time-level solution averages corresponding to t ≥ t 1 . Even though we select one trajectory as the target data, the well-trained CANN can still be applied to other different trajectories corresponding to different initial conditions.

Learning data collection
Learning data are collected in the form of pairs. Each pair refers to the solution averages at two neighboring time level (t n , t n+1 ). The learning data set is denoted as where the training data ū n j is generated from solution averages of given data (or initial conditions with n = 0). The target dataū n+1 j is solution averages obtained from observed data collection of real application problems or other numerical methods for the PDEs. We highlight the learning data pairs are solution averages collected over the spatial domain and from time levels t 0 to t n+1 (n = 0, 1, 2, . . . , m).
To simplify the discussion of the method, in this paper we focus on the wellposed problem of (2.1) associated with different initial value u i (x, 0) = u i 0 (x) (i means the different initial conditions and boundary conditions). Then, learning data pairs of (3.1) corresponding to the given initial conditions and boundary conditions are collected to train the neural network. After well trained, CANN will be implemented as an explicit one-step finite volume scheme and applied to solve the different initial value problem without retraining the network. It is necessary to point out that indicates different initial conditions under the similar type. For example, if the trajectory data from u 1 (x, 0) = u 1 0 (x) will be collected to train the network, the well-trained CANN still can be applied to solve the different initial value problems with different trajectories u i (x, 0) = u i 0 (x), (i = 2, 3, . . . ). Additionally we also test the capability of the CANN method by dealing with the corruption data or lowquality data which are used as the learning data to train the neural network. Now the noisy learning data set is denoted as where ω j and ξ j are Gaussian white noise which are drawn from a standard normal distribution. Now we conclude the section with comment on the minimum size of training data set S or S N , which are purely laboratory results observed from numerical tests.
Remark 1 For linear partial differential equations, one time-level solution averages in the training set S or S N corresponding to (t 0 , t 1 ) or m = 0 in (3.1), is sufficient for obtaining an effective neural network, see the left figure in Fig. 1. For nonlinear partial differential equations, it is necessary to include multiple time levels (m > 0) of solution averages in the training set S or S N to have the neural network learn the evolution mechanism successfully, see the right figure in Fig. 1.

Training process
In this section, we discuss how to train the network to obtain optimal parameter set * such that neural network (2.8) can accurately approximate the solution average evolutionū n j →ū n+1 j . This is achieved by applying − → V in j in (2.8) to obtain network output v out j . Comparing with the targetū n+1 j , and then looping among the data set S of (3.1) (or S N of (3.2) to minimize the squared loss function) for all j = 1, . . . , J and for all n = 0, . . . , m. This choice of loss function defined over one single data pair is referred as the stochastic or approximate gradient descent method. Specifically for the last time-level pair (t m , t m+1 ), we record the squared L 2 error defined below corresponding to iteration We output the squared L 2 error of (3.4) corresponding to iteration index i = 1, · · · , K to demonstrate the effectiveness of CANN method.
To simplify the discussion of the method, in this paper we focus on cell-average-based network approximation corresponding to each wellposed problem. In a word, given a wellposed problem of (2.1) associated with different initial value u i (x, 0) = u i 0 (x) (i means different type of initial conditions and boundary conditions), learning data pairs of (3.1) are collected to train the network. After well trained, it will be implemented as an explicit one-step finite volume scheme and applied to solve the different initial value problem without retraining the network. It is necessary to point out that u i (x, 0) = u i 0 (x) indicates different initial conditions under the similar type. For example, if the trajectory data from u 1 (x, 0) = u 1 0 (x) will be collected to train the network, the well-trained CANN still can be applied to solve the different initial value problems with different trajectories . Remark 2 Once one CANN solver is well trained and available, it can be applied to solve the same Cahn-Hilliard equation and Camassa-Holm equation associated with different initials and over different domains.

Implementation and summary of the CANN method
In this section, we present the detail of our CANN learning algorithm. With the optimal weights and biases * obtained and the neural network N ( − → V in j ; * ) well defined and available, the CANN method can be implemented as a regular explicit finite volume scheme. Again, with the spatial and time step sizes of x and t and the previously chosen network input vector , together with the network optimal parameter set * , we have a complete definition of a neural network method.
Definition 1 A CANN method is uniquely determined by the following four components: (1) the choice of spatial mesh size x; (2) the choice of time step size t; (3) the choice of network input vector − → V in j of (2.9); and (4) the number of hidden layers and neurons per layer the corresponding structure of the neural network. With the optimal weights and biases * , the CANN method can be implemented as a regular explicit finite volume schemē v n+1 (3.5) In spite of great successes of neural network in many practical applications, it is widely accepted that approximation properties of neural network are not yet well understood and that understanding on why and how they work could lead to significant improvements in many machine learning applications. Some empirical observations suggest that deep network can approximate many functions more accurately than shallow network, but rigorous study on the theoretical advantage of deep network is scarce. Therefore, how to properly select the number of hidden layers and the number of neurons in each layer has not been discovered yet. However, some techniques like emerging meta-learning in [55] may be combined with the CANN method to decide the effective neural network structures. After a lot of practice, we find that a small number of hidden layers (less than 5 hidden layers) are conducive to implementation, and the implementation results are good. Thus, our CANN is also called neural network. But for some complicated cases, we find that deeper networks have better and more stable fitting effects.
The general procedure of CANN method including the collection of noisy learning data can be generated in Algorithm 1.

Algorithm 1 Training process of CANN
Require: Collect data set S: (S N can be collected similarly.) 1: training data set:ū n j from given data at t n or initial values u i (x, 0), i = 1, 2, 3, . . . 2: target data set:ū n+1 j from given data or exact values at t n+1 .
update * ← Stochastic Gradient Descent 10: end while 11:v n+1 j =v n j + N ( − → V n j ; * ) Apply this to solve different initial value problems with data collection from u s (x, 0), s = i, j = 1, 2, 3, . . . 12:ū n+1 j ←v out j Preliminary numerical tests show some advantages of our CANN method over traditional numerical methods or other neural network methods. First, one amazing result is that CANN method can be relieved from the CFL restriction on time step size and and can adapt large time step size (like t = 4 x). The capability of allowing large time step size for solution average evolution makes the method extremely fast and efficient, especially for higher-order PDEs and multidimensional problems. For decades a lot of efforts and resources are toward the investigation of semi-implicit and implicit methods, which are usually expensive and involving complicated algorithms and hard to implement. It is hard to transfer the technique and knowledge to industry with such complex methods. We should take advantage of this powerful new tool of machine learning and incorporate its feature into traditional numerical methods to obtain new fast solvers like the CANN method. It turns out CANN method can adapt to any time step size t that is independent of spatial size x, and still a stable method can be obtained. Once well trained, the CANN method can efficiently and accurately evolve the solution forward in time as an explicit method.
Second, once CANN solver is well trained and available, it can be applied to solve the same Cahn-Hilliard equation and the Camassa-Holm equation associated with different initials and over different domains. For CANN method, we assign exact solution values or known data to those out of ghost cells based on boundary conditions. Using the training procedure explained earlier, we calculate the average value of points within the cells when they are involved. A roughly universal implementation template allows the evolution from one current cell to the next, with the previous known cells solving for the next unknown cells.
Finally, in order to evaluate the capabilities and robustness of CANN method, we apply it to deal with the learning data with corruption or low quality which are from relative real-world application problems or noisy learning by Gaussian noise. Numerical tests show that even for the relatively high-level noisy data following normal distribution up to 0.08 × N (0, 1), CANN method exhibits the accurate dynamics as learning system and can effectively deal with the corruption errors for the learning data to obtain nice trajectories evolution.

Numerical examples
In this section, we carry out a series of numerical tests to check out the accuracy and capability of CANN methods for solving Cahn-Hilliard equation (1.2) and Camassa-Holm equation (1.3), including the accuracy for approximation, the generalization and flexibility for different initial conditions, and the capability for corruption data. It is worth emphasizing that there is no need to train the neural network again to handle different initial conditions and low-quality data after being properly trained.
Over the section we adapt T as a generic final time notation at where we compute the errors and orders. As mentioned previously in Definition 1, time step size t is chosen before and is a part of the definition of neural network method (3.5). Thus we always have T as a multiple times of t with T = N t t where N t is the total numbers of time step. We highlight that for the all Cahn-Hilliard equation and Camassa-Holm equation we tested, four time levels t 4 solution average data pairs associated with m = 4 in (3.1) are applied for training this neural network solver, see Remark 1. The network is trained by minimizing loss (3.3) via SGD method, until the squared L 2 error of (3.4) is smaller than a threshold = 10 −6 or the epochs exceed K . Below we list the L 2 and L ∞ errors formula, same as the ones for finite volume method.
Again, we havev j (T ) denote the solution evolution by CANN method (3.5) on cell j and at final time T . We haveū j (T ) denote either the exact solution average or the reference solution average on cell j and at time T that is obtained from a highly accurate numerical method.
Our primary focus is to demonstrate the accuracy and capability of using CANN method for accurate solution approximation. Second, we carry out a series of numerical tests to illustrate the stability and generalization by applying the optimal architecture of CANN to different initial conditions directly without retraining the neural network. At last, in order to show the ability in dealing with the corrupted data, we also consider to apply the CANN method to deal with the noisy data which are denoted by S N in (3.2) with ω j = η×N (0, 1) and ξ j = η × N (0, 1).

Numerical results for the Cahn-Hilliard equation
In this section, we present some numerical results for the Cahn-Hilliard equation obtained by using the CANN method, including the accuracy, stability, generalization ability, and capability. In Sect. 4.1.1, we consider the Cahn-Hilliard equation in one space dimension and test for the two space dimension in Sect. 4.1.2.

One dimension case
In this section, we give the numerical test results for the one-dimensional Cahn-Hilliard equation, including the accuracy, capability, and robustness.
x ∈ Notice that the CANN method allows for an easy implementation of such boundary conditions and initial conditions. We test CANN method by taking the exact solution with the source term f c , where f c is a given function so that (4.5) is the exact solution. Different c represents different source term f c and leads to different PDEs. Accuracy tests: c = 1. Considering the space limitation, we are just showing the case c = 1. We choose the size of x = 2π 60 and t = 1 4 x. Training data sets are generated from the solution trajectory of (4.5) associated with c = 1. The structure of the neural network consists of 1 hidden layer of 8 neurons. The neural network solvers apply input vector of − → V in j = ū n j−3 ,ū n j−2 ,ū n j−1 ,ū n j ,ū n j+1 ,ū n j+2 ,ū n j+3 T .

(4.6)
Network training is conducted for up to K = 10 4 iterations. After well trained, the neural network is applied for solving the Cahn-Hilliard equation up to T = 1. Snapshots of our CANN method simulation are presented in Fig. 2. We can see clearly that the moving profile is resolved very well. The CANN method can accurately and sharply capture the evolution.
Furthermore, in order to test the accuracy association with large time step sizes, we keep the spatial mesh size x = 2π 60 fixed and consider four different   1 3 x, 1 4 x. We list the L 2 and L ∞ errors computed at final time T = 1 in Table 1. Yet, all simulations with different t are found stable. By comparing the results in Table 1, we further highlight that our CANN method can explicitly solve the Cahn-Hilliard equation with t = x which is bigger that the time step size of t ≤ x 2 by conventional method CFL restriction.

Different c and different initial conditions:
In this part, we apply the well-trained neural network, which is obtained by the learning data with c = 1, to solve the Cahn-Hilliard equation associated with c = 0.5 and c = 2 without retraining. c = 0.5: in this case, we just need to apply the optimal neural network to solve the equation with c = 0.5 directly. As a fixed numerical scheme, the welltrained CANN method only needs to change the size of the time step with t = x to match the parameter c = 0.5 in the process. From Fig. 3, we can see clearly that the moving profile is resolved very well for c = 0.5. c = 2: similarly, we also try to apply the optimal neural network to solve the different c = 2. We need to change the size of the time step with t = 1 4 x.
From Fig. 4, we can see clearly that the CANN method can obtain the numerical approximation very well for c = 2 without retraining the neural network. Furthermore, based on the optimal CANN which is obtained from the first accuracy tests part with c = 1, we apply it to solve the Cahn-Hilliard equation directly as a fixed numerical scheme for different initial condition u(x, 0) = cos x. In Fig. 5, we can find that our CANN method has a high degree of generalization ability such that it can obtain the well numerical approximation of the different initial condition for the Cahn-Hilliard equation without retraining the network.
Low-quality data: at last, we carry out the corrupted data to validate the robustness of the CANN method. All learning data with corruption in (3.2) are perturbed by a multiplicative factor ω j and ξ j . Then, by considering η = 0.05 which, respectively, correspond to ±5% relative noises in all data, we apply the same CANN which are obtained from c = 1 in the accuracy tests to deal with the corrupted learning data. From Fig. 6, we observe that the predictions of the CANN model are fairly robust against data noise. It also shows that the CANN method works well for the corrupted data and can capture the main structure of the exact solution well in the approximation.

Two dimensions case
In this section, we present numerical simulation results for the two-dimensional Cahn-Hilliard equation. In the following simulation, the left of each sub-figure is the CANN approximation and the right is exact solution.    for Eq. (1.2) with a source term f c , which is a given function so that (4.9) is an exact solution. Accuracy tests: c = 2. Considering the limitation of this paper, we are just showing the case c = 2 and others will be listed as different parameters in the following tests. In this case, we choose the uniform step size x = y = 2π 64 and t = 1 2 x for the training in the neural network. The structure of the neural network consists of 1 hidden layer of 8 neurons. Specially, the input vector associated with two space dimensions for the CANN has (4.10) After well trained, the neural network is applied for solving the two-dimensional Cahn-Hilliard equation up to T = 1. Snapshots of our CANN method simulation are presented in Fig. 7. We can see clearly that for the simulation of two space dimensions, the moving profile still can be resolved very well by using CANN method. Furthermore, we also test the accuracy with different time step sizes by generating the spatial mesh size x = y = 2π 64 fixed and four different time step sizes of t = 1 4 x, 1 2 x, x, 2 5 x for the simulation of the CANN method. An interesting aspect of the CANN is the fact that we keep the same architecture of the neural network and just change the time step size for test. From the L 2 and L ∞ errors at final time T = 1 in Table 2, we have seen that the results from CANN method approximate the exact solution well for twodimensional Cahn-Hilliard equation, even for larger time step size t = x. However, by comparing the results of L 2 and L ∞ , we can find that the cell size t = 1 4 x is the best choice for the simulation.

Different c and different initial conditions:
In order to test the generalization ability of our CANN method, we apply it for solving two-dimensional same group Cahn-Hilliard equation with different source term function f c . Considering the limitation of this paper, we just choose two cases c = 1 and c = 3 in the following tests. Now we apply the well-trained neural network, which are obtained from above accuracy tests part with c = 2, to solve the Cahn-Hilliard equation with c = 1 and c = 3 without retraining.
We use the optimal neural network to solve the equation with different initial conditions directly, specially for c = 1 and c = 3. Like a fixed numerical scheme, the CANN method only needs to change the size of the time step with t = x to match the parameter c = 1 in the process. Similarly, we also need to change the size of the time step with t = 1 2 x in order to match the parameter c = 3 in the process. From Fig. 8, we can see clearly that the CANN can solve the moving profile very well without retraining.
Additionally, we apply the well-trained CANN to solve the Cahn-Hilliard equation for different initial condition with u(x, 0) = cos x cos y. In Fig. 9, we can see that our CANN method has a high degree of generalization ability, allowing it to obtain a good numerical approximation of the different initial conditions for the Cahn-Hilliard equation without having to retrain the network.
Low-quality data: at last, in order to validate the robustness of our CANN method, we also carry out it to deal with the data with corruption which are from the noisy learning data in (3.2) perturbed by a multiplicative factor. Without changing the architecture of neural network associated with c = 2 in the accuracy tests, we apply it for the corruption with η = 0.05, which, respectively, correspond to ±5% relative noises in all data. In Fig. 10, it is no surprise that the predictions of the CANN model are fairly resistant to data noise and can capture the main structure well in the approximation without retraining the neural network.
x, y ∈ (4.11) where ε represents (the effect of) the interfacial energy in a phase separation phenomenon. The computing domain is = [0, 1] × [0, 1] and periodic boundary condition will be applied. Then, we have the initial condition u(x, y, 0) = 0.05 sin (2π x) sin (2π y). (4.12) Notice that the CANN method allows for an easy implementation of such boundary conditions and initial conditions. A suitable source term f c is chosen such that u(x, y, t) = 0.05e ct sin (2π x) sin (2π y) (4.13) is the exact solution. Different c leads to different source term functions or different PDEs.
Accuracy tests: c = 1. We just show c = 1 in this case. We choose x = y = 0.01 and t = 10 x for our CANN method. Learning data sets are generated from the solution trajectory associated with c = 1. We choose the same structure of the neural network and input vector of Example 4.1.2 in this test. After well trained, the neural network is applied for solving the Cahn-Hilliard equations up to T = 1. Snapshots of our CANN method simulation are presented in Fig. 11. We can see clearly that the CANN method can accurately and sharply capture the profile evolution. We also test the accuracy for different time step sizes to validate the efficiency of the CANN method. By using the fixed spatial mesh size x = y = 0.01, we consider four different time step sizes of t = 2 x, 5 2 x, 5 x, and 10 x. From the L 2 and L ∞ errors computed at the final time T = 1 in Table 3,  we have seen that we can approximate the exact solution very well with different time step size by using the CANN method. However, by comparing the results of L 2 and L ∞ , we highlight that our CANN method can adapt large time step size (even for t = 10 x) for solution evolution, even being implemented as an explicit method. We report that this conflicts with the principle of conventional numerical method for time discretization with explicit method. Nevertheless, CANN method is still found stable and accurate.

Different c and different initial conditions:
After training, we apply the well-trained neural network, which is obtained from the learning data associated with c = 1, for solving Cahn-Hilliard equation (4.11) with different c = 0.5 and c = 2 without retraining. We can highlight that we can apply the optimal neural network to solve the equation with different initial conditions directly. Here, we set the different initial conditions associated with c = 0.5 and c = 2 from (4.13) as examples to test the generalization ability of our CANN solver. Like a fixed numerical scheme, the CANN method only needs to change the size of the time step with t = 10 x to match the parameter c = 0.5 and t = 5 x for c = 2 in the process. From Fig. 12, we can see clearly that the moving profiles are resolved very well for the smaller c = 0.5 and larger c = 2 without retraining the neural network. In a word, the CANN method can be well generalized to a large family of wave solutions in the form of (4.13).
(4.14) By using the well-trained neural network which is obtained from the learning data with c = 2, we generate it to solve Eq. (4.11) with different initial conditions (4.14) directly. From Fig. 13, we can see that without retraining the network, our CANN method can approximate the exact solution very well and has a great performance of generalization ability.
Low-quality data: we also carry out the CANN method to validate the robustness ability by using the data of corruption. All learning data sets in (3.2) are then perturbed by a multiplicative factor ω j and ξ j with η = 0.01, which, respectively, correspond to ±1% relative noises in all data. By using the well-trained CANN which is associated with c = 1 in accuracy tests and initial condition (4.12), the corrupted learning data sets can be handled well in the simulation. The whole settings of the CANN are the same as previous those. From Fig. 14, we observe that the CANN method is stable and works well for the corrupted data. The predictions of the CANN can capture the main structure of the propagation well in the approximation without retraining the neural network.

Numerical results for the Camassa-Holm equation
In this section, we present some numerical results for the Camassa-Holm equation obtained from the CANN method, including the accuracy, stability, and capability.

Example 4.2.1. Shock wave profile of Camassa-Holm equation
Next, we consider the Camassa-Holm equation as following x ∈ (4.15) The peakon solutions of Camassa-Holm equation (4.15) are well known and are the only travelling waves for which there is a simple explicit formula. The peaked traveling wave solution is where c is the wave speed. Specifically, the initial condition and boundary conditions correspond to the data from the analytical solution. The computing domain is = [ −25, 25].
Accuracy tests: Wave propagation with c = 0.25. In this part, we just show the case of c = 0.25 for the accuracy tests and others will be listed as different parameters of solution in the following generalization ability tests. The sizes of x = 0.1 and t = 1 4 x will be applied in the training process for neural network. Learning data sets are generated from the solution trajectory of (4.16) associated with c = 0.25. The structure of the neural network consists of 1 hidden layer of 8 neurons. Input vector of the CANN method has the following form − → V in j = ū n j−2 ,ū n j−1 ,ū n j ,ū n j+1 ,ū n j+2 ,ū n j+3 ,ū n j+4 T . (4.17) Snapshots of our CANN method simulation at time T = 1 in [− 25,25] are presented in Fig. 15. We can see clearly that after well trained, the CANN method can precisely and pointedly catch the shock wave profile advancement. Furthermore, in order to test the accuracy of the CANN method for different time step size, we keep the spatial mesh size x = 0.1 fixed and consider four different time step sizes with t = 1 4 x, 1 3 x, 1 2 x We apply the well-trained neural network for solving the different initial condition with c = 0.1 and c = 0.4 directly. As a fixed numerical scheme, the CANN method only needs to change the size of the time step with t = x to match the parameter c = 0.1 in the process. Similarly, we also apply t = 1 4 x for the corresponding parameter c = 0.4. From Fig. 16, we can see clearly that without retraining the neural network, the CANN method can accurately and sharply capture the shock evolution that is comparable to exact solution.
In a word, the well-trained CANN method can be well generalized to solve the equation with a large family of different initial conditions generated from wave solutions in the form of (4.16) without retraining. Furthermore, by comparing the results of Figs. 15 and 16, we can observe that the general trend of the wave has not changed, but the amplitude of the wave has changed. The smaller c is, the smaller amplitude of the peak is.
Low-quality data: additionally, in order to validate the robustness and capability, we carry out the CANN method to simulated the data with corruption. All learning data sets (3.2) are then perturbed by a multiplicative factor ω j and ξ j . Here, we consider η = 0.01, which, respectively, correspond to ±1% relative noises in all data. Then, we apply the same neural network which is obtained from the initial conditions of c = 0.25 in the accuracy tests to deal with the corrupted learning data. The whole settings of the CANN are the same as previous those. In Fig. 17, we observe that the predictions of the CANN method are fairly robust against data noise. These show that the CANN method works well for the corrupted data and can capture the main structure of the wave propagation well in the approximation without retraining the neural network.

Example 4.2.2. Peakon solution of Camassa-Holm equation
In this example, we still focus on Camassa-Holm equation (4.15). But we present the wave propagation of the periodized version of solution (4.16). In the single peak case, the exact solution is (4.18) and the corresponding initial condition is where x 0 is the position of the trough and a is the period. We present the wave propagation for the single peak Camassa-Holm equation with parameters c = 1, a = 30, and x 0 = −5. c represents the wave speed, and different speeds lead to different initial conditions and different simulations. The computational domain is [0, a].
Accuracy tests: wave propagation with c = 1. In this case, we have x = 0.1 and t = x for the neural network training. Learning data sets are generated from the solution trajectory of (4.18) with c = 2. The structure of the neural network consists of 1 hidden layer of 8 neurons. Considering peakon solution, the CANN solvers apply a wider setting of input vector − → V in j = ū n j−8 , · · · ,ū n j−3 ,ū n j−2 ,ū n j−1 , u n j ,ū n j+1 ,ū n j+2 ,ū n j+3 , · · · ,ū n j+8 T . (4.20) In Fig. 18, the peak profile at t = 0, 5, 10 and the space time graph of the solutions up to t = 10 are 4840 X. Zhou et al. shown by using the well-trained CANN. The lack of smoothness at the peak of peakon introduces highfrequency dispersive errors into the calculation and will cause numerical oscillation near the peak. However, we can see clearly that the moving peak profile is solved very well in our computation of the CANN method.
Furthermore, we also test the accuracy of our CANN method since it can adopt large time step size even being an explicit scheme. Here, we keep the spatial mesh size x = 0.1 fixed and consider four different time step sizes of t = x, 2 x, 5 2 x and 5 x. We list the L 2 and L ∞ errors computed at final time t = 10 in Table 5.
And we have seen that the CANN method gives similar errors for large time step size even for the t = 5 x so that it is stable and accurate as an efficient solver. Different initial conditions: In exact solution (4.18), different c leads to different initial conditions. Considering the limit in this paper, we choose two cases of c = 0.5 and c = 2 as different initial conditions to test generalization ability of the CANN method. Now we apply the well-trained neural network, which is obtained from the learning data associated with c = 1,  to solve the equation with different initial conditions of c = 0.5 and c = 2 without retraining. c = 0.5: in this case, we just need to apply the welltrained CANN directly to simulate the peakon trajectory with c = 0.5. Like a fixed numerical scheme, the CANN method just has to move toward the time step size of t = 5 x to match the parameter c = 0.5 in the process. c = 1.5: similarly, we also apply the optimal neural network to solve the case of c = 1.5. And in order to match the parameter c = 1.5 in the wave propagation, we need to change the size of the time step with t = 10 3 x. From Fig. 19, we can see clearly that the moving shock profile is resolved very well for the wave propagation c = 0.5 and c = 1.5 without retraining the neural network. In a word, the well-trained CANN method can be well generalized to solve the equation with a large family of different initial conditions generated from wave solutions (4.16) without retraining.
Low-quality data: At last, we carry out the data with corruption to test the capability and robustness of the CANN method. All learning data (3.2) are then perturbed by a multiplicative factor ω j and ξ j with η = 0.05, which, respectively, correspond to ±5% relative noises in all data. Then, we apply the same neural network associated with the initial conditions of c = 1 in the accuracy tests to deal with the corrupted learning data. The whole settings of the CANN are the same as previous those. In Fig. 20, we comment that the forecasts of the CANN model are somewhat strong against data turbulence and can predict the main structure of the wave propagation well without retraining the neural network.

Example 4.2.3. Two-peakon solution of Camassa-Holm equation
In this example, we consider the two-peakon interaction of Camassa-Holm equation (4.15) with the exact solution After well trained, the neural network is applied solving the two-peakon interaction of Camassa-Holm equation up to T = 3. Snapshots of our CANN method simulation are presented in Fig. 21. We can see clearly that the moving shock profile of the two-peakon interaction is solved by the CANN method very well. We highlight that the two peaks moving to the right at different speeds. Furthermore, in order to test the accuracy with different time step sizes, we keep the spatial mesh size x = 0.1 fixed and consider four different time step sizes of t = x, 2 x, 5 2 x, 5 x. We use the same CANN solver and the main difference is the time step size. We list the L 2 and L ∞ errors computed at final time T = 5 in Table 6. And we have seen that for the  larger time step size, the CANN method can approximate the exact solution well. However, by comparing the results of L 2 and L ∞ , we choose t = 5 2 x as the optimal choice for the CANN solver.

Different initial conditions:
Now we apply the welltrained neural network, which is obtained from the learning data with wave propagation c 1 = 2 and c 2 = 1.9, to solve different initial conditions with c 1 = 1.5, c 2 = 1.4 and c 1 = 2.1, c 2 = 2 without retraining. c 1 = 1.5 and c 2 = 1.4: in this case, while using the optimal neural network to solve the different initial condition, we just need to apply it directly like a fixed numerical scheme. The CANN method only needs to change the size of the time step with t = 10 3 x to match the parameter c 1 = 1.5 and c 2 = 1.4 in the process. c 1 = 2.1 and c 2 = 2: similarly, in order to match the parameter c 1 = 2.1 and c 2 = 2 in the wave propagation, we need to change the size of the time step with t = 25 11 x. From Fig. 22, we can see clearly that the moving shock profiles are resolved very well for the different parameters c of wave propagation. In a word, the CANN method can be well generalized to different initial conditions associated with a large family of wave solutions in the form of (4.21) without retraining the neural network.
Low-quality data: At last, we carry out the CANN method to deal with the simplified simulation of corrupted data to validate the robustness and capability. All learning data in (3.2) are then perturbed by a multiplicative factor ω j and ξ j which are generated by η = 0.05. Here, we choose the CANN solver associated with the learning data of c 1 = 2, c 2 = 1.9 in the accuracy tests to deal with the corrupted learning data. The whole settings of the CANN are the same as previous those. Traces of the exact solution and CANN approximation on the planes are depicted in   23. We observe that the predictions of the CANN model are fairly robust against data noise. Although the CANN model performs good approximation power for the corruption data, attainable approximation may not be as accurate as the exact solution due to the inherent lite parameters setting (number of hidden layers and neurons) and the difficulty of nonconvex optimization. Hence, for the more complicated two-peakon trajectories of Camassa-Holm equation, the CANN will not be enough, especially for the dynamic data handling with corruption.

Conclusion
A cell-average-based neural network method is developed for the Cahn-Hilliard equation and Camassa-Holm equation. Classical numerical methods motivate and guide the design of machine learning neural network method. It is found that the cell-average-based neural network method can be relieved from explicit scheme CFL restriction on time step size and is able to explicitly evolve the solution forward in time with large time step size. This method is able to accurately capture the solution profiles, especially for same group differ-ent initial conditions and the corrupted data without retraining the neural network. The new neural network solver is built upon conventional numerical methods for partial differential equations, which gives it more room to design successful new solver.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.