Fractional physics-informed neural networks for time-fractional phase field models

In this paper, a new fractional physics-informed neural networks (fPINNs) is proposed, which combines fPINNs with spectral collocation method to solve the time-fractional phase field models. Compared to fPINNs, it has large representation capacity due to the property of spectral collocation method, which reduces the number of approximate points of discrete fractional operators, improves the training efficiency and has higher error accuracy. Unlike the traditional numerical method, it directly optimizes the spectral collocation coefficient, saves the time of matrix calculation, is easy to deal with the high-dimensional model, and also has higher error accuracy. First, fPINNs based on a spectral collocation method is used to obtain the numerical solutions of the models under consideration. The spectral collocation method is used to discretize the space direction, and the fractional backward difference formula is used to approximate the time-fractional derivative. The error accuracy in different cases is discussed, and it is observed that the point-wise error is 10-5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-5}$$\end{document} to 10-7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-7}$$\end{document}. Next, fPINNs is employed to solve several inverse problems in time-fractional phase field models to identify the order of fractional derivative, mobility constant, and other coefficients. The results of numerical experiments are presented to prove the effectiveness of fPINNs in solving time-fractional phase field models and their inverse problems.


Introduction
With the recent explosion in the amount of available data and computing resources, advances in neural networks (NNs) and data analysis have produced transformative results in various scientific fields. Studies on applications of NNs can be divided into two main categories. The first category involves data-driven NNs using only information from the observed data. For example, a number of researchers have discussed the applications of deep learning in fluid [1,2] and thermal simulations [3], material property prediction [4], system monitoring [5], and structure analysis [6,7]. These studies use a data-driven approach because they either lack physics-based models or rely on computationally expensive models. However, such an approach is obviously inappropriate for mathematical models with explicit expressions. Thus, a different type of NNs has been developed, explicitly utilizing information from equations by involving the equations directly in the loss function to be optimized. For example, Karpatne et al. [8] conceptualized a theory-driven data science framework and proposed several approaches to integrate domain knowledge into loss functions. Raissi et al. [9] proposed a physics-informed neural networks (PINNs) to solve partial differential equations (PDEs) employing a standard feedforward NNs and explicitly encoding the PDEs into the loss function to use automatic differentiation. They proved the robustness of PINNs through extensive numerical examples and discussed both the forward and the inverse problems of PDEs. For other examples, see references [10][11][12][13].
The fractional derivative, an effective and powerful tool for describing physical systems with longterm memory, has become popular in mathematics and physics [14,15]. Especially, it is widely used in modeling systems with memory effect, spatial nonlocality, or power-law characteristics. These include solute transport in porous media [16,17], the propagation of dissipative sound waves [18,19], turbulent flows [20] and so on. Finding a stable and accurate method of approximating fractional PDEs (FPDEs) has been the goal of some researchers. In particular, due to the integral definition of fractional derivative and the complex expressions of fractional models in practical problems, traditional numerical methods still cannot achieve good convergence accuracy and computational efficiency in some problems. Even lower convergence accuracy can be obtained on some more complex models. Therefore, it is urgent to develop a new numerical method for the calculation of fractional partial differential equations. As neural networks and deep learning approaches show their power in solving differential equations of all kinds with arbitrary dimensions and geometric complexity [21][22][23], the use of NNs to solve FPDEs has become a research hotspot. However, since the chain rule in integral calculus is invalid in fractional calculus, the automatic differentiation does not apply to fractional operators. Thus, it is impossible to deduce the fractional calculation. But so far, with further research, several papers have appeared, which have opened up new ground for the numerical computation of fractional partial differential equations. Next, we mainly introduce two different methods of dealing with fractional operators in NNs, one is the discretization method, the other is approximating the numerical solution by special function expansion, which is the function that can directly write its fractional differential expression, such as x β (β ∈ R), ex p(x), and so on. For example, on the basis of PINN [9], Pang et al. proposed fractional PINNs (fPINNs) [24] to solve the space-time fractional advection-diffusion equations (ADEs). By discrete fractional operators, they bypassed the above difficulty. They considered multidimensional fractional ADEs and obtained a convergence accuracy of 10 −3 to 10 −4 . Subsequently, Rostami [25] solved the difficulty of spatial fractional could not use the automatic differentiation by replacing the solution with a series expansion, where the function in the expansion can directly write the fractional differential expression. This enabled them to solve high-order linear FPDEs with greater accuracy than that achieved with traditional techniques, and it opens a new situation for dealing with fractional operators in NNs. Moreover, many researchers try to combine traditional numerical methods with neural networks to solve fractional partial differential equations. Zeinab [26] proposed fractional Chebyshev deep neural network (FCDNN) to solve fractional differential models. They used Chebyshev orthogonal polynomials as activation functions in FCDNN. Finite difference and the marching in time methods for discretization. It is pointed out that combining with Chebyshev polynomials can reduce the computational cost and improve the NNs capability.
Phase field models have been widely used to study material [29], physical [30], and biological systems [31,32]. In particular, phase field models have made great contributions to the study of coarsening dynamics [33,34], which is widely observed in material systems. In particular, the phase field model is a dissipative system, which is driven by dissipation of free energy. So we need a mathematical way to describe the energy dissipation in the partial differential equation, and the gradient flow can achieve it, which includes the selection of a state space, a drive function and a dissipation mechanism. Specifically, the classical gradient flows have the following form: where u is the state function (phase function), E(u) is the free energy, and grad H E(u) is the functional derivative of E in the Sobolev space H. Multiplying both sides by δ E δu and integrating, we can get the energy dissipation law: where || · || 0 is the associated norm. This means that the energy function E decreases with time under the influence of the state function u. The phase flow model based on the above mentioned energy in the variational framework has received extensive attention, and it has long been used in materials science and fluid dynamics, see, e.g., [35][36][37][38][39] and the references therein. However, there is an increasing need to study more complex mixtures of materials, or materials with elastic memory, which results in significant changes in the scaling dynamics. Classical phase field models are no longer suitable for studying such anomalous coarsening dynamics. In particular, the original phase field models are nonlocal models, and the classical models were derived only under the simplified assumption that the temporal is local. And because the fractional operator has memory and time dependence, it can be used to convert the all previous effects add up to represent nonlocal features of temporal. Thus, more and more researchers begin to focus on the time-fractional phase field models. In this paper, we consider the models deriving from gradient flows having modified dissipation mechanism. Specifically, compared with the form of the gradient flows of the classical phase field models, we consider the gradient flows of the form [40][41][42][43][44][45] where 0 < α ≤ 1, and D α t is the Caputo fractional derivative defined as follows [45] By definition (4), the fractional derivative is a weighted mean of the historical derivative, which can be used to denote the effect of historical exchange rates at the current time. Therefore, the gradient flow model (3) is useful to describe the memory effect of the related free energy dissipation in some cases. Such fractional-type phase field models have attracted much research attention, and many studies have focused on numerical analysis of time-fractional phase field models [40][41][42][43][44][45][46] and get better results in simulating the anomalously diffusive effects. Based on spectral methods, Antil et al. [47] analyzed the features and numerical solutions of fractional phase field models. An efficient finite difference scheme and a Fourier spectral scheme were developed to solve the non-local behavior of time-fractional phase field models in [48]. In [49], the authors employed a Petrov-Galerkin spectral method to discretize the spatial derivative and a stabilized ADI scheme to discretize the time-fractional derivative; they proved that the method had spectral accuracy and could deal with both erroneous boundary effects and sharpening of the interface at no extra resolution.
An important problem related to time-fractional phase field models is their inverse problem. The core of the inverse problem is the estimation of the unknown parameters in the model according to the known model and some observed data. Wu et al. [50] proved the Lipschitz stability and uniqueness of the inverse problem of phase field models with an inertial term. Stability results concerning the inverse problem of determining two time-independent coefficients with the observation data were presented in [51]. For other examples, please see references [52,53]. Many works have focused on the inverse problem of classic phase field models, but few have considered time-fractional phase field models. A NNs approach has been applied to several models. Bandai & Ghezzehei [54] employed PINNs to solve the inverse problem of the Richardson-Richards equation without initial and boundary conditions, and fPINNs to obtain the order of fractional derivative and diffusion coefficient of the space-time fractional advection-diffusion equation from observations was derived in [24].
In this work, we focus on the applications of fPINNs for solving forward and inverse problems of timefractional phase field models. The advantage of fPINNs algorithm is that it solves the problem that the chain rule does not apply to the fractional calculus. They approximated the fractional derivatives numerically by standard discretization method, so that the automatic differential can be called to solve the fractional partial differential equations as PINNs does. Based on the advantage of fPINNs, we further set up a new NN structure and discrete way, and get a relatively better result. Specifically, a systematic study is performed by using a spectral collocation method to discretize the space direction, in order to accelerate training and enhance accuracy. The influence of the structure of the spectral collocation method on the numerical results is discussed systematically. Numerical results based on different numbers of spectral collocation points, neurons, and approximate points are also discussed. For the inverse problems, we solve one-dimensional (1D) and 2D time-fractional phase field models with fPINNs, present the results, and analyze the causes of the error. The highlights of our paper are as follows.
-By using a shallow-layer NNs, the loss function optimization process can be expressed analytically, which means that simple numerical analyses of these formulas can be performed.
-With the special structure of loss function, we can make full use of the data and model information to achieve the expected results. -Owing to the combination of fPINNs with a spectral collocation method, the computational efficiency and the convergence accuracy are improved.
The rest of the paper is organized as follows. Section 2 presents the time-fractional phase field models. In Sect. 3, we develop fPINNs to solve the forward and inverse problems of the time-fractional phase field models. Several examples are presented in Sect. 4 to verify the effectiveness of fPINNs for both forward and inverse problems. Finally, some conclusions are discussed in Sect. 5.

Mathematical model
The general form of the time-fractional phase field model can be written as [45]: and the initial boundary conditions are given as where γ is a mobility constant. Equation (5) denotes the time-fractional Allen-Cahn (AC) equation when g = −1. When g = Δ, Eq. (5) represents the timefractional Cahn-Hilliard (CH) equation. Here, μ is the chemical potential; and f (·) is a function satisfying f (u) = F (u), where F(u) is a double-well potential that usually takes the form F(u) = 1 4 (1−u 2 ) 2 . The lefthand side of the first equation is the Caputo fractional derivative of the order α, defined as follows [45]: where Γ (·) is the gamma function. As α → 1, it reduces to the first derivative. g(x) and h(x, t) are some (sufficiently) known functions, which we set to be the white-box (WB) function here. It is worth noting that although periodic boundary conditions are used as an example in this paper, the proposed method can also be applied to the Dirichlet and Neumann boundary conditions. In addition, in order to satisfy the discrete method [45], we limit the order of α to (0, 1).

Forward problem with fPINNs based on spectral collocation method
In this subsection, a new fPINNs based on spectral collocation method is developed to solve the forward problems of time-fractional phase field models. We consider the forward problem of this form, as well as the corresponding NNs, as shown in Fig. 1: where is a nonlinear operator. Let u be the approximated solution of Eq. (8), and let u N N be an output of the NNs. In this paper, the second-order fractional backward differential formula (FBDF) [45] is used to discretize the time-fractional derivative, and the spectral collocation method is used to deal with the space direction. We first divide the interval [0, T ] into K equal subintervals with a time-step of size τ = T /K . Let t n = nτ , u n = u(x, t n ), 0 ≤ n ≤ K . Using the FBDF method, we can discretize the fractional derivative D α t u as in which Here, {w (α) i , i = 0, ..., n} are coefficients that can be generated using the following generating function: Then, the expansion of u(x, t) by the Lagrange interpolation polynomials {φ k (x)} N i=0 at Legendre-Gauss-Lobatto {x j }( j = 0, ..., N ) within the given Ω can be written as where N is the number of spectral collocation points. And we use the notation u(x m , t n ) = N k=0 φ k (x m )c n k . Based on the spectral collocation method and the FBDF method, we have the following fully discrete scheme for Eq. (8): where By using the output of the NNs, u N N (x k , t), to approximate the coefficients {c(x k , t), k = 0, .., N }, the explicit expression of the approximated solution u(x, t) is obtained. Based on the orthogonality of the Lagrange interpolation polynomials, we can also obtain where c n i is the value of solution u at spectral collocation point (x i , t n ). Thus, we can use a single output u N N to represent the values of the coefficients c j and the approximated solution u. In addition, compared with the fPINNs, we have made some changes to the input structure of the approximated solution and the loss function. Instead of integrating all the initial boundary conditions into the approximated solution, only the boundary condition is added to the approximated solution, and the initial condition is added to the loss function as a penalty term. This construction is intended not only to reduce the complexity of the loss function but also to optimize its direction for corrective purposes. It is advantageous for the network to approximate the distribution of the true value u(x, t) and improve the generalization accuracy.
With the approximated solution the loss function can be defined as the mean-squared error (MSE) of the equation residual: with MSE equ (μ) Here, W I is the initial weight; μ are the hyperparameters of the NNs; and N equ and N IVC denote training points in the spatiotemporal domain and initial condition, respectively. Then, the weights and the biases are updated by back-propagation as follows: Using the gradient descent method, we can update these weights and biases as follows: where w i , respectively. By iterative training, we obtain the optimal network parameters and output the required numerical solution. The initialization of weights is among the most important problem in network design. There are many ways to determine the initial values of weights, including Gaussian initialization, uniform initialization, and Xavier initialization. In this work, we use Xavier initialization, which not only ensures that the information flow of each layer satisfies the same variance but also keeps the activation value and gradient variance of each layer consistent, thereby enabling better transmission of the information in the network.
Finally, the forward problem can be simply described as follows. First, the coefficients c j and the approximated solution u are expressed by the output u N N at the corresponding points. Then, the loss function is constructed by merging them into the mathematical model, in which MSE equ is constructed by u in the form of a spectral collocation expression and MSE IVC is constructed by the output u NN . Finally, the loss function of the network is optimized to obtain the optimal network parameters and output u(x test , t test ). In this setting, the problem of solving Eq. (8) can be described as follows: where ς is the desired error accuracy.

Inverse problem by fPINNs
In this subsection, we regard the inverse problem as an optimization problem and use fPINNs to estimate the order of fractional derivative α, the mobility constant γ , and the coefficient .
Considering the inverse problem of the form where , and the fPDE parameters α, γ , and the solution u(·) are to be recovered from the initial boundary conditions. As the inverse problem complicates the loss function by increasing the dimension of the parameter space, we use the same treatment as in [45] to reduce the complexity. Specifically, we consider the following transformations for {α, γ , }: where {α 0 , γ 0 , 0 } are parameters in the transformed parameter space. Instead of discretize the time-fractional derivative and the space direction at the same time as in the forward problem, we only discretize the time-fractional derivative by the FBDF. This is because in the inverse problem the parameters {α 0 , γ 0 , 0 } of the model are jointly optimized with the parameters μ of the network, and we do not set the penalty term on the parameter directly in the loss function. The NNs can optimize the loss function to its minimum only when the relationships between the model parameters, the model equation, and the label dataset are established. If the space direction also is discretized, as in the forward problem, this will complicate the relationship and may result in erroneous network information; this in turn may lead to the model parameters being optimized in the wrong direction, meaning we would be unable to obtain the optimal result. Let is pre-selected to automatically satisfy the boundary condition. Moreover, the loss function is expressed in two parts. The loss function L INV for the inverse problem can be written as with Under this construction of the loss function, we can control the weight W I to intervene the influence of the model and the label dataset on the parameters estimation to obtain a better result. There are several ways to set the value of the weight W I ; these can generally be categorized as adaptive and artificial methods. Here, we use an artificial method. This is because although an adaptive method could be used to select the appropriate weight value based on the physical information of the NNs, this will increase the complexity of the network and affect the optimization of the parameters.
A difference between the inverse problem and the forward problem is the optimization object of the NNs. Parameters in the model are optimized together with the hyperparameters μ of the network. After constructing the loss function, we can minimize it using an appropriate optimization algorithm, such as the gradient descent method or stochastic gradient descent method (SGD). The SGD method is very popular in the field of machine learning (see [55] for more details). In this work, we use a version of the SGD method as our optimization algorithm, namely the Adam optimizer. By optimizing the loss function, the approximated values of the fractional order α, the mobility constant γ , the coefficient , the hyperparameters μ, and the approximated solution of u(x, t) are obtained. However, as we do not have any real model parameters by which to judge the network's training parameters, we need to evaluate them indirectly. We compare the difference between the approximated solution u and the exact solution u at test points (x test , t test ) to confirm the accuracy of the estimated parameters when where ς is our required error accuracy. When Eq. (30) is satisfied, we assume that the network obtains the desired model parameters, and stop training. The process of the inverse problem is summarized in Algorithm 1.
Algorithm 1 fPINN algorithm for the inverse problem 1: Build a training set in the inner space and the initial condition Training data: ; 3: Set the desired error accuracy ς and the maximum number of iterations N , and transform the parameters;

5:
Construct the NN with initialization network parameters μ and model parameters α 0 , ξ 0 ; 6: Specify the loss function L I N V as Find the best parameters for minimizing the loss function using the Adam optimizer Give the approximated solution u at test points (x i test , t i test ). 9: end while 4 Results and discussion In this section, several numerical examples are considered to verify the effectiveness of fPINNs for solving the forward and inverse problems of time-fractional phase field models.
To evaluate the performance of our method, we con- and L 2 relative error at each time step as where u denotes the approximated solution, u is the exact solution, and (x i , t i ) denotes the i-th test point. We select 1000 space points from the Sobol sequence for each example and divide them into training points and test points according to a ratio of 7 : 3. We employ Tensorflow for programming to take advantage of its automatic differentiation capability and use the Adam and L-BFGS algorithms to optimize the loss function. We also use Xavier initialization and choose the Sobol sequence as the dataset. For the forward problem, the learning rate, the number of hidden layers, the number of neurons, the number of iterations, and the active function are fixed to 1 × 10 −3 , 1, 6, 10 4 , and tanh(x), respectively. For the inverse problem, we use the same code as that used for the forward problem but with four hidden layers with 20 neurons. In addition, due to the lack of the theoretical proof, each numerical result we show in this section is an average of 10 training times, so as to indirectly ensure the stability of our algorithm. And we will show the convergence accuracy of our method at different equation parameters to guarantee the numerical stability of the method again. The approximation error is discussed by observing the convergence accuracy under different parameters of fPINNs, and the robustness of the algorithm is discussed by setting different fractional order and forcing term in the model.
Case 1: Considering the smooth fabricated solution (t 2 + 1) sin(2π x) and the forcing term the approximated solution can be constructed as The loss function can be written as where MSE equ and Then, we can find an approximated solution by optimizing the loss function such that u(x, t) = u(x, t).
which offers the possibility of numerical analysis. In addition, because different NN structures give different results for different models, we study the effect of the NN approximation error by altering the depth and width of the NN to optimize error accuracy. The results are shown in Figs. 3, 4 and 5. For each experiment, the code runs 10 times. Figure 3 shows the point-wise error with different learning rates and different numbers of approximate points and basis functions. As shown in Fig. 3(a), for a fixed number of approximate points and basis functions, similar convergence accuracy is achieved with different learning rates. This is because the NN is simple in structure and thus requires few optimization parameters; therefore, it does not need a large number of iterations to achieve the optimal results. When the network optimization reaches its optimal value, further iterations will only cause it to fluctuate around the optimal result and may even lead to overfitting; they will not produce real change. That is, our NN does not require a lower learning rate or larger number of iterations. Next, we change the number of approximate points and basis functions and observe the resulting changes in point-wise error. As shown in Fig. 3(b), we first consider the case where the number of basis functions and approximate points are the same. When the number of approximate points and basis functions are changed from 6 to 8, the pointwise error changes from 10 −4 to 10 −5 ; these results indicate that using fewer basis functions and approx- imate points would be detrimental to the network's prediction ability, whereas increasing the number can improve prediction ability. However, when the number of basis functions and approximate points reach certain values, further increases will not result in higher accuracy. This limitation is based on the NN approach taken. As shown in Fig. 3 (c), the best point-wise error is 10 −6 , which is obtained by choosing 10 approximations and 12 basis functions. The variation in the loss function is also shown, in Fig. 3(d) and (e).
In the loss function, the penalty operator W I is added before the initial condition penalty term, and the coefficient before the information penalty term of the equation is 1. In this construction, when W I is greater than 1, the network will be more inclined to fit the initial boundary conditions; here, the function of the equation term is to correct the initial condition but not the other time point. Therefore, finding the optimal match ratio by adjusting the penalty operator W I in the loss function is another aspect to consider. We present the results of our investigation in Fig. 4. As shown in the figure, W I acts as a correction; as W I increases, the error curve becomes wider, indicating that the global error is decreasing and the network optimization tends to fit the initial condition. Although the minimum point-wise error becomes larger, the global error decreases. Moreover, as W I increases, the error first increases and then decreases. This means that there is an optimal value of W I under this condition. Figure 5 shows the influence of different methods of selecting collocation points on the point-wise error, that is, whether uniform points or random points are taken. Taking points uniformly means dividing the intervals between points equally according to the number of points needed. Taking points at random means selecting random values in the interval. We find no great difference in the convergence accuracy between the two cases. This may be because in a relatively small area, random and uniform points are used to cover the whole area do not differ much, resulting in the same convergence accuracy. However, in a larger space, taking random points may be advantageous. Moreover, the selection of collocation points must be performed in the whole space, and the points must cover the whole space; otherwise, the convergence accuracy will be greatly reduced. Figure 6 shows the relationship of the number of neurons and the number of approximate points with the point-wise error. The relationship of the number of basis functions and the number of approximate points with the point-wise error is presented in Fig. 7. We also consider the effect of the structure of the network by changing the depth and width of the NN; Fig. 8 displays the L 2 relative error corresponding to different depths and widths. In each case, the code runs 10 times.
As shown in Fig. 6(a), given a fixed number of approximate points and varying number of neurons, when the number of neurons is small, the point-wise error is relatively large. The reason is the small number of neurons means that the generalization ability of the network is weak. Thus, we cannot achieve a high convergence accuracy. As the number of neurons increases, the generalization ability of the network is enhanced; thus, the convergence accuracy improved and optimization is achieved gradually. After the network has been optimized, we continue to increase the number of neurons. Although the generalization ability of the network is still increasing, the number of parameters that need to be optimized increases exponentially, which makes it difficult for the network to reach its global optimum and easy for it to fall into local minima. Therefore, increasing the number of neurons does not always improve the generalization ability of a network; there are different optimal numbers of neurons in different situations. As shown in Fig. 6(c), the number of different approximate points has similar effects on the point-wise error. As the number increases from 5 to 40, the error changes from 10 −4 to 10 −5 . This indicates that there is an optimal number of approximate points under this condition. Figure 7(a) and (b) shows the results for different numbers of basis functions, from 4 to 20, with fixed number of approximate points and neurons. When the number of basis functions is greater than 10, the pointwise error begins to increase, from 10 −5 to 10 −2 . This result shows that the number of basis functions has a great influence on the point-wise error. When more basis functions are selected, more parameters need to be optimized, which means the optimization can easily fall into local minima and we cannot obtain a good generalization. Moreover, when the number of basis functions is less than 10, the point-wise error is relatively stable at a value of 10 −5 . Further research is needed to determine the optimal number of basis functions. Figure 7(c) describes the effect of the number of approximate points on the point-wise error for a fixed number of basis functions. As the number of approximate points increases, the point-wise error tends to decrease at first and then increase. However, the change in the point-wise error is far smaller than that caused by the increase of the number of basis functions.
From Fig. 8(a), we can find that the depth of the NN has an obvious effect on the convergence accuracy when the width is 6. Moreover, the depth has a greater impact on the L 2 relative error than the width. The difference is not great, but larger depth leads to worse accuracy. In this example, the L 2 relative error is relatively stable, with only a few oscillations. This indicates that there is an optimal pairing relationship between depth and width, which is also closely related to other conditions, including learning rate, W I , and the number of training points, approximations, basis functions, and iterations.
Finally, the effects of different fractional orders on the 1D time-fractional AC equation are shown in Fig.  9. To make the network universally applicable to different fractional orders of the same problem, we set the network width to 6 and the depth to 8. Because we are uncertain about the complexity of unknown problems, we naturally want to maximize the fitting ability of the network. Obviously, the deeper the network depth, the better the network performance. With the same net-work settings, you can see that the numerical solution achieves well convergence accuracy 10 −4 for all tested α.
Case 2: We consider the non-smooth fabricated solution u(x, t) = (1 + t 2 ) sin(2π x 1.2 ) and the force term Unlike case 1, we consider another form of the hypothetical solution; this form can also be used in the previous example. When we do not know whether the exact solution is smooth, we need to add an unknown parameter to our hypothetical solution, i.e., u(x, t) = u(x, t) sin(2π cx β ). The smoothness of the solution is determined by the expression of the space vector x in the solution. The expression of the space vector x inside the network is determined by the network parameter μ, so no special treatment is required. The space vector x outside the network should be expressed in a general form to satisfy all possible solutions. Thus, we add a coefficient c and a power β to x. Obviously, this increases the difficulty of solving such problems by traditional methods, but they are very easily solved using NNs. Figure 10 gives the results of solving case 1 and case 2 problems with the above structure. The results shown in Fig. 10 indicate that the assumed solutions construct in this new way can be used to obtain equations with high convergence accuracy in the case of non-smooth solutions. We show that they can also deal with the case where the fabricated solution is smooth. The corrected results are presented in Fig. 10(c) and (d). However, it is necessary to find the optimal values of certain parameters, including the number of neurons, the number of basis functions, and the number of approximate points. Here, we take the values of these parameters to be 15, 40 and 12, respectively, and obtain a point-wise error of 10 −4 .
For consistency, we use a periodic boundary condition and fixed α = 0.8, = 1, with α 0 = 0.7 and 0 = 0. With the fabricated solution (1 + t 2 ) sin(2π x) sin(2π y), the force term s(x, y, t) can be obtained: − (1 + t 2 ) sin(2π x) sin(2π y) Similar to the 1D case, we construct a NN and define the loss function as L(μ) = MSE equ + W I × MSE IVC . To define the same residual of the network, we consider the 2D Lagrange interpolation polynomials, which have same properties as in the 1D form.
With the approximated solution we can rewrite the loss function as where We can extrapolate in this way to solve 3D problems. Table 1 shows the structure and the corresponding parameters of our NN. The exact solution, prediction, point-wise error, and the value of loss function are shown in Fig. 11. As shown in Fig. 11, the 2D timefractional AC equation can still obtain a good convergence accuracy, indicating that our method can effectively deal with multidimensional time-fractional phase field models. In Fig. 12, we also consider the effects of different fractional orders on the 2D time-fractional AC equation. Due to the generality of the network parameters we set, we did not get a high convergence accuracy. But even so, we get an error accuracy of 10 −3 ∼ 10 −4 , which shows that our method still has good generalization and stability. Later, we show the optimal error accuracy for different fractional orders at specific network parameters, in Table 7. We construct the NN by selecting six collocation points in x and y directions, respectively. Our method can accurately capture the exact solution, demonstrating its effectiveness for solving the forward problem of 2D time-fractional AC equations. To solve the higherdimensional problems, the same code could be used, just changing the form of the basis function. In addition, we can analyze our error sources from the following perspectives: the sample selection error, the basis function selection error, the collocation point selection error, the discrete approximation error, the NN approximation error, and the optimization error. Table 2 shows the performance comparison between the fPINNs based on spectral collocation method and a high-efficiency second-order numerical method [45] and a accurate and efficient algorithm [46] using different 1/τ . In [45], the FBDF is used to approximate the time-fractional derivative and the extended scalar auxiliary variable method is used to deal with the nonlinear terms. And the scalar auxiliary variable method is also used to deal with the nonlinear terms in [46]. As shown in Table 2, we can observe that our method can achieve higher error accuracy under larger time division.
Finally, we show the results of inverse problems for 1D and 2D time-fractional AC equations. Better error accuracy is obtained by setting an appropriate NN structure. Table 3 gives the estimated results for the two parameters in 1D and 2D time-fractional AC  Method fPINNs based on spectral collocation method Numerical method in [46] Numerical method in [45]   equations with different numbers of iterations. As the number of iterations increases, the estimation becomes more accurate. The initial values of the parameters are α 0 = 0.6 and 0 = 0.1. The iteration process for the two parameters generated by the fPINNs is given in Fig.  13. Across the whole iteration process, the number of iterations and the convergence of these two parameters are both good; their estimated values are α = 0.73208 and = 0.00058. These results imply that the fPINNs are effective for solving the inverse problem of timefractional AC equations.

Time-fractional CH equation
Example 3 This example considers the ability of our approach to handle the time-fractional CH equation, which has higher-order terms. We start with the 1D time-fractional CH equation with periodic boundary conditions: and fix α = 0.8, γ = 1, and = 1, with α 0 = 0.7, γ 0 = 0, and 0 = 0. This equation has one more higher-order differential term than the time-fractional AC equation.
is considered, which is smooth and has a fourth-order differential in the direction of space.
The corresponding forcing term is Then, with the approximated solution as in Eq. (34), the loss function can be written as follows: where We use the Xavier initialization, the Adam algorithm and the L-BFGS method for optimization, the results are shown in Figs. 14, 15, 16, 17 and 18.
In Fig. 14(a), the point-wise error is obviously better when the learning rate is low compared with when it is high. This is due to the existence of a high-order differential term in time-fractional CH model, which is relatively complex in structure and can easily interfere with the ability of the network to reach an optimal value. Thus, we need to adjust the learning rate and the number of iterations to achieve the optimal value. Furthermore, for the time-fractional CH model, the convergence accuracy reaches 10 −6 . Figure 14(c) shows the effects of changing the same number of approximate points and basis functions on point-wise error; these results indicate that the number of approximate points and basis functions in the time-fractional CH model also has an important effect on the point-wise error. Next, the effect of the number of neurons and the number of approximate points on the point-wise error are considered. As shown in Fig. 15(a), as the number of neurons increases from 2 to 6, the point-wise error decreases from 10 −4 to 10 −6 . With the increase of the number of neurons, the convergence accuracy is improved slightly. When we fix the number of neurons and change the number of approximate points, we find that when the number of approximate points is too high, the convergence accuracy decreases. However, regardless of this number, the convergence accuracy is 10 −6 . An increasing number of approximate points may lead to more parameters to be optimized, whereas the number of neurons does not increase sufficiently to substantially change the convergence accuracy. These results indicate that there is an optimal match between the number of neurons and the number of approximate points. They also show that the convergence accuracy does not depend only on these two factors.
Similar to the case of the time-fractional AC model, the number of basis functions is an important factor affecting the convergence accuracy of the time-  Fig. 16, when the number of basis functions increases from 4 to 20, the convergence accuracy presents obvious wave crest variation, indicating that having too few or, especially, too many basis functions will have a detrimental effect on the convergence accuracy. Moreover, too many basis functions will make the model more complex and increase the number of factors that need to be consid-ered; thus, the network will not be able to accurately describe the solution that satisfies all these factors. This makes generalization less precise. In other words, we need to consider not only whether the basis functions can represent the whole world but also whether the network has the required fitting ability. As shown in Fig that the convergence accuracy changes; however, this is a relatively weak effect compared with that of the number of basis functions. Thus, when the NN structure is fixed, the influence of the number of basis functions is greater than that of the number of approximate points. We also consider the influence of the NN structure on the L 2 relative error in Fig. 17. The convergence accuracy is between 10 −4 and 10 −6 , that is, it is relatively stable. Note that the greater the width, the greater the uncertainty, for training 10 times up to a minimum of 10 −7 . As shown in Fig. 18(a), as W I increases, the error curve flattens out. This is also due to the preference of the optimization center for the initial conditions. These results also show that W I has an effect on the accuracy of L 2 relative error but is not the main influencing factor.
Finally, as with the 1D time-fractional AC equation, we consider the effects of different fractional orders on the 1D time-fractional CH equation as shown in Fig. 19. We also use the same NN parameter settings, changing only the fractional order. As can be seen from the figure, we can still get better convergence accuracy, which proves once again the effectiveness and numerical stability of our method.

Example 4 Finally, 2D time-fractional CH equation is considered,
with periodic condition, and fix α = 0.8, γ = 1, and = 1, with α 0 = 0.7, γ 0 = 0, and 0 = 0. We consider the smooth fabricated solution (1 + t 2 ) sin(2π x) sin(2π y), where the corresponding force term is: With the same numerical solution as in Eq. (42), the loss function can be written as follows: where sin(2π x k ) sin(2π y k ) (53) Figure 20 shows the exact solutions, predictions, and point-wise error. In Fig. 20(c), fPINNs based on the spectral collocation method achieves a good convergence accuracy of 10 −5 , indicating that our method is effective for 2D time-fractional CH equations. And we show the effects of different fractional orders on the 2D time-fractional CH equation in Fig. 21. Obviously, our method achieves a good convergence accuracy for all tested α.
The results of solving the inverse problems of 1D and 2D time-fractional CH equations are given in Fig.  22. Better results are obtained for the inverse problem (1 + t 2 ) sin 2 (2π x) 4.248429e − 4 (1 + t 2 )x sin(2π x) 5.414722e − 4 of the 1D time-fractional CH equation. For the three parameters of value α 0 = 0.7, γ 0 = 0, and 0 = 0, we obtain the following estimates: α = 0.70919, γ = 0.03055, and = −0.020910. The similar result is also achieved for the 2D time-fractional CH equation with an error accuracy of 0.0939, see Table 4. These results demonstrate that fPINNs can deal with the inverse problem for time-fractional phase field models and has a better ability to deal with the timefractional AC equations. The poor results for the inverse problem of the time-fractional CH equations may have been due to the existence of higher-order derivatives, which means the loss function cannot be optimized to the global minimum. In future work, we will further explore this inverse problem and find a way to deal with high-order derivatives.
In this paper, we currently manually tune the parameters of fPINNs based on human experience by continuously debug different parameter combination, which has a large room for improvement. In future work, we will use meta-learning [56] to automatically tune the hyperparameters of the NN architecture and combine the weight relationship with the error accuracy to get a better pairing relationship between the parameters of fPINNs.
Next, the computation time of our method for different problems are given in Table 5. It can be found that our method is able to achieve better numerical accuracy in a relatively short time. The reason for the long time of the 2D CH model is that we need a higher number of iterations for complex models, which including a high-order derivative. We took such an iteration of 10w times, but the time is still lower than the traditional numerical method. We use the DESKTOP-HHM6M5A with Intel(R) Core(TM) i5-9500 CPU for computation.
Finally, in order to prove the superior generalizability of this method, we consider two different fabricated solutions and different fractional orders. The results are presented in Tables 6 and 7, respectively. Although we need to adjust the number of network layers and neurons to make the loss function reach the global minimum and increase the number of the basis functions and the number of the approximate points to increase the convergence accuracy of lower α, the results demonstrate that our method can achieve better convergence accuracy with L 2 relative error of 10 −4 ∼ 10 −5 .

Conclusion
In this work, we employ fPINNs based on a spectral collocation method to solve the forward problem and inverse problem of the time-fractional phase field models. For the forward problems, we use the FBDF to approximate the time-fractional derivative and a spectral collocation method to discrete the space direction. Combined with the fPINNs, the numerical solution of the model is obtained. The convergence accuracy of this method is discussed in different cases. We show that the point-wise error accuracy of this method is 10 −5 to 10 −7 . For the inverse problems, the fPINNs is demonstrated to estimate the order of fractional derivative α, the mobility constant γ , and the coefficient . Several numerical examples are presented to demonstrate the effectiveness of our method. In future studies, the given methods can also be used to other time-fractional equations in biology and physics.