Deep learning enables accurate sound redistribution via nonlocal metasurfaces

Conventional acoustic metasurfaces are constructed with gradiently ``local'' phase shift profiles provided by subunits. The local strategy implies the ignorance of the mutual coupling between subunits, which limits the efficiency of targeted sound manipulation, especially in complex environments. By taking into account the ``nonlocal'' interaction among subunits, nonlocal metasurface offers an opportunity for accurate control of sound propagation, but the requirement of the consideration of gathering coupling among all subunits, not just the nearest-neighbor coupling, greatly increases the complexity of the system and therefore hinders the explorations of functionalities of nonlocal metasurfaces. In this work, empowered by deep learning algorithms, the complex gathering coupling can be learned efficiently from the preset dataset so that the functionalities of nonlocal metasurfaces can be significantly uncovered. As an example, we demonstrate that nonlocal metasurfaces, which can redirect an incident wave into multi-channel reflections with arbitrary energy ratios, can be accurately predicted by deep learning algorithms. Compared to the theory, the relative error of the energy ratios is less than 1\%. Furthermore, experiments witness three-channel reflection with three types of energy ratios of (1, 0, 0), (1/2, 0, 1/2), and (1/3, 1/3, 1/3), proving the validity of the deep learning enabled nonlocal metasurfaces. Our work might blaze a new trail in the design of acoustic functional devices, especially for the cases containing complex wave-matter interactions.

The emergence of acoustic metamaterials, which are artificial materials with targeted functionalities, have significantly broaden the fields of physical and material science 1-6 . Novel acoustic manipulations based on metamaterials have been realized, such as acoustic focusing 7-9 , acoustic cloaking [10][11][12] , acoustic holography [13][14][15] , and asymmetric acoustic transmission [16][17][18][19] . In the last decade, acoustic metasurfaces, as 2D acoustic metamaterials, stand out as distinct choices to manipulate sound for their vanishing thickness [20][21][22][23][24][25][26][27][28][29][30][31][32] . By employing "locally" designed phase shift provided by subunits of metasurfaces, fascinating acoustic manipulations are extensively explored recently, including acoustic cloaking 33 , acoustic holography 15 , and compact sound absorbers 26,[34][35][36] . However, in the most of previous designs, the subunits are designed individually according to the discretized phase shift profile. In this way, the mutual coupling among the subunits which also contributes the wave-field forming is neglected. It therefore turns out that the designed metasurfaces suffer from the lower efficiency, especially in the cases where incident/reflected/transmitted angles are relatively large. Furthermore, to reshape wave field with fine resolution, a large number of subunits are necessary which inevitably introduce undesired loss due to the viscou-thermal effect induced by small apertures in small subunits. Recently, the concept of "nonlocal" metasurface is introduced to overcome these limitations [37][38][39][40] . In these works, the nonlocal coupling effects among all subunits are taken into accounted to collect the contribution of nonlocality to the wave-field forming and therefore give rise to more accurate wave field manipulations. However, it also makes the nonlocal system quite complex and the relationship between desired wave field and the geometrical parameters unclear. In this situation, optimization methods are generally employed to inversely design the functionality of nonlocal metasurface. However, for different wave manipulation aims, the objective function needs to be revised correspondingly to search for the best value in the entire parameter space, resulting in the drawbacks of time-wasting and computational resource-consuming.
The development of artificial intelligence algorithms provides new insights into the inverse design of nonlocal metasurfaces. In view of its extreme success in domains correlated to the detection of diseases 41,42 , speech recognition 43   the width and the depth of the lth groove, and dx l refers to the distance between adjacent grooves (l = 1, 2, 3). The incident wave along the −y direction could be reflected to three allowed diffracted beams.
ing acoustic fields accurately by nonlocal metasurfaces. The complex nonlocal coupling can be learned by deep learning efficiently from the preset datasets. Then for a target wave field, deep learning can predict the structural parameters of a nonlocal metasurface in a few seconds. In the following, we will demonstrate that nonlocal metasurfaces predicted by deep learning can redirect an incident wave into multi-channel reflections with arbitrary energy ratios. Compared to the direct calculations, the relative error of the energy ratios is less than 1%. In addition, we also perform experiments to witness the ability of the deep learning enabled nonlocal metasurfaces.

Results
Theory of nonlocal metasurface. Our proposed nonlocal metasurface is schematically shown in Fig. 1, which consists of a suitable engineered arrangement of grooves. In this work, we set 3 rectangular-shaped grooves in each period to demonstrate the ability of energy redistribution among three allowed diffraction beams. We will show that empowered by machine learning algorithms, the energy proportion of the three diffracted beams can be redistributed arbitrarily by tuning the depth (h l ) and the width (t l ) of each groove and the distance between two adjacent grooves (dx l ).
When an incident acoustic plane wave along with −y direction impinges on the metasurface (i.e., incident angle θ i = 0 • ), the periodicity of the metasurface will give rise to a set of different diffraction modes. According to the coupled mode theory, the pressure field p(x, y) and normal velocity field v(x, y) in the region above the structure (Region I) can be expressed by 40 p I (x, y) = A + 0 e jk 0 y + ∑ n A − n e − jk x,n ·x e − jk y,n ·y , where A + 0 , A − n are the amplitude of the incident wave and the nth-order of the reflected diffraction beams. k x,n = k 0 sin θ i + G n , and k y,n = k 0 2 − k 2 x,n , are the x and y component of the nth-order diffraction wave vector with k 0 being the wave number of incident waves and G n = n2π/a representing the reciprocal lattice vector. ρ and ω refer to the mass density of the medium and the angular frequency. From the above formulas, the period a of the metasurface related to k x,n decides the propagation characteristics of different diffraction modes. It means that by selecting a suitable a, the numbers of the allowed propagating diffraction orders can be tuned.
To illustrate the nonlocal coupling effect among etched grooves, we need first get the pressure and velocity field inside each grooves. Considering the fact that the bottom of the grooves are acoustically hard where the normal velocity should be zero, the p(x, y) and v(x, y) in grooves (Region II) could be represented as where a kl represents the amplitude of the kth-order of the waveguide mode, x l is the beginning x coordination of the lth groove, α kl = kπ/t l and β kl = k 0 2 − α kl 2 are the x and y components of the wave vector for the kth-order mode in the groove, respectively. Conforming to the conditions at the air-metasurface interface, sound pressure and volume velocity should be continuous at y = 0.
By plugging Eqs.
(1)-(4) into continuum conditions and integrating the equations with orthogonal relationship, we can get the following equations: and where and with m, n = 0, ±1, · · · , ±(N − 1)/2 and k, k 1 , k 2 = 1, · · · , K. P l 1 , P l 2 , P l 3 are respectively related to the the sound pressure of incident wave, reflected wave in Region I and the sound wave in Region II. And their dimensions are known as LK×1, LK×N and LK×LK. L is the total number of grooves in a period, K is the total number of modes inside the metasurface, and N is the total number of diffraction modes. In fact, the superscript and subscript of V have the similar meaning as those of P, but V is related to the volume velocity of sound waves. The dimensions of V 1 , V 2 , V l 3 are N×1, N×N, N×LK respectively. We simplify the above equations with matrixes and rewrite them as: From Eq. (13), the reflected energy of each diffraction order could be calculated. In order to show energy distribution intuitively, we utilize I = A − n A + 0 2 cos θ r to denote the power flow ratio of each mode. Here, we choose -1st, 0th and +1st order diffractive components as reflected channels, and the corresponding diffraction angles for them are θ r = −60 • , 0 • , 60 • . The working frequency is set to f 0 = 8000Hz. In real situations, the harmonic modes of the metasurface and its internal modes are very complicated, so we choose K=20, N=51, L=3 to be as close to the actual situations as possible. Obviously, the consideration of non-locality makes the dimension of the matrixes large which means that the computational complexity is higher. This brings more troubles to inverse design.
Deep learning model. The special operating mechanism of deep learning will help to explore the relationship between metasurface and its response to acoustic waves. Here, we establish a tool that can achieve rapid forward and inverse design with the support of deep learning. This part of content mainly explains the process of building the transient reactor through deep learning algorithms (the specific training mechanism can be found in the Method).
To realize the intelligent design of metasurfaces, we firstly should establish a data set. We introduce a vector S = (h 1 , h 2 , h 3 ,t 1 ,t 2 ,t 3 , dx 1 , dx 2 ) T which is composed of the structure parameters of metasurfaces. Meanwhile, reflected power flow ratio in three channels is expressed as a state vector I = (I 1 , I 2 , I 3 ) T , where I 1 , I 2 , I 3 respectively correspond to power flow of −1st, 0th and +1st order diffractive components.The dataset based on diffraction theory for metasurfaces contains many arrays {S, I}, including 270,000 training samples and 30000 test samples. Specifically, as shown in Fig. 2, we randomly generate many different sets of S and calculate their corresponding I by MATLAB. Here, we construct two network models (A net and B net ), one of which takes S as input data and I as label data while the other is the opposite. Obviously, there is a correspondence between these two vectors, I = A net (S), S = B net (I), where A net is the forward deep learning network and B net is the inverse neural network. It should be noted that B net can hardly converge when it is trained in a similar way as A net since the same power-flow distribution could be accounted for different arrangement of meta-units.
To tackle this multi-value problem, we exploit tandem neural network introduced in Ref. 47 . As In a way, the "decoder" converts the multi-value problem into a single-value problem without complicated data filtering. As the corresponding relation between the design S and the response I changes, the training of connected network converges easily (nearly after 1500 epochs).
With the entire model trained successfully, the transient reactor for three-channel metasurfaces is completely constructed. When we take the middle layer as input and the last layer as the output, it could realize the forward data prediction. While we use the first layer as the input and the middle layer as the output, the desired structure parameters would be predicted by the middle layer, i.e., the output layer of inverse network, as long as we input the target power flow distribution. Actually, what we are concerned more about is the function implemented by the latter, because nonlocal metasurfaces do not have a clear inverse theoretical relationship between structures and their responses, but the realization of the model allows this theoretical relationship to be simulated.
Moreover, it avoids the troubles of conventional optimization methods that require to iterate many times to search for the best value and easily fall into local optimality. For further verification, we choose three sets of predicted data for simulation and experiment.  All the scattering fields in experiment have been normalized by the incident pressure. In addition, we select a line in each measurement area and plot the data extracted from the line in lower panels, defining R(θ ) = Re(p θ p max ) as the abscissa. Experimental results agree well with the simulation, which demonstrates the fantastic functionality of the metasurfaces designed by deep learning algorithm. In the last experimental field, the pressure field of the middle channel is weaker than the two others, which may be caused by the multiple reflection between loudspeaker array and the metasurface.

Discussion
In this work, we investigate the beam splitting function of the grooved metasurfaces based on nonlocal coupling mechanism and realize the arbitrary regulation of the energy ratio for the multichannel metasurfaces. We successfully conducted model training on S and I taking into account the nonlocality among subunits, so that finally the transient reactor can predict the target structures.

Method
Neural network training mechanism. Given that one network model has D layers, then the first layer gets the input data X 0 and the layer D outputs the data X D which has been activated many times by nonlinear activation functions. f represents the activation function.The complete neural network training process is mainly composed of two parts. In the former process, neural network parameters w and b are used to calculate the output values in conjunction with the activation function. Forward equations are as follows: where z called weighted input will play an important role in the back propagation. Cost function(C) evaluates the error between predicted values and expected values. Actually, the goal of training given data is to seek suitable model parameters for which cost function return as small value as possible. Whereas this purpose depends on the later process to complete. In the back propagation, parameters w and b could be updated by gradient calculation and the global optimizer to improve the predictive performance of models. Referring to the back propagation algorithm proposed in Ref. 58 , error gradient of each layer can be expressed as: Then the partial differential of the cost function to the internal parameters of the network can be written as: Note that every neuron will go through these calculation cycles. Considering that the calculation relationship between two neurons located in adjacent layers is similar, for the sake of simplicity, we just use the superscript representing the number of layer for these parameters in above equations instead of marking the specific position of each neuron in a certain layer.
Network model structure and hyperparameter settings. In order to balance the training speed and prediction accuracy of the network model, we adopt the batch training method, that is, part of the data is sent for training each time. Batch size is the number of samples sent for each training.
In all our training process, batch size is set to 256. We use the Leaky Relu as the activation function PXIe-1071). The microphone as a probe scans fields in a step of 5mm. In the automatic scanning platform, we disposed 7 loudspeakers with a total width of 245 mm as the source. Samples are put in an organic glass plate waveguide about 2 cm high for experimental measurements. Foams are distributed at the edge of the measurement field to absorb external sound waves and reduce the influence of internal multiple reflection waves. We first measured the empty field with no sample placed, and then measured the total field with the placed sample. In this way, the difference between the two measurements is the result of the reflected field.

Data availability
The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request. * q.cheng@tongji.edu.cn † yongli@tongji.edu.cn