Programmable arti�cial intelligence machine for wave sensing and communications

.

To establish a re-trainable wave-based D 2 NN, a weight-reprogrammable node is necessary to make the neurons alive.This is possible to achieve using programmable metasurface 19 and information metasurface 20 .In the past decade, metamaterials and metasurfaces have been well developed for manipulating lights and electromagnetic (EM) waves [21][22][23][24] .The digital-coding representation of meta-atoms makes the information metasurface be reprogrammable and sets up a direct connection between the physical world and digital world 19,20,25,26 .With controllable active components, the programmable metasurface can manipulate the reflected or transmitted EM waves in real time under the digital instruction from field-programmable gate array (FPGA).Various functions and applications have been achieved using the information metasurfaces, including newarchitecture wireless communications [27][28][29] , imaging 30 , space-time modulation 31 , and smart devices 32 .
Here, we propose a programmable and on-site trainable artificial intelligence machine (PAIM) using an array of information metasurfaces for wave sensing and communications, in which the multi-layer metasurfaces act as the programmable physical layers of D 2 NN.We design our PAIM to be a real-time re-trainable system, whose parameters could be set in digital to realize alive artificial neurons.In the physical layer, PAIM could hierarchically manipulate the energy distribution of transmitted EM waves by a five-layer information metasurface array 33,34 , from which the amplitude of transmitted wave through each meta-atom could be enhanced or attenuated by controlling the value of digital parameters (Fig. 1a).The phase change of the transmitted wave is determined by the phase part of complex-valued gain factor of the programmable meta-atom, which is also modulated by the customized FPGA circuit (Fig. 3b, Supplementary Fig. S1, Supplementary Materials Note 1 'FPGA Modulation and Feedback Channels of Receivers').When the incident beam goes through a programmable meta-atom in the first-layer metasurface, the amplitude and phase of transmitted wave are determined by the product of incident electric field and complex-valued transmission coefficient of the meta-atom, and the transmitted wave will act as a secondary source and illuminate on all programmable meta-atoms in the second-layer metasurface (Fig. 1b, c), based on the Huygens-Fresnel principle 16 .Then the transmitted waves from all directions illuminate on a meta-atom in the second layer are added up and the whole will act as the incident wave to the meta-atom in the second-layer metasurface (Fig. 1b).This process is continued until to the final layer.Here, the reprogrammable interconnection architectures in PAIM is the fundamental and essential factor to simulate the alive artificial neurons.
According to the radiation pattern of meta-atom (Fig. 1d), the power transmitted by the meta-atom has a certain weight distribution on the next-layer metasurface (Fig. 1c).Therefore, the forward propagation model (see Supplementary Materials Note 3 'Forward Propagation Model') of PAIM can be regarded as a fully-connected network (Supplementary Fig. S2).However, compared with the traditional fully-connected network constructed by real numbers 1 , the PAIM parameters have complex values and the trainable parts are complex-valued transmission coefficients of the metaatoms.Hence, we have fewer trainable parameters (Supplementary Fig. S2).The traditional error back-propagation method could be used to train the PAIM parameters.Meanwhile, owing to the fast parameter-switching ability and direct feedback from receivers (Supplementary Fig. S1), our PAIM enables self-learning capability by using the data gained from the direct interaction with environment, and does not need any prior knowledge.Thus, PAIM possesses the reinforcement learning capacity 35 .
When processing given data, we make the first-layer programmable metasurface as a digitalanalog converter to modulate the given data into the amplitude distribution of EM wave when illuminated by plane waves (Fig. 1a, b).Then, the transmitted EM waves carry the information of given data and will be processed by remaining metasurface layers.Therefore, without using independent and complicated input modules, PAIM is more flexible and compact than the optical D 2 NN platforms 16,36,37 .In fact, PAIM can directly receive and process the EM waves radiated by radars, communication base stations, and wireless routers, making it more environment compatible.
To verify the powerful capabilities of PAIM, we firstly use it to deal with two image classification tasks: oil painting style (Fig. 1e, g, Supplementary Fig. S4) and handwritten digit (Supplementary Fig. S3) classifications.In the first classification task with two kinds of oil paintings (portraiture and landscape painting), we simulate a PAIM with 6-layer metasurfaces, each of which consists of 25×25 programmable meta-atoms.The input image (Fig. 1e) is grayed and reshaped to 25×25 pixels (corresponding to the size of metasurface) (Fig. 1f), and then inputted to PAIM by configuring the first-layer metasurface, in which the transmission coefficient of each meta-atom is set as the corresponding pixel value of the image.Thus, the EM wave would carry the information of the input image when going through the first layer.The remaining five layers constitute the recognition network.At the end of PAIM, we assign 2 receivers to get the 2 kinds of oil paintings.The receiving energy at each receiver represents the level of possibility that the input image is classified.The receiver with the maximum receiving energy corresponds to the kind of classification result (Fig. 1g).After training with 500 handwritten digit images and testing with 100 images, the mean accurate rate to recognize the two oil painting styles is 97%.We remark that the number of trainable parameters is only 5×25×25, which is much fewer than that in the traditional fully-connected ANN networks.The similar architecture is used in handwritten digit classification with the layer size of 40×40, in which the handwritten digits are classified into ten different kinds, reaching a 90.76% classification accuracy after training (Supplementary Fig. S3).More details on the network design and recognition results are provided in Supplementary Materials.
For demonstrating the versatility of our PAIM in real world, we design and fabricate a PAIM sample with five-layer programmable metasurfaces controlled by five FPGA modules, and each layer consists of 8×8 meta-atoms (Figs.1a, 3b).Each meta-atom integrated with two amplifiers (Supplementary Fig. S5) can modulate more than 500 different grades of transmitted gain of the EM wave (Supplementary Fig. S6) individually under the control of bias voltage by FPGA, imitating as an alive neuron in the fully connected network.The support structure of the PAIM sample is presented in Supplementary Fig. S7, in which the first layer (i.e., the input layer) is illuminated by microwaves at 5.4 GHz radiated by a horn antenna.The measurement is performed in a standard microwave chamber (Supplementary Fig. S8c, d).
To test the real experimental performance of PAIM in image classification, we design two imaginative cases.One case is to classify simple patterns, and the other is to recognize four kinds of game props in a popular sandbox game 'Minecraft'.As mentioned above, the first-layer metasurface of PAIM acts as a digital-to-analog converter to convert the input image into the corresponding spatial distribution of EM waves.More specifically, different pixel values in the input image correspond to different transmission coefficients of the meta-atoms in the first layer, radiating EM waves with different spatial distributions onto the second layer.The remaining four layers act as a recognizer, and several receiving antennas are put at the end of PAIM.The training process is run in a computer to obtain appropriate transmission coefficients of the meta-atoms in the recognizer.A gradient back propagation algorithm could be used to train PAIM.However, considering the fact that the adjustable parameters of digital coding metasurfaces are discrete, we specially design a discrete optimization algorithm to make the training results more practical.Meanwhile, we calibrate the wave propagation coefficients between two adjacent metasurface layers using a gradient decent method to make our training results more precise.More details on the discrete optimization algorithm and calibration approach are provided in the Supplementary Materials.
In our first experimental image classification, we design two simple patterns, letter 'I' and bracket '[]'.The positions of such patterns could be different to make the input images more variegated.The pixels belonging to pattern parts and background are allocated with different bias voltages of metaatoms in the first PAIM layer.Experimental results show that our PAIM could classify the two patterns with an accuracy of 100% (Fig. 2a-d, Supplementary Fig. S9).In the second case of game prop recognition, we choose four kinds of classical props: pickaxe, handgun, sword, and axe.We down-sample the original prop image into an 8×8 pixel matrix and use different bias voltages of meta-atoms in the first PAIM layer to represent different pixel values.The bias-voltage configurations of layers 2-5 (corresponding to the recognition part of PAIM) for this case are shown in Fig. 2I.100% recognition accuracy is also achieved in the experimental tests (Fig. 2e-h, Supplementary Fig. S10).
Besides the image classification, we further use PAIM for mobile communication codec, which can perform coding and decoding tasks in Code Division Multiple Access (CDMA) scheme, and transmit four kinds of orthogonal user codes simultaneously or separately in one channel.Here, each user code is a string of binary numbers with length of 64.As shown in Fig. 3a, the first-layer PAIM metasurface is set as an encoder, on which each meta-atom sequentially corresponds to one bit in the binary number string.When high or low bias voltage is set to the meta-atom, it will correspond to '1' or '0' bit, respectively.We put four receiving antennas at the end of PAIM, and each antenna represents a user code.When one of the antennas receives high energy of EM waves, it means that the corresponding user code related to this antenna is transmitted (Fig. 3a).
We use to represent the four user codes, and to represent the receiving energies of the corresponding antennas.The remaining four-layer PAIM metasurfaces are trained as a decoder.When is transmitted by the first layer, the values of would be , in which the function f represents the linear forward propagation function of PAIM, and the term low indicates that the receiving energy is much less than that of high.
Similarly, when is transmitted, the receiving energy values would be .For a more complicated situation, when and are transmitted simultaneously, the receiving energy values would be: . The same is true when three or four user codes are transmitted simultaneously.Owing to the independence of wave propagation, the transmission of one user code has little influence on the transmission of others, and hence each user code could be transmitted nearly independently in one channel.
This property allows simplification of the training process: we do not need to train all combinations of the four user codes.Instead, when the output EM wave distribution of each user code conforms to the designed distribution, the combinations would satisfy the expectation automatically.The total loss function for training is the sum of mean square errors (MSEs) between the designed output energy distribution and the generated one by inputting each of the four user codes.Random Gaussian noise is added to the training input to make our system more robust.The experimental results verify the feasibility of PAIM in wireless communications (Fig. 3c-f, Supplementary Fig. S11).Compared with the traditional CDMA scheme, our PAIM performs coding and decoding using space dimension instead of time dimension, and realizes the Open System Interconnection (OSI) reference model in the physical layer instead of link layer (Supplementary Fig. S12).Therefore, PAIM has the advantage to reduce the time delay in wireless communications.On the other hand, the strong capability of processing distributed space EM waves makes PAIM a good candidate to realize space division multiplexing and thus increases the channel capacity.The decoding function of PAIM is operated as a dependent system and is able to deal with signals from distributed communication base stations (Supplementary Fig. S13).In fact, with the exhaustion of spectrum resources, the space division multiplexing technique has obtained increasing attention and becomes the key technology in the fifth and sixth generations of wireless communications 38 .
Finally, we turn our PAIM into a dynamic multi-beam focusing lens, which could focus the EM energy on multiple points with arbitrary positions.Different from the aforementioned cases, in which the training process is executed on a computer in advance, here we directly make on-site training to PAIM using reinforce-learning method in real time, which completely overcomes the limitation to require priori knowledge in the previous optical D 2 NN platforms.Benefit from the real-time programmable ability of PAIM, we can train the parameters by continuously interacting with unknown and complicated EM environment.Figure 4a illustrates the schematic diagram of the reinforce-learning process, in which the bias voltages of meta-atoms are randomly changed and controlled by FPGA.The same FPGA also receives the feedback signals from receivers and calculates the trend of error function to determine whether the change of bias voltages is reserved or eliminated.In this case, MSE is used as the error function again, and an extra function is added to restrain the redundant EM waves (see Supplementary Materials Note 2 'Reinforce-Learning Process').It could be regarded as a real-time optimization procedure, and the optimal objective is minimizing the distance between the desired pattern and the generated EM wave distribution by PAIM.The output EM wave distributions along with the updated times of parameters are presented in Fig. 4b, c In conclusion, the proposed PAIM is an on-site programmable D 2 NN platform running by realtime control of EM waves in digital way, which can perform computations based on the parallelism of EM wave propagations at the speed of light.It is a universe wave-based intelligence machine, which can not only deal with the traditional deep learning tasks such as image recognition and feature detection, but also provide on-site and user-friendly way to manipulate the spatial EM waves such as multi-channel coding and decoding in the CDMA scheme and dynamic multi-beam focusing, and hence may find new applications in the wireless communications, signal enhancement, medical imaging, remote control, internet of things (IoT), and other intelligent applications.adaptive to the actual application scenarios.
For the multi-beam focusing task, the error is calculated by the following formula: where, k represents the number of points we want to concentrate the EM energy on, k s and k θ indicate the measured energy and desired energy at the kth point, respectively, S represents the total leaking energy, which is the sum of energies radiated outside the target point, and k y and a are constant scale factors.
In one iteration of the training process, we randomly choose 20% trainable meta-atoms, and change their bias voltages by small random values one by one.Instantly after the bias voltage of a meta-atom is changed, the change of output energy distribution is measured by the receiving antennas, and at the same time the change of error is calculated by Eq. ( 1), as shown in Fig. S1.If the error descends, which means that the bias voltage change of the meta-atom can make the current output EM distribution closer to the desired one, the current bias voltage will be reserved, otherwise the bias voltage will restore to the value before currently changed.In our multi-beam focusing task, PAIM could focus the electromagnetic energies to the desired positions after about 500 iterations.
One advantage of the reinforce learning is the result-oriented strategy, in which we do not need to worry about the accuracy of simulations or other factors that could make the designed parameters deviate from the measurement results.Hence, it enables PAIM to deal with very complicated and unknown environments, broadening the application range of PAIM.

Supplementary Note 3. Forward Propagation Model.
Figure 1b demonstrates the 2D structure of the PAIM model, and a more detailed version is illustrated in Fig. S2.We use , 0,1, 2,..., i iM = E to represent the complex electric field illuminating on the i-th PAIM layer, which is an N-dimensional vector (N is the number of meta-atoms in a layer), and each element indicates the field received by the corresponding meta-atom.

E
is the complex field on the output plane (output layer).Thus, the length of 1 M + E depends on the number of receiving antennas or the sampling numbers of moving probe.
, 0,1, 2,..., i iM = T represents the complex transmission coefficients of the i-th layer, which is also an N-dimensional vector and each element in this vector corresponds to the complex transmission coefficient of each meta-atom in the i-th layer.Then the forward propagation formula can be written as: 1 ( ), 0,1, 2,..., in which  is the Hadamard product and i W represents the space attenuation coefficients from the i- th layer to the (i+1)-th layer.In fact, , 0,1, 2,..., is an N×N dimensional matrix, and its element in the p-th row and q-th column represents the space attenuation coefficient from the q-th meta-atom in the i-th layer to the p-th meta-atom in the (i+1)-th layer.
The matrix partial derivatives in Eq. ( 3) apply for the numerator format and diag means matrix diagonalization.For different tasks, we define different error functions, symbolized by 1 () M Error + E .

490
For the classification tasks, we use cross entropy error function; and for other tasks, we use mean square error function.Applying the chain rule, the final gradient formula can be written as:

Error Error
Error Error When all gradients are calculated, the method of gradient descent is used to optimize i a or Error Error d a  outcome.Several methods could be used to accelerate the optimization process, such as organizing the computational process in matrix form or using parallel algorithms.
W will deviate from the initial one, which would bring difficulties to parameter designs in the training process.Thus, we need to calibrate the value of i W before starting the training process.
Firstly, we acquire 5000 experimental samples.For each sample, we randomly set the bias voltages on all meta-atoms in PAIM and measure the output field distributions by a specially designed 8×8 receiving antennas array.We remark that the number of experimental samples should not be small, otherwise the parameters will over fit the data set and reduce the accuracy of simulations.Secondly, we use gradient descent method to correct the value of i W .Before introducing the detailed approach, we need to define some symbols.The definitions of i E , i T and , 0,1, 2,..., i iM = W are the same as those in Eq. ( 2) and 1 M + E represents the calculated complex field value on the output plane when 0 E and , 0,1, 2,..., i iM = T are given.Here, 0 E is obtained by measuring the EM fields from the transmitting antenna and , 0,1, 2,..., i iM = T is set according to the bias voltage of each meta-atom at the i-th layer.We use in which, * means the complex conjugate transpose; ML − A and ML − B are calculated by For M ρ and M φ (L = 0), we have in which, l indicates the learning rate for updating, and could be set an initial value as 3e -4 .The learning rate needs to be gradually decreased during the calibration process.
 Step 5: Execute Step 2 to Step 4 in loops until the current error is less than the pre-set value or reaches the maximum number of iterations, then output , 0,1, 2,..., i iM = W as the calibration results.
The comparison between the measured and simulated energy distributions after the calibration is shown in Fig. S14.We observe that the high energy regions in measured and simulated field distributions agree very well with each other.meta-atom in the learning layer will receive the waves radiated from all meta-atoms in the former layer, making the PAIM structure a full-connected network.The transmission coe cient of each meta-atom can be trained by using supervised/unsupervised learning or even reinforcement learning methods to achieve various functions.The rst layer acts as the input layer by using pre-set transmission coe cients to encode the input information into the spatial distribution of EM energy.c, The transmitted wave of a meta-atom, multiplied by propagation factors , illuminates on all meta-atoms in the next layer.Then the EM wave is multiplied by the complex-valued transmission coe cient to act as the secondary source of wave.d, The radiation pattern of a meta-atom.e-g, Two testing examples for oil painting classi cation, in which (e) and (f) are the original pictures and their corresponding normalized visualization images of amplitude transmission coe cients in the 330 input layer, and (g) illustrates the energy distributions in the output plane, demonstrating that the two pictures are successfully classi ed.Here, the distributions of input elds in (b) and (c) are randomly generated but remained unchanged during the reinforce-learning process.We observe that the energies of the output elds are gradually focused on the target points when the updating procedure goes on.

Fig. 1 |Fig. 2 |
Fig. 1 | PAIM -A reprogrammable D 2 NN platform.a, An array of programmable metasurfaces is used to construct PAIM, in which several FGPAs are installed to control the gain factor of each meta-atom, making PAIM a real-time and re-trainable intelligent machine.b, The schematic diagram of PAIM.A meta-atom in the learning layer will receive the waves radiated from all meta-atoms in

Fig. 3 |
Fig. 3 | Experimental results of encoder and decoder in the CDMA task using PAIM.a, The energy distributions of all layers when radiating the coding information of the fourth user.The input layer acts as an encoder to transform the user code to the energy distribution in the free space.The yellow and black dots in the input layer represent the binary digits '1' and '0', respectively.PAIM receives the spatial energy distribution and decodes it by four metasurface layers to judge which user code has been transmitted.The input patterns of the four user codes are presented in Supplementary Fig. S11.b, Photograph of one of the fabricated metasurface layers, which is controlled by altogether

Fig. 4 |
Fig. 4 | Experimental results of dynamic multi-beam focusing by the on-site reinforce-learning process using PAIM.a, The on-site reinforce-learning process of PAIM, in which the transmission coefficients of each PAIM layer are continually controlled by FPGA according to the real-time

MWSupplementary Note 4 .
connects the last PAIM layer and the output layer, and hence it is a K×N matrix and K is the number of receiving antennas or the sampling number of moving probe.The schematic diagram of EM-wave propagations between adjacent layers and the radiation pattern of one meta-atom are demonstrated in Fig.1c, d.The symbols W , 1, 2, 3, 4,... n n = in Fig. 1d are different from the matrix i W in Eq. (2).In fact, W , 1, 2, 3, 4,... n n = constitute one of the columns in i W .Using the hierarchical formula (2), we get the final output field once the input field ( 0 E ) is given.Error Backpropagation.Eq. (2) presents the forward propagation model of PAIM, which is a linear model.We use i a and i Φ to represent the amplitude and phase parts of , 0,1, 2,..., i iM = T , respectively.The gradients of ML − a and , 0,1, 2,..., ML LM − = Φ can be easily obtained by matrix multiply operations using the chain rule of derivative:

 Step 1 : Step 2 : Step 3 :
. Various kinds of deep learning optimizers can be used, such as Adam 39 .the p-th meta-atom in the i-th layer, whose complex transmission coefficients are represented as i T .The procedure of the optimization algorithm is given as follows (optimizing i p a as example).Initialize all parameters of , 0,1, 2,..., i iM = T by discretizing the uniform random distribution.Calculate the output electric fields ( Initialize the update step value of i p a randomly, which is expressed by i the difference of adjacent discrete values.Because the linearity of our network, if the value of

Figure 4 Experimental
Figure 4 . After the automatic training, PAIM could transform the output EM waves into the target point(s) with more than 90% concentrated energy, and the focusing point(s) could shift with a frequency of 100 Hz to realize multi-beam scanning.In this application, benefit from the real-time updates of parameters in PAIM, no extra training dataset is needed.The experimental results indicate that PAIM could apply for different kinds of EM environments.