Architecture of the perception enhancement system
Here, we propose a smart metasurface system architecture through a combination of multiple means, as outlined in Fig. 1. Eye tracking glasses, the wearable solution for capturing objective measures of cognitive workload and viewing behavior, are well-suited to providing reliable oculomotor data at the required spatial and temporal resolutions. When the user wears the eye tracker and runs the program, the relevant eye movement data including gaze point, eye saccades, blinks and so on can be extracted. On this basis, data processing program is used to process the raw data of eye movement. Through analysis of the area of interest (AOI) and the state of the subject (e.g., attention), areas that need to be tracked or perceptually enhanced are identified, which lays the foundation for control of programmable metasurface. It is worth mentioning that a wearable display attaches to the eye tracker in some demos, so that the observer can see the processed microwave band information while still looking at the target in front. Unlike the traditional means of displaying microwave information on a PC monitor, the extra screen on the eye tracker displays microwave information without letting the observer take eyes off the surrounding world.
Programmable metasurfaces are accompanied by digital coding characterization, in which the EM responses are manipulated by the digital-coding sequences. Varactors are integrated into metasurface unit cells. As the voltage applied to the diode changes, the digital-coding sequences change accordingly. With the aid of field programmable gate array (FPGA) and digital-analog conversion (DAC) module, a large range of control voltages used for the selected varactors can be generated. Therefore, in accordance with AOI and subjects' states, the algorithm can determine the reaction of the metasurface by itself and instruct the FPGA to change the programmable metasurface configuration and then digital-coding sequences are determined. In this way, significantly distinct functions of programmable metasurface can be adjusted to facilitate beam splitting, scanning, scattering and other smart features.
On this basis, visual information in the optical frequency band is organically combined with the information in the microwave frequency band. Observers make basic decisions with the help of the observed optical information, and further mine more useful information in microwave frequency band. It makes humans more perceptive through receiving and processing of echoes and a rich set of functions have been developed. In this part, multi-targets' physiological state detection including respiration and heartbeat is carried out, followed by human location and motion detection under obstruction, target tracking based on visual information and healthcare system for deaf people.
Autonomous beam control following eye movements
In this scenario, we propose an eye-movement-based reflection beam steering scheme that the direction of irradiated EM waves changes with eye movement using eye tracker. In this method, autonomous and flexible beam control are realized which lays the foundation for microwave detection.
Here, we designed a reflective 2-bit digital programmable coding metasurface. Two varactors are integrated into each unit cell, the structure of which is illustrated in Supplementary Note S1. Through tuning the biasing voltage of the varactors, the phase response of the unit cell can be accurately tailored within a wide phase range. To grasp a complete picture of the unit cell reflection spectra, the reflection magnitude and phase of the unit cell under different bias voltages are simulated using CST Microwave Studio, as shown in Fig. 2(a) and (b). The equivalent RLC circuit of varactors for simulation is illustrated in Supplementary Note S2. It can be concluded that the reflection minima are observed to blueshift as the bias voltage changed from 2V to 23V. And high reflection efficiency in all states is observed in the operating band. Furthermore, a large phase range covering almost − 180° to 180° and good linearity are observed around 4GHz. This enables beam steering of coding metasurface based on generalized Snell's Law and the theory of phase array antenna (Supplementary Note S3). Four bias voltages (2V, 6V, 9V, and 23V) are adopted to meet the 90-degree phase shift demand, making the unit cells generate four digital states "00", "01", "10", and "11" and thus constitute a 2-bit digital metasurface. The pattern and angle of reflection and scattering can be tuned via changing coding sequence, coding period and relative phase shift of adjacent elements. The designed metasurface consists of 36×12 unit cells with a total area of 550×400 mm2. The scheme is designed with different coding sequences, coding periods and relative phase shifts, as illustrated in Fig. 2(c)-(h). It can be observed that the proposed 2-bit digital metasurface is able to achieve beam reflection, deflection, beam splitting and beam diffusion by adjusting the coding sequence, respectively. In our demo, the desired scattering patterns and angles of the beam are designed to irradiate the AOIs.
Next, we combine the eye movement process with the metasurface to realize autonomous EM beam control. In our first experiment, we investigated scenarios with eyes looking at targets in different positions or looking at multiple targets symmetrically in a short period of time. Eye movement enables the control of reflected EM beam steering of the metasurface with the help of eye movement data processing program, FPGA and DAC module. The principle and method of eye tracking is illustrated in Supplementary Note S4. When the time of gazing on AOI exceeds the time threshold, the reflected beam will turn to the angle corresponding to the angle of sight, where the included reflected angle changes from − 36° to 36° corresponding to the maximum viewing angle of the eye tracker. The threshold is set for subjects to continuously fixate the AOI for 1 second. Since the phase range of the unit cell covers almost from − 180° to 180°, any angle in one-dimensional space can be theoretically realized. Facing multi-object situations, when subjects focus on two symmetrical targets in a period, that is, the subjects focus on both two symmetrical AOIs for more than 1 second within 2.5 seconds, the reflected wave is split into two symmetrically oriented directions. In the above two scenarios, the EM beam will be irradiated in the direction of the subject interest. Otherwise, if there is no clear gaze point in a period, that is, observers do not find clear targets or rest with their eyes closed, the metasurface can be programmed to generate diffuse scattering. The reflected beam will be diffused in order to obtain information in the microwave frequency band in a wider direction.
The experiment environment and system are illustrated in Supplementary Note S5. The experiment results are depicted in Fig. 3, which illustrates the relation between gaze points as well as eye images in the upper half space and the measured far-field scattering pattern in the lower half space. Using this control method, we successfully combine the field of view with microwave radiation of metasurface, making up for the limitation of visual perception and laying the foundation for microwave detection.
"X-ray Eyes"
We have illustrated the autonomous beam control method following eye movements in the previous section. On this basis, a rich set of functions have been developed in this section. In order to improve humans' perceptual ability, an efficient and smart perception enhancement route named "X-ray Eyes" is proposed, associating the visual information with human physiological information through metasurface.
In this part, two sets of application schemes are carried out to verify the effectiveness of the design. The first experiment is the detection of the respiration and heartbeat signals of multiple human targets in free space. The observer perceives not only visual images of the targets, but also their invisible physiological signals such as respiration and heartbeat. The second experiment is the location and motion detection of human subjects behind obstacles. The proposed method can not only detect the exact location of hidden personnel in azimuth direction but also his movement pattern. Physiological signal detection, hidden location and motion detection provide a new complement to the perception of personnel besides visual information. The method provides an efficient approach for stronger multi-type information acquisition in a variety of application environments.
(a)Visible multi-targets' respiration and heartbeat detection
Herein, two volunteers were recruited to conduct the experiment, with the separation angle in azimuth direction set to be ± 36˚. Each of them wears a piezoelectric respiratory belt and disposable ECG electrodes connected to a portable electrocardiograph to record the reference respiration signal and ECG signal. On the basis of autonomous beam control following eye movements, the observer sequentially gazes at one of the subjects over the time threshold in order to make EM waves irradiate the subjects. The detailed experimental configuration is illustrated in Supplementary Note S8(a).
When the person to be detected takes a breath, a portion of body parts, mainly the chest wall moves periodically according to his breathing pattern39–41. Moreover, the mechanical deformation caused by heart beating would also be transmitted to the chest wall, which is superimposed on the micro-movements caused by respiration. EM waves radiated onto the surface of human body can detect this faint micro-motion of the chest wall, which is recorded by the receiving antenna. The superposed fretting signal is then demodulated from the phase of echoes by utilizing the phase unwrapping signal processing method. After that, by applying a simple filtering finite impulse response (FIR) bandpass process, the respiration and heartbeat signals of the subject can be obtained. The theory of respiration and heartbeat detection is illustrated in Supplementary Note S6. Since the main lobe of EM wave reflected by metasurface is only facing one of the subjects each time, respiration and heartbeat signals of multiple targets are obtained in turn.
The experiment results of two subject's respiration and heartbeat detection are shown in Fig. 4. The red lines in the figures are time-domain waveforms of subjects' respiration and heartbeat signals collected using the vision-based metasurface platform, respectively. While the black lines are reference signals detected by the respiration belt and the portable electrocardiograph for comparison. Experimental results show that the respiration and heartbeat cycles matched well with the reference signals for both subjects. Moreover, the respiration and heartbeat signals of different subjects from different azimuth angles can be well separated on a physical level by means of changing the direction of EM waves through eye movement.
Therefore, it can be concluded that the system can effectively detect time-varying respiration and heartbeat signals of multiple human subjects. Compared with traditional methods, only a pair of receiving and transmitting antennas are used in this system, which greatly reduces the system complexity. Moreover, it shows strong ability to suppress the environmental noises, clutters and multipath effects by directing the beam only to the direction of target of interest. Most importantly, the proposed method can well account for the traditional challenge of accurate breath and heartbeat detection when multi-target co-exists in the environment through beam regulation.
(b) Human location and motion detection behind plank obstacles
In this section, plank obstacles were placed between observer and subjects to block the observer's view. One or three recruited volunteers were asked to stand in one of three azimuths (-36˚, 0˚, 36˚) behind the plank. And the observer fixated the corresponding azimuth angle over the time threshold in sequence to determine whether the human subject exists in the specified location. Then the subject began to perform one of the following motions (jumping, squatting, falling, and walking) at the specified azimuth. The detailed experimental configuration is illustrated in Supplementary Note S8(b).
The localization of human subjects is achieved by detecting the respiration signal. When the EM beam controlled by the eye tracker is directed towards a specific azimuth, there exists a hidden human subject if the respiration signal can be detected. Otherwise, there presents no human subject at that azimuth. The theory and algorithm of human location as well as motion detection behind plank obstacles are illustrated in Supplementary Note S7.
The principle of motion detection can be described as follows: when the human subject conducts different types of motions, the body parts involved will produce characteristic micro-Doppler modulations to the incident EM waves. By applying short-time Fourier transform based time-frequency analysis to the collected scattered EM echoes, the time-varying instantaneous Doppler frequency of different body parts can be obtained. Since different motions modulate the incident EM waves in a different way, the obtained micro-Doppler signatures would show distinct distribution patterns in spectrograms. Thus, the resulted spectrograms can be utilized for distinguishing different motions. After obtaining the spectrogram, the principal component analysis (PCA) method is employed to extract the characteristic motion features. Then, the support vector machine (SVM) is utilized to classify different motions. The establishment of the corresponding motion database is illustrated in Supplementary Note S8(c). After the construction of motion database, the aforementioned motion signature representation, feature extraction and motion classification are implemented.
As depicted in Fig. 5(a) (b) and (c), apparent respiration signal is detected at azimuth of 36˚, which indicates that a human subject exists in that direction. And it was found that the metasurface measured respiration signal is in highly accordance with that obtained using the contact respiration belt. While no breath is detected in other two azimuths, which is in line with the actual experimental setup. Figure 5(d) (e) and (f) show that respiration signals are detected in all three directions, which means there locates human subjects in all three directions. The above results validate the human location detection method.
Then the human target at an azimuth angle of 36˚ is chosen as an example to verify the motion detection performance. Figure 5(g)-(j) show the obtained micro-Doppler signatures of the four behind-the-plank human motions. For in-place jumping motion, symmetrical micro-Doppler signatures appear in both positive and negative-hemi-axis, which caused by knees bending forward and hands waving backward when the human body jumps. The positive and negative Doppler frequency components of squatting are caused by the forward as well as up motion of one knee and backward as well as down motion of another knee, respectively. It is observed that the amplitude of positive and negative Doppler frequency components is much smaller than that of the jumping motion, but with increased repetition frequency. As for falling motion, characteristic volcano-shape positive Doppler signature envelope is sighted in the spectrogram, which is caused by the large scale forward falling motion of the body torso. Meanwhile, a small negative envelope appears in the spectrogram, induced by the backward kicking of foot after the falling. The walking motion presents obvious alternating positive and negative Doppler frequency, introduced by the EM wave modulation when subject walking towards and away from the metasurface. Then, based on these feature representations, a SVM classifier is trained to classify and recognize different behind-the-plank motions. Through training, the recognition accuracy of the model reached 93% for four behind-the-plank human motions.
Like breath detection, when detecting human motion of a specific azimuth, the narrow beam emitted by metasurface can exclude the interference from other positions in the detection space, which made it capable of robust operating in a real detection scenario where multiple human targets in motion co-exists. By this means, the observer can not only master the optical information of the subject, but also master the information in other frequency bands. Human location and motion can be detected without being visual sighted by the observers, that is, the observer has a pair of "X-ray eyes".
"Glimpse-and-forget" metasurface smart target tracking system
When maintaining high-intensity work in complex scenes or performing multiple tasks, the visual system can only selectively handle a small part of visual information from a large number of candidates. It restricts the visual system's capability to attend and process the large amount of information simultaneously present within a visual scene. In this situation, visual system may focus on one goal while ignoring others, leading to the miss of useful information. Aiming at this issue of human visual system, a beam tracking method based on vision is proposed in this scenario to improve the efficiency of information acquisition and processing. Once a target is selected by human vision system, the EM beam can automatically track the target without further attention, while visual system is freed up to process other information instead of continuing to track the selected target. By this means, visual perception is enhanced with the assistance of eye tracker. And the burden on the vision system is reduced, allowing more tasks to be performed at the same time.
To track the object over time, we follow the scheme of the Kanade-Lucas-Tomasi (KLT) algorithm to create a tracker that automatically tracks a single object. The KLT tracker is commonly used for tracking feature points of a target due to its excellent processing speed and high accuracy. It is a classical algorithm widely applied in applications such as video stabilization and image mosaicing. By means of invoking the eye tracker's camera, the KLT algorithm can track a set of feature points across consecutive video frames. The tracking procedure is denoted as follows: First, the observer locates the object to be tracked through blinking eyes quickly. The eye movement data processing program applies a bounding box to the object. Next, the feature points can be reliably extracted for the target object within the box. KLT tracker confirms the motion of the target object by tracking these feature points in frames. Finally, the eye tracker to metasurface interface program determines the reaction of the metasurface by itself. To be specific, digital-coding sequences of the programmable metasurface are changed to keep the reflected beam turning to the angle corresponding to the tracked object.
We consider a series of applicational situations in multi-targets or complex environments, e.g., unmanned aerial vehicle (UAV) and departure hall. In the departure hall, crowds of people walk with luggage at railway stations or airports, which brings difficulty to tracking. Using smart metasurface platform, observers can lock onto targets in large crowds without affecting the search for the next target, and microwave beams can then track them based on the information provided by KLT tracker, recording the target's locations and assisting in detecting target states. As another example, we need to track slow-flying UAVs and jam them with the help of microwave weapons in the outdoor environment. Through blinking eyes quickly, the programs can select, lock and track the gaze area where the UAV appears. Then the microwave beam is steered to irradiate and track the UAV. Two intentional vulnerabilities of UAVs, Jamming and GPS spoofing, can be exploited in this way. EM waves incident to UAVs can carry high-power noise or counterfeit GPS signals in order to hinder a navigation service or produce fake GPS positions. Therefore, it is possible to successfully spoof or jam a civilian UAV and change its flight trajectory from the predefined ones without the user's notice.
Herein, we take the UAV scene as an example. The experiment scenario is illustrated in Supplementary Note S9. The experiment results are depicted in Fig. 6, which illustrates the relation between the locations of the UAV and the measured far-field scattering pattern. The EM beam is able to automatically track and illuminate the UAV as it moves. Since the phase range of the unit cell covers almost − 180° to 180°, any angle in one-dimensional space can theoretically be produced. At the reflection angle of 21.5˚, the reflected beam is split into two beams, which is used to demonstrate the quick switching capability of metasurface's operating modes. It can be concluded that the proposed "glimpse-and-forget" metasurface smart system can track the moving target, which verifies the effectiveness of the design.
Barrier-free speech acquisition and enhancement system for deaf people
Speech is an essential tool in information sharing between people. Deaf people have serious problems to access information due to their inherent difficulties to deal with spoken languages, contributing to a reduction of social connectivity and is associated with increased morbidity and mortality. Therefore, it is of great significance to improve the speech perception ability for deaf people.
In this scenario, a barrier-free speech acquisition and enhancement system is proposed to provide healthcare access and overcome communication barriers for deaf people based on vision-driven metasurface platform, as shown in Fig. 7 (a). A deaf person observes an interested target driven by the voice vibrates (loudspeaker in this case). The reflected signal of the target is measured and analyzed, thereby obtaining voice signals and the corresponding speech recognition results. With the help of the wearable display attached to eye tracker, deaf people can see text messages projected on his eyes corresponding to voice signals. In this way, sound wave information is converted into microwave information, and then into visual signal directly visible to the human eyes. The system breaks through the barriers of visual information and auditory information, realizing the "visibility" of speech signals. It helps deaf people to acquire speech information without barriers, so as to communicate, watch dramas and movies, especially in noisy environments.
Traditional acoustic microphones and recognition algorithms are easily to be affected by environmental noise and multiple sound sources. The system, however, is capable of excluding the babble noise from other positions and empower the deaf people with "cocktail party effect", that is, the ability of human hearing sense to extract a specific target sound source from a mixture of multiple sound sources and background noises in complex acoustic scenarios. Moreover, different from other reported metasurface based speech acquisition work, the system can obtain the understandable clear speech signal by analyzing the reflected EM wave, instead of a series of simple instructions.
Experiments were conducted to verify the effectiveness of our speech acquisition and enhancement system. The detailed experimental configuration is illustrated in Supplementary Note S10. Metasurface platform first directs EM beam guided by eye tracker toward the loudspeaker to detect its surface vibrations driven by voice signal. The backscatter EM signals is collected using a receiving antenna connected to vector network analyzer (VNA). The micro-displacement information of audio signal is carried in the phase of the backscattered EM signal.
To reveal the authentic audio signal, the minimum mean-square error (MMSE) based short-time spectral amplitude (STSA) speech enhancement algorithm is applied to suppress the noise and enhance speech information. The theory and algorithm of speech acquisition and enhancement are illustrated in Supplementary Note S11. Figure 7 (b) denotes the waveform of a recorded audio played by the loudspeaker, an adult male saying "Good Morning. How are you doing" in a natural way. Raw microwave speech echo signals disturbed by the ambient noise are recorded by VNA, as shown in Fig. 7 (c). On this basis, MMSE-STSA algorithm is applied to obtain clean voice information, and then the enhanced audio signal is further processed using a python-based speech recognition module developed by Microsoft Corporation. The speech enhancement result and the recognition result are illustrated in Fig. 7 (d) and (e). The recognition perfectly reconstructs the text information from the target audio source, which proves the effectiveness of our proposed metasurface based speech acquisition and enhancement system.