A memristor-based analogue reservoir computing system for real-time and power-efficient signal processing

Reservoir computing offers a powerful neuromorphic computing architecture for spatiotemporal signal processing. To boost the power efficiency of the hardware implementations of reservoir computing systems, analogue devices and components—including spintronic oscillators, photonic modules, nanowire networks and memristors—have been used to partially replace the elements of fully digital systems. However, the development of fully analogue reservoir computing systems remains limited. Here we report a fully analogue reservoir computing system that uses dynamic memristors for the reservoir layer and non-volatile memristors for the readout layer. The system can efficiently process spatiotemporal signals in real time with three orders of magnitude lower power consumption than digital hardware. We illustrate the capabilities of the system using temporal arrhythmia detection and spatiotemporal dynamic gesture recognition tasks, achieving accuracies of 96.6% and 97.9%, respectively. Our memristor-based fully analogue reservoir computing system could be of use in edge computing applications that require extremely low power and hardware cost. Dynamic and non-volatile memristors can be used to create hardware-based reservoir and readout layers in artificial neural networks, providing a fully analogue signal processing chain for efficient data classification.

Reservoir computing offers a powerful neuromorphic computing architecture for spatiotemporal signal processing. To boost the power efficiency of the hardware implementations of reservoir computing systems, analogue devices and components-including spintronic oscillators, photonic modules, nanowire networks and memristors-have been used to partially replace the elements of fully digital systems. However, the development of fully analogue reservoir computing systems remains limited. Here we report a fully analogue reservoir computing system that uses dynamic memristors for the reservoir layer and non-volatile memristors for the readout layer. The system can efficiently process spatiotemporal signals in real time with three orders of magnitude lower power consumption than digital hardware. We illustrate the capabilities of the system using temporal arrhythmia detection and spatiotemporal dynamic gesture recognition tasks, achieving accuracies of 96.6% and 97.9%, respectively. Our memristor-based fully analogue reservoir computing system could be of use in edge computing applications that require extremely low power and hardware cost.
With the rapid development of artificial intelligence, artificial neural networks (ANNs) are becoming increasingly larger and deeper in scale, placing considerable strain on hardware based on the von Neumann architecture on which they are run 1,2 . To break the von Neumann bottleneck-the performance and energy limits imposed by having to transfer data between memory and processing units-brain-inspired neuromorphic computing based on devices such as memristors 3,4 could be used [5][6][7] . For example, computing-in-memory architecture with emerging resistive switching devices (including non-volatile memristors (NVMs)) has shown orders of magnitudes higher energy efficiency than traditional computing platforms, such as central processing units and graphics processing units 2 . This is primarily because parallel multiply-accumulate computations can be implemented in a purely analogue fashion via fundamental physics laws on a memristor array by making use of its analogue resistive switching characteristics.
Most of NVM-based ANN demonstrations have been feedforward networks, such as multilayer perceptron and convolutional neural network [8][9][10][11] , which have difficulty handling temporal tasks. ANNs with recursive connections-typically called recurrent neural networks (RNNs)-are designed to deal with temporal inputs [12][13][14] . However, Article https://doi.org/10.1038/s41928-022-00838-3 devices can be considered as special physical systems or nodes 36 that function as a reservoir and can further reduce both energy consumption and hardware area. However, these hybrid RC systems still require ADCs to convert analogue reservoir states into digital signals for the digital readout layer. Recently, NVM arrays have been used to create the analogue readout layer 32,35 . However, the systems reported so far still need digital units, such as ADCs and registers, between the reservoir module and readout module for data conversion and buffering. In fully analogue RC systems ( Fig. 1c) with analogue reservoir and readout layers, signals can be directly transmitted and processed throughout the system without any conversion and buffering. Such systems could perform real-time spatiotemporal signal processing with lower power consumption and hardware cost compared with digital or hybrid digital-analogue systems, but they have not yet been demonstrated.
In this Article, we report a memristor-based fully analogue RC system. The approach is based on two distinct types of memristor: dynamic memristors (DMs) that are used as physical nodes to construct parallel reservoirs, and NVM arrays that are used as the readout layer. We examine the relationship between the node features and system performance by carefully adjusting the hyperparameters. Two key node features-threshold and window-are found to have a large impact on system performance. The threshold represents the smallest input signal that is required to turn on the physical node, and the window characterizes the size of the hysteresis window that a physical node exhibits under the dual sweep of the input signal. By engineering DM-based reservoirs, our RC system can be used for real-time spatiotemporal signal processing with more than three orders of magnitude lower power consumption than a digital RC system equivalent. We evaluate the performance of the approach via arrhythmia detection and dynamic gesture recognition tasks, achieving accuracies of 96.6% and 97.9%, respectively.
training RNNs is challenging as the process suffers from exploding or vanishing gradient-related problems. They are also difficult to implement with NVM arrays in a fully analogue fashion because variations in NVM conductance and noise in analogue circuits can cause error accumulation during recursive computations, leading to large output errors.
An alternative approach is to replace the recursive network with a dynamic physical system [15][16][17] . This approach can circumvent the error accumulation problem in the recursive network as well as reduce hardware cost. A representative example of this idea is reservoir computing (RC) [18][19][20] , which was initially considered as a specialized variant of an RNN. Unlike a traditional RNN, the recursive network in the reservoir is fixed and its weights do not need to be changed during the training process, which enables the reservoir layer to be created with a specific physical system. In addition, the training process of RC only involves the weights connecting to the final readout layer, where low-cost training algorithms, such as linear regression, can be used.
Hardware implementations of RC systems can be roughly divided into three types: fully digital systems, hybrid systems and fully analogue systems. In fully digital RC systems (Fig. 1a), the reservoir layer and readout layer use digital components, such as field-programmable gate arrays or digital signal processors [21][22][23] . Although these systems can minimize noise, additional analogue-to-digital converters (ADCs) are usually needed to convert the analogue signals into digital inputs, which require power and introduce latency. These inefficiencies are worsened when floating-point-based approaches are used. To address these problems, hybrid RC systems, which have an analogue reservoir layer and a digital readout layer, can be used (Fig. 1b). These systems can directly receive analogue signals and process them in the reservoir, and they have been demonstrated using spintronic oscillators 24-26 , photonic modules [27][28][29] , nanowire networks 30-32 and memristors [33][34][35] . These

System architecture and device characteristics
The hardware implementation of a fully integrated dynamicmemristor-based reservoir computing (DM-RC) system is shown in Fig.  2a, which consists of a power module, microcontroller unit (MCU), reservoir module and readout module. The power module provides stable positive and negative voltage sources, and the MCU generates control signals and handles the data transfer. The reservoir module maps the features of low-dimensional temporal input signals to high-dimensional space. Such high-dimensional features (that are reservoir states) are usually linearly separable; therefore, they can be classified using a simple fully connected readout layer. The core of the reservoir module is a set of DMs that are connected to the printed circuit board (PCB) through a probe card. The DM used in this work has a cross-point structure with a material stack of Ti/TiO x /Pd (110/80/50 nm), and the cross-sectional transmission electron microscopy (TEM) image of the device is shown in Fig. 2a, left inset. Here 24 DMs are used in our system to form parallel reservoirs, and the memristive current-voltage (I-V) curves along with the optical image of these devices are shown in Fig. 2a, left. All the 24 DMs show similar I-V curves with a strong rectification feature, which offers the desired nonlinearity for designing the reservoir. More electrical characterization results of the DM are shown in Supplementary Fig. 1a,b. To build a complete reservoir module, peripheral circuits, such as amplifiers and multiplexers (MUXs), are employed to realize the mask process 33 (Methods) and signal transmission. One DM, along with its peripheral circuits, is treated as one nonlinear physical node, which we call a DM node; therefore, the entire reservoir module is composed of 24 DM nodes. The details of signal processing in a DM node are shown in Fig. 2b. First, the input signal is amplified by a factor A i through an input amplifier. Then, the amplified signal and its inverted counterpart are connected to a MUX, which is controlled by a special mask sequence with pulse width δ. The masked signal is added by an input voltage bias B i and then applied to the DM. After that, the output current signal of the DM is first converted to a voltage signal through a transimpedance amplifier (TIA) and then amplified by a factor A o through an output amplifier. The final output signal and its inverted counterpart are fed to the readout module. The function of the readout module is to perform a weighted summation of all the reservoir states to obtain the final system output. First, the input signal is amplified by a factor A i through the input amplifier. Then, the amplified signal and its inverted signal are connected to a MUX, which is controlled by a special mask sequence with pulse width δ. The masked signal is added by an input voltage bias B i and then applied to the DM. The output current of the DM is first converted to a voltage signal by a TIA and then amplified by a factor A o through the output amplifier. The final output signal and its inverted signal are fed to the readout module. c, Circuit diagram of the hardware RC system. The analogue input signals are directly fed into multiple parallel DM nodes with different masks. The output signals of the reservoir module are applied to the BLs of the NVM array in the form of differential pairs, to realize both positive and negative weights. The current-signal outputs from the SLs are collected by the integrators and the final output signals are sampled from the output of these integrators.
Article https://doi.org/10.1038/s41928-022-00838-3 The readout module includes four NVM chips, each one consisting of an array of 2,048 one-transistor-one-resistor (1T1R) cells. The NVM used in this work has a memristor material stack of TiN/TaO x /HfAlO y / TiN, where HfAlO y acts as the resistive switching layer and the TaO x layer serves as the thermal enhanced layer to improve the analogue switching characteristics 8,37 . The cross-sectional TEM image of the NVM device is shown in Fig. 2a, right inset. The I-V curves of NVM at different resistance states are shown in Fig. 2a, right, which exhibits good linearity. Such linear I-V characteristics ensure that the device has the same resistance under different input voltage levels, which allows us to directly use analogue voltages as inputs. More electrical characterization results of the NVM are shown in Supplementary Fig.  1c,d. The circuit diagram of the complete DM-RC system is shown in Fig. 2c. The analogue input signals are directly fed into the reservoir module that has multiple parallel DM nodes with different masks. The output signals of the reservoir module are applied to the bit lines (BLs) of the NVM array in the form of differential pairs to realize both positive and negative weights 38,39 . The current-signal outputs from the source lines (SLs) are collected by the integrators, whose output signals are sampled to yield the final result ( Supplementary Fig. 2). In this way, the DM-RC system completes the signal processing in a fully analogue fashion (Fig. 1c).

Hyperparameter analysis
To optimize the performance of the DM-RC system, four hyperparameters, namely, A i , B i , A o and δ, in the reservoir module are set to be externally adjustable. Figure 3a illustrates the experimental approach to explore the relationship between the DM node behaviour and system performance. The DM node behaviour is characterized by the inputoutput (I-O) curve measured from the voltage sweep, and the DM-RC system performance is measured by the normalized root mean square error (NRMSE) on a waveform classification task ( Supplementary Fig. 3). Different sets of hyperparameters are used when optimizing both DM node behaviours and system performance. Under each set of hyperparameters, the waveform classification results of the DM-RC system and the corresponding I-O curves of the DM nodes are recorded for subsequent analysis. Figure  To facilitate the analysis of the impact of the DM node's behaviour on the RC system performance, three key features, namely, threshold (T), slope (S) and window (W), are extracted from the normalized I-O curve.
A simplified DM node model is established using these three features, and the details of this model are described in Methods. Figure 3c shows the simulated I-O curves of the established DM node model under a set of simulation parameters, which are consistent with the experimental results. This result confirms that the established model can retain the key characteristics of the DM node. In the following, we then use it to further build the reservoir model and explore the relationship between the three node features and system performance. Through the analysis of both experimental and simulation results, we can then clarify the relationship between the DM node features and DM-RC system performance. Figure 3d,e shows the experimental and simulation results of system performance as a function of threshold and slope, respectively, where the value of window remains constant (W = 0.15). The results shown in Fig. 3d,e reveal the effect of node nonlinearity on the performance of the reservoir. From both experimental and simulation results, we can find that the system always achieves the optimal performance under certain threshold values. The experimental results (Fig. 3d) show that the system performs the best when the threshold is around 0.1. The simulation results (Fig. 3e) show a similar pattern, where the optimal performance region is around the threshold of 0.25. It is, hence, noted that there is a slight deviation between the optimal threshold values in the experimental and simulation results, which could be mainly attributed to the simplified DM node model that has only three parameters. For example, the output curve near the threshold changes gradually and also the I-O curve beyond the threshold is not completely linear, which are not captured in the simplified DM node model. A detailed analysis on the dependence of system performance on the node features T and S is presented in Supplementary Figs. 4 and 5. In general, the nonlinearity of the DM node (that is a hard sigmoid-like function) moves the feature points outside a high-dimensional cube to its edges or vertices, whereas the other points inside the cube remain stationary. To make them linearly separable, points in different classes would need to be moved to different vertices of the cube. By adjusting the node features T and S, the feature points before nonlinear transformation can be panned and zoomed in the high-dimensional space. Compared with S, the value of T directly controls the position of the feature points before nonlinear transformation and determines whether they can be linearly separated after nonlinear transformation. From the above analysis, we can see that threshold T is a key feature of the DM node that directly determines the DM-RC system performance.
Furthermore, the experimental and simulation results of system performance as a function of window size are shown in Fig. 3f,g, respectively, where the values of T and S remain constant. The results reflect the impact of the dynamic characteristics of DM node on the reservoir performance. Both results show that the system could achieve the optimal performance when the window size W, which-to some extent-represents the node memory capacity for the input signal, is neither too large nor too small. This can be qualitatively explained as follows: too weak node memory makes the reservoir unable to retain the temporal characteristics of the input signal, resulting in poor system performance; however, too strong memory could easily saturate the node state and make the reservoir lose the ability to process subsequent signals, which can also lead to poor system performance 16,33 . From a more intuitive point of view, a proper node memory would increase the distance between the points from different classes in the high-dimensional feature space, thereby improving the classification performance of the reservoir. A more detailed analysis of the dependence of system performance on node feature W is presented in Supplementary Fig. 6. All these results suggest that the DM-RC system performance can be optimized by carefully tuning the hyperparameters and node features.

Arrhythmia detection
To evaluate the performance of the DM-RC system on analogue signal processing, a typical benchmark temporal task of arrhythmic heartbeat detection is carried out using the MIT-BIH heart arrhythmia database 40 , which contains 30 min electrocardiogram recordings from 48 subjects. The training and testing processes for the DM-RC system are shown in Fig. 4a. First, the original electrocardiogram waveform is re-sampled at a frequency of 72 Hz and split into single heartbeats of 700 ms (that is, 50 time steps). Each heartbeat, labelled as healthy or arrhythmic according to the health status, is normalized so that its amplitude is in the interval [−1, 1] (Fig. 4b). In total, 10,000 different heartbeats are used as the dataset for this task, out of which 5,000 heartbeats are healthy and the remaining 5,000 are arrhythmic. Then, those samples in the dataset are divided into two groups: 1,000 randomly selected samples for training and the remaining 9,000 samples for testing. The input signal is generated by a random combination of single heartbeats in the dataset, and the target signal Y target is generated according to the corresponding labels. Subsequently, the input signal is fed into the DM reservoir module, where a one-dimensional temporal signal is applied to 24 DM nodes in parallel after passing through different masks with a sequence length of 5 (Fig. 4c).
For the training process, the outputs of 24 DM nodes are sampled as the reservoir states (Fig. 4d). Since the mask sequence length is 5, each DM node can produce five reservoir states in one time step; therefore, the number of final reservoir states reaches 120 per time step.
Article https://doi.org/10.1038/s41928-022-00838-3 To improve the system robustness to noise (which is mainly derived from the stochastic nature of NVM devices), here a noise-aware linear regression method is used to train the output weight W out . We first combine the reservoir states at all the time steps to generate the state matrix X. Then, a random matrix N tr with uniformly distributed values between -σ and σ is added to X, where σ is defined as the noise level for training. Subsequently, the weight matrix W out can be calculated by W out = Y target (X + N tr ) T ((X + N tr )(X + N tr ) T ) † , where symbol ' †' represents the Moore-Penrose pseudo-inverse 33,41 . To verify the robustness of W out , a simulated test process is carried out, where another random matrix N te with uniformly distributed values between −0.04 and 0.04 is used to simulate the device noise. The noise level of 0.04 is estimated from the experimental data of NVM devices (Methods). In such a simulated test process, the system output Y can be calculated by Y = (W out + max(|W out |)N te )X, where the term max(|W out |)N te is used to quantify the noise of W out . A more detailed description of the noise-aware training method can be found in Supplementary Fig. 8. The dependence of training error and simulated test error on different σ values during the training process is shown in Fig. 4e. Evidently, as the noise level σ increases, the training error increases slowly whereas the simulated test error (both mean and variance) decreases rapidly. The result shows that the robustness of W out to noise can be improved by adding a certain noise perturbation during the training process. It is worth noting that as σ further increases, the training error continues to increase, which would eventually lead to an increase in the test error as well. Thus, in our experiment, the optimal σ value is set to be 0.06 to get the ideal W out value. The last step of the training process is to map the calculated W out to the device conductance of the NVM array. Here we use the differential conductance of two memristors to represent one element in W out , where the conductance of each device is programmed between 0 and 33 μS. A closed-loop programming scheme is applied here to write the device conductance to the target value (Methods).   The result of weight mapping is shown in Fig. 4f, which displays the distributions (120 × 2) of target conductance weights, mapped conductance weights and weight-mapping errors compared with the target values. Evidently, the trained weights are well mapped with relatively small errors owing to the excellent analogue switching characteristics of our memristor devices 42 .
For the testing process, the outputs of the 24 DM nodes are adjusted to a voltage range of 0 to 0.2 V through amplifiers and the corresponding negative voltage signals (~0 to −0.2 V) are then generated by the analogue inverters. The differential voltage signals are directly fed into the NVM array, on which the devices have been programmed to the appropriate conductance. Subsequently, the output currents of each SL in the NVM array are collected by an integrator. The integrator is reset at the beginning of each time step, and its output voltage is then sampled once at the end of each time step. The experimental results of the testing process are shown in Fig. 4g, where the output of the integrator is normalized to [0, 1]. Evidently, the output signal can effectively fit the target signal and generates a peak when an arrhythmic heartbeat occurs. Finally, a reference value K ref (Fig. 4g, dashed line) is used to obtain the overall classification accuracy of the DM-RC system. The optimal system performance is achieved when the hyperparameters are set to be A i = 1.0, B i = 2.4 V, A o = 2.2 and δ = 0.2 ms. The average classification accuracy obtained in our experiments is 96.6%, which is close to the software baseline of 98.2% achieved by a fully digital RC system (Fig. 4h). This result demonstrates the feasibility of using the DM-RC system to effectively process temporal signals with high accuracy.

Dynamic gesture recognition
Beyond arrhythmia detection, a more complex spatiotemporal task was implemented to verify the analogue computing ability of the DM-RC system. Here we demonstrate the real-time processing of spatiotemporal signals from a three-axis acceleration sensor to classify different hand gestures. The schematic of fully analogue computing with a DM-RC system for dynamic gesture recognition is shown in Fig. 5a. When the hand performs a certain gesture, the three-dimensional analogue signals from the acceleration sensor are directly fed into the DM-based reservoir module, out of which the generated reservoir states are calculated in the NVM array-based readout module in a fully analogue fashion to yield the final recognition results. In our experiment, the sensor signals of four classes of dynamic gestures (G1, equilateral triangle; G2, capital letter N; G3, inverted triangle; G4, capital letter Z) (Fig. 5a) are recorded at a sample frequency of 10 Hz. Typical signals with a normalized amplitude (−1 to 1) are displayed in Fig. 5b. Similar to the previous arrhythmia detection task, the recorded sensor signals are split into single gestures of 3 s (that is, 30 time steps) to generate the dataset. The gestures in the dataset are then divided into two groups: 600 randomly selected samples for training and the remaining 300 samples for testing. Figure 5c shows the implemented network structure by reconfiguring   the DM-RC system, where 24 DM nodes are equally divided into three groups to process the input signals from three channels and four NVM arrays are used to classify four different types of gesture. The training and testing processes are also similar to those in the previous task. The outputs of the 24 DM nodes are sampled as the reservoir states (Fig. 5d).
Here the mask sequence length is set to be 8; therefore, each DM node can produce eight reservoir states in one time step, and the final number of reservoir states reaches 192 per time step. The result of weight  To evaluate the advantages of our system in terms of energy efficiency, we compare both classification accuracy and energy consumption of our DM-RC system with a fully digital RC system (Fig. 5g). Evidently, compared with the digital RC system, our DM-RC system has an average accuracy loss of only 1.1%, but saves more than 99.9% of power consumption. The details of power estimation are described in Supplementary Table 1. Such low power consumption can be attributed to two key factors in our system: one is the fully analogue signal transmission and processing realized in our system such that the a large power consumption overhead caused by ADCs in conventional RC systems can be eliminated; the other one is that the large recurrent neural network in traditional reservoirs is replaced by a small number of parallel DMs, which largely reduces not only the hardware cost but also the power consumption.

Conclusions
We have reported a fully analogue RC system that is fabricated using a set of parallel DMs for reservoir and NVM arrays for readout. The relationship between the electrical characteristics of DM nodes and DM-RC system performance has been studied by analysing the I-O curves and system performance under different hyperparameters. We find two key features-threshold and window-that have a notable impact on the quality of DM-based reservoirs, and the optimal system performance occurs at a specific threshold when the window is within a suitable range. By adjusting the hyperparameters, the DM-RC system can perform different spatiotemporal signal processing tasks in real time with low power consumption.
Compared with previous works 21,32,34,35,43,44 , the power consumption of our DM-RC system is lower (Supplementary Table 2 provides a detailed comparison) due to the fully analogue signal transmission chain and the use of parallel DMs. To further reduce the power consumption and computing latency in the system, the entire DM-RC system could be miniaturized and monolithically integrated on chip, as the technologies used in our work are complementary metal-oxide-semiconductor compatible and scalable. A more sophisticated RC system could also be constructed using our DM-RC architecture as a basic unit-this would further enhance the system performance due to richer reservoir states and better memory capacity. Our work offers a promising platform for edge computing with low power consumption, and could be of use in applications such as smart wearable devices and microrobots.

Device fabrication
The DM devices were fabricated with a Pd/TiO x /Ti crossbar structure on a Si substrate with 200 nm thermally grown SiO 2 on it, and the device size was 5 μm × 5 μm. First, 50-nm-thick Pd was electron-beam evaporated and patterned as the bottom electrode. Then, the functional 80-nm-thick TiO x was deposited by reactive sputtering in the mixed atmosphere of Ar and O 2 . The sputtering power was 400 W, the process pressure was 10 mtorr and the O 2 partial pressure was 17%. Finally, about 110-nm-thick Ti was electron-beam evaporated and patterned as the top electrode. The NVM array was fabricated using a standard 0.13 μm complementary metal-oxide-semiconductor process. The NVM array had a 1T1R cell structure in which the NVM material stack was TiN/TaO x /HfAlO y /TiN. The device size was 0.5 μm × 0.5 μm. The 8-nm-thick HfAlO y resistive switching layer was deposited by atomic layer deposition. The 45-nm-thick TaO x thermal enhanced layer was deposited by sputtering. The top and bottom electrodes were both 30-nm-thick TiN deposited by sputtering.

Mask process
The mask process in our system is a time-multiplexing method that can increase the number of reservoir states by dividing the original time step into N small steps. Usually, the mask is a binary sequence of length N composed of −1 and 1, which is repeatedly multiplied to the input signal at each time step. The masked input signal is then applied to the DM node to generate the reservoir states. To obtain different reservoir states from each DM node, the mask sequences used here must be different from each other. Since the mask sequences are binary, the number of different mask sequences can be calculated as 2 N . In this work, we randomly select 24 of these mask sequences for the DM-RC system. In this way, the mask process using different mask sequences can enhance the richness of the reservoir state and system performance.

Measurement setup
The entire test system mainly consists of three parts, namely, a host personal computer (PC), an STM32 (MCU) development board and a user-customized PCB as the test board. The host PC operates as the upper computer for data loading, command sending and user interface coded in Python. The STM32 operates as the lower computer to communicate with the upper computer and generate the specific control signals that are transmitted to the test board through the general-purpose input/output ports. In addition, the STM32 board is also responsible for generating and reading voltage signals through the integrated 12-bit digital-to-analogue converters and ADCs, respectively. It needs to be pointed out here that in addition to collecting the output of the integrators, the ADCs are only used in the steps of sampling states and mapping weights during the training process. The test board integrates four NVM chips, 24 DMs (connected by a probe card) and several commercial chips such as low-dropout regulators, MUXs, amplifiers and digital potentiometers. Low-dropout regulators are used to provide stable positive and negative d.c. voltage sources for other chips in the test board. The MUXs are used to control the signal flow in the system and realize the mask process. All the MUXs used in our system are analogue MUXs with a digital control port. More specifically, we used CD4053 chips (Texas Instruments) as the single-pole-double-throw analogue switches to realize the mask process, where the mask signal is directly fed into the digital pin of the CD4053 MUX. Amplifiers are used to adjust the voltage signals and realize TIAs and integrators. Digital potentiometers are used to adjust the gain of the amplifiers and adjust the bias voltage (B i ).

Simplified DM node model
The simplified DM node model is defined as where V i (k), V o (k) and V t (k) are the input, output and threshold voltages at the kth time step, respectively, whereas S and f are the slope of the I-O curve and nonlinear function, respectively. Here f is defined as Article https://doi.org/10.1038/s41928-022-00838-3 In equation (1), V t changes dynamically with input signal V i . We consider the simplest case where the dynamic process is linear and time invariant. In other words, V t is the result of a linear filtering on V i . The differential function of such a linear filtering process is defined as where α is the filter weight that determines the timescale of the system memory and T is the initial value of V t that also corresponds to the threshold extracted from the I-O curve. The other feature extracted from the I-O curve is the window (W), which can be calculated as follows: W = max(V t ) − min(V t ), where the sequence of V t is obtained using equation (3) with a dual-sweep sequence of V i . The simulation of such a simplified DM node model is performed in Python3.8 with the Numpy package by solving equations (1)-(3). The simulation results suggest that other devices can also be used to implement the DM node for RC as long as the behaviour of such a DM node can be modelled by the above three parameters, which may need to be optimized for specific devices based on their device characteristics.

Weight-mapping process
We use a closed-loop programming scheme to map the target weight (W target ) onto the device conductance of the NVM array 9 . The basic loop of such a programming scheme is run in the PC and coded in Python.
In the PC program, there are four core functions: f read , f set , f reset and f comp .
First, the function f read is executed and a command is sent from the PC to the MCU to read the conductance state (W real ) of the NVM, where the read voltage is set to be 0.2 V. Then, the function f comp is executed, where the difference between W real and W target is calculated as ΔW = W real − W target and the maximum error weight ΔW max is used to compare with ΔW to determine the next step of programming. (1) If |ΔW| is smaller than ΔW max , then W target is considered to be successfully mapped and the programming step is finished. (2) If ΔW is smaller than -ΔW max , the function f set is executed and a SET operation is performed, where two voltage pulses with amplitudes of 5 V and V wl (varying from 0.5 to 2.8 V) are applied to the BL and word line (WL) of the 1T1R cell, respectively.
(3) If ΔW is larger than ΔW max , the function f reset is executed and a RESET operation is performed, where two voltage pulses with amplitudes of V sl (varying from 1.5 to 3.0 V) and 5.0 V are applied to the SL and WL of the 1T1R cell, respectively. Steps (2) and (3) above are repeated until the target weight W target is successfully mapped. As evident from the above weight-mapping process, the smaller the ΔW max value is, the closer the value of W real is to W target . In practice, the minimum value of ΔW max is limited by the intrinsic noise of the NVM devices, below which the mapping procedure would be difficult to converge. In our experiment, the minimum value of ΔW max is found to be 0.04 × max(W target ). Therefore, we use 0.04 as the noise level to take into consideration both mapping deviation and noise of NVM devices.