A Spiking Visual Neuron for Depth Perceptual Systems

,


Introduction
Whereas information in a conventional sensing system is generally processed in sequence and encoded by amplitude, biological systems operate parallelly and use the "spike" as its language 1 .As a result, the biological systems show unparalleled energy efficiency compared to electronic counterparts.Hence, there is a trend to develop systems implicated in biologically plausible or bioinspired methodologies, which has boomed many fields including neuromorphic engineering 2,3 , neurorobotics 4 , and bioinspired sensing [5][6][7] .Spiking neural networks (SNNs) with higher biological fidelity are thought to be a promising competitor to classical artificial neural networks (ANNs) for further improvement of computing efficiency 8,9 .Therefore, many efforts have been made recently to build a SNN-based artificial visual system 10 .More importantly, visual information constitutes the largest portion of what we perceive in our interaction with the environment.There is ~1 Gigabit of visual information transmitted through the retina per second, and the visual system can organize, identify, and interpret such a high volume of information within an energy budget of a few watts 11 .Specifically, the biological visual systems rely on spiking bipolar cells, as light waves from the object enter the eye through the cornea and reach the retina where the photoreceptors respond to light and influence the membrane potential of the bipolar cells.Then the potential spikes are generated, conveying the visual information through the retinofugal pathway to the central neural systems, which can be well interpreted by the so-called leaky integrate-and-fire (LIF) models 10,12 .
Hence a spiking visual neuron (sVN) with high biological plausibility is crucial for building an SNN-based artificial visual system.The spiking component can be achieved by classical oscillator circuits such as ring oscillators with multiple transistors.A carbon nanotube (CNT)-based three-stage ring oscillator has been exploited in an artificial tactile neuron with an active/inactive energy consumption of several microwatts 13,14 .
Recent achievements on threshold switching (TS) memory have been demonstrated, featuring good scalability and thus integration density.In addition, they can well mimic the oscillation behaviors of neurons based on the intrinsic ionic dynamics.For example, a hafnium-based TS device that takes the advantage of Ag filamentary rupture/formation successfully achieved self-oscillation behaviors with a low operation voltage of <0.6 V and power consumption of ~1.8 μW/spike 15 .Although there is a wide spectrum of materials including transition metal oxide (e.g., HfO2 and TaOX) [16][17][18][19][20] , phase change materials (e.g., Ge2Sb2Te5) 21,22 , insulator-metal-transition (e.g., NbO2 and VO2) [23][24][25] , the oscillation parameters like frequency range and power consumption of current spiking encoder devices are still far from that of their biological counterparts.Furthermore, how to mimic the rich neural behaviors, such as depth perception, and how to leverage such behaviors to benefit neuromorphic visual computing are still lacking.

Results
In biological visual systems, photoreceptor cells convert external optical stimuli into spiking potentials which are eventually interpreted and processed by the visual cortex.
A biological neuron receives input spikes from other neurons through connected synapses and then integrates all the input spikes.A firing spike of membrane potential is generated when the integrated potentials collected from the dendrites exceed the threshold [26][27][28] .The leaky integrate-and-fire (LIF) neuron model could well describe such a spiking generation process.In parallel to the biological visual system, a spiking visual neural network (sVNN) is proposed, as shown in the bottom panel of Fig. 1a.The sVNN consists of sensory transduction of external stimuli to artificial receptors, the subsequent LIF neurons functioning as spiking encoders, and neuromorphic computing for deep processing.
Firstly, a spiking encoder is built based on the magnetron sputtering deposited Ag/TaOX/ITO memristor as shown in Fig. 1b.The top and bottom electrodes of the memristor implement the functions of the dendrites and axons of a neuron.The active layer acts as the soma of neurons which integrates signals from dendrites with the assistance of a parallel capacitor, representing the plasma membrane.The SEM image of the cross-section of the Ag/TaOX/ITO memristor confirms a vertically layered structure, shown in Fig. 1b along with a structural schematic in the inset.In the 100 nmthick oxide layer, the existence of Ta/O is verified by XPS in supplementary materials.
This diffusive memristor has an electrochemical metallization (EM) based switching property 29,30 .The Ag ions move from the top Ag electrode to the bottom electrode by electro-chemical-thermal driving force under a bias voltage.The Ag atoms form localized conductive filaments at a high electric field with the displacement and drift of Ag atoms.When the voltage reaches the threshold voltage (VTH) of 0.2~0.4V and forms a conductive filament, the memristor turns on and switches sharply from a high resistance state (HRS) to a low resistance state (LRS), as shown in Fig. 1c.The voltage of the conductive filament formation will increase with thickening the oxide layer.
When the voltage drops below the hold voltage (VHold) of 0.1 V, the memristor will return to OFF-state (HRS).A compliance current (ICC) of 1 μA is necessary during the electroforming process to activate the volatile threshold switching behavior.The threshold switch (TS) characteristics exhibit no obvious degradation during 1,000 consecutive cycles.These characteristics prove the memristor is reliable and durable.
The current abruptly jumps from 2×10 -10 to 1×10 -6 A within 5 mV during the DC voltage sweep, indicating a steep switching slope of 1.37 mV/dec.The ON-OFF sweeping speed of the memristor is detected utilizing pulse measurement.The memristive switching can be triggered by a voltage pulse with an amplitude of 1 V and a width of 2 μs, and the turn-on delay time is only 10 ns, as shown in Fig. 1d.In contrast, the memristor quickly returns to the OFF-state with a 40ns delay under a read voltage of 0.1 V.
A biological neuron generally fires with a rate of around 1-200 spike/s and power consumption of ~10 -10 W/spike [31][32][33] .In contrast, an electronic spiking encoder was developed to fire at a much higher rate (e. g. up to 250 MHz 34 ) to achieve fast encoding and low energy consumption per spike (the lower pulse width the lower energy per spike), which are required for hardware acceleration of neuromorphic computing.However, this may not be energy efficient compared to the biological counterpart, due to the much higher power consumption and firing rate.Furthermore, the frequency and power consumption of these devices are very different from those of the biological neuron, making them difficult to be deployed in bio-plausible systems and challenging to interface with biological neural networks.Fig. 1e shows a comparison of benchmarks between several kinds of LIF spiking encoder and biological neurons 15,19,20,24,25,[35][36][37][38] .
Our work features both biological plausibility with a spiking rate of 1-200 spikes/s and low power consumption of 0.25-0.5 W/spike.
The circuit for LIF spiking encoder is shown in Fig. S2.The EM threshold switch memristor is connected in series with a resistor (R0=200 kΩ), representing an artificial dendrite that integrates input spikes from the presynaptic neurons, and in parallel with a capacitor (C=10 nF) to work as an artificial axon for spike generation.The input spikes were applied to this component through a resistor (RS=10 MΩ), and the output spikes were measured on R0.The operational mechanism is divided into the integration process and the firing process, the same as that of a biological neuron.During the integration process, input pulses are applied to the neural circuit to charge the capacitor until the TSM voltage reaches VTH, and the memristor switches from the HRS to LRS with the formation of Ag filaments.During the neuron firing, the charge accumulated on the capacitor is released through the memristor, and a spike voltage on R0 is detected.
When the TSM voltage decreases below VHold, the device reinstates to the HRS due to the rupture of Ag filament.The EM threshold switch operates through the formation and dissolution of the metallic filament in the interlayer.Fig. 2a illustrates the leaky integrate-and-fire behavior of TSM neurons by applying a train of voltage spikes.When the series of voltage spikes (1 kHz, 500 μs, 5V) is fed to the node "IN", the capacitor will accumulate the charges and raise the voltage potential on the capacitor during the integration process.The TSM remains in the HRS (RH=1 GΩ) in the charging process, implying a negligible voltage on R0.When the voltage on the capacitor goes beyond the memristor reaches the VTH, the memristor switches from HRS to LRS.In contrast, the RC constant (TC=RS×C) in the charging loop is much longer than that (TD=(RL+R0) ×C) in the discharging loop as the device is in the LRS (RL ∼3 kΩ).In this case, the capacitor will discharge, and the neuron will fire an output voltage spike.Along with the discharging process, the voltage on the memristor and R0 gradually decreases.When the voltage on the memristor falls below VHold, the TSM will spontaneously relax back to the HRS again and the neural circuit will start the next integration cycle.Under fixed voltage pulses, the neuron spikes at a constant frequency.Larger voltage pulses would reduce the charging time to reach the threshold potential, thereby increasing the frequency of the firing spike.In Fig. 2b, consecutive input pulses with different amplitudes ranging from 2V~5 V are applied to the TSM neuron.The output spiking frequency increases with the increase of input amplitude, as shown in Fig. 2c.For example, the firing rate is as low as 1 spike/s with 2.5 V input pulses.When the input amplitude reaches 5 V, the highest firing rate is 10 spikes/s.This is mainly due to the higher charging speed of a capacitor under higher input voltage amplitude.Fig. 2c shows the statistical relation between voltage and spike-frequency of the neuron.
During the capacitor charging towards the threshold voltage, a smaller voltage requires more charging cycles and leads to a smaller firing rate.The strength-modulation of spiking frequency in biological neurons are realized by the modulating the amplitude of input voltage spikes 31 .Besides, an increasing RC time constant means a longer integration time.Fig. 2d demonstrates that a larger RC constant decreases the output spiking frequency.The firing rates decreases from 200 spikes/s to 1 spikes/s as the RC constant raises from 10 MΩ•nF to 1000 MΩ•nF.Through engineering the circuit parameters, our system can reduce energy consumption and improve efficiency.
The human brain receives perceptual information through sensory organs for cognitive processing 39 .Cone cells in human eyes detect light signals and then integrate the signals before propagating them to the visual cortex of the brain.The ciliary muscle of the eyes is thus adjusted to adapt either near vision or distance vision, as shown in Fig. 3a.For near vision, the ciliary muscle shrinks outward and the crystalline lens becomes a flatter surface.For distance vision, the crystalline lens is squeezed by the muscle to become rounder and rounder.As a result, both the near and distance vision can be focused clearly on the retinal plane.To simulate the function of the human visual system, a spiking visual neuron (sVN) was built based on our LIF spiking encoder as shown in Fig. 3b.A photoresistor (Light: RP=10 kΩ, Dark: RP=100 MΩ) and a resistor (R1=1 MΩ) are connected in series, and the voltage on the resistor is used as the input of the electronic circuit.While the light intensity increases from 0 to 7.2 mW/cm 2 , modulating RP from 100 MΩ to 10 kΩ, the voltage on R1 rises from 1% to 99% of VDD.
The pulse voltage on R1 gradually increases and the capacitor starts being charged when the photoresistor is illuminated by pulses.This charging cycle is the same as that in the electric neuron circuit.The voltage on the capacitor rises during the integration process until the memristor reaches the VTH and switches from HRS to LRS.As a result, the capacitor will discharge, and an output voltage spike will be produced in the firing process, as shown in Fig. 3c.After the discharging process, another charging cycle starts subsequently.The light intensity is inversely proportional to the square of the distance between the light source and photoresistor as well as the resistance of the photoresistor.
As the distance increases, the light intensity on the photoresistor decreases and RP increases in turn.Consequently, the voltage on R1 decreases, and the firing rate in Fig. 3d decreases as well with a sigmoid shape.When biological neurons are overused for a long time, fatigue would occur, resulting in performance degradation with bad recognition accuracy.This is because the neurotransmitter, which triggers the membrane potential of the cone cell, would not be released until a segment of light pulses is continuously superimposed 40,41 .Our sVN can mimic such behavior upon continuous illumination.After 10 5 cycle of light pulses (Intensity=6 mW/cm 2 , duration=5 ms, frequency=100 Hz), fatigue attenuation occurs.The detailed simulation process is illustrated in Fig. S3.Although the distance-firing rate-dependent curve shows a similar sigmoid trend, it becomes flatter than the initial state, indicating the sensitivity decreases in the fatigue state.For example, the firing rate in response to a target with distance of 5 cm decreases from 120Hz to 90 Hz in the initial and fatigue states, respectively.
Our visual system can focus on distant objects through the adjustment of the ciliary muscle, which enables us to acquire depth perception for precise recognition and learning.Such neuromuscular adjustment is processed and computed based on the slight difference of spatial information (i.e., parallax) perceived by each eye.Figure 4a shows such a process.The firing rates of the triggered spikes (f1, f2) from the two optical neuronal pathways are dependent on the distance (L1, L2) between an object to the left and right eyes, respectively.The interpupillary distance and the distance between the eye plane and object plane are d and z, respectively.The visual angle θ is correlated to the relative orientation of the target sight.The visual system couldn't achieve a clear image of the object due to the lack of depth information, leading to poor recognition.
By comparing the firing rates between the two eyes, the ciliary muscle can be commanded to produce a clear vision.Two sVNs were used to obtain the firing rate differences for depth perception.The d and z are set to 20 and 100 mm (z=5d), respectively.The frequencies of the two sVNs are 110 spikes/s and 40 spikes/s when L1 and L2 are 70 mm and 150 mm, respectively.Furthermore, the polar coordinate in Fig. 4c shows the relationship between angle θ and the firing rate differences between the left and right eyes.When the angle θ is arctan(2z/d) =84.3, the firing rate difference is zero.The red and blue lines show the initial state and fatigue state, respectively.The difference curve is axially symmetric with respect to the angle of 84.3 in polar coordinates, which corresponds to the symmetrical equivalent relationship between the left and the right eyes.When the value of θ gradually increases from 40 to 140 (or 220 o to 320 o ), the firing rate difference decreases to the minimum at 84.3, and then gradually increases.
As the biological visual system can take advantage of the firing rate difference from the binocular vision, this can be exploited to estimate the distance of the object and focus on it to show a sharp image (Fig. S4).Without this process, blurred images will be formed in the eyes.This is because, when the object refracted by the crystalline lens is exactly at the plane of the retina, it appears as the clearest focused image.When the lens becomes a little thicker, the image projected on the retina is slightly out of focus, making it less clear after refraction.If the lens continues to thicken, severe defocusing will occur, and the image on the retina becomes harder to recognize.Nine groups of face images are chosen from the Yale Face Database and down-sized to 784 pixels in 28×28 size, with each group comprising 585 images.Fig. 4d shows examples of label 1-3 with different clarities.The image is in grey scale where each pixel value ranges from 0 to 255 with a larger value corresponding to a darker pixel.At the first stage, the visual system just receives the distance vision without proper focusing.The face images formed in the retina would be blurry as shown in the first column.At the second stage, the visual system adjusts the crystalline lens to obtain a clearer vision as shown in the second column.At last, the eye can focus on the object by comparing the firing rate differences as shown in the third column.The image datasets comprising of the Yale faces with three clarities (r=5, 1, 0, respectively), were built for training the network.
The value of r represents the ratio value between the actual focal plane of the lens and the retinal plane, which is used for simulating the defocused images.A three-layer perceptron, with 784, 28, and 9 nodes in each layer, is adopted by the neural network for the recognition task using the back-propagation algorithm.Each synaptic connection in the network is simulated by extracting the parameter from an IGZO-based floating-gate synaptic transistors with 5-bits precision (Fig. S7) 42

Discussion
In conclusion, a memristor-based spiking visual neuron was built with bio-realistic properties capable of encoding light signals into spike trains.The firing rate of such a spiking visual neuron is distance-dependent and shifts with the stimulation time, analogous to the depth perception and eye fatigue in biological visual systems.An artificial binocular visual system with two sVNs can generate firing rate difference due to the distance difference from the target point to the two sVNs, which could be used to infer the depth of the vision.Such a binocular visual system can take advantage of the depth perception in achieving high accuracy pattern recognition.Face visions with three defocusing levels corresponding to different depth perception stages were simulated.Assisted by such an artificial binocular visual system, the accuracy of a hardware neural network's image recognition increases from 68% to 90% after 20 training epochs.Such a spiking visual neuron is a promising candidate for hardware visual systems with high biological plausibility, energy efficiency and perceptual ability, which might open up a new frontier for neuromorphic engineering and bionic robotics.

Materials and Methods
The threshold switching memristor was fabricated on the silicon substrate.Firstly, ITO (Indium Tin Oxide) bottom electrode was deposited on silicon with a thickness of 300 nm by radio frequency magnetron sputtering using an ITO target (90 wt% In2O3 and 10 wt% SnO2 mass ratio) at room temperature.Secondly, the TaOX buffer layer was deposited by magnetron sputtering Ta2O5 target (100 wt% Ta2O5) utilizing an Ar: O2 ratio of 30:2 and at room temperature with a radio-frequency power of 100 W.
Finally, Ag top electrodes (80 um×80 um) were deposited on the TaOX buffer layer by thermal evaporation with a nickel shadow mask.In the electrical neuron function measurement, a Tektronix AFG3102 pulse generator acts as the input source.In the optical measurement, the wavelength of laser is 532 nm.A Tektronix DPO3032 oscilloscope was used to measure the input/output voltages.The electrical characteristics of the TSM device were tested by Fs-Pro PX500.The exact stoichiometric ratios of the TaOx buffer layer were confirmed by X-ray photoelectron spectroscopy (PHI5000 Versa Probe).
Ag/TaOX/ITO memristor-based spiking visual neuron with high biological plausibility.Such a memristor shows excellent TS characteristics including steep slope (1.37 mV/dec) and fast turn-on/off delay time (10/40 ns).The TSbased spiking encoder exhibits four critical features of action-potential-based computing: the all-or-nothing spiking of an action potential, threshold-driven spiking, a refractory period, and strength-modulated frequency response.The TS-based visual neuron contains three components: the artificial photoreceptor, the TS-based spiking encoder, and the neural network, which mimics biological behaviors like a sigmoid response to increasing light intensity and eye fatigue.It shows a frequency range of 1-200 Hz and the sub-microwatt power consumption, representing a step forward to the biological comparable performance.A binocular visual system built by such an artificial visual neuron has been verified by simulation, showing a recognition improvement by refocusing on sights at different distances.Our design presents a fundamental building block for energy-efficient and biologically plausible artificial visual systems.
. The neural network is trained to distinguish one person's face from others.The recognition consists of two phases: training and testing.The learning rate is 5% in the training procedure.The training phase includes two sub-procedures: forward-pass and weight update.During the training process, the weights are optimized by minimizing the difference between the predicted classification and the ground truth.During the testing process, the network was fed up with images of different clarity to identify the category of corresponding faces, and the recognition accuracy is thus obtained.Fig.4e shows the relationship between the recognition accuracy and training epoch in three different focusing states.As the training epoch increases, the recognition accuracy is improved.For clearly focused images, the recognition accuracy is able to reach 90% after 20 training epochs.In contrast, for the severely defocused control group, the recognition accuracy can only reach 68% under the same situation.This simulation successfully demonstrates that depth perception has a great positive impact on the recognition accuracy of an artificial visual neural network.The function of binocular positioning based on the spiking visual neuron proposed in this work has a great potential for future neuromorphic computing, image acquisition, and recognition.

Fig. 1 A
Fig.1 A spiking visual neural network based on Ag/TaOX/ITO memristor in comparison with a

Fig. 2
Fig.2 The integrate and fire dynamics of the TS memristor-based spiking encoder.(a) The voltage

Fig. 3 A
Fig.3 A spiking visual neuron based on the TS memristor.(A) Schematic illustration of the eye

Fig. 4
Fig.4 Emulation of binocular vision for high accuracy recognition based on the TS memristor-