Figure 1a shows a schematic illustration of the image encoding and classification processes in the mammalian retinal and brain system. The outputs of the rod and cone photoreceptors are decomposed into approximately 12 parallel information streams, which are then connected to the retinal ganglion cells. Bipolar and amacrine cell activity are combined in a ganglion cell to create diverse encodings of features extracted from the visual world, such as edges, direction, and color; the retina then transmits these pre-processed data to the brain11–13. By reducing redundant information, the retina can effectively convey image data to the central processor with a minimal transport delay. In the visual cortex, higher-level visual cognitive processes are conducted using encoded images from the retina11–13,
In this study, we designed and demonstrated an in-sensor neuromorphic machine vision system with functionalities of image memorization and processing, by mimicking the above-mentioned neural circuit and visual classification system in the human eye, as shown in Fig. 1b. The image sensor consists of a crossbar array of photodetectors and resistive memory cells, which correspond to photoreceptors and ganglion cells in the retina, respectively. In the retina, the ganglion cells operate as a pre-computing processor unit, whereas the ReRAM in our system serves both as a memory and computation unit, depending on the polarity and magnitude of the applied bias to each pixel. When a reverse bias with respect to the photodiode is applied to the 1P-1R pixels, the sensor operates in a memorization mode, in which incident light stimuli are converted to electrical signals in the photodetectors, and the photocurrents are subsequently stored in the memory cells by changing the conductance of the memory. Under a forward-biased voltage with respect to the photodiode (lower than the threshold voltage for the erase operation), the sensor operates in the computing mode to process the stored image at the pixels via analog in-memory computing for vector-matrix multiplication. Because vector-matrix multiplication is a key operation in the ANN algorithm, we utilized the 1P-1R crossbar array to execute in-sensor image encoding, which extracts critical features from the original image to alleviate the data transfer burden at the sensor and processor interface, paralleling biological processes in the human retina. Finally, image classification was conducted in the post-processing unit with the encoded images delivered through an ANN. While the encoded images possess compressed information compared to the original images, the ANN successfully classifies the object with less computational load.
Figure 1c shows block diagrams of the image-processing sequence in conventional in-sensor processing systems and our neuromorphic in-pixel computing system.6,21,23,24 Most previously reported conventional in-sensor processing systems only perform image memorization and pre-processing (low-level processing) within the sensors, such as image contrast enhancement and noise reduction. Meanwhile, the size of the pre-processed images from the sensors (N\(\times\)N) was still the same as the size of the original images (N\(\times\)N), and high-level image processing took place in the post-processor. Therefore, conventional systems barely reduce the data traffic load at the sensor/processor interface, as well as the computational burden in the post-processor. In contrast, the fabricated neuromorphic in-pixel computing system in this work memorizes images in each pixel and subsequently conducts image encoding by analog in-memory multiply–accumulate operation (1st high-level processing) by combining the sensing and computing functions. Therefore, the size of the output data from the sensor (N) is effectively reduced to the square root of the N\(\times\)N original image by the in-sensor image encoding process, thereby minimizing the transportation of redundant data and reducing the computation load in the post-processor. Therefore, this advance in-sensor computing architecture can significantly reduce the inefficient energy usage and high data latency of smart imaging systems.
First, a 16\(\times\)16 1P-1R in-pixel computing chip was fabricated using p-i-n InGaAs photodiodes and HfO2-based ReRAM (see Methods). The optical images of the fabricated chip and optical microscope images of the 1P-1R crossbar array are shown in Fig. 1d and Figs. 1e,f, respectively. Each pixel consisted of an InGaAs photodiode and ReRAM. The row lines share the top electrodes of the ReRAMs (Ta/Pt electrodes) and the column lines share the top electrode of the InGaAs photodiodes (p + electrodes). Prior to operating the 1P-1R array, we studied the optical and electrical characteristics of a single 1P-1R pixel. Figure 2a shows a schematic illustration of a single 1P-1R pixel, and its equivalent circuit diagram is shown in Fig. 2b, where VP and VR are the applied voltages of the photodiode and ReRAM, respectively, and Vtotal=VP + VR. The 1P-1R pixel is composed of an InGaAs p-i-n photodiode and HfO2-based ReRAM, which converts incoming optical signals to electrical signals and memorizes the optical information as its resistance. Additionally, the ReRAM is utilized as an in-memory computation unit when it operates in the computation mode. The three primary operations of the 1P-1R device, depending on the applied bias voltage (VTotal), are depicted in Fig. 2b: i) memorization, ii) computation, and iii) erasing operations. When VTotal>2.5 V under light illumination, the photodiode is reverse-biased such that an incident optical signal generates a photocurrent to modulate the resistance of the ReRAM by forming a conductive filament (memorization operation). Thus, the optical signal can be stored in the form of resistance in a synaptic device. The stored image data in the ReRAM can then be directly used for high-level in-sensor processing (computation operation). When − 1.3 V < VTotal<-0.5 V, the photodiode operates under the ohmic regime with relatively low resistance (< 50 Ω) compared to a resistance range of the ReRAM (> 1·103 Ω); in this way, the 1P-1R circuit can be approximated to a single ReRAM circuit. Thus, the 1P-1R crossbar array can be used for synaptic in-memory computing based on Ohm’s and Kirchhoff’s laws. 6,10,26 Therefore, ReRAM serves as a cross-functional device for both the memory unit and processing unit for high-level in-sensory image processing. When a high negative bias voltage over the RESET threshold voltage (VTotal<-1.3 V) is applied across the 1P-1R device, the memorized data in the ReRAM are erased (Erase operation). These three operations are key functions for realizing neuromorphic in-pixel image processing with a 1P-1R crossbar array.
Figure 2c shows the current-voltage (I-V) characteristics of the fabricated InGaAs photodiode in the single 1P-1R device under light illumination (λ = 532 nm) with various light intensities, where V = VP, and the bottom electrode on the n-InP layer is grounded. The photocurrents generated from the photodiode under reverse bias modulated the resistance states of the connected ReRAM depending on the incident light intensity. The fabricated HfO2-based ReRAM in the single 1P-1R device was also characterized by applying repetitive positive and negative voltage sweeps for the SET and RESET processes, respectively, whereas the bottom electrode (Ti/Pt) of the ReRAM was grounded (Fig. 2d). Initially, to form a conductive filament, a positive d.c. voltage sweep was employed by increasing the voltage from 0 V to 5 V and the voltage was decreased from 5 V to 0 V with 2 µA of compliance current. After the initial channel forming process, repeated SET/RESET operations are performed by applying the positive and negative sweeps of 3 V (SET) and − 2.5 V (RESET), respectively, along the sweep paths indicated in the graph to switch the state of the ReRAM between the high-resistance state (HRS) and low-resistance state (LRS). The I-V curves show stable bipolar switching behavior with abrupt SET and gradual RESET, which is ideal for binary data storage.
The optoelectronic switching behavior of the integrated 1P-1R pixel was then characterized by an I-V measurement under light illumination (λ = 532 nm and P = 67 W/cm2), as shown in Fig. 2e. Voltage is applied to the top electrode of the ReRAM (Ta/Pt contact) while the top electrode of the photodiode (p-InGaAs contact) is grounded, as shown in the equivalent circuit diagram in Fig. 2b. The ranges of the three corresponding operations, which are explained in Fig. 2b, depending on the applied bias voltage, are indicated by colored areas in the graph. Unlike the electrical-field-driven switching process of a single ReRAM device, the conductive filament channel of the ReRAM in the 1P-1R pixel is grown by photogenerated current from the connected photodiode (memorization operation), whereas RESET switching is still performed by the application of an electrical field. The conductive filament was formed by the first positive voltage sweep under light illumination, corresponding to the forming loop shown in Fig. 2e. After the forming process, the ReRAM is switched to the OFF state by applying a negative voltage sweep, where the light illumination has no effect on the erase operation (Fig. S1). Subsequently, a positive voltage sweep was conducted on the 1P-1R device under light illumination, switching the ReRAM to the ON state and memorizing the light information (see memorization loop in Fig. 2d). Under dark conditions, the ReRAM in the 1P-1R device cannot be switched ON via a positive voltage sweep owing to the lack of sufficient driving current to build the conductive filament, as shown in Fig. S2, because the current flow is limited by the reverse-biased dark current of the photodiode. This result clearly demonstrates the capability of the 1P-1R device as binary optoelectronic memory.
Moreover, we demonstrate the functionality of the 1P-1R cell as a multistate optoelectronic memory. The memory effect of the 1P-1R unit was predominantly determined by the characteristics of the ReRAM in the pixel. Hence, the multistate capability of the ReRAM enables the 1P-1R pixel to function as a multistate optoelectronic memory device for in-sensor computing applications. Multiple resistance states in the HfO2-based ReRAM are usually achieved by the application of various compliance currents during SET operation.27,28 Figure S3 presents the I-V curves of the multiple resistance states in the ReRAM by controlling the compliance currents during the SET operations. When a higher compliance current is applied to the ReRAM during the SET operation, a lower resistance state is formed owing to the continuous growth of the oxygen vacancy (Vo)-based filament with a higher charge injection into the oxide layer.27,28 The multiple conductance states achieved in Fig. S3 are plotted in Fig. S4, depending on the applied compliance. This behavior is consistent with the characterization results from previously reported oxide-based ReRAMs.27–29 Inspired by the above-described resistance modulation method in HfO2-based ReRAMs, the photogenerated currents from the photodiode in the 1P-1R system were used as the driving currents to perform a SET operation on the ReRAM during the memorization process. Therefore, multiple resistance states in the 1P-1R system can be enabled by light illumination with different intensities on the photodiode, generating diverse magnitudes of the driving photocurrent during the memorization process. Figure 3a shows multiple memorization and erase processes to demonstrate the analog resistance states in the 1P-1R device by employing double-voltage sweep I-V measurements under illumination with 532 nm wavelength light. Positive voltage sweeps were applied across the photodiode-memristor (VRP) to perform a light-driven SET operation, whereas negative voltage sweeps were applied only to the memristor (VR) for a RESET operation. The I-V curves in Fig. 3a clearly display the multiple resistance behaviors depending on the incident light intensity, where higher intensity light results in the formation of a filament channel with higher conductance. Figure 3b shows the conductance of multiple states depending on the incident light power density in the memorization process, which is nearly identical to the multistate characteristic of a single ReRAM.
The endurance characteristics of the 1P-1R optoelectronic memory were measured after the resistance of the ReRAM was set by illuminating lights with seven different intensities (Fig. S5). The measurement results confirm that the 1P-1R device has a stable and reliable endurance property over 104 s for memory functionality. To validate the continuous image detection and memorization capability using the 1P-1R system, sequential memorization/erase operations are shown in Fig. 3c. Voltage pulses (VRP) for memorization (2 V), erase (-3 V), and read (-0.6 V) operations were applied to the 1P-1R device under dark or light conditions, with a power density of 50 mW/cm2. Without light illumination, no data were stored in the memory even with SET pulses, and only leakage currents were observed. However, the imaging data were stored in the memory once the light illuminated the device with the application of a memorization voltage pulse, and the memory was maintained in the form of an LRS until the application of an erasing voltage pulse. When the erasing voltage pulse was applied to the ReRAM, the device was switched back to the HRS. Repeated memorization/erase processes were successfully completed using the 1P-1R system. This transient switching characteristic can be utilized to perform continuous image memorization/erase processes by applying a voltage pulse train to the 1P-1R focal plane array without data transfer to the external memory.
Using the fabricated 1P-1R array, we demonstrate in-sensor image storage, encoding, and classification. Prior to image encoding and classification, an image-storage operation was demonstrated. Figures 3d-g show the schematic illustrations of the image memorization, read, and erasing processes, respectively, with the 16\(\times\)16 1P-1R focal plane array. A circuit diagram of the pixels is depicted in Fig. 3d, where a ReRAM in a yellow (dark blue)-colored pixel is in the LRS (HRS) state. To control the 1P-1R array, memorization, read, and erase operations (Fig. 2b) were utilized. For the image memorization process, voltage pulses of + 5 V were applied across the individual 1P-1R pixels, where the photodiodes were reverse-biased, to store incident image information in the ReRAMs, as shown in Fig. 3e (see Methods for more experimental details). The stored image is then read by applying voltage pulses of -1 V to each 1P-1R pixel, where the photodiode is forward-biased, to read the resistance states of the ReRAMs (Fig. 3f). To erase the saved image in the sensor, voltage pulses of -5 V were applied to each 1P-1R pixel to switch all pixels to the HRS state, enabling the sensor to be ready to capture the next images (Fig. 3g).
In this study, we characterized and operated a fabricated 16\(\times\)16 1P-1R crossbar array for image memorization under light illumination with a wavelength of 532 nm. First, the I-V characteristics of an individual pixel in the array were measured under 532 nm light illumination (see Fig. S6). Forming voltage pulses of + 6 V were applied to all 256 pixels under global light illumination (P = 70 mJ/cm2) to form the conductive filament channels in the active medium of ReRAMs. Subsequently, READ voltage pulses of -1 V were applied to each pixel to read the conductance state of each ReRAM in the pixels, followed by an application of RESET voltage pulses of -5 V to switch all pixels to the HRS state for the next image memorization process. Figures 3h and i show the 12\(\times\)12 conductance maps of the InGaAs 1P-1R array before and after the forming process, respectively, on a logarithmic scale (see Figs. S7a and b for details). After the forming process, all pixels were effectively switched from the initial HRS state (Fig. 3h) to the LRS state (Fig. 3i). With the operation-ready 1P-1R array, the memorization function was demonstrated by imaging the handwritten digit images of ‘4’ and 8’ from the MNIST dataset.31 First, the ‘4’ handwritten digit image is illuminated on the 1P-1R array, and + 5 V (SET) voltage pulses are applied to each pixel to memorize the exposed image in the sensor as shown in Fig. 3e (see Methods for detailed experimental methods). After the image memorization process, -1 V (READ) voltage pulses were applied to all pixels to read the saved image from the sensor. Figure 3j shows the corresponding current map of the 1P-1R array after the memorization process with the digit ‘4’ image, indicating that the captured image is successfully memorized in the sensor. The saved image in the sensor array is then erased by applying voltage pulses of -5 V (RESET) to all pixels, and second image memorization and readout processes were performed under the image exposure of the MNIST handwritten digit of ‘8’ using the identical procedure described above (Fig. 3k).
The 1P-1R crossbar array can be approximated as a 1R crossbar array under a forward bias condition for the photodiodes because the resistance of the forward-biased p-i-n InGaAs photodiode is relatively low compared to that of the ReRAM. Thus, analog neuromorphic computing can be directly performed in the 1P-1R crossbar array using the stored image data in the ReRAMs in the same way as the 1R-based crossbar arrays.10,16,26,30 Therefore, image processing and encoding based on ANNs can be conducted within the sensor by directly implementing vector-matrix multiplication. This in-sensor vector-matrix multiplication enables an efficient higher-level computation without data transport between the sensor, memory, and processor, reducing significant amounts of energy consumption and processing time.6,10
The fabricated 1P-1R crossbar focal plane array fuses sensing, learning, and computing capabilities similar to those of biological retinas. To realize a neuromorphic vision system, we stored the vision information in each 1P-1R cell as a matrix geometry and simultaneously harnessed the data using emulated vision encoding. Previously demonstrated conventional crossbar geometries of neuromorphic in-memory computing systems for image processing are associated with pre-trained weight values in the ANN matrices, and input image data are applied to the crossbar column as a vectorized electrical signal (Fig. 4a)6,10,14. Because the format of image data is usually a 2-dimensional (2D) N\(\times\)N array, 2D-to-1D conversion (vectorization) must be applied as a vector input, which is an N2 \(\times\) 1 vector, to the column of the ReRAM crossbar array. In this case, extra complex circuit components (e.g., ADCs, DACs, and multiplexers) must be added to a peripheral circuit to control a large number of input signals, increasing energy consumption and operational complexity.1,6,10 However, our in-pixel image processing system transposes image data to the weights of the ANN, in which the input image is applied and stored in the crossbar array in a weight vector matrix form, as shown in Fig. 4b. Therefore, the 2D-to-1D conversion of the image data is no longer necessary for this configuration, significantly reducing the circuit complexity and improving the operational efficiency. Moreover, data transportation from image memorization to the image encoding process is significantly diminished because the image information is directly processed in the pixels without any data transfer.
Figure 4c shows the in-pixel computing process using the fabricated 1P-1R array. The 12\(\times\)12 image of ‘8’ is optically mapped onto the 1P-1R array (sensing) and preserved as the conductance of the ReRAMs (learning). Meanwhile, the front-ANN and post-ANN is pre-trained with 10,000 datasets of the MNIST handwritten numbers in the post-processor to extract the optimum weight vector.31 The pre-trained 1D weight vector is then converted to electrical signals and applied to the 1P-1R array, enabling the physical matrix multiplication for the in-pixel ANN computation via Ohm’s and Kirchhoff ’s law (computing). The output current signals from the voltage-conductance multiplication thus represent the encoded vector of the image ‘8,’ achieved without data transportation by emulating the biological encoding capability. Figure 4d shows the 10 encoded vectors (vector size: 1 \(\times\) 12) for the input digits from ‘0’ to ‘9,’ each digit exhibiting a distinguishable encoded vector. The encoded vector is then fed to the next hidden layers to classify the image in the post-ANN (for more details, see Methods). Figures 4e and f show the classification results from the measured and memorized ‘4’ and ‘8’ digit images. Before training, the activation level of each digit is randomly distributed. However, the activation level of the ANN output neurons of the ANN is concentrated on a single digit after training. A digit with the highest activation level was adopted as the classified ‘answer’. Figures 4g and h show the results of the image classification before and after 100 training epochs for the full precision of 10,000 test digit images, indicating that the classification performance of the ANN was significantly improved after training the ANN. Although the proposed device has an N times smaller number of weight values compared to conventional in-memory computing methods (Fig. 4a), the classification accuracy is up to 82% with 100 training epochs (Fig. 4i). Further accuracy improvement could be achieved by employing a dual encoding neural layer in the ANN, conserving both row- and column-wise features of images, which can be realized by employing bi-directional peripheral circuitry.32
We demonstrated a neuromorphic machine vision system with an in-sensor encoding process inspired by mammalian vision. The focal plane array is based on an InGaAs photodiode directly integrated with HfO2 ReRAM, constructing the 1P-1R optoelectronic memory and computing pixels. The optoelectronic and memory functionality of the fabricated 1P-1R pixel under light illumination showed reliable digital and multibit memory operation and endurance performance. Furthermore, a 16\(\times\)16 1P-1R crossbar array with an InGaAs photodiode and HfO2-based ReRAM was used to perform edge computing of the handwritten numbers. Finally, we demonstrated biological image encoding with the developed 1P-1R crossbar array, utilizing direct image memorization and in-memory vector matrix multiplication. The encoded images were conveyed to the ANN for image classification, which revealed an accuracy of 82% with 100 training epochs. This slightly low classification accuracy is attributed to the structure of the encoding neural network, which consists of twelve 12\(\times\)1 fully connected layers. The architecture of the neural network is inevitably determined by the hardware circuit structure of the 1P-1R crossbar array. The classification accuracy of our sensor system can be further improved using a dual-encoding neural layer in the ANN. The in-sensor computing concept introduced in this study is a novel method for storing and processing image information directly within pixels, and is seamlessly scalable with conventional semiconductor fabrication technology.