Figure 1(c) represents the ANN for the in-sensor vision system using the MOSTs. The MOSTs are located at the forefront of the ANN for detecting the light intensity and transmitting pre-processed weights with a reflection of optical signals to the next layer. The photocurrent (Iphoto) summed from each neuron at the next layer is produced by the multiplication of the memorized photoresponsivity matrix and the light intensity of each pixel. When the vision system has N pixels and M neurons at the next layer, current summed in the mth neuron of the next layer (Im) can be represented by the following equation: , where n = 1, 2, …, N and m = 1, 2, …, M denote the indices of the pixel and the neuron at the next layer, respectively. Rmn represents the memorized photoresponsivity matrix and Pn represents the light intensity of each pixel. In this way, the in-sensor processing with the inclusion of image sensing and signal processing allows real-time multiplication of the image with the memorized photoresponsivity matrix [13].
Figure 1(d) shows a schematic of an n-channel MOST with a vertical pillar structure. n+ heavily doped source (S) and drain (D) are located at the top and the bottom of each pillar in the array of MOSTs shown in Figure 1(e), which protrudes from a bulk-silicon wafer, respectively. Between the S and D, there is a p-type channel. As gate dielectrics, quintuple-layers (OⅠ/NⅠ/OⅡ/NⅡ/OⅢ) composed of triple-layered tunneling dielectrics (OⅠ/NⅠ/OⅡ), the aforementioned CTL nitride (NⅡ), and a blocking oxide (OⅢ) wrap around a sidewall of the pillared channel, as shown in Figure 1(f). The triple layers of the OⅠ/NⅠ/OⅡ were adopted to reduce the operating voltage by barrier engineering (BE) of the tunneling dielectrics [17,18]. Each thickness of the gate dielectrics is 1.3 nm/1.3 nm/1.6 nm/5.6 nm/6.3 nm in the order of OⅠ/NⅠ/OⅡ/NⅡ/OⅢ, respectively. A triple-layered metal gate composed of titanium, titanium nitride, and tungsten (Ti/TiN/W) also surrounds the sidewall exterior of the gate dielectrics and pillar. When the light is illuminated, the carriers are generated and flown in the channel in the form of Iphoto that drives the photodetector. Iphoto is actually the drain current (ID) flowing between the source and the drain, which is controlled by the gate voltage (VG) and drain voltage (VD). The gate electrode makes the photoresponsivity tunable by charging and discharging the CTL of NⅡ (hereafter simply abbreviated as ‘CTL’) and controls the memory function. Note that NⅠ in the tunneling dielectrics cannot serve as a CTL because OⅠ is too thin to block tunneling of the trapped charges. Fabrication details of the MOST are described in Figure S1.
In the MOST, threshold voltage (VT) can be adjusted by two factors, photo-carriers controlled by light illumination and trapped electrons modulated by the VG in the CTL. Figure 2 shows the transfer characteristic curve of ID versus VG (ID-VG) according to the light intensity (P) and the number of gate pulses (Npulse). This Npulse determines the level of ID at each state in the synaptic operation, i.e., the number of states. As an example, Npulse of 0 is the initial state with the highest ID due to the lowest VT, and Npulse of 31 is composed of 31 gate pulses that produce the lowest ID due to the highest VT in the depression for multi-states of 32. In this work, a variable pulse number with an identical pulse amplitude and width is used for a potentiation-depression (P-D) operation. An LED (SOL 3.0, Fiber Optic Korea Co., Ltd.) was used as a white light source. The P indicated in Figure 2 is the measured value in a blue region with a wavelength of 405 nm. It was quantified by a power meter that has a detection spot area of 0.785 cm2. Figure 2(a) shows a leftward VT shift. This is caused by the photo-carrier generation, which arises from light illumination [19]. In contrast, Figure 2(b) exhibits a rightward VT shift. It is attributed to electron trapping in the CTL by applied positive depression gate voltage (VG,dep); i.e., it suppresses inversion at the channel surface. This is analogous to the depression operation to reduce the synaptic weight in an artificial synapse [20-22]. The magnitude of VG,dep is 9 V and its pulse width is 10 μs. It should be noted that the rightward VT shift by the electron trapping is semi-permanent and the leftward VT shift by the light illumination is temporal. In other words, the VT shift is returned to a pristine state when the light illumination is removed. Figure 2(c) superimposes ID-VG with the photo-carrier generation by incident light and the electron trapping by the applied VG,dep in one graph. The ratio (h) of photoresponsivity without charge trapping to that with charge trapping by VG is approximately 800 at a VG,read of 0 V. In this way, photoresponsivity can be modulated effectively by controlling the trapped electrons in the CTL. Therefore, the MOST acts as a photodetector by sensing Iphoto with light, a synapse by updating a weight with VG, and a non-volatile memory by holding a weighted state with trapped charges for the in-sensor vision system. This tunable photoresponsivity is utilized as a controllable synaptic weight in the ANN. Unlike the previously reported photodetecting device, extra memory is no longer needed because the MOST itself harnesses an inherent non-volatile memory function [12,13].
Figure 3(a) shows the depression where ID was decreased by an increased Npulse for various P. Herein Npulse is varied from 0 to 31; i.e., there are 32 states. The magnitude of VG,dep is 9 V and its pulse width is 1 μs. This result shows that the photoresponsivity was finely tunable with multi-states. For a typical synaptic operation, the potentiation that increases the synaptic weight should be available, similar to the depression that decreases the synaptic weight. Figure S2(a) represents the P-D characteristics for various P, i.e., with light illumination. The conductance (G) is defined as ID/VD, which can be simplified to ID because the applied VD was 1 V. The photoresponsivity was finely tunable during the potentiation as well as the depression. The magnitude of potentiation gate voltage (VG,pot) is -10 V and its pulse width is 200 μs. Figure S2(b) shows another P-D characteristic in a dark environment, i.e., without light illumination. From Figure S2(b), the nonlinearity parameters (α) were extracted using the following equation:

where Gmax is the maximum conductance, Gmin is the minimum conductance, α is a nonlinear parameter, and w is an internal variable that ranges from 0 to 1 [23]. The extracted αpot and αdep were -0.02 and -0.58, respectively. These parameters are used for the subsequent software simulations. It is well known that a large number of states is preferred to enhance the performance of pattern recognition in a synaptic device [20-22]. In this context, it was also confirmed that the P-D characteristics for Npulse of 64 and 128 were achievable by delicately tuning the gate pulse, as shown in Figure S3.
Figures 3(b) and (c) show the real-time ID for various P and Npulse, respectively, when the light is turned on and off. At a fixed Npulse, ID was increased as P increased. At a fixed P, ID decreased as the Npulse increased. It is worth noting that ID returned to the initial state when the light was off. This feature assures that the synaptic weight is not changed during the optical sensing and repetitive reset operations are not needed. As shown in Figure 3(d), ID was sustained even after 40,000 sec owing to the superior retention characteristics of the CTL-based memory. This attribution has been proven by commercial flash memory adopting the CTL. It should be recalled that good retention characteristics of a synaptic device are crucial for reliable operation over time [22].
Figure S4 shows the P-D characteristics of the MOST for various wavelengths (l). Measurements were performed by using a blue (B), red (R), and infrared (IR) light source. Each l of B, R, and IR light is 405 nm, 638 nm, and 1550 nm, respectively. As shown in Figure S4, tunable photoresponsivity was observed for visible light of B and R, whereas it was not for the IR light. This is because the B and R light can generate photo-carriers to increase Iphoto. However, the IR light cannot create them owing to a small photon energy of 0.80 eV compared to the silicon energy bandgap of 1.12 eV [24,25]. It should also be noted that the photoresponsivity of the B light was smaller than that of the R light because the penetration depth is decreased with shorter l [26]. The demonstrated wavelength dependency as well as the intensity dependency of the tunable photoresponsivity can help in recognizing a color mixed pattern [27,28].
As mentioned above, BE tunneling dielectrics composed of the triple layers renamed BE layers were adopted to reduce the operating voltage. In order to confirm this effect, simplified MOSTs were fabricated as a control group. The BE layers of OⅠ/NⅠ/OⅡ were replaced by a single layer of thermal oxide (Osingle). Other structures were set to be the same. As plotted in Figure S5(a), the measured transfer characteristics of the fabricated MOST with Osingle/NⅡ/OⅢ showed similar photoresponsivity compared to those with OⅠ/NⅠ/OⅡ/NⅡ/OⅢ. This is because the gate dielectric has no effect on the photo-carrier generation by light. Whereas VT was shifted rightward by a VG,dep of 9 V in the case of the OⅠ/NⅠ/OⅡ/NⅡ/OⅢ (Figure 2), it was not changed by that in the case of the Osingle/NⅡ/OⅢ, as shown in Figure S5(b). A VG,dep larger than 11 V should be applied to change the VT and update the synaptic weight, as shown in Figure S5b. As a consequence, the P-D characteristics in Figure S5(c) show that synaptic weight update is impossible with the same VG,dep in the case of the Osingle/NⅡ/OⅢ. Therefore, it is confirmed that the gate dielectric structure of OⅠ/NⅠ/OⅡ/NⅡ/OⅢ is more attractive than that of Osingle/NⅡ/OⅢ for low-power neuromorphic hardware.
Using a full set of the fabricated MOSTs, simple pattern recognition was performed using a single-layer perceptron (SLP). As illustrated in Figure 4(a), two images, ‘A’ of an off-diagonal pattern and ‘B’ of a diagonal pattern, were prepared. Each pattern comprises 2×2 black-and-white pixels. Classification of the two patterns was attempted. A neural network was composed of four input pixels labeled P1, P2, P3, and P4 and two nodes in the output layer labeled OA and OB, as depicted in Figure 4(b). By detecting the output current of the MOSTs connected to each output node, each pattern was recognized. The photoresponsivity that corresponds to the synaptic weight was preset with a binary value, the maximum photoresponsivity and the minimum photoresponsivity, from the data of Figure 3(a). The solid lines and the dashed lines in Figure 4(b) represent the device with the maximum photoresponsivity and the minimum photoresponsivity, respectively. Each photoresponsivity is represented as ‘R’ in the neural network configuration. This in-sensor processing with the inclusion of image sensing and signal processing performs real-time multiplication of the image with a memorized photoresponsivity matrix [13]. Figure 4(c) shows the circuit diagram to construct the neural network of Figure 4(b). VG and VD were set as 0 V and 1 V, respectively. Each output was measured in the form of the output current: Iout,A and Iout,B; i.e., Iout,A was measured in the output node OA for the input image of ‘A’ and Iout,B was measured in the output node OB for the input image of ‘B’, as shown in Figure 4(d). As a result, inference for the simple pattern was experimentally verified. It is worth comparing the required components to distinguish the abovementioned two simple patterns. This work that is applicable to an in-sensor vision system demands only eight MOSTs without extra photodetectors, ADCs or synaptic devices. In contrast, a conventional approach that is suitable for a conventional vision system may need four photodetectors, an ADC, and eight synaptic devices. Thanks to this in-sensor vision system, rapid classification within 1 msec was achieved with low power consumption under 150 nW. This is very small compared to the power consumption of an ADC used for a conventional vision system, which ranges from a few tens of μW to a few mW [29,30].
To demonstrate recognition of more complex patterns such as hand-written digits in the MNIST dataset, a multi-layer perceptron (MLP) network composed of two hidden layers was constructed, as illustrated in Figure 5(a). An input layer corresponds to 528 input pixels, which were cropped from the 28×28 pixels, and an output layer corresponds to the 10 numbers from 0 to 9. Each hidden layer is composed of 250 neurons. The MOSTs were located at the forefront of the network for detecting the light intensity and transmitting pre-processed weights with a reflection of optical signals to the first hidden layer. Each device has its own photoresponsivity corresponding to the synaptic weight, which is represented as ‘R’ in the neural network configuration. This simultaneous image sensing and signal processing allow real-time multiplication of the image with a memorized photoresponsivity matrix [13]. The measured photoresponsive and P-D characteristics from the fabricated MOSTs in a dark environment were reflected in the software simulations. The detailed procedure to reflect the measured characteristics is summarized in Figure S6(a). Before the simulation, extra data of Iphoto/Idark according to the light intensity were created by linear interpolation, as shown in Figure S6(b). Iphoto is the drain current with light illumination (ID,light) and Idark is the referenced drain current without light illumination (ID,dark). This process was repeated for every synaptic state for a precise simulation. For the first step of the simulation, Iphoto/Idark of each pixel was determined by substitution of the MNIST dataset into the interpolated curve, because the MNIST dataset represents the pixel intensity. Next, Iphoto/Idark was multiplied to the conductance of each MOST in the dark environment (Gdark), which was obtained from the P-D characteristic of Figure S2(b). Because the applied VD of the MOST is 1 V, Gdark, defined as Idark/VD, is simplified to the Idark. The multiplication thus results in Iphoto. Finally, Iphoto encompassing information of the pixel intensity with the photoresponsivity is transmitted to the first hidden layer for summation at each neuron. For a normal synapse between the first hidden layer and the second hidden layer or between the second hidden layer and the output layer, only the electrical characteristics (e.g., P-D characteristics at dark environment) were reflected because they could not respond to the light owing to deficiency of a photo-effect. The sigmoid activation function was adopted and supervised learning with back propagation was employed for the learning process to update the synaptic weight of the MOST and a normal synapse. Figure 5(b) shows the simulated recognition accuracy according to the number of training epochs and the saturated recognition rate was 85.7 %. This recognition rate is comparable to an upper limit of 88.3 %, which is achievable by software-based pattern recognition simulations that directly multiply the MNIST dataset by the conductance of each synapse, which has ideal P-D characteristics of perfect linearity and symmetry; i.e., αpot=1 and αdep=1.