Device Fabrication and Characterization
There have been various candidates for the material combination to make up the metal-insulator-semiconductor (MIS) stack for ReRAM device. In this work, Ni/GeOx/p+ Si MIS stack was fabricated. There are two reasons for having employed the material combination; one is to equip the fabrication viability through introducing the materials with compatibility to conventional Si processing which is mostly adopted for the modern VLSI electronics and the other is to obtain more concentrated distribution of operation voltages with non-metallic switching material. Figure 1(a) shows the cross-sectional view of a fabricated ReRAM device by a high-resolution transmission microscopy (HR-TEM), by which GeOx switching layer with 3-nm thickness is confirmed. The schematic of the fabricated ReRAM cells is shown in Fig. 1(b). Ni and p+ Si act as the materials for the top electrode (TE) and bottom one (BE), respectively. Figure 2(a) shows the measured I-V curves from the fabricated ReRAM device with a diameter of 100 µm after 1, 5, 10, and 20 direct-current (DC) sweeps using a Keithley 4200A, with 0.1-mA compliance current. The distribution of set and reset voltages are confirmed to be narrow owing to the non-metallic switching dielectric material formed by a MTO and finalized by a post-deposition annealing (PDA). In order to elucidate the conduction mechanism in the Ni/GeOx/p+-Si ReRAM cell, I-V curves in the high-resistance state (HRS) in the positive voltage HRS region and in the negative voltage LRS region are depicted in Figs. 2(b) and 2(c). As shown in Fig. 2(b) for the case of HRS, there are two linear regions with different slopes: region I for TE voltage < 0.7 V with a slope of 1.83 and region III for TE voltage > 2.5 V with a slope of 4.78, respectively. Also, there is a nonlinear region between these two regions: region II for TE voltage is located in 0.7 V ≤ V < 2.5 V and the average slope is approximated to 2.49. Judging from the slopes in Fig. 2(b), Child’s square law (I ~ V2) has the dominance in the low and intermediate TE voltage regions [21–22]. Also, the behavior in the high TE voltage region demonstrating the large slope attributes to drastic increase in current by trap-controlled space-charge-limited current (SCLC) [23]. In case of LRS state in the negative TE voltage region, the slope is extracted to be 1.06 as shown in Fig. 2(c), in which the current conduction mechanism can be mainly explained by ohmic conduction. Figures 3(a) through 3(d) illustrates the construction and destruction of the conducting bridge in the Ni/GeOx/p+-Si ReRAM device. There is no conduction filament in the pristine state (Fig. 3(a)) but Ni2+ ions begin to penetrate into the GeOx switching layer as the TE voltage increases. These Ni2+ ions are reduced at the BE resulting in the gradual growth of conductive filaments of Ni atoms towards the TE (Fig. 3(b)). As the TE voltage increases, the filament formed by the Ni atoms touches the TE and the resistance state turns to LRS (Fig. 3(c)). As the TE voltage is reduced and goes into the negative region, the conductive filament undergoes electrochemical dissolution and gets ruptured leading to the HRS state (Fig. 3(d)). This formation and rupture of the conducting filament, or conducting bridge, are realized by the metallic species, which is more likely to be observed in the ReRAM cells employing Ni as the TE material [24, 25]. The conducting filaments repeating the construction and destruction with voltage dependence shown in Figs. 3(a) through 3(d) are randomly distributed over the ReRAM cell as illustrated in Fig. 4(a). Each filament can be described as a parallel combination of a voltage-dependence resistance and a capacitance as shown in Fig. 4(b). The series resistance (Rs) at the top of the block comes from the series combination of TE, BE, and contact resistances. Since all the filaments are connected in parallel between TE and BE, all the resistances can be lumped into an equivalent cell resistance (Rc), and likewise, all the parallel capacitances are summed into an equivalent cell capacitance (C) as demonstrated in Fig. 4(c). Although the construction and destruction of the conducting bridge are explained by the movements of the metallic atoms and the bridging mechanism can be varied according to the material combination making up the cell stack, an individual cell can be described by a variable resistor and a capacitor, and thus, the suggested equivalent electrical circuit model in Figs. 4(b) and 4(c) is allowed to have the high universality for ReRAM devices. In order to extract the passive elements in the ReRAM cell, the fabricated devices were brought to impedance analysis using an impedance analyzer by introducing the equivalent circuit model in Fig. 4(c). The Cole-Cole plots from the fabricated ReRAM device in the HRS and LRS are shown in Figs. 5(a) and 5(b), respectively. The measurement frequency was varied from 1 Hz to 200 kHz, and the x (Z’) and y (Z”) axes indicate the real and imaginary parts of the impedance. As the frequency goes higher, the trajectory is plotted in the counterclockwise direction. The appearance of a single semicircle in the Cole-Cole plot is an affirmation of the fact that the charge transport mechanism in the device can be described in terms of a parallel RC circuit as described in Fig. 4(c). The square symbols in Figs. 5(a) and 5(b) show the measurement results whereas the continuous lines denote the fitted data. Table 1 shows the values of the extracted parameters from the impedance analysis of the GeOx ReRAM device. It is revealed that the capacitance in the LRS is much smaller than that in the HRS, which attributes to the reduction in effective area for the device capacitance taking place over the growth of a conductive filament. The higher accuracy in the impedance analysis fitting shows that the physical simplification of a realistic ReRAM cell in Fig. 4(a) and the equivalent circuits in Figs. 4(b) and 4(c) induced from the results in Fig. 4(a) have high coherence. In order to further explain the charge transport within the device, energy-band diagrams across the Ni/GeOx/p+ Si stack in the HRS are schematically shown in Fig. 6(a). Since the workfunctions of Ni BE and p+ Si are not significantly different, the energy-band diagram is drawn to be under nearly flat-band condition. In the HRS, there is no conduction filament bridging TE and BE, inside the switching layer dielectric as shown in Fig. 6(a), with the know material parameters. As the TE voltage gets higher, the energy bands in the GeOx layer and p+ Si are bent upward as demonstrated in Fig. 6(b), and further increase in the TE voltage leads to the field-enhanced migration of Ni cations from the TE into the switching layer as schematically shown in Fig. 6(c). The Ni2+ ions are reduced to metallic Ni atoms at the BE. A conductive filament is formed of Ni atoms which grows toward the TE. At the moment when the TE and BE are bridged by a conducting filament composed of Ni atoms, memory state transition takes places from HRS to LRS finally. Figure 6(d) presents the energy-band diagram of the Ni/GeOx/p+Si ReRAM cell with the Ni conducting bridge in the LRS state when there is no applied TE voltage, in the steady state.
Table 1
Values of passive elements extracted from the impedance analyses.
Resistance states
|
Rs [Ω]
|
Rc [MΩ]
|
C [pF]
|
Extraction voltage [V]
|
HRS
|
197
|
12.1
|
163
|
2.3
|
LRS
|
216
|
29.2
|
116
|
0.7
|
Training Approach and Hardware Architecture of the PIM with GeOx ReRAM
The off-chip training capability for graphical image recognition by the GeOx ReRAM has been evaluated using the Canadian Institute for Advanced Research (CIFAR)-10 dataset in the Visual Geometry Group (VGG)-8 neural network architecture. The architecture of the VGG-8 network comprises a total 8 layers: 6 convolutional layers and 2 fully-connected layers. The detailed schematic of the VGG-8 network architecture is shown in Fig. 7(a). The input CIFAR-10 dataset has a collection of 60,000 color (red-green-blue) images of 32 × 32 resolution. The images can be broadly classified into 10 output indexes. During the network training, the data is grouped into 50,000 train and 10,000 test images with a batch size of 200. The VGG-8 network has been trained using a stochastic gradient descent (SGD) algorithm and rectified linear unit (ReLU) activation function. The realization of hardware-sense neural network for a PIM architecture is illustrated in Fig. 7(b) [20]. The hardware design is capable of evaluating the performance of the VGG-8 network of GeOx ReRAM synaptic devices. The system takes into account the various hardware constraints including technology node, analog-to-digital converter (ADC) precision and the nonideal changes in the synaptic weights during the training. The system design has been hierarchically organized into chip level, processing element level, and synaptic array level elements. For the full single-chip hardware integration, peripheral circuits including ADCs, buffers, multiplexers (MUX), interconnects with the 32-nm predictive technology SPICE model parameters have been presumably used and other relevant circuitry such as digital adders and shift registers have been also considered. The accumulation circuits include the chip-level units, processing element level adders, tile-level adders, and shift adders on the edges of the ReRAM synapse array. The system level performance has been evaluated using an analog parallel read-out scheme using 64 × 64 synaptic array size and 5-bit ADC precision. The input data flows into the wordline (WL) switch matrix and the MAC operations in the crossbar array generate partial sums which are accumulated along the columns using the read-out circuits (flash ADCs). The bit-quantized ADCs are much larger in area than the synaptic array column pitch and hence they share several columns using the column MUX. The roles of adders and shift registers are to shift and accumulate partial sums by the MAC operations over repeated data cycles due to batch-wise data processing. A major concern with the batch-wise data processing lies in the large amount of intermediate data generated during the feed-forward process taking place in the computation of activations. In order to minimize the requirement for in-chip memory space, the PIM architecture can be designed to send the intermediate data to off-chip DRAM, which can be optional depending on neural network size and chip area of the GeOx ReRAM PIM architecture.
System-Level Performance Evaluation
For evaluating the system-level performances of the PIM architecture based on GeOx ReRAM, binary-state switching operations in the synaptic array were assumed with potentiation (write) voltage = 3 V with a pulse width = 100 µs and inference voltage = 0.7 V with a pulse width = 100 µs. Figure 8 shows the accuracy in CIFAR-10 image recognition as a function of number of epochs in comparison between software and hardware neural networks. It is observed that the PIM system with the hardware neural network of GeOx ReRAM synapses has achieved an accuracy of 91.27%, which is comparably high with the accuracy obtained by the software neural network, 92.31%, in terms of test accuracy. A sharp jump is witnessed in both the software and hardware-based trainings at 200 epochs. This is due to the decreasing learning rate strategy employed after 200 epochs for the optimized training of the network. The inset in the Fig. 7(c) shows a subset of the CIFAR-10 dataset. The system-level parameters from the simulation of the designed PIM architecture are summarized in Table 2. Figures 9(a) through 9(c) shows the pie diagrams of portions in energy, latency, and area occupied by different hardware components in the PIM architecture. The energy distribution in Fig. 9(a) reveals that the ADCs (multi-level current sense amplifier) and interconnects consume the largest energy, followed by the accumulation circuits. The synaptic array energy consumption is extremely low as compared to other components. The latency distribution in Fig. 9(b) shows that the logic and buffer circuits along with the interconnects have the predominance in determining the overall system latency. The large size of the VGG-8 network results in the considerable amount of on-chip data transfer from the buffer memory and a large number of synapses in the array leading to increase complexity in interconnects within the PIM chip. This is a crucial factor in limiting the overall chip latency. Finally, it is observed from Fig. 9(c) that the total chip area is largely occupied by the ADCs. Thus, the ADC area needs to be intensively optimized with regard to both energy and area efficiencies. The energy, latency, and area minimally consumed by the GeOx ReRAM synapse array is an indication of its high applicability in the hardware PIM architecture. Further, the computational demands of the VGG-8 network on the hardware PIM design is also evaluated in order for understanding the future directions for the optimization of hardware neural network architecture. Figures 10(a) and 10(b) describes the layer-wise energy consumption and latency distribution of the VGG-8 network across the main hardware components. It is observed that the convolutional layers with additional pooling layers (previously shown in Fig. 7(a)), i.e., layers 2 and 4, demand the largest energy and time consumptions for the in-memory computations. Judging from both the histograms, it is clarified that the inference energy and latency of the synaptic array across all the layers of VGG-8 are minimized by the virtues of the fabricated GeOx ReRAM. However, it is important to consider the variation in the device-level operations in evaluating the system-level performances. Recently, there have been several studies on the effects of nonideal variations in the synaptic devices on the PIM system performances [26, 27]. As one of the most decisive nonidealities, variation in cycle-to-cycle switching operations can be quantified as a standard deviation and can be treated as an independent variable in determining the system accuracy. Figure 10(c) shows the maximum test accuracy for the CIFAR-10 image recognition as a function of the cycle-to-cycle variation. It is explicitly shown that there is little drop in the accuracy up to the standard deviation of 0.02, which confirms the robustness of the GeOx ReRAM synaptic devices implementing the PIM architecture. It is demonstrated in Fig. 10(d) that the inference energy monotonically increases with the standard deviation but the system preserves the robustness against the device-level variation up to standard deviation of 0.02.
Table 2
Chip-level parameters and performances computed per epoch for the GeOx ReRAM synapse array-based PIM architecture.
PIM chip parameters
|
Values
|
Chip area
|
62.5 mm2
|
Total energy on chip
|
3.35 × 10− 5 J
|
Latency
|
1.33 ms
|
Peak energy efficiency
|
58.92 TOPS/W
|
Mean energy efficiency
|
36.42 TOPS/W
|
Inference energy in the synapse array
|
1.64 × 10− 6 J
|
Other logic energy
|
3.55 × 10− 7 J
|
ADC energy
|
1.37 × 10− 5 J
|
Interconnect energy
|
1.20 × 10− 5 J
|
Inference latency in the synapse array
|
2.20 × 10− 5 s
|
Other logic latency
|
5.58 × 10− 4 s
|
ADC latency
|
5.86 × 10− 5 s
|
Interconnect latency
|
5.55 × 10− 4 s
|