Crossbar-Compatible Stateful Logic Using Phase Change Memory


 Stateful logic is a digital processing-in-memory technique that could address von Neumann memory bottleneck challenges while maintaining backward compatibility with von Neumann architectures. 
In stateful logic, memory cells are used to perform the logic operations without reading or moving any data outside the memory array.
This has been previously demonstrated using several resistive memory types, but not with commercially available phase-change memory (PCM). Here we present the first implementation of stateful logic using PCM. We experimentally demonstrate four logic gate types (NOR, IMPLY, OR, NIMP) using commonly used PCM materials and crossbar-compatible structures. Our stateful logic gates form a functionally complete set, which enables sequential execution of any logic function within the memory and paves the way to PCM-based digital processing-in-memory systems.


Introduction
For the last 75 years, computers have been typically designed in the von Neumann architecture, which separates the memory from the processing units (Fig. 1a). While their programming model is simple, incessant data movement limits system performance because memory access time is often substantially longer than the computing time. This bottleneck has worsened over the years since CPU speed has improved more than memory speed and bandwidth (the so-called 'memory wall') 1 . One attractive approach to deal with this problem is processing-in-memory (PIM), which suggests adding computation capabilities to the memory. PIM reduces the need for costly (in terms of processing-speed, bandwidth, and energy) chip-to-chip transfers, thus yielding higher performance and energy efficiency 2 .
Stateful logic 3,4 is a processing-in-memory technique based on memristive memory technologies (Fig. 1b), e.g., resistive random-access memory (RRAM) or conductive-bridge RAM (CBRAM). In stateful logic, the stored resistive data is used as input and the result is written to an output memory cell without reading the input cells beforehand or moving any data outside the memory array. Stateful logic enables PIM architectures such as the memristive memory processing unit (mMPU, Fig. 1b) 5 that offer massive intrinsic parallelism, high-performance, and energy-efficient processing, while maintaining backward compatibility with von Neumann architectures.
In this article, we present and experimentally demonstrate a novel method to perform stateful logic operations using PCM. We demonstrate four different logic gates (NOR, IMPLY, OR, and NIMP) with robust and repeatable results. Our logical set is functionally complete, enabling sequential execution of any logic function in-memory. The gates are compatible with the memory crossbar structure and can be applied in parallel on multiple rows. Additionally, previous applications suggested for stateful logic using RRAM are applicable to our PCM-based method (Supplementary Text 1).

Phase-change memory devices.
Phase-change memory exploits the behavior of certain chalcogenide materials that can be switched rapidly and repeatedly between amorphous and crystalline phases. These materials are typically compounds of Germanium, Antimony, and Tellurium (Ge x Sb y Te z , GST). The amorphous phase presents a high electrical resistivity while the crystalline phase exhibits low resistivity. A PCM device consists of a certain volume of this phase change material sandwiched between two electrodes (Fig. 2). Applying pulses to a PCM device results in Joule heating, which alters the phase (state) of the material. A reset pulse is used to melt a significant portion of the phase change material. When the pulse is stopped abruptly, the molten material quenches into the amorphous phase. Following the reset pulse, the device will be in a high resistive state (HRS). When a slower set pulse, with an amplitude above a threshold voltage (V th ) 33 , is applied to a PCM device in the HRS, a part of the amorphous region crystallizes. After the SET pulse, the device will be in a low resistive state (LRS). The resistance state achieved after the application of reset or set pulses can be read by biasing the device with a small read voltage that does not change the phase configuration.
Our setup (Fig. 3) includes three PCM cells with a shared bottom electrode, and enables programming and reading each cell as well as performing the logic operations. A write-verify scheme is used to probe the maximum cycle count of a single device and characterize its switching behavior. The current, voltage, and power required to set and reset the devices are depicted in Fig. 4a-c. Our endurance test shows that a device can maintain a 10x resistance window for almost 10 4 cycles, with some degradation in the resistance distribution after several hundred cycles (Fig. 4d). The current and voltage waveforms across the PCM cell during a typical set operation are shown in Fig. 4e. Finally, the current-voltage (I-V) transition of the device from the amorphous state to the crystalline state is shown in Fig. 4f.

Phase-change memory stateful logic.
The proposed PCM-based stateful logic gate consists of three PCM cells where two cells serve as inputs and the third device as the output (Fig. 3). The output cell may also serve as an additional input at the cost of losing its stored data, i.e, a destructive operation. A grounded fixed resistor (10 kΩ in our configuration) can be connected to the shared node as well (similar to material implication in RRAM 4 ). A logic operation is achieved by applying voltage pulses to the top electrode (TE) of the input cells, causing a conditional output switching, depending on the resistive states of the inputs. The switching mechanism is based on the Ovonic threshold switching phenomenon 33 that occurs if the voltage across the output cell is above its threshold voltage, V th , followed by the crystallization of the output cell. Furthermore, this design principle is compatible with the crossbar memory structure commonly used for PCM 27,34 . Here, we propose four different logic functions (i.e., NOR, IMPLY, NOR, NIMP) based on this principle. Table 1 13 . See Supplementary  Fig. 1 for more information.

Experimental results of the proposed logic gates.
We measured the functionality and robustness of the proposed gates on the fabricated devices. In each test cycle, we examine the four input combinations for the tested gate. Each experiment includes: a) a write-verify procedure to initialize the inputs and output to the desired states, b) applying the voltage pulses required to evaluate the logic function, and c) reading cycles to examine the output result and to verify the stability of the inputs (see Supplementary Fig. 3). A stateful operation is evaluated not only by the correct logical result, but also the stability of the inputs. This is not always trivial, as reported by previous RRAM works 12,13,15,35 . Results of 50 cycles of: (a) NOR, (b) IMPLY, (c) OR, and (d) NIMP logic gates are shown in Fig. 5. The results show successful logic operation for all cycles on all gates. Additionally, the inputs remain stable, without any meaningful change in their resistance.
The applied voltage pulses to implement the gates and the measured voltage at the shared BE, marked as BE ALL , for each input case are depicted in Fig. 6. In the NOR and IMPLY tests, if one of the inputs is in LRS, the voltage at BE ALL follows the constant voltage dictated by the inputs, thus keeping the voltage across the output below the set threshold. Otherwise, the voltage at BE ALL remains at 0 V, and the output is switched once the voltage on its TE is above V th . In the OR and NIMP tests,

2/11
the voltage at BE ALL follows a constant trend for all non-switching cases, keeping the voltage across the output cell below V th . In the cases where the output is switched, a meaningful change in the voltage at BE ALL is noticeable, caused by the resistance change of the output.

Discussion
An increasing number of applications from high-performance computing (HPC) to databases, data analytics and deep neural networks require higher memory capacity to meet the needs of workloads with large data sets. DRAM scaling has slowed down in the last years, and it has become a Herculean task to improve its capabilities further 36,37 . Thus, new technologies, such as RRAM, CBRAM, and PCM, are being explored 38 . These technologies offer increased memory capacity and add non-volatility, enabling persistent memory. These types of persistent memories are usually referred to as Storage Class Memory (SCM) 14,17 , as they combine both storage and memory characteristics. Although RRAM is considered an attractive memory technology for SCM, to date it is mostly used for small embedded nonvolatile memory, and has not been commercialized yet in large scale (e.g., in gigabytes scale). The Intel Optane Persistent Memory is already commercially available in dual in-line memory modules (DIMM) form with up to 512GB capacity, larger than DRAM DIMM, that typically range from 4GB to 128GB 39 .
With the PCM-based Intel Optane, applications stand to benefit from the availability of large-capacity memory, but the performance will still be limited by the incessant data movement between the CPU and memory. To tackle this issue, we propose adding computation capabilities to PCM technology, inspired by previously proposed stateful logic operations for RRAM. We experimentally demonstrate a novel method to perform four stateful logic gates using PCM (NOR, IMPLY, OR, and NIMP). The measured results show correct and robust logic operation. The proposed gates are crossbar compatible, functionally complete and can be executed simultaneously on multiple rows, paving the path towards PCM-based digital processing-in-memory architectures.

Device fabrication.
PCM devices are fabricated 40 , starting with the evaporation and etching of tungsten (W) to form the bottom electrodes (BEs). Next, we deposit SiO x with plasma-enhanced chemical vapor deposition (PECVD) and pattern the confined vias using e-beam lithography. In-situ Ar sputtering is used to make sure the top of the BE isn't oxidized. Then, sputtering and liftoff are used to pattern the Ge 2 Sb 2 Te 5 (GST) layer with in situ TiN capping, and the final TiN/Pt top electrodes (TEs) and contact pads.

Electrical measurements.
The measurement setup is shown in Fig. 3; it includes three PCM cells and enables programming and reading each cell, as well as performing the logic operations. Electrical measurements are performed on-wafer using a Keysight B1500A with four B1530 WGFMU channels and a Keysight MSOX3104T oscilloscope. For set and reset, we use 30/500/500 ns and 30/50/30 ns rise/width/fall pulses, respectively. The resistance is measured with a 0.2 V read pulse.

Experimental demonstration.
We apply the voltage pulses in two stages or set a relatively long rise time to deal with the long RC delay in the shared node, caused by the large parasitic capacitance of the pads and probes. This long rise time is required since otherwise the RC delay causes an unwanted voltage state at the shared node, which might cause unwanted switching. In an integrated setup, the RC delay is considerably smaller, which will eliminate the need for the long rise time. Additionally, to reduce unintended switching of the inputs and/or output we chose to use exactly V th with a longer pulse and not a higher voltage. See Supplementary Text 2 for more details. In the NOR gate test, we apply 0.6 V for 3 µs on all TEs, then we increase the voltage on TE OUT to 1.     TE IN1  TE IN2  TE OUT  BE Table 1. Summary of the applied voltages and configurations at each node to realize the logic gates. 'R' marks connection to the grounded fixed resistor, 'F' marks a node left floating.