In PCMs, after programming, the active layer undergoes structural relaxation overtime. It leads to resistance drift, overlap, and misreading whenever one considers a PCM multilevel cell. In order to investigate and capture this phenomenon, a drift test at 300 K, 77 K, and 12 K was conducted. The same programming scheme depicted in Fig. 2(a) is used to first program the ePCM. It is subsequently read at 0.1 V, starting at 0.9 ms after programming, over 30 seconds, with a reading interval of 500 ms (see Fig. 4(a–c)). The resistance evolution of five states from LRS to HRS are tracked. The drift phenomenon is then modelled and extracted according to the power law51,52:

$${\text{R}}_{\text{t}}= {\text{R}}_{\text{t}0}{\left(\frac{\text{t}}{{\text{t}}_{0}}\right)}^{\text{v}}\left(2\right),$$

Where R0 stands for initial resistance at time t0, and Rt instant resistance at time t. Finally, v is the drift coefficient. As the most prominent drift behaviour is effectively captured within a confined initial time range 35,38,53, the fitting window considered in the analysis is the first minute after amorphization with a view to accurately capturing the distinct shift in resistance within the sub-second range immediately after programming. The drift coefficient goes from 0.03 to 0.09 for LRS and HRS at 300 K and from 0.0005 to 0.045 at 12 K. A slight change from 0.044 to 0.041 in the maximum drift coefficient is observed at 12 K compared to 77 K (see Fig. 4(d)). It is noteworthy that the drift coefficients are in complete agreement with the literature34,35,37,38,53–55. To date, there have been limited studies at cryogenic temperatures, and most are GST line cell devices. Studies at 80 K and 125 K reported an average drift coefficient of 0.01 and 0.07 at HRS, respectively, and a zero-drift coefficient at approximately 60 K for a line cell based on extrapolations39,40. According to Khan40, the reduction in the drift coefficient at cryogenic temperature is caused by a combination of structure relaxation and diminished kinetic energy and rate of charge emission from electron traps in the amorphous phase. The results shown in Fig. 4 confirm the drift temperature dependence and report a near-zero drift behaviour at 12 K for the first time.

## ePCM state distribution evolution over time

Considering the distribution and variability of states, it is essential to model the correlation between the drift coefficient and resistance to forecast the evolution overtime of any resistance state achieved during programming. Since drift is resistance dependent, there is a generalized way of predicting drift using the following Eq. 56:

$${\text{v}}_{\text{R}2}-{\text{v}}_{\text{R}1}= \text{k}.\text{log}\left(\frac{{\text{R}}_{1}}{{\text{R}}_{2}}\right) \left(3\right),$$

Where 𝜈R2 and 𝜈R1 refer to the drift coefficient of a corresponding pair of initial resistances R1 and R2, respectively measured. k denotes how the drift coefficient depends on resistance. As shown in Fig. 4 (d), the k coefficient is 0.022, 0.0109, and 0.0098 at 300 K, 77 K, and 12 K, respectively. The k coefficient is also temperature dependent and, as observed, it drops with the temperature.

Once it has been obtained, it becomes feasible to predict the drift coefficient for any resistance state by combining Equations 1 and 2, as follows:

$${\text{R}}_{\text{t}}= {\text{R}}_{\text{t}0}{\left(\frac{\text{t}}{{\text{t}}_{0}}\right)}^{\text{k}.\text{log}\left(\frac{{\text{R}}_{\text{t}0}}{{\text{R}}_{\text{R}1}}\right)+{\text{v}}_{\text{R}1} }\left(4\right)$$

Consequently, when applied to the list of states depicted in Fig. 3, it becomes feasible to predict and project the resistance distribution evolution at any given moment (see Fig. 5). The study comprises a huge exploration range from seconds to approximately two years.

## The effect of ePCM drift on SNN performance down to 12 K

Once the projection of states’ distributions over time had been obtained for each temperature, a system-level simulation was carried out on a fully connected SNN to evaluate and compare the effect of state overlap and misreading caused by drift on ePCMs. Both lists of states (pre-selected and selected) are used and considered separately to assess whether a reduced number of states and programming instructions is enough to achieve reasonable accuracy. In system-level simulations, the weight matrix, or synaptic strengths in SNNs, are counted as conductance states. Weight normalization is a common practice used to improve stability, convergence, and performance in simulations. Therefore, the list of states pre-selected and selected at 300 K, 77 K, and 12 K is thus converted to conductance C and to a normalized weight state ranging from 0 to 1,

$${\text{C}}_{\text{N}\text{o}\text{r}\text{m}\text{a}\text{l}\text{i}\text{z}\text{e}\text{d}}=\frac{{\text{C}-\text{C}}_{\text{m}\text{i}\text{n}}}{{{\text{C}}_{\text{m}\text{a}\text{x}}-\text{C}}_{\text{m}\text{i}\text{n}}} \left(5\right),$$

where Cmin is the minimum conductance, Cmax is the maximum conductance, and C is the conductance of the device. It covers both the experimentally extracted resistance state distributions and those predicted using Eq. (4). The normalized conductance states are used to simulate an ePCM-based SNN for the MNIST classification task. A feedforward 784x50 spiking neural network is considered as depicted in Fig. 7(a). The training method is based on the unsupervised learning rule called “voltage dependent synaptic plasticity” (VDSP)57,58. The SNN is trained, and the corresponding software-based computed weights are compared to the experimentally extracted lists of pre-selected and selected ePCM states to find the nearest corresponding value. The trained weights are updated, or replaced by the nearest corresponding normalized states, and sorted by their standard deviation to define the final weights used to compute inference on a test set before drift. On the other hand, to evaluate the effect of drift on classification accuracy at different times, the final weights used to compute inference before drift are replaced by their projected counterpart obtained through Eq. (4), as depicted in Fig. 5. The calculation begins at 6 s and progresses logarithmically up to approximately two years, reaching 6x108 s. Henceforth, the inference is computed without further training steps, ensuring that the trend in the evolution over time of the classification accuracy remains consistent regardless of the training method considered in the analysis, enabling the generalization of the study. The baseline accuracy before updating the weights by the list of states is 73.3%. After weight update with the ePCM pre-selected and selected list of states, the latter achieves an average classification precision of 71.3%, 72.1%, and 71.2% while the former reaches 71.8%, 71.5%, and 71.1% at 300 K, 77 K, and 12 K, respectively (see Fig. 6(b–c)). Considering the standard deviation, the performance is barely distinguishable between temperatures and encoding schemes (selected and pre-selected list) prior to the occurrence of the drift phenomenon – Possibly because both lists span the entire resistance range regardless of the number of states and present a similar average coefficient of variability (CV) per state in the order of 15%, initially.

Now, considering the predicted version of the selected and pre-selected list of states starting at 6 s going towards 6x108 s, the pre-selected list of states achieves after two years a classification performance of 64.8%, 70.6%, and 70.9% while the selected list of states reaches 63.5%, 71.0%, and 70.8% at 300 K, 77 K, and 12 K, respectively (see Fig. 6(b–c)). The impact of the more pronounced drift at room temperature compared to cryogenic temperatures is evident: the latter undergoes a performance drop of 7–8% while the former presents a sustained precision along the complete time window irrespective of the list of states under consideration (see the receptive fields in Fig. 6(d)). The accuracy consistently declined over time at room temperature in both scenarios, starting to drop one week and two hours after programming for the list of pre-selected (120) and selected (10) states, respectively. This difference in time possibly arises due to a larger number of states being available on the pre-selected list that are more prone to being closer to the original trained weights from simulation. The likelihood of the states being closer to the original distribution simply means that they are more likely to be closer to the centre and distant from the edges of the distribution of the original trained states, and consequently farther from the threshold for state overlap and misreading. This suggests that it would require more time to drift and overcome the threshold barrier for misreading compared to the list of selected states.

As both encoding schemes present a similar accuracy at 77 K and 12 K the entire time, this means that the less costly approach comprising the list of 10 defined states and programming instructions is preferable compared to the list of 120 pre-selected states. However, at 300 K the latter must be considered to sustain the accuracy over one week instead of two hours.

It is essential to emphasize that baseline accuracy is proportional to the number of output neurons57,58. Thus, a state-of-the-art accuracy of 90% could be achieved, for instance, in a VDSP-based SNN comprising 500 output neurons57,58. However, 50 output neurons serves the purpose of depicting the effect of drift on classification accuracy evolution overtime adequately, while simultaneously minimizing computational expenses.

In sum, it should be noted that ePCM performance at cryogenic temperatures in SNN applications is enhanced due to this inherent drift mitigation aspects. The accuracy is sustained over two years and a less costly encoding approach is favoured at cryogenic temperatures. It thus enables and attests to its efficient application in spiking neural networks at cryogenic temperatures down to 12 K due to this near-zero drift condition without requiring any additional software or hardware solution to address drift mitigation.