An Annealing Accelerator for Ising Spin Systems Based on In‐Memory Complementary 2D FETs

Metaheuristic algorithms such as simulated annealing (SA) are often implemented for optimization in combinatorial problems, especially for discreet problems. SA employs a stochastic search, where high‐energy transitions (“hill‐climbing”) are allowed with a temperature‐dependent probability to escape local optima. Ising spin glass systems have properties such as spin disorder and “frustration” and provide a discreet combinatorial problem with a high number of metastable states and ground‐state degeneracy. In this work, subthreshold Boltzmann transport is exploited in complementary 2D field‐effect transistors (p‐type WSe2 and n‐type MoS2) integrated with an analog, nonvolatile, and programmable floating‐gate memory stack to develop in‐memory computing primitives necessary for energy‐ and area‐efficient hardware acceleration of SA for Ising spin systems. Search acceleration of >800× is demonstrated for 4 × 4 ferromagnetic, antiferromagnetic, and spin glass systems using SA compared to an exhaustive search using a brute force trial at miniscule total energy expenditure of ≈120 nJ. The hardware‐realistic numerical simulations further highlight the astounding benefits of SA in accelerating the search for larger spin lattices.

The ORCID identification number(s) for the author(s) of this article can be found under https://doi.org/10.1002/adma.202107076.

DOI: 10.1002/adma.202107076
intelligence, applied mathematics, and theoretical computer science. The travelling salesman problem (TSP) is a representative optimization problem where the shortest route connecting all of a given number of cities must be found. [1] In TSP, the time complexity is on the order of n!, where n is the number of cities. The optimal solution can be found for a small n by an exhaustive search using the brute force trial (BFT) method, wherein every combination is evaluated. However, BFT becomes impractical for large n. For example, 40 cities would require 10 49 trials! While BFT fails, various metaheuristic algorithms such as simulated annealing (SA), [2] genetic algorithm, [3] and ant colony optimization [4] have been utilized to obtain an approximate solution which is very close to the actual solution for TSP and other optimization problems. Among these, SA is an excellent optimization technique in discrete problems where multiple local minima exist. SA provides a simple framework which can be implemented on systems with arbitrary energy landscapes, and it statistically guarantees an optimal solution. SA has hence been employed to solve optimization problems in a wide variety of domains such as circuit design, [5] data analysis, [6] imaging, [7] neural networks, [8] geophysics, [9] finance, [10] and the Ising model of magnetism. [11] SA draws inspiration from physical annealing, in which a material is heated above its recrystallization temperature to allow atoms to rearrange and is then slowly cooled down to improve its crystallinity and reach a low energy state. SA is an optimization algorithm where the free energy (H) of a system is minimized by employing a stochastic search. It is similar to other optimization methods, such as gradient descent, [12] where transitions lowering H are accepted. However, unlike the gradient descent method, it also allows transitions increasing H ("hill-climbing"), determined by the annealing temperature. [13][14][15][16] This "hill-climbing" feature of SA makes it highly attractive for systems with multiple local minima in their energy landscape (Figure 1a). The cost associated with a state transition leading to ΔH change in the free energy of the system is evaluated using the Boltzmann factor and accepted if it falls below a predefined threshold, P, following Equation (1)

Introduction
Combinatorial optimization problems represent a set of problems where finding the best solution using an exhaustive search is often unfeasible. Interestingly, such problems appear in applications ranging from supply-chain management, airline scheduling, and industrial resource allocation to artificial www.advmat.de www.advancedsciencenews.com Here, k B is the Boltzmann constant and T is the temperature. This particular SA approach is also referred to as the threshold accepting method and is widely used for its simple structure. [17] The basic principle of SA is illustrated in Figure 1a. To minimize the free energy function H(x) corresponding to the argument set x = {x 1 ,x 2 ,x 3 ,…x N }, where N is a large number, a random x i is initialized. At each iteration, a new random point (x k ) is chosen and the ΔH associated with the transition is evaluated. If ΔH < 0, the transition is automatically accepted, whereas if ΔH > 0, the transition is accepted conditionally, i.e., "hill-climbing" is allowed following Equation (1). The SA algorithm starts at a high T, where significant "hill-climbing" is allowed, and T is progressively reduced following an annealing schedule. At a sufficient lower T the system finds the global minima or arrives close to the same.
Most of the work on optimization algorithms such as SA is done in software, [11,[13][14][15][16] with the few hardware demonstrations [18,19] based on von-Neumann architecture rendering Adv. Mater. 2022, 34, 2107076 Figure 1. Simulated annealing (SA) and Ising spin glass system. a) Illustration of the basic principle of SA used for finding the ground state(s) or lowest energy state(s) of a system in a large search space with multiple local minima. Unlike many other optimization methods, SA accepts transitions increasing H ("hill-climbing") based on the annealing temperature (T). b) A 4 × 4 Ising spin glass system with 16 randomly oriented spins in up (white) or down (black) direction. The neighboring spin interactions can be either ferromagnetic (green) or antiferromagnetic (purple). c) The corresponding coupling strength matrix [CS]. d) Free energy (H) corresponding to the 2 16 possible states for the spin glass system in (b). The energy landscape shows multiple local minima with ground state degeneracy of 4. The number of brute force trials increases exponentially as the size of the spin lattice increases, qualifying the spin glass system as a challenging combinatorial optimization problem where SA can accelerate the search. e) Flowchart of the SA algorithm for a spin glass system. A predetermined cooling schedule is used for T. At each T, the algorithm runs for a predetermined number of iterations (I max ). During each iteration, a random spin is flipped and the ΔH value associated with the spin flip is evaluated. For ΔH ≤ 0, the spin flip is always accepted. For ΔH > 0, the spin flip is accepted if the cost is less than a predetermined value, P. f) For a sufficiently large number of iterations, the system converges to one of the 4 ground states for the spin glass systems in (b).
www.advmat.de www.advancedsciencenews.com them area and energy-inefficient owing to the physical separation of memory and compute. Non-von-Neumann hardware accelerators such as graphics processing units (GPUs) [20,21] and field-programmable gate arrays (FPGAs) [22] have become increasingly popular in recent years, offering speed-up, higher energy efficiency, and a smaller physical footprint. [23] Quantum annealing, a variant of SA where the "hill-climbing" is achieved by quantum mechanical tunneling, has been implemented in a D-wave processor, a quantum computer. [24] More recently, memristive crossbar arrays have been used to implement standalone SA [25] and SA in Hopfield neural networks for various optimization problems. [26][27][28] While memristors offer in-memory computing, they have limitations owing to sneak paths and narrow dynamic ranges, which limit the number of achievable conductance states and cause high power consumption and metal migration in the ON state due to high currents. To improve their performance, memristors are often integrated with an access transistor in a 1T1R structure and also require peripheral circuits based on silicon complementary-metal-oxide semiconductor (CMOS) technology.
In this work, we explore SA using in-memory complementary field-effect transistors (FETs) based on ultrathin-body 2D semiconductors, i.e., p-type WSe 2 and n-type MoS 2 FETs. First, unlike quantum computers operating at cryogenic temperatures, our demonstration is based on room temperature, and second, we exploit analog subthreshold conduction and analog programmability to design unique computing primitives and annealing schedule which achieve better energy and area efficiency compared to large memristive cross-bar arrays. Note that earlier works have already demonstrated the scalability and technological viability of 2D materials. [29][30][31] Additionally, in-memory and near-memory technologies based on 2D materials have been used recently for various computing tasks overcoming the von-Neumann bottleneck. [32][33][34] This work further advances the development of 2D FET-based in-memory computing platforms. To the best of our knowledge this is the first demonstration of hardware acceleration of SA for the Ising spin glass systems using emerging materials and devices.

Spin Glass System
Since the early years of SA, the Ising spin glass problem has been extensively studied since it offers a combinatorially huge number of outcomes with multiple statistically equivalent ground states. Similar to the disordered nature of atomic positions in glass, in a spin glass system, the magnetic spins are disordered. [35,36] The spatial disorder in a glass is set by quenching, where its atoms are frozen in a disordered state. Similarly, spin glass is a system of quenched magnets with disorder stemming from frozen spin states and interactions. Spin glasses demonstrate interesting properties such as frustration. At low temperatures, a spin glass system has a roughly equal number of ferromagnetic and antiferromagnetic interactions, or bonds, which are frozen. However, spins can be flipped to obtain a low energy state which satisfies the frozen spin interactions. This leads to "frustration" in spins trying to settle between competing ferromagnetic and antiferromagnetic interactions. [37] A large number of metastable states with high degeneracy corresponding to different spin orientations are observed in spin glass systems. This leads to a non-monotonic energy landscape with multiple local minima. The transition from a metastable state to the lowest energy state requires specific spin flips and represents an optimization problem where high energy transitions should be allowed to escape from local minima; hence, spin glass is an ideal system to implement SA.
A K × K square spin lattice with K = 4 is shown in Figure 1b. The natures of the various spin interactions, i.e., ferromagnetic (green) versus antiferromagnetic (purple), are also shown. The corresponding spin vector [σ] consisting of K 2 = 16 spins is given by [ ] [ , , ] In the Ising spin model, +1 represents "up" spin and −1 represents "down" spin, respectively. The Hamiltonian representing the free energy of the spin glass system is given by the zero-field Edwards-Anderson (EA) model described in Equation (2) [38] 1 2 , , Here, σ i and σ j are the ith and jth elements of σ. J ij denotes the nature of the interaction between σ i and σ j , i.e., J ij = +1 for ferromagnetic interactions and J ij = −1 for antiferromagnetic interactions. The operator <> denotes that only the nearest neighbor interactions are taken into account, i.e., N ij = +1 if σ i and σ j are immediate neighbors and N ij = 0 otherwise. [25] Note that both the [J] and [N] matrices are sparse matrices of size K 2 × K 2 . These two matrices are combined into a single K 2 × K 2 matrix, referred to as the coupling strength matrix, [CS] (Figure 1c). The green squares represent +1, the purple squares represent −1, and the gray squares represent 0. Note that the K × K spin glass system with K 2 spins, each with two possible orientations ("up"/"down"), has 2 2 K possible states. Also note that, by changing [N], long-range spin interactions can be incorporated into the model. The free energy (H) values corresponding to each of these 2 2 K states are shown in Figure 1d for the spin glass system shown in Figure 1b. Clearly, the energy landscape is non-monotonic with multiple local minima and degenerate global minima, or ground states. Furthermore, as the size of the spin lattice (K) increases, the search for spin configurations that satisfy all interactions becomes unfeasible using BFTs, thus qualifying the spin glass system as a challenging combinatorial optimization problem where SA can speed-up the search process.

Simulated Annealing for Spin Glass System
The SA algorithm to find the ground state of a spin glass system is shown in Figure 1e. To find the orientations of K 2 = 16 spins that result in minimum free energy, a random spin in the K × K spin lattice is flipped and the corresponding ΔH value is evaluated using Equation (2). If ΔH < 0, the free energy of the system is lowered and hence the spin flip is accepted. However, if ΔH > 0, the spin flip increases the free energy of the system. In this case, the spin flip can still be accepted ("hill-climbing") if the associated cost obtained from Equation (1) is lower than www.advmat.de www.advancedsciencenews.com the predetermined value, P. If both conditions are not satisfied, then the spin flip is rejected. At a given T, a predetermined number of iterations (I max ) are performed to traverse the energy landscape of the spin system. This process is then repeated by progressively cooling the system, i.e., by reducing T following a predefined cooling schedule. It is found that within a reasonable number of iterations, and at a sufficiently lower temperature, the system converges to its ground state (Figure 1f). Note that the spin glass systems typically exhibit ground-state degeneracy. See Figure S1a-d in the Supporting Information for the ground states and the free energy landscapes of 4 × 4 ferromagnetic and antiferromagnetic spin systems, respectively. In a ferromagnetic system, the ground state degeneracy is 2 since all interactions are satisfied if all spins are pointing "up" or "down" (Figure S1c, Supporting Information). An antiferromagnetic system also possesses ground state degeneracy of 2 since all interactions are satisfied if adjacent spins are oriented in opposite directions ( Figure S1d, Supporting Information). However, in spin glass systems, the ground-state degeneracy is increased as the ground state is unable to satisfy all interactions due to the phenomenon of "frustration". For example, "up" and "down" configurations are equally valid for σ 16 for the spin glass system to be in its ground state (Figure 1f). Video S1 in the Supporting Information shows SA for 4 × 4 ferromagnetic, antiferromagnetic, and "frustrated" spin glass systems.

Hardware Realization of Simulated Annealing
Hardware implementation of SA requires: 1) a random number generator for random spin flip, 2) a computational unit to calculate the change in free energy (ΔH) of the spin system associated with the random spin flip following Equations (2 and 3), 3) a computational unit to determine the cost of "hill-climbing" if ΔH > 0 to accept or reject the spin flip following Equation (1), and, finally, 4) a hardware mechanism equivalent to the annealing/cooling schedule in metallurgy. While we use software code to generate the random numbers, all other computational units including the mechanism for annealing are realized in hardware. For example, a multiplier module is designed using complementary 2D FETs along with a resistor and a capacitor module to evaluate ΔH. Similarly, co-location of analog memory and analog computing enabled by nonvolatile and programmable MoS 2 FETs is used to determine the cost of "hill-climbing" and achieve an annealing schedule equivalent to changing the T.
The schematic and optical image of a back-gated p-type WSe 2 FET with 50 nm Al 2 O 3 as the gate dielectric are shown in Figure 2a,b, respectively. Undoped WSe 2 demonstrates ambipolar transport with both electron (n-type) and hole (p-type) conduction owing to the pinning of the metal Fermi-level near the middle of the WSe 2 bandgap. [39][40][41] However, for the design of the multiplier module, unipolar p-type WSe 2 is preferred. Hence, WSe 2 is doped p-type using surface charge transfer doping with sub-stochiometric WO 3−x . [40] Doping is achieved on a multilayered WSe 2 flake by converting its top layers to WO 3−x through mild O 2 plasma exposure, as discussed in our earlier reports. [40,42] As shown in Figure 2a, before O 2 plasma exposure the flakes are patterned to obtain a p-i-p structure, i.e., p-doped source and drain extension regions with the middle region left intrinsic [40] (see the Experimental Section for details on p-type WSe 2 FET fabrication). The length of the intrinsic region is designed to be 500 nm. The corresponding transfer characteristics, i.e., drain current (I DS ) vs back-gate voltage (V BG ) for different drain-to-source voltage (V DS ), and output characteristics, i.e., I DS vs V DS for different V BG , of the WSe 2 FET are shown in in Figure 2c,d, respectively. Clearly, the WSe 2 FET demonstrates unipolar p-type conduction with current on-off ratio of ≈10 8 , subthreshold slope, SS = 200 mV dec −1 , hole mobility, μ P = 13 cm 2 V −1 s −1 , and on-current, I ON = 11 μA μm −1 for an inversion carrier density of 4.4 × 10 12 cm −2 .
The schematic and optical image of a back-gated MoS 2 FET with the same 50 nm Al 2 O 3 gate dielectric and with a channel length of 500 nm are shown in Figure 2e,f, respectively. We have used monolayer MoS 2 grown using the same metalorganic chemical vapor deposition (MOCVD) technique as described in our earlier reports [32,43,44] (see the Experimental Section for details on n-type MoS 2 FET fabrication). For MoS 2 , the metal Fermi-level pins closer to the conduction band at the source/drain contact interfaces, resulting in unipolar n-type conduction. [41,45] The corresponding transfer characteristics and output characteristics are shown in Figure 2g,h, respectively. For monolayer MoS 2 FET, we extracted current on-off ratio of ≈10 7 , subthreshold slope, SS = 310 mV dec −1 , electron mobility, μ N = 15 cm 2 V −1 s −1 , and I ON = 6.7 μA μm −1 for an inversion carrier density of 3.7 × 10 12 cm −2 .
The unique stack of p ++ Si/TiN/Pt/Al 2 O 3 offers analog and nonvolatile memory, i.e., the threshold voltage of the FETs can be adjusted by applying a programming voltage (V P ) to the back-gate electrode for a programming pulse time (t P ). The programmability is shown in Figure 2i using the transfer characteristics of a MoS 2 FET at V DS = 1 V after the application of negative programming voltages of different amplitudes for t P = 100 ms. Figure 2j shows the retention characteristics, i.e., post-programmed I DS vs time, measured at V BG = 2 V, confirming nonvolatile and analog programmability of the MoS 2 FET. Similarly, by applying positive erase voltages (V E ) for an erase pulse time, t E = 100 ms, the programmed states can be erased and the transfer characteristics can be shifted in the opposite direction as shown in Figure 2k. The corresponding nonvolatile retention characteristics are shown in Figure 2l. Note that programming (erase) operations can also be achieved by varying t P (t E ) for a fixed V P (V E ), as shown in Figure 2a,b in the Supporting Information. The working principle of the analog and nonvolatile back-gate memory stack has been described in detail in our earlier report. [32] The above demonstration of p-type WSe 2 FETs, n-type MoS 2 FETs, and the analog compute and analog nonvolatile storage capability allow us to design the computational primitives necessary for the hardware realization of SA. Note that ΔH due to random spin flip events can be computed from the difference in the free energy of the spin glass system before (H i ) and after (H f ) the spin flip. According to Equation (2), evaluation of H and H′ requires multiplication of the spin vector [σ] of size 1 × K 2 with the [CS] matrix of size K 2 × K 2 followed by multiplication with the transpose of the spin vector, i.e., [σ] T of size K 2 × 1. However, there are several challenges: first, while vector matrix multiplication can be realized using cross-bar architectures, the [CS] matrix contains negative elements which www.advmat.de www.advancedsciencenews.com are difficult to realize using conductance states and require additional circuitry (note that earlier demonstration of SA [25] using resistive random access memory only implemented ferroelectric interactions); and second, the size of the [CS] matrix can become substantial even for relatively low values of K, imposing heavy area and energy overhead for the computation. Instead, the computational load can be significantly reduced by acknowledging the fact that only one spin is allowed to randomly flip during each iteration of SA, simplifying the computation of ΔH following Equation (3) Here, we assume that the jth spin is flipped. Note that the first term inside the square bracket, i.e.,    connected in series with a common gate and a common source terminal. It multiplies the sign of two input voltages, V in−1 and V in−2 . V in−1 is applied to the common-gate terminal, V in−2 is applied to the drain terminal of T1, and −V in−2 is applied to the drain terminal of T2. b) Transfer characteristics of T1 and T2. c) Transfer characteristics of M1, i.e., output voltage, V out , vs V in−1 , for V in−2 = ±0.1 V. Using M1, the product between the ith elements of [σ] and [CS] j is obtained at the ith time step (iτ p ) as V out (iτ p ) by applying V in−1 (iτ p ) = V 1 σ i and V in−2 (iτ p ) = V 2 CS ij . Note that i = 1:1:K 2 and K = 4. We have used τ p = 60 ms, V 1 = 1 V, and V 2 = 0.1 V, resulting in V out (iτ p ) = 0.1 × CS ij σ i . d) The voltage-to-current converter module (M2) transforms V out from M1 into current, I out , following I out = GV out with G ≈ 0.5 μS. e) The integrator module (M3), a capacitor (C I = 20 nF), sums I out from M2 over K 2 time steps into voltage, V ΔH . f) V in−1 , V in−2 , −V in−2 , V out , I out , and the output from M3, i.e., V I , for representative ferromagnetic, antiferromagnetic, and spin glass systems during a given iteration of SA. V ΔH and σ j are multiplied to obtain ΔH. g) Schematic and h) transfer characteristics of a programmable MoS 2 FET used for evaluating the cost associated with the state transition as well as for realizing cooling schedule in hardware. The subthreshold conduction governed by Boltzmann statistics is exploited to evaluate the cost of "hill-climbing" by applying V BG = ΔH and the spin flip is accepted if I ΔH < I cost = 100 pA. The cooling schedule is implemented by shifting the threshold voltage of the FET through back-gate programming.
www.advmat.de www.advancedsciencenews.com region, as shown in Figure 3b, and not in the OFF region. Having the cross-point in the OFF-state can lead to erroneous V out values. Additionally, it is used to ensure that at V in−1 of ±1, V out reaches ±V in−2 (Figure 3c). This necessitates at least 3-4 orders of magnitude difference in resistance between T1 and T2 at V in−1 = ±1, as the multiplier module is basically a voltage divider circuit and, thus, having comparable resistances for T1 and T2 can lead to the V out being lower than ±V in−2 . Using M1, the product between the ith elements of [σ] and [CS] j is obtained at the ith time step (iτ p ) as V out (iτ p ) by applying V in−1 (iτ p ) = V 1 σ i and V in−2 (iτ p ) = V 2 CS ij . We have used τ p = 60 ms, V 1 = 1 V, and V 2 = 0.1 V, resulting in V out (iτ p ) = 0.1 × CS ij σ i . Next, V out is converted to I out using a voltage-to-current converter module, M2 (Figure 3d), composed of a resistor, following I out = GV out with G ≈ 0.5 μS, where G is the conductance. Finally, the contribution due to all spins are summed over K 2 time steps using an integrator module, M3 (Figure 3e), composed of a capacitor (C I ≈ 20 nF), resulting in V ΔH given by Equation (4) p I 1 (4) Figure 3f shows V in−1 , V in−2 , −V in−2 , V out , I out , and the output of the integrator, V I , for representative ferromagnetic, antiferromagnetic, and spin glass systems during a given iteration. Finally, V ΔH and σ j are multiplied in MATLAB to obtain ΔH. Following the evaluation of ΔH, it must be determined if the spin flip is accepted. Traditionally, this is done using two separate steps: one to determine the sign of ΔH and another one to determine the cost of "hill-climbing" if ΔH is positive. In hardware, both steps are combined in M4 (Figure 3g) by exploiting the subthreshold conduction in a FET where the carrier injection from the source contact into the semiconducting channel is given by Boltzmann statistics. The cost of "hillclimbing" is, therefore, obtained by applying V BG = ΔH and the spin flip is accepted if I ΔH < I cost = 100 pA (Figure 3h) following Here, q is the electronic charge, I 0 is the static leakage current, and m is the body factor, which determines the SS following Equation Here, C S is the semiconductor capacitance, C IT is the interface trap capacitance, and C OX is the oxide capacitance. For fully depleted ultrathin-body FETs, C S = 0. We extracted a SS of 430 mV dec −1 and hence m of 7.1. Note that in Figure 3h the transfer characteristics are represented as a plot of I ΔH versus ΔH. Also note that all negative ΔH (low-energy transition) naturally leads to I ΔH < I cost , whereas positive ΔH up to ΔH max satisfies the "hill-climbing" criterion of I ΔH < I cost .
Next, we implement cooling schedule in hardware. While temperature can be used to the change the current I ΔH for the same ΔH following Equation (5), physically changing the temperature of a system requires excessive energy. Alternatively, ΔH max can be modulated by programming the MoS 2 FET in different states as shown in Figure 3h. As the annealing temperature is lowered, the transfer characteristics are shifted towards the left. This ensures acceptance of more positive ΔH at higher temperatures and no "hill climbing" at T = 0 K, i.e., ΔH max = 0 V. As shown in Figure 3h, our annealing schedule involved five different ΔH max ,i.e., 0.6, 0.45, 0.3, 0.15, and 0 V, analogous to temperature in metallurgical annealing with maximum number of iterations, I max = 5, 15, 15, 30, and 30, at the respective "temperatures".
The flow chart for the experimental demonstration of SA using our hardware-software approach is summarized in Figure S3 in the Supporting Information. The operations performed in hardware are highlighted in red. First, the spin system, i.e., [σ] and [CS], and parameters of the simulated annealing algorithm such as I max and ΔH max are initiated in MATLAB. Using the random number generator in MATLAB, a random spin is flipped. ΔH due to this random spin flip is computed in hardware, using M1, M2 and M3. By applying ΔH to M4, I ΔH is evaluated to check if the spin flip is accepted. After I max iterations, ΔH max is reduced according to the cooling schedule using M4 and the process is repeated for ΔH max up to 0. Figure 4a-c, respectively, show the representative experimental demonstration of SA leading to the convergence of randomly initiated ferromagnetic, antiferromagnetic, and a spin glass system to their respective ground states. See Figures S4-S6 and Videos S2-S4 in the Supporting Information for SA experiments performed on ferromagnetic, antiferromagnetic and a spin glass system initiated with 20 randomly oriented spin configurations. 11 ferromagnetic (55%), 9 antiferromagnetic (45%) and 8 spin glass systems (40%) converged and reached their respective ground states at the end of the total 95 iterations. Additionally, multiple systems are either 1 or 2 spin flips away from their ground state, and hence they are expected to converge for an increased number of iterations. We also observed "frustration" in the spin glass system. Remarkably, compared to an exhaustive search using BFT that requires a maximum of 2 2 K ( = 65 536 for K = 4) spin flips, SA accelerates the search requiring orders of magnitude lower spin flip events. We define acceleration as the ratio of maximum number of spin flips using exhaustive search to the maximum number of SA spin flips to reach the ground state. The highest acceleration for ferromagnetic, antiferromagnetic and the spin glass system were found to be ≈1365×, ≈1260×, and ≈1310×, respectively, whereas, the average acceleration for the systems that converged to their ground states were ≈850×, ≈900×, and ≈810×, respectively. Evolution of H for representative ferromagnetic, antiferromagnetic, and spin glass systems are shown in Figure 4d-f, respectively (see Figure S7 in the Supporting Information for the evolution H for all 60 spin systems used in our experiments). The signature of SA can be seen in the energy landscapes, i.e., significant "hill-climbing" occurs during the initial iterations when the system is at higher "temperature", whereas at the lowest "temperature", the free energy decreases monotonically.
To obtain further insight, hardware-realistic simulation of SA is performed using the virtual source (VS) model developed in our earlier work to capture 2D FET characteristics. [46,47] Figure 5a-c, respectively, show the convergence accuracy of www.advmat.de www.advancedsciencenews.com 1000 randomly initiated 4 × 4 ferromagnetic, antiferromagnetic, and spin glass systems subjected to SA for different total numbers of iterations (I). All ferromagnetic and antiferromagnetic systems converged after ≈600 and ≈750 iterations, resulting in average accelerations of ≈110× and ≈90×, respectively, compared to exhaustive BFTs. Note that these average acceleration  . d-f) Evolution of H with I for ferromagnetic (d), antiferromagnetic (e), and spin glass (f) systems for different ground states. The signature of SA can be seen in the energy landscapes. At higher "temperature", more "hill-climbing" is allowed, whereas at the lowest "temperature", the free energy decreases without any "hill-climbing". www.advmat.de www.advancedsciencenews.com Adv. Mater. 2022, 34,2107076 numbers are lower than the experimental findings since more iterations are performed (200 at each temperature). The spin glass system, however, shows ≈80% convergence accuracy after ≈700 iterations. This is owing to the fact that any spin glass system is more prone to getting stuck in a metastable state. Figure 5d shows the convergence accuracy for 100 randomly oriented K × K ferromagnetic spin systems as a function of total number of spins (K 2 ) and total number of SA iterations. Even for K = 10, SA requires only ≈3500 iterations, or maximum number of spin flips, in comparison to ≈10 30 maximum number of spin flips required for an exhaustive search, demonstrating the tremendous improvement in acceleration (≈10 26 ×). Figure 5e shows the acceleration for ≈100% convergence accuracy as a function of K 2 . Clearly, as K increases the benefits of SA become even more astounding. Additionally, note that hardware modules don't need to scale with K. The same modules can be implemented for larger spin systems. The only difference is that, in the case of a K × K system, the length of the vectors such as V in−1 , V in−2 , −V in−2 , V out , and I out that are passing through various modules would be 1 × K 2 .
Finally, we evaluate the energy expenditure (E SA ) by our annealing accelerator during each iteration following Equation (7 Here, E M1 , E M2 , E M3 , and E M4 are the energy consumption by M1, M2, M3, and M4, respectively, and I T1 and I T2 are the current in T1 and T2, respectively. Note that this calculation does not take parasitic capacitances in our FETs into account. Figure S8a in the Supporting Information shows E M1 , E M2 , E M3 , E M4 , and E SA averaged over all 60 spin systems as a function of I for a 4 × 4 spin lattice. The average energy expenditure for the hardware module was found to be a miniscule ≈1.3 nJ per iteration, which corresponds to a maximum total energy expenditure of ≈120 nJ for finding the ground state of any 4 × 4 spin system. Note that due to the limitations imposed by our measurement instruments, we have used τ p = 60 ms. However, it is possible to scale τ p and thereby reduce the energy expenditure even further. Also note that our energy calculations exclude the software operations performed using MATLAB such as the generation of random numbers. We also attempt to estimate the energy consumption for larger spin systems. Since only the neighbor interactions are considered, out of these K 2 timesteps for a K × K system, ideally, there would be a maximum of 4 spikes and the rest would be zero. However, for the hardware implementation a nonideality is seen. When V in−2 = 0, V out must be zero for the multiplier module. However, in practice, when V in−2 = 0, V out ≈ 10 −4 . This non ideality can carry into the other modules and lead to higher power consumption for larger systems. From Figure S8a in the Supporting Information, M2 and M3, i.e., the VTC module and the integrator module, dominate energy consumption. Hence, we estimate the worst-case energy consumption for spin systems with K ranging from 1 to 1000, for both modules M3 and M4, as shown in Figure S8b in the Supporting Information. Figure S8b in the Supporting Information shows that the increase in energy consumption due to this non ideality is minimal even for very large spin systems, ensuring low energy consumption for larger spin systems. Figure S8b in the Supporting Information represents the energy consumption per iteration. Hence, note that there would be an obvious increase in energy associated with the larger number of iterations needed for larger spin systems. , and spin glass (c) systems subjected to SA for different total I) d) Convergence accuracy for 100 randomly oriented K × K ferromagnetic spin systems as a function of total number of spins (K 2 ) and total number of SA iterations. e) Acceleration factor, i.e., the ratio of maximum number of spin flips using BFT to the maximum number of SA spin flips for 100% convergence accuracy, as a function of K 2 .

Conclusion
This work successfully demonstrates hardware acceleration of SA for the Ising spin system by exploiting subthreshold conduction and analog programmability of complementary 2D FETs integrated with nonvolatile floating-gate memory stack. By designing in-memory computing primitives and annealing schedule equivalent of cooling, we were able to achieve >800× acceleration for 4 × 4 ferromagnetic, antiferromagnetic, and spin glass systems experimentally and at a frugal average energy expenditure of ≈120 nJ. Our numerical simulations show more striking benefits of SA for search acceleration of larger spin lattices.

Experimental Section
Device Fabrication: Back-gated MoS 2 and WSe 2 FETs were fabricated using e-beam lithography. MOCVD grown MoS 2 was transferred onto 50 nm Al 2 O 3 substrate using a PMMA (poly(methyl methacrylate)) assisted wet-transfer process. The substrate was spin-coated with PMMA and baked at 180 °C for 90 s to define the channel region. The PMMA photoresist was then exposed to an e-beam and developed using a 1:1 4-methyl-2-pentanone (MIBK) and 2-propanol (IPA) mixture. Using sulfur hexafluoride (SF 6 ) at 5 °C for 30 s, the monolayer MoS 2 film was subsequently etched. Next, the sample was rinsed in acetone and IPA to remove the photoresist. In order to fabricate the source/drain contacts, the substrate was spin-coated with MMA and PMMA and again patterned using e-beam lithography and developed using MIBK and IPA. E-beam evaporation was then used to deposit a 40 nm Ni/30 nm Au stack to serve as the contacts. Finally, the photoresist was rinsed away in a lift-off process using acetone and IPA.
For WSe 2 , micromechanical exfoliation was performed to obtain optimally thin WSe 2 flakes on the 50 nm Al 2 O 3 substrate. The source/ drain contacts (10 nm Pt/30 nm Au) were defined using e-beam lithography, as discussed above, with a channel length of 1 μm. Following that, in order to fabricate the p-i-p structure, the channel was spin-coated with PMMA and a subsequent e-beam exposure was used to expose 250 nm of the channel near the source and drain contact, leaving the middle 500 nm covered with PMMA. The WSe 2 FET was further doped with O 2 plasma using a Tepla M4L plasma etch tool at a power of 100 W for 300 s. O 2 and He gas flow rates of 150 sccm and 50 sccm with a chamber pressure of 500 mT were used for the O 2 plasma doping. Finally, the photoresist was rinsed away in a lift-off process using acetone and IPA.
Electrical Characterization: A Lake Shore CRX-VF probe station and a Keysight B1500A parameter analyzer were used to perform the electrical characterization at room temperature in high vacuum (≈10 −6 Torr). The measurements with the resistor and capacitor modules were performed outside the probe station on a bread board.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.