An All-in-One Bioinspired Neural Network.

In spite of recent advancements in artificial neural networks (ANNs), the energy efficiency, multifunctionality, adaptability, and integrated nature of biological neural networks remain largely unimitated by hardware neuromorphic computing systems. Here, we exploit optoelectronic, computing, and programmable memory devices based on emerging two-dimensional (2D) layered materials such as MoS2 to demonstrate a monolithically integrated, multipixel, and "all-in-one" bioinspired neural network (BNN) capable of sensing, encoding, learning, forgetting, and inferring at minuscule energy expenditure. We also demonstrate learning adaptability and simulate learning challenges under specific synaptic conditions to mimic biological learning. Our findings highlight the potential of in-memory computing and sensing based on emerging 2D materials, devices, and integrated circuits to not only overcome the bottleneck of von Neumann computing in conventional CMOS designs but also to aid in eliminating the peripheral components necessary for competing technologies such as memristors.

Biological neural networks comprising of billions of neurons connected via trillions of synapses are incredibly diverse, integrated, and energy efficient in processing information that involves sensing, encoding, storage, and computation.For example, sensory neurons receive external/internal stimuli from various sensory organs and convert the information into spike trains following various encoding algorithms, which are then communicated via interneurons to the central nervous system (CNS) where spike-based computation leads to memory formation (learning) and/or decision making (inference).Spikes are electrical impulses or digital point events in time that enable ultra-low-power neural computation as well as long-distance neural communication.Spiking activity between the pre-synaptic and post-synaptic neurons determines the potentiation or depression of their connection strengths or synaptic weights, which is ultimately responsible for learning and forgetting.Another key feature of biological neural network is neuroplasticity that allows adaptation to learning and decision making under changing environmental conditions.For example, eyes can identify patterns under both low-light (scotopic vision) as well as bright-light (photopic vision) conditions.Finally, the balance between the relative strength of potentiation and depression of synaptic connections is critical and any deviation can lead to neurological disorders including learning disability.Therefore, designing low-power neuromorphic hardware systems that resemble the functionality, organization, and plasticity of biological neural network can not only accelerate the development of hardware artificial intelligence (AI) and benefit edge computing but also offer a platform to model synaptic plasticityrelated learning disorders of the CNS.
Artificial neural networks (ANNs) are highly simplified, but most prevalent abstraction of biological neural networks that have already demonstrated breakthroughs in many applications including image classification, speech recognition, and game playing [1,2].However, hardware realization of ANNs using traditional complementary metal-oxide-semiconductor (CMOS) technology appears to consume orders of magnitude higher power compared to what the brain demands for similar tasks.One of the key differences is in the computing architecture, whereas CMOS-based computation embrace von Neumann architecture that physically separates the compute (logic) and storage (memory), biological neural networks dissolve such gap by placing neurons, the computational primitives, and synapses, the storage units, right next to each other.
Acknowledging the energy gap, field programmable gate arrays (FPGAs) [3] and crossbar architectures utilizing memristors [4,5], resistive random-access memory (RRAM) [6], phase change memory (PCM) [7][8][9], etc. with tunable conductance states are accelerating the development of energy efficient and non von Neumann computing architectures.However, these non-von Neumann platforms still require CMOS based peripheral transducers for converting external stimuli into electrical impulses unlike biological neural networks where specialized afferent neurons transduce sensed information into electrical signals i.e. spike trains.Such preprocessing can ultimately limit the energy efficiency and scalability of emerging non von Neumann architectures [13,14].Finally, neuroplasticity of learning in changing environments, and modeling of learning disabilities even at a high level of abstraction is yet to be demonstrated.
Here we mitigate the aforementioned challenges by introduce a monolithically integrated, multipixel, and "all-in-one" bio-inspired neural network (BNN), which is capable of sensing, encoding, learning, forgetting, and inferring using monolayer MoS2 based multifunctional field effect transistors (FETs).First, we use gate-tunable persistent photoconductivity in monolayer MoS2 FET to convert optical information into graded potentials using a neuromorphic sensing module (SM).Next, we demonstrate MoS2 based neuromorphic encoding module (EM) comprising of two MoS2 FETs to transform the graded potentials into spike-count and spike-duration based programming voltages.And finally, we exploit electrical programmability of MoS2 FET-based non-volatile synapses for realizing a neuromorphic learning module (LM) for spike-based learning, forgetting, and inference.Furthermore, we demonstrate low-power operation and adaptability of our BNN to learning under different ambient conditions mimicking neuroplasticity of biological neural networks.Our BNN hardware also offers a platform to model learning disabilities and disorders at a high level of abstraction.To the best of our knowledge, this is the first experimental demonstration of an integrated BNN exploiting in-memory computing and sensing based on emerging two-dimensional (2D) layered materials, devices, and circuits that can accelerate the development of energy efficient neuromorphic systems.
The motivation behind using two-dimensional (2D) layered MoS2 as a hardware platform for neuromorphic computing is multifold.First, there are several demonstration of photodetectors [15], chemical sensors [16], biological sensors [16], touch sensors [17], and radiation sensors [18] using MoS2 based devices, which can naturally serve as artificial sensory afferent neurons eliminating the need for peripheral sensors for MoS2 based intelligent systems.Next, MoS2 being a semiconductor, almost all peripheral analog or digital signal processing units can be build using MoS2 FETs [19] eliminating the need for hybrid design involving CMOS circuitry.Additionally, the atomically thin body nature of MoS2 allows aggressive channel length scaling without the loss of superior gate electrostatic benefiting high integration density.In fact, recent studies show high performance monolayer MoS2 FETs with the channel and contact lengths scaled to 29 nm and 13 nm, respectively [20].Moreover, some of the early criticism of 2D FETs have also been successfully addressed in recent years through the realization of low contact resistance [21], high ON current [22], integration of ultra-thin and high-k gate dielectric [23], and wafer scale growth using chemical vapor deposition (CVD) and metal organic CVD (MOCVD) [24,25].Similarly, MoS2 based microprocessors [26], analogue operational amplifier [27], RF electronics components [28] and neuromorphic and biomimetic hardware platforms [29][30][31] have been reported.Finally, MoS2 can enable flexible [32] and printable [39] optoelectronics adding value towards a MoS2 based hardware platforms similar to ultra-thin silicon on insulator [33,34].

Monolithic, multi-pixel, all-in-one, BNN platform:
Fig. 1a show neurobiological architecture for processing visual information and Fig. 1b-d, respectively, show the optical images of our multi-pixel (7 × 7) BNN hardware platform, each pixel comprising of four monolayer MoS2 FETs (4T cell) that monolithically integrate the sensing module (SM), encoding module (EM), and learning module (LM), and of individual MoS2 FETs, which are locally back-gated using a stack comprising of atomic layer deposition (ALD) grown 50 nm Al2O3 on sputter deposited 40/30 nm Pt/TiN.All back-gate islands were placed on a commercially purchased SiO2/p ++ -Si substrate (see Supplementary Information 1-3 for enlarged optical images of the entire chip, 7 × 7 pixels, and each pixel).Within each pixel, the SM consists of 1 MoS2 FET ( SM ), the EM consists of 2 MoS2 FETs ( EM1 and  EM2 ), and the LM consists of 1 MoS2 FET ( LM ), which are connected using the circuit diagram shown in Fig. 1e.MoS2 FETs used for the SMs and LMs have footprints ( × ) of 5 µm × 1 µm and MoS2 FETs used for the EMs have a footprint of 5 µm × 3 µm excluding the contact pads and each pixel has a footprint of 400 µm × 600 µm.Scalability of our devices was limited by our measurement setup, which require large contact pads for probing the devices for demonstrative purpose.Nevertheless, the circuit schematic in Fig. 1e shows that each pixel is designed to achieve functional and organizational resemblance with different neuronal cells found in the vision pathways in primates, which is depicted schematically in Fig. 1a.For example, monolayer MoS2 FET-based SM is functionally equivalent to photoreceptor cells (rods and cones) in the human eyes that convert external optical stimuli into corresponding graded potentials.Rods primarily enable low-light or scotopic vision whereas cones are responsible for bright-light or photopic vision, both of which can be achieved using our adaptive SM.Similarly, MoS2 FET-based EM mimic the functionality of retinal ganglion cells that encode the graded potentials into spike trains and transmit to visual cortex or midbrain for higher order processing and computation.Finally, MoS2 FET-based LM imitate the visual cortex where learning, forgetting, and inference take place.As we will elucidate later, the Al2O3/Pt/TiN gate stack shown in Fig. 1f allows non-volatile programming of our MoS2 FETs owing to the trapping/detrapping of charge carriers at and near the MoS2/Al2O3 interface when subjected to large positive and negative gate biases.This, in turn, empowers our BNN architecture to overcome the von Neumann bottleneck and enable in-memory sensing and computing capabilities, which are presently lacking for the conventional silicon technology.The programming capability is also central towards the realization of a reconfigurable BNN platform that allows adaptation to different learning conditions (e.g., scotopic conditions) similar to biological neuroplasticity, as well as offers a unique platform to model and study origin of various learning disabilities found in humans (e.g.autism disorder).
MoS2 used in this study was obtained from 2D crystal consortium (2DCC) grown epitaxially on a sapphire substrate using MOCVD technique at 1000 0 C [24,35].As we will elucidate, high-

Figure 1. Monolithically integrated, multi-pixel, all-in-one biomimetic spiking neural network (BNN). a) Biological neural network (BNN) for processing visual information. Optical images of b) 7 × 7-pixel BNN architecture, c) each pixel comprising of four monolayer MoS2 FETs (4T cell) that monolithically integrate the sensing module (SM), encoding module (EM), and learning module (LM), and d) individual monolayer MoS2 field effect transistor (FET)
, which is locally back-gated using a stack comprising of atomic layer deposition (ALD) grown 50 nm Al2O3 on sputter deposited 40/30 nm Pt/TiN.All back-gate islands were placed on a commercially purchased SiO2/p ++ -Si substrate.e) Circuit schematic for each pixel showing the connection between SM, EM, and LM consisting of 1 ( SM ), 2 ( EM1 and  EM2 ), and 1 ( LM ) MoS2 FETs, respectively.MoS2 FETs used for the SMs and LMs have footprints ( × ) of 5 µm × 1 µm and MoS2 FETs used for the EMs have a footprint of 5 µm × 3 µm excluding the contact pads and each pixel has a footprint of 400 µm × 600 µm.Each pixel is designed to achieve functional and organizational resemblance with different neuronal cells found in the visual BNN.For example, SM is functionally equivalent to photoreceptor cells (rods and cones) in the human eyes that convert external optical stimuli into corresponding graded potentials ( N3 ) at node  3 .Rods primarily enable low-light or scotopic vision whereas cones are responsible for bright-light or photopic vision, both of which can be achieved using  SM by exploiting gate-tunable persistent photoconductivity.Similarly, EM mimics the functionality of retinal ganglion cells that encode the graded potentials into spike trains and transmit to visual cortex for further processing.Finally, LM imitates the visual cortex where learning, forgetting, and inference take place.f) 3D schematic of each MoS2 FET with non-volatile programming capability, which empowers our BNN architecture to overcome the von Neumann bottleneck and enable in-memory sensing and computing.
temperature growth ensures high film quality and low device-to-device variability, which are critical for the successful demonstration of our BNN platform.The monolayer MoS2 film was transferred from the growth substrate to the target application substrate, i.e.SiO2/p ++ -Si substrate with predefined islands of Al2O3/Pt/TiN for subsequent FET fabrication and monolithic integration of SM, EM, and LM.Details on fabrication of the back-gate stack, monolayer MoS2 synthesis, film transfer, fabrication of MoS2 FETs, and monolithic integration can be found in the Methods section.

Characterization and device-to-device variation of MoS2 FETs:
Before diving deeper into each functional unit of our hardware BNN platform, i.e., SM, EM, and LM, it is important to thoroughly characterize the basic building blocks i.e. the MoS2 FETs.Fig. 2a-b, respectively, show the Raman and photoluminescence (PL) spectra of a representative MoS2 channel region.The Raman peak separation between the characteristics  1g and  1 2g modes of 17 cm -1 .andPL peak location at 1.82 eV are consistent with monolayer MoS2 [36][37][38].Fig. 2c-d, respectively, show the Raman and PL maps of 10 µm × 10 µm MoS2 regions.Raman peak separation and PL peak position vary less than 4% over the entire map confirming high quality and uniformity of the monolayer film.In fact similar assessment on film uniformity for the entire chip can be made from Fig. 2e-f, which show Raman peak separation and PL peak position across 49 MoS2 channels corresponding to each of the 7 × 7 pixels (see Supplementary Information 4 & 5 for the Raman and PL scans for each of these 49 MoS2 channels, respectively).The mean and standard deviation values were extracted to be 18 cm -1 .and0.8 cm -1 , respectively, for Raman peak separation and 1.82 eV and 0.01 eV, respectively, for PL peak location.Fig. 2g shows the transfer characteristics, i.e. source to drain current ( DS ) as a function of the local back-gate voltage ( BG ) at different drain biases ( DS ) for a representative MoS2 FET with  = 1 µm and Fig. 2h shows the device-to-device variation in the transfer characteristics across 49 MoS2 FETs corresponding to each of the 7 × 7 pixels (see Supplementary Information 6 for the transfer characteristics for each of these 49 MoS2 FETs).Note that, as expected, MoS2 FETs show unipolar, n-type characteristics owing to the pinning of the metal Fermi level close to the conduction band allowing only electron transport through the channel.Fig. 2i show the map of electron field effect mobility values ( FE ) extracted from the peak transconductance for these 49 MoS2 FETs with mean of ~21 cm 2 V -1 s -1 and standard deviation of 5.5 cm 2 V -1 s -1 .Fig. 2j-l, respectively, show similar colormaps on device-to-device variation in current on/off ratio ( ON/OFF ), subthreshold slope () over 3 orders of magnitude change in  DS , and threshold voltage ( TH ) extracted at iso-current of 100 nA/µm with mean values of 2.6×10 7 , 275 mV/decade, and 0.9 V, respectively, and standard deviation values of 0.8×10 7 , 59 mV/decade, and 0.2 V, respectively.Our  FE ,  ON/OFF , , and  TH values and the corresponding device-to-device variations are on par with the state-of-the-art literature on large area grown MoS2, which can be attributed to high quality growth, and relative damage-free transfer, and clean device fabrication.However, we also believe that it is possible to further reduce the device-to-device variation by improving the growth and the transfer process.Supplementary Information 7 shows the output characteristics i.e.  DS versus  DS for different  BG for a representative MoS2 FET with  = 1 µm.While we mostly exploit off-state and subthreshold regime of FET operation in our SM, EM, and LM, on-state current reaching as high as ~ 100 μA/μm at  DS = 5 V for an inversion charge carrier density of ~1.5×10 13 /cm 2 is yet another evidence of high film quality.Extracted mean values for  FE ,  ON/OFF , , and  TH were found to be 21 cm 2 V -1 s -1 , 2.6×10 7 , 275 mV/decade, and 0.9 V, respectively, with corresponding standard deviation values of 5.5 cm 2 V -1 s -1 , 0.8×10 7 , 59 mV/decade, and 0.2 V, respectively.

MoS2 FET based neuromorphic sensing module (SM):
Monolayer MoS2 based phototransistors have been studied extensively in the recent years including our own work [15,[39][40][41][42][43].The phototransduction mechanism in MoS2 FET is typically attributed to two mechanisms: photocarrier generation in the MoS2 channel and photogating effect arising due to charge trapping/detrapping at the MoS2/gate-dielectric interface.Fig. 3a shows the transfer characteristics with  DS = 1 V for a representative monolayer MoS2 FET before and after illumination from a blue light emitting diode (LED) with input currents ranging from  LED = 0.5 mA (low-brightness) to  LED = 20 mA (high-brightness) at different  BG =  write for  write = 100 ms.The corresponding incident optical power is in the range of 0.1-10 Wm -2 obtained by calibrating using commercially purchased silicon PIN photodiode as described in Supplementary Information 8. Given the channel area of each MoS2 FET used in the SM is 5 µm × 1 µm, the estimated incident power on each pixel is 0.5-50 pW.See Supplementary Information 9 for the optical images showing corresponding LED brightness levels.Note that, instead of LASER illumination, conventionally used to study photoresponse in monolayer MoS2 [43], we have used LED to provide optical stimuli since it represents more realistic lighting ambience where most neuromorphic sensors will be deployed.
Two distinct types of photoresponse are observed in Fig. 3a.For  write > 0 V, i.e. illuminations in the on-state ( write = 2.0 V) and in the subthreshold regime ( write = 0.5 V) of the MoS2 FET, there are no visible shift in the device characteristics post-illumination irrespective of the brightness level of the LED ( LED ).This can be ascribed to photocarrier generation in the MoS2 channel, which are swept across by the applied  DS and hence there is no persistent photocurrent beyond the optical exposure.However, for  write < 0 V, i.e. illuminations in the off-state ( write = -1.5 V and  write = -2.5 V) of the MoS2 FET, there are significant shifts in the device characteristics post-illumination.This is a feature of photogating effect, where photocarrier trapping at the MoS2/dielectric interface leads to the shift in the device threshold voltage ( TH ).
The detrapping mechanism can be rather slow and can take hours to several days, which is why the  TH shift is visible post-illumination.Higher  LED , more negative  write , and longer  write naturally result in more photocarrier trapping ( trap ) and hence larger  TH shifts (∆ TH ).Supplementary Information 10 shows ∆ TH and corresponding  trap (=  OX ∆ TH ) as a function of  write and  write for  LED = 20 mA, where  OX ≈ 2×10 -3 Fm -2 is the ack-gate oxide capacitance per unit area.
We exploit gate-tunable photogating effect in MoS2 FET ( SM ) for the conversion of analog optical stimulus into graded potential,  N3 , at node  3 using the circuit layout shown in Fig. 1d.A constant voltage,  N1 = 5 V, is applied to the node  1 , which is the drain terminal of  SM and a clocking signal toggling between  read and  write is applied to node  2 , which is the local back-gate of  SM with  CLK = 100 ms.The source terminal of  SM is connected to the local-back-gate of  EM1 at node,  3 .Note that,  SM and  EM1 forms a  circuit, with  SM serving as the resistor and the local back-gate of  EM1 serving as the capacitor.Prior to illumination, the time constant for charging the node,  3 remains very high > 100 seconds since  SM is biased in the off-state.Fig. 3b shows analog-valued and continuous-time input optical stimuli from the LED and Fig. 3c shows the corresponding temporal evolution of  N3 for different  write with  read = 0 V.Some key observations can be made from the results: 1)  N3 increases monotonically for any given  LED and  write owing to the photogating effect, which results in gradual negative shift in the  TH of  SM switching it from the off-state through subthreshold to the on-state and thereby reducing the charging time constant for the node,  3 , 2) for any given  LED ,  N3 increases faster for more negative  write since more trap states are available at the MoS2/dielectric interface resulting in greater  TH shift and hence higher photoconductance, 3) the time required by  N3 to reach  N1 =

V. g) Colormap of distribution of ratio of post-illumination photoconductance to dark conductance (𝑟𝑟 PH )
measured at  BG = 0 V.The mean and standard deviation values were found to be 6.7×10 3 and 3.8×10 3 , respectively.5 V scales inversely with  LED for any given  write i.e. higher  LED allows the graded potential to reach its maximum strength earlier and vice versa, and finally 4) a lower  LED (e.g., 5 mA) can invoke similar response in  N3 like a higher  LED (e.g., 20 mA), when the former is measured using a more negative  write = -2 V compared to the later measured using  write = -1.5 V allowing adaptation to different illumination levels.These observations are summarized in Fig. 3d, which shows the time for  3 to reach the same magnitude as a function of  LED and  write .Also see Supplementary Video 1 for time-evolution of the graded potential for various  LED using different  write .Fig. 3e shows the average energy consumption by the SM ( SM ) during each  CLK for different  LED and  write .Even for the brightest LED illumination at the most negative  write ,  SM ~ 50 fJ, which suggests energy efficient phototransduction by our MoS2 FET based SM.Finally, Fig. 3f show the pre-and post-illumination transfer characteristics of 49 MoS2 FETs corresponding to the SMs of each of the 7 × 7 pixels of our BNN hardware after  write = 1 second exposure to  LED = 20 mA at  write = -2.5 V and Fig. 3g shows the colormap of ratio of post-illumination photoconductance to dark conductance ( PH ) measured at  BG = 0 V (see Supplementary Information 11 for the pre-and post-illumination transfer characteristics for each of these 49 MoS2 FETs).The mean and standard deviation values were found to be 6.7×10 3 and 3.8×10 3 , respectively.Note that this is the first report on device-to-device variation in the photoresponse of MOCVD grown monolayer MoS2 FETs.

MoS2 FET-based neuromorphic encoding module (EM):
EM converts the graded potentials received form the SM into corresponding programming waveforms using spike-count and spike-duration based algorithms and transmit to the LM as summarized in Fig. 4a-j.Each EM comprises of two MoS2 FETs,  EM1 and  EM2 connected in series as shown in Fig. 1d.Note that the local back-gate of  EM1 is shorted to its source at node,  5 , which is also the drain of  EM2 and the output node of the EM.The drain terminal of  EM1 ,i.e.,  4 is kept grounded, and the programming voltage,  P is applied to the source terminal of  EM2 , i.e.,  6 .Furthermore, both FETs are pre-programmed such that  EM1 operate as a depletion mode (normally on) n-channel FET, whereas,  EM2 operates as an enhancement mode (normally off) nchannel FET as shown in Fig. 4a.The EM module, therefore, serves as an NMOS inverter with depletion load.Fig. 4b shows the input ( N3 ) versus output ( N5 ) characteristics of the EM for different  P values.Also note that the inverting threshold ( IT ), i.e., the magnitude of  N3 at which  N5 reaches  P /2 can be adjusted by reconfiguring  EM1 and  EM2 .Fig. 4c-d, respectively, show the various programming states of  EM2 and corresponding EM characteristics for  P = -5 V.
Tunability in the EM characteristic is an additional benefit of our BNN platform allowing adaptation to various learning conditions as we will elucidate later.Finally, a constant  P applied to node  6 transforms the graded potential into spike-duration based programming voltage as shown in Fig. 4e, whereas a clocking signal toggling between  P and 0 V with  CLK = 100 ms applied to node  6 transforms the graded potential into spike-count based programming voltage as shown in Fig. 4f (see Supplementary Information 12 for the biasing configuration of EM for spike-duration and spike-count based encoding).Fig. 4g and Fig. 4i, respectively show the total spiking time ( spike ), and total number of spikes ( spike ) as a function of  write and  LED , when the input stimulus is presented for a duration of 10 s and  P = -6 V.Note that we start to count the spike time and spike number once  N5 reaches 75% of  P , i.e. -4.5 V in Fig. 4e-f.As expected, for any given  write , graded potentials received from the SM module that corresponds to higher values of input stimuli ( LED ) invoke longer  spike and more  spike at the output of the EM for spikeduration and spike-count based encodings, respectively.Similarly, for any given  LED , more negative  write invokes longer  spike and more  spike .Note that  spike and  spike can also be controlled by configuring  IT (Fig. 4d).For example, scotopic condition will benefit from lower  IT since spikes will reach the programming voltage,  P earlier for any given  LED and  write .
Alternatively, a higher  P value can be used to encode shorter duration or lower number of programming spikes.See Supplementary Information 13 for encoding of the same graded potential using different  P for both spike-duration and spike-count based adaptive encoding.The reconfigurability of the EM can also be exploited for modeling learning disabilities.For example if bright light is encoded into low-magnitude  P spikes, potentiation of synapses can be severely limited invoking learning difficulty.Finally, the average encoding energy expenditure ( EM ) per clock cycle by the EM for spike-count and spike-duration based encoding module for different  LED and  write are shown in Fig. 4h and Fig. 4j, respectively.The relatively higher energy expenditure for the EM is a direct consequence of using a depletion mode NMOS inverter and can be reduced significantly by using a CMOS inverter.This will require the development of p-channel MoS2 or use of another 2D material such as WSe2.

MoS2 FET-based neuromorphic learning module (LM):
Optical information encoded in spikes are delivered from the EM to the LM for pattern learning using memory augmented reinforcement.Each learning module comprises of one MoS2 FET ( LM ) as shown in Fig. 1d, which serves as a non-volatile synapse with analog conductance states programmable by applying electrical voltage spikes to the local back-gate terminal,  5 , which also serves as the output terminal of EM.Fig. 5a-o  As expected, lower number of  spike invoke lower depression and vice versa, which can be exploited for spike-count based forgetting.As we will elucidate later forgetting capabilities enable relearning of new patterns using the same synapses that have learned a previous pattern.Also note that smaller number of  spike can achieve higher potentiation/depression if encoded using higher  P/D .As mentioned earlier, this aspect can be exploited to achieve learning plasticity.different  D .Here, shorter  spike invokes lower potentiation/depression and vice versa, which can be used for spike-duration based learning/forgetting.Note that similar to spike-count based learning/forgetting, higher potentiation/depression can be achieved for shorter spike durations when encoded using higher magnitude of  P/D enabling spike-duration based learning plasticity under scotopic condition.
The underlying mechanism behind the spike-count and spike-duration based potentiation and depression of MoS2 synapses can be explained using the shift in  TH observed in the transfer characteristics of MoS2 FETs.The  TH shift is attributed to charge trapping/detrapping at and near the MoS2/Al2O3 interface, which is also responsible for the photogating effect described earlier.
Negative shift in the threshold voltage ( TH ) with increasing magnitude of  P and positive shift in the  TH with increasing magnitude of  D observed in the transfer characteristics of MoS2 FET are indicative of electron trapping and de-trapping in the local back-gate stack, respectively.Interestingly, the trapping and de-trapping processes were found to be non-volatile as evident from the retention measurements displayed in Fig. 5i-j for 6 representative potentiated ( P ) and depressed ( D ) conductance states, respectively, for 100 seconds.We also examined long-term memory retention characteristics of two representative post-programmed analog conductance states for ~10 4 seconds as shown in Supplementary Information 14.The memory ratio () between these two states was found to change from ~1.1×10 2 to 0.6×10 2 following an exponential decay with time constant of 1.6×10 4 seconds.The projected time before the two states become indistinguishable or  reaches 1 is found to be ~1 day.Note that, while conventional memory devices require non-volatile retention for years, many neuromorphic applications including those used by edge devices and sensors relax the requirement for long-term retention and can be well served with short-term memory retention of several hours to days.The retention window demonstrated by the MoS2 FETs was adequate for the successful realization of our proof-ofconcept "all-in-one" BNN.Certainly, it is desirable and possible to improve the memory retention window by optimizing the design of the local back-gate stack e.g., by mimicking the floating-gate architecture used by the conventional FLASH memory devices.
The device-to-device variation in the pre and post-programmed transfer characteristics and corresponding colormap of  measured at  BG = 0 V for 49 monolayer MoS2 FETs from each LM of our 7 × 7 BNN platform when programmed using  spike = 10 with spike magnitude,  P = -8 V, and spike width,  spike = 100 ms are shown in Fig. 5k-l, respectively (see Supplementary Information 15 for the pre-and post-programmed transfer characteristics for each of these 49 MoS2 FETs).The mean and standard deviation values for  were found to be 6×10 5 and standard deviations of 0.5×10 5 , respectively.Similar to photoresponse, this is the first report on device-todevice variation in the programmability of monolayer MoS2 FETs.
Finally, Fig. 6a-b, respectively, show the spike-duration and spike-count based conductance evolution in MoS2 FET based LM when input programming waveforms ( 5 ) are received from the EM corresponding to different  LED and  write .As shown in Fig. 6c-d, for any given  write , the spiking patterns received from the EM module that correspond to higher values of input stimuli ( LED ) results in higher values of final conductance and vice versa for both spike-duration and spike-count based learning, respectively.Similarly, for any given  LED , more negative  write results in better learning, i.e., higher final conductance.The average learning energy expenditure ( LM ) per clock cycle by the LM for spike-duration and spike-count based learning for different  LED and  write are shown in Fig. 6e-f, respectively.The energy expenditure for the LM is found to be miniscule, ~ 50 fJ per clock cycle even for the brightest illumination and most negative  write .

Multi-pixel demonstration of analog image sensing, encoding, and leaning:
Fig. 7a-d and Supplementary Video 2 show a complete demonstration of our BNN hardware involving multi-pixel and monolithically integrated SM, EM, and LM.A 7 × 7 analog input pattern obtained by illuminating the LED (Fig. 7a) is transduced into corresponding graded potentials using the SMs (Fig. 7b) and encoded into corresponding programming spikes following the spike- count based encoding algorithm by the EMs (Fig. 7c), which are subsequently used by the LMs to potentiate the MoS2 FET-based non-volatile synapses (Fig. 7d).Clearly, the input LED pattern is learnt by our 7 × 7 BNN hardware.For this demonstration, all synapses were initially programmed in their LCS and different LED illuminations were presented one by one to the corresponding pixels of our 7 × 7 BNN hardware.For simultaneous illumination, a lensing system will be needed to focus the image pixels on to the corresponding SMs of our BNN platform.In our future endeavor will attempt to integrate the lensing system with the BNN hardware.Note that MoS2 FETs used in the SMs are biased in the deep off-state by applying negative  write to harness the photogating effect that results in the transduction of optical illuminations into corresponding graded potentials.
However, the MoS2 FETs used in the EM and LM are biased either in the subthreshold or in the on-state, where these devices remain insensitive to illumination and hence their operation is not impacted by illumination.Also note that, while the input pattern is learnt by our BNN architecture, device-to-device variation in the photogating effect, transfer characteristics, and programming of MoS2 FETs are translated into variation in the graded potential, spike-count, and learnt conductance values corresponding to the same input LED signal as seen in Fig. 7.There is no doubt that further reduction in device-to-device variation is desirable.As MoS2 technology matures further through the optimization of growth conditions to reduce point-defects etc., and cleaner and damage-free techniques are developed for large area transfer and polymer residues are eliminated from the device fabrication it will be possible to mitigate the device-to-device variation to a large extent and achieve near-ideal learning.Nevertheless, our proof-of-concept demonstration highlights the fully integrated nature of our MoS2 FET-based hardware BNN that combines sensing, computing (encoding), and storage and thereby distinguishes it from other hardware BNN architectures based on CMOS or emerging technologies such as RRAM, PCM, memristor, alloptic, as well as hybrid approaches.

Importance of forgetting (synaptic depression) in learning and inference:
Forgetting has traditionally been considered to be a passive brain process, which ensures unused information fade over time so that neural resources can be reallocated for storing more important and newer information.When machines learn with unrestricted storage resources (e.g.cloud servers), forgetting is irrelevant.However, when storage capacity is either limited or not accessible, for example in internet of things (IoT) edge devices deployed in remote locations, forgetting can play an active role in smart learning.Here we demonstrate the role of forgetting in relearning without any external supervision and by directly interacting with the changing environment.showing the biasing configuring of the EM for introducing depression cycle.To introduce depression in EM,  D is applied to the drain terminal of  1 , i.e.  4 instead of keeping it grounded with clocking profiles as shown in Fig. S17a-b for spike-duration and spike-count based encoding, respectively.Fig. S17c-d, respectively, show the output ( N5 ) of EM for a constant graded potential,  N3 = 5 V and for various combinations of  D and  P for spike-duration and spike-count based encoding, respectively.During the potentiation cycle, the image pattern to be learnt is presented to the corresponding synapses of the 9×1 BNN, whereas during the depression cycle all 9 synapses are uniformly depressed.The first pattern (left diagonal) is presented for 20 epochs followed by the second pattern (right diagonal) for another 20 epochs to test weather our BNN can forget previously learned pattern and relearn new patterns.Fig. 8c-d, respectively, show the spiking profiles used for spike-count and spike-duration based learning.For each type of learning, we consider three configurations of the BNN: 1) weak potentiation and strong depression, 2) strong potentiation and weak depression, and 3) strong potentiation and strong depression.For spikecount based learning, the strength of potentiation ( P ) and depression ( D ) are adjusted using the

Importance of forgetting (synaptic depression) in learning and inference. a) Schematic and optical image of a 2layer BNN with 9 presynaptic neurons and 1 postsynaptic neuron for learning and inferring patterns from 3×3 pixelated images. b) Training and retraining schedule with 𝑀𝑀 = 40 epochs, with each epoch having potentiation and depression cycles. During the potentiation, the pattern to be learned is presented to the BNN, whereas during the depression all synapses are uniformly depressed. Spiking profiles used for c) spike-count and d) spike-duration based learning. For each type of learning, three BNN configurations are used: 1) weak potentiation and strong depression, 2) strong potentiation and weak depression, and 3) strong potentiation and strong depression. The strength of potentiation (𝑉𝑉 P ) and depression (𝑉𝑉 D ) are adjusted using the spike magnitude and spike duration for spike-count and spike-duration based learnings, respectively. The time evolution of colormap of synaptic weights i.e., the conductance states of the 9 synapses during e) spike-count and f) spike-duration based learning. For each type of learning all synapses are initialized either in a high conductance state (HCS) with 𝐺𝐺 HCS = 100 nS, or a low conductance state (LCS) with 𝐺𝐺 LCS = 100 pS (also see the Supplementary Video 3 and 4). Learning of the left diagonal followed by relearning of the right diagonal when potentiation and depression are both strong for g) spike-count and h) spike-duration based learnings (also see the Supplementary Video 5 and 6).
spike magnitude, for example  P = -10 V for strong and  P = -8 V for weak potentiation and  D = 12 V for strong and  D = 10 V for weak depression.Similarly, for the pattern to be learned, each pixel in the 3×3 images is encoded with  spike = 10 if it is bright and  spike = 0 if it is dark.For spike-duration based learning the strength of potentiation and depression are adjusted using the spike duration, i.e.  spike = 800 ms for strong and  spike = 100 ms for weak potentiation/depression and for the pattern to be learned, each pixel in the 3×3 images are encoded with respective  spike (weak/strong) if it is bright and  spike = 10 ms if it is dark.Fig. 8e-f, respectively, show the time evolution of colormap of synaptic weights i.e. the conductance states of the 9 synaptic devices during the spike-count and spike-duration based learning cycles.For each type of learning all synapses are initialized either in a high conductance state (HCS) with  HCS = 100 nS, or a low conductance state (LCS) with  LCS = 100 pS (also see the Supplementary Video 3 and 4).Following are the key observations.When potentiation is weak but depression is strong, it is difficult to learn irrespective of the initial state of the synapses, however, when potentiation is strong but depression is weak, learning from LCS is fast, but forgetting and hence relearning from HCS is slow.This is expected since synapses that are potentiated get stuck in their HCS owing to weak depression making it difficult for them to forget their respective states.Finally, if both potentiation and depression are strong, learning and forgetting become faster irrespective of the initial synaptic state.This is demonstrated in Fig. 8gh, which show learning of the left diagonal followed by relearning of the right diagonal when potentiation and depression are both strong for spike-count and spike-duration based learnings, respectively (also see the Supplementary Video 5 and 6).Our findings indicate that the relative strengths of potentiation and depression play critical role in learning using BNN.This is similar to BNN, where autism, or autism spectrum disorder (ASD), which includes a broad range of conditions such as challenges with learning social skills, repetitive behaviors, etc. are related to dysregulation or deficit in long term depression in several mouse models [44,45].Therefore, our hardware BNN platform offers a unique opportunity to bridge the gap between neuroscience of learning and machine learning.Supplementary Information 18 shows inference using our BNN architecture.We have used a 9×2 fully connected neural network implemented using two sets of 9×1 synapses as shown in Fig. S18a.
The synapses between the 9 presynaptic neurons and the "Yes" postsynaptic neuron are trained with the actual pattern, whereas the synapses between the 9 presynaptic neurons and the "No" postsynaptic neuron are trained with the inverse of the pattern to obtain the respective conductance maps ( −Yes/No ,  = 1,2,3…,8,9) as shown in Fig. S18b.Any input pattern from the LED is converted to corresponding graded potentials by the SMs and transduced into spike trains by the EMs.For spike-count based inference, the output voltage spikes (  ,  = 1,2,3…,8,9;  = 1,2,3…, spike ) obtained at the output of the encoding module corresponding to each pixel of the 3×3 image are applied to the drain terminals of the 9 presynaptic neurons.The output currents from the common source terminal i.e. post-synaptic "Yes" and "No" neurons are integrated using capacitors ( Yes/No ) to obtain  Yes and  No as shown in Fig. S18c following Eq.1.
For spike-duration based inference, a similar approach is adopted, except for the fact that only one voltage spike (  ,  = 1,2,3…,8,9) is obtained at the output of the encoding module corresponding to each pixel of the 3×3 image with different spiking durations.In this case,  Yes and  No are given by Eq. 2.
For the "Yes" neuron to be a winner,  Yes >  No and  Yes ≥  Win , where  Win is the winning threshold determined by the learned pattern.Clearly, the "Yes" neuron should be the winner only when the pattern similar to the learned one is inferred, whereas the "No" neuron should win for all other patterns.However, the experimental inference accuracy was found to be ~96%.This is because the patterns which contain one or two off-diagonal pixels in addition to the diagonal pixels also make the "Yes" neuron the winner.There are total 6 C1 + 6 C2 = 21 such patterns, which accounts for ~4% of all 2 9 = 512 patterns that are wrongly inferred.Note that if 3 or more pixels in addition to the diagonal pixels are bright, the "No" neuron wins.The inference accuracy was improved to 100% by making  No ≥  Win even when only one off-diagonal pixel is present in the input pattern.This was accomplished through greater potentiation of the synaptic connections between the input neurons and the "No" neuron during the training with the inverse pattern resulting in an order of magnitude higher learned conductance value.

Conclusion
In conclusion, we have experimentally demonstrated a fully integrated, multi-pixel, and biomimetic BNN hardware platform based on monolayer MoS2 that combines sensing, encoding, learning, and inference.We have employed both spike-count and spike-duration based encoding, learning, and inference inspired by the energy efficiency of spike-based computing in the brain.
Similarly, we were able to show adaptive learning in photopic and scotopic conditions and impact of relative strengths of synaptic potentiation and depression on learning and forgetting.on the entire substrate including the island regions.To access the individual Pt back-gate electrodes etch patterns were defined using the same bilayer photoresist consisting of LOR 5A and SPR 3012.
The bilayer photoresist was then exposed to MLA 150 and developed using MF CD26 microposit.
50 nm Al2O3 was subsequently dry etched using the BCl3 chemistry at 5 °C for 20 seconds, which was repeated four times to minimize heating in the substrate.Next the photoresist was removed to give access to the individual Pt electrodes.
Large area monolayer MoS2 film growth: Monolayer MoS2 was deposited on epi-ready 2" csapphire substrate by metalorganic chemical vapor deposition (MOCVD).An inductively heated graphite susceptor equipped with wafer rotation in a cold-wall horizontal reactor was used to achieve uniform monolayer deposition as previously described [46].Molybdenum hexacarbonyl (Mo(CO)6) and hydrogen sulfide (H2S) were used as precursors.Mo(CO)6 maintained at 10°C and 950 Torr in a stainless-steel bubbler was used to deliver 0.036 sccm of the metal precursor for the growth, while 400 sccm of H2S was used for the process.MoS2 deposition was carried out at 1000°C and 50 Torr in H2 ambient, where monolayer growth was achieved in 18 min.The substrate was first heated to 1000°C in H2 and maintained for 10 min before the growth was initiated.After growth, the substrate was cooled in H2S to 300°C to inhibit decomposition of the MoS2 films.More details can be found in our earlier work [35,40,47].

Fabrication of monolayer MoS2 FET:
To define the channel regions for the MoS2 FETs, the substrate was spin-coated with PMMA and baked at 180 °C for 90 s.The resist was then exposed to electron beam (e-beam) and developed using 1:1 mixture of 4-methyl-2-pentanone (MIBK) and 2 propanol (IPA).The monolayer MoS2 film was subsequently etched using sulfur hexafluoride (SF6) at 5 °C for 30 s. Next, the sample was rinsed in acetone and IPA to remove the e-beam resist.
To define the source and drain contacts, sample is then spin coated with methyl methacrylate (MMA) followed by A3 PMMA.Then using e-beam lithography source and drain contacts are patterned and developed by using 1:1 mixture of MIBK and IPA for 60s.40 nm of Nickel (Ni) and 30 nm of Gold (Au) are deposited using e-beam evaporation.Finally, lift-off process is performed to remove the evaporated Ni/Au except from the source/drain patterns by immersing the sample in acetone for 30 min followed by IPA for another 30 mins.Each island contains one MoS2 FET to allow for individual gate control.
Monolithic Integration: Each pixel of our multi-pixel (7 × 7) BNN hardware consist of 4 MoS2 FETs as shown using the circuit schematic in Fig. 1e.Within each pixel, the SM consists of 1 MoS2 FET ( SM ), the EM consists of 2 MoS2 FETs ( EM1 and  EM2 ), and the LM consists of 1 MoS2 FET ( LM ).To define the connections between the respective nodes of  SM ,  EM1 ,  EM2 , and  LM , the substrate was spin coated with MMA and PMMA, followed by the e-beam lithography and developing using 1:1 mixture of MIBK and IPA, and e-beam evaporation of 60 nm Au.Finally, the e-beam resist was rinsed away by lift-off process using acetone and IPA.
Electrical Characterization: Electrical characterization of the fabricated devices are performed using Lake Shore CRX-VF probe station under atmospheric condition using a Keysight B1500A parameter analyzer.Extracted mean values for  FE ,  ON/OFF , , and  TH were found to be 21 cm 2 V -1 s -1 , 2.6×10 7 , 275 mV/decade, and 0.9 V, respectively, with corresponding standard deviation values of 5.5 cm 2 V -1 s -1 , 0.8×10 7 , 59 mV/decade, and 0.2 V, respectively.and 4).Learning of the left diagonal followed by relearning of the right diagonal when potentiation and depression are both strong for g) spike-count and h) spike-duration based learnings (also see the Supplementary Video 5 and 6).

Figure 2 .
Figure 2. Characterization and device-to-device variation of MoS2 FETs.a) Raman and b) photoluminescence (PL) spectra of a representative MoS2 channel region.The Raman peak separation between the characteristics  1g and  1 2g modes of 17 cm -1 .andPL peak location at 1.82 eV are consistent with monolayer MoS2.c) Raman and d) PL maps of 10 µm × 10 µm MoS2 regions.Colormaps of distribution of e) Raman peak separation and f) PL peak position across 49 MoS2 channels corresponding to each of the 7 × 7 pixels of our BNN architecture.The mean and standard deviation values were extracted to be 18 cm -1 .and0.8 cm -1 , respectively, for Raman peak separation and 1.82 eV and 0.01 eV, respectively, for PL peak location.g) Transfer characteristics, i.e., source to drain current ( DS ) as a function of the local back-gate voltage ( BG ) at different drain biases ( DS ) for a representative MoS2 FET with  = 1 µm.h) Device-to-device variation in the transfer characteristics across 49 MoS2 FETs corresponding to each of the 7 × 7 pixels.Colormaps of distribution of i) electron field effect mobility values ( FE )extracted from the peak transconductance, j) current on/off ratio ( ON/OFF ), k) subthreshold slope () over 3 orders of magnitude change in  DS , and l) threshold voltage ( TH ) extracted at iso-current of 100 nA/µm for these 49 MoS2 FETs.Extracted mean values for  FE ,  ON/OFF , , and  TH were found to be 21 cm 2 V -1 s -1 , 2.6×107 , 275 mV/decade, and 0.9 V, respectively, with corresponding standard deviation values of 5.5 cm 2 V -1 s -1 , 0.8×107 , 59 mV/decade, and 0.2 V, respectively.

Figure 3 .
Figure 3. MoS2 FET based neuromorphic sensing module (SM): a) Transfer characteristics of monolayer MoS2 FET measured with  DS = 1 V before and after illumination from a blue LED with input currents ranging from  LED = 0.5 mA (lowbrightness) to  LED = 20 mA (high-brightness) at different  BG =  write for  write = 100 ms.b) Analog valued and continuous time input optical stimuli from the blue LED.c) Corresponding temporal evolution of the graded potential,  N3 , at node  3 for different  write obtained by using the circuit layout for SM shown in Fig. 1d.A constant voltage,  N1 = 5 V, is applied to the node  1 , which is the drain terminal of  SM and a clocking signal toggling between  read and  write is applied to node  2 , which is the local back-gate of  SM with  CLK = 100 ms.The source terminal of  SM is connected to the local-back-gate of  EM1 at node,  3 .d) Time for  N3 to reach the same magnitude ( SAT ) as a function of  LED and  write .e) Average energy consumption by the SM ( SM ) during each  CLK for different  LED and  write .f) Device-to-device variation in photoresponse of 49 MoS2 FETs corresponding to the SMs of each of the 7 × 7 pixels of our BNN hardware after  write = 1 second exposure to  LED = 20 mA at  write = -2.5 V. g) Colormap of distribution of ratio of post-illumination photoconductance to dark conductance ( PH ) measured at  BG = 0 V.The mean and standard deviation values were found to be 6.7×10 3 and 3.8×10 3 , respectively.

Figure 4 .
Figure 4. MoS2 FET-based neuromorphic encoding module (EM).a) Transfer characteristics of MoS2 FETs used as  EM1 and  EM2 in the EM. EM1 is programmed to operate as a depletion mode (normally on) n-channel FET, whereas  EM2 operates as an enhancement mode (normally off) n-channel FET.Based on the circuit layout shown in Fig. 1d, the EM module serves as an NMOS inverter with depletion load.b) Input ( N3 ) versus output ( N5 ) characteristics of the EM for different  P values applied to the source terminal of  EM2 , i.e.,  6 .The drain terminal of  EM1 ,i.e.,  4 is kept grounded.c) Various programming states of  EM2 and d) corresponding EM characteristics for  P = -5 V.The inverting threshold ( IT ), i.e., the magnitude of  N3 at which  N5 reaches  P /2 can be adjusted by reconfiguring  EM1 and  EM2 .e) Spike-duration and f) spike-count based encoding of graded potential ( N3 ) received from SM corresponding to different  write and  LED into programming voltage ( N5 ) for transmitting to the learning module (LM).g) Total spiking time ( spike ) and h) corresponding average encoding energy expenditure ( EM ) per clock cycle for spike-duration based encoding and i) total number of spikes ( spike ) and h) corresponding  EM for spike-count based encoding as a function of  write and  LED .The input stimulus is presented for a duration of 10 s and  P = -6 V. Spike-duration and spike numbers are counted once  N5 reaches 75% of  P , i.e., -4.5 V.
Fig 5a shows the potentiation of a representative MoS2 electrical synapse from a low conductance

Fig
Fig 5e-h show the spike-duration based potentiation and depression of MoS2 synapses.Fig 5e

Figure 5 .
Figure 5. Analog and non-volatile programming of MoS2 FET-based synapses.a) Potentiation of a MoS2 synapse from low conductance state (LCS) after the application of a fixed number of programming spikes (  = 10) of different amplitudes of negative polarity (  ) with each spike applied for   = 100 ms.b) Post-potentiated conductance states (  ) measured at   = 0 V as a function of   for different   .c) Depression of a MoS2 synapse from high conductance state (HCS) after the application of a fixed number of programming spikes (  = 10) of different amplitudes of positive polarity (  ) with each spike applied for   = 100 ms.d) Post-depressed conductance states (  ) measured at   = 0 V as a function of   for different   .e) Potentiation of MoS2 synapse from LCS after the application of single spike of constant magnitude   = -6 V for different   .f) Post-potentiated   measured at   = 0 V as a function of   for different   .g) Depressionof MoS2 synapse from HCS after the application of single spike of constant magnitude   = 6 V for different   .h) Postdepressed   measured at   = 0 V as a function of   for different   .Retention characteristics for i) 6 potentiated and j) 6 depressed conductance states for 100 seconds.k) Device-to-device variation in the pre-and post-programmed transfer characteristics and l) corresponding colormap of distribution of memory ratio () measured at   = 0 V for 49 monolayer MoS2 FETs from each LM of our 7 × 7 BNN platform when programmed using   = 10 with spike magnitude,   = -8 V, and spike width,   = 100 ms.The mean and standard deviation values for  were found to be 6×105 and standard deviations of 0.5×105 , respectively.

Figure 6 .
Figure 6.MoS2 FET-based neuromorphic learning module (LM).a) Spike-duration and b) spike-count based conductance evolution in MoS2 FET based LM when input programming waveforms ( 5 ) are received from the EM corresponding to different   and   .Final conductance state achieved by the LM for c) spike-duration and d) spike-count based input spiking patterns received from the EM module corresponding to different   and   .The average learning energy expenditure (  ) per clock cycle by the LM for e) spike-duration and f) spike-count based learning for different   and   .

Figure 7 .
Figure 7. Multi-pixel demonstration of analog image sensing, encoding, and leaning.a) Analog 7×7 input pattern obtained by illuminating the blue LED.Temporal evolution of corresponding b) graded potential ( N3 ) at the output of SMs, c) programming spike-count ( spike ) at the output of the EMs, and d) programmed conductance values at the output of LMs.Clearly, the input LED pattern is learnt by our 7 × 7 BNN hardware.For this demonstration, all MoS2 FET based synapses belonging to the LMs were initially programmed in their LCS (100 pS) and different LED illuminations were presented one by one to the corresponding pixels of our 7 × 7 BNN hardware.

Fig. 8a shows
Fig.8ashows the schematic and optical image of a fully connected 2-layer BNN with 9 presynaptic

Figure 8 .
Figure 8. Importance of forgetting (synaptic depression) in learning and inference.a) Schematic and optical image of a 2layer BNN with 9 presynaptic neurons and 1 postsynaptic neuron for learning and inferring patterns from 3×3 pixelated images.b) Training and retraining schedule with  = 40 epochs, with each epoch having potentiation and depression cycles.During the potentiation, the pattern to be learned is presented to the BNN, whereas during the depression all synapses are uniformly depressed.Spiking profiles used for c) spike-count and d) spike-duration based learning.For each type of learning, three BNN configurations are used: 1) weak potentiation and strong depression, 2) strong potentiation and weak depression, and 3) strong potentiation and strong depression.The strength of potentiation ( P ) and depression ( D ) are adjusted using the spike magnitude and spike duration for spike-count and spike-duration based learnings, respectively.The time evolution of colormap of synaptic weights i.e., the conductance states of the 9 synapses during e) spike-count and f) spike-duration based learning.For each type of learning all synapses are initialized either in a high conductance state (HCS) with  HCS = 100 nS, or a low conductance state (LCS) with  LCS = 100 pS (also see the Supplementary Video3 and 4).Learning of the left diagonal followed by relearning of the right diagonal when potentiation and depression are both strong for g) spike-count and h) spike-duration based learnings (also see the Supplementary Video 5 and 6).

Figure 2 .
Figure 2. Characterization and device-to-device variation of MoS2 FETs.a) Raman and b)

Figure 3 .
Figure 3. MoS2 FET based neuromorphic sensing module (SM): a) Transfer characteristics of monolayer MoS2 FET measured with  DS = 1 V before and after illumination from a blue LED

Figure 4 .
Figure 4. MoS2 FET-based neuromorphic encoding module (EM).a) Transfer characteristics of MoS2 FETs used as  EM1 and  EM2 in the EM. EM1 is programmed to operate as a depletion mode (normally on) n-channel FET, whereas  EM2 operates as an enhancement mode (normally off) n-channel FET.Based on the circuit layout shown in Fig. 1d, the EM module serves as an

Figure 7 .
Figure 7. Multi-pixel demonstration of analog image sensing, encoding, and leaning.

Figure 8 .
Figure 8. Importance of forgetting (synaptic depression) in learning and inference.a) Ouraccomplishments can be attributed to the unique photoresponse of monolayer MoS2 based phototransistors for sensing, uniquely designed MoS2 based neuromorphic circuit modules for encoding, and programmable and non-volatile MoS2 synapses enabled by our local back-gate memory stack for unsupervised and adaptive learning.Our findings highlight the potential of inmemory computing and sensing based on emerging 2D materials, devices, and circuits that not only overcome the bottleneck of von Neumann computing in conventional CMOS designs but also aid in eliminating peripheral components necessary for competing technologies such as memristors, RRAM, PCM, etc.We believe that our MoS2 based low-power and fully integrated hardware BNN system is more bio-realistic in terms of functionality, organization, and plasticity of BNN and, therefore, can not only accelerate the development of hardware artificial intelligence (AI) and benefit edge computing and smart sensing for Internet of Things (IoT), but also offer a platform for adaptive leaning and for modeling plasticity-related learning disorders of the BNNs.To define the back-gate island regions, the substrate 285 nm SiO2 on p ++ -Si was spin coated with bilayer photoresist consisting of Lift-Off-Resist (LOR film transfer to local back-gate islands: To fabricate the MoS2 FETs, MOCVD grown monolayer MoS2 film was transferred from the sapphire to SiO2/p ++ -Si substrate with local backgate islands using PMMA (polymethyl-methacrylate) assisted wet transfer process.First, MoS2 on sapphire substrate was spin coated with PMMA and then baked at 180 °C for 90 s.The corners of the spin-coated film were scratched using a razor blade and immersed inside 1 M NaOH solution kept at 90 °C.Capillary action causes the NaOH to be drawn into the substrate/film interface, separating the PMMA/ MoS2 film from the sapphire substrate.The separated film was rinsed multiple times inside a water bath and finally transferred onto the SiO2/p ++ -Si substrate with local back-gate islands and then baked at 50 °C and 70 °C for 10 min each to remove moisture and residual PMMA, ensuring a pristine interface.