High-speed single perceptron for optical neural networks based on microcombs


 Optical artificial neural networks (ONNs) — analog computing hardware tailored for machine learning [1, 2] — have significant potential for ultra-high computing speed and energy efficiency [3]. We propose a new approach to architectures for ONNs based on integrated Kerr micro-comb sources [4] that is programmable, highly scalable and capable of reaching ultra-high speeds. We experimentally demonstrate the building block of the ONN — a single neuron perceptron — by mapping synapses onto 49 wavelengths of a micro-comb to achieve a high single-unit throughput of 11.9 Giga-FLOPS at 8 bits per FLOP, corresponding to 95.2 Gbps. We test the perceptron on simple standard benchmark datasets — handwritten-digit recognition and cancer-cell detection — achieving over 90% and 85% accuracy, respectively. This performance is a direct result of the record small wavelength spacing (49GHz) for a coherent integrated microcomb source, which results in an unprecedented number of wavelengths for neuromorphic optics. Finally, we propose an approach to scaling the perceptron to a deep learning network using the same single micro-comb device and standard off-the-shelf telecommunications technology, for high-throughput operation involving full matrix multiplication for applications such as real-time massive data processing for unmanned vehicle and aircraft tracking.

Introduction well as to predicting benign/malignant cancer classes using a feature set extracted from microscopy images of biopsied tissue, achieving > 85% accuracy.
Finally, we show how this approach can be readily scaled using the same single micro-comb chip source to form ultrahigh speed deep neural networks using standard off-the-shelf telecommunications tools. Scaling to multiple levels offers the full potential of wavelength multiplexing for speed enhancement together with the deep learning network structure. Both the perceptron and deep learning ONN are dynamically trainable and fully compatible with state-of-art electrical interfaces, making them highly promising for next generation real-time massive data processing.
Photonic Single Perceptron Figure 1 shows the mathematical model of the single neuron perceptron [18] while Fig. 2 shows the detailed experimental con guration that we use based on an integrated optical micro-comb source. The perceptron uses simultaneous time and wavelength multiplexing based on 49 wavelengths from the microcomb source, each wavelength forming a single synapse. Its core function is a matrix multiplication operation (for a single perceptron, reducing to a vector dot-product) between the input electronic data of the image to be analysed with the synaptic weights that are implemented in a multiple-step approach in the optical domain. The raw input data for classi cation is a 28×28 matrix in electronic digital grey-scale values with 8-bit intensity resolution. We rst resample this digitally (effectively performing digital down-sampling) into a 7×7 matrix which is then rearranged into a 1D vector: X = [x(1), x(2), …, x (49)]. This vector is then sequentially multiplexed in the time-domain via a high-speed electrical digital-to-analog converter at a data rate of 11.9 Giga Baud (symbols per second), where each symbol corresponds to the 8-bit pixel of the input data and occupies one time-slot of length τ = 84 ps, so that the entire waveform duration is given by N τ = 4.12 ns (N = 49). In traditional digital approaches, the input nodes to the neural network generally reside in electronic memories and are routed via the memory addresses. In contrast, for our ONN the input nodes are de ned by temporally multiplexed symbols that can be routed according to their temporal location.
Next, the electronic time-division multiplexed input waveform signal is multicast onto all 49 (e.g. equal to the number of components of the X vector) wavelength channels from the micro-comb source via an electro-optic modulator, such that each wavelength contains an identical replica of the temporal data waveform X. The optical power of each comb line is then weighted with an optical spectral shaper (Waveshaper) according to the trained synaptic weight vector W = [w(1), w(2), …, w(49)], which therefore effectively multiplexes the synaptic weights in the wavelength domain. Assuming X and W are both 49×1 column vectors, the resulting weighted replicas of input X then become 1 where the n th row (n∈ [1, N]) corresponds to the weighted temporal waveform replica at the n th wavelength channel. Hence, the diagonal elements denote the N weighted input nodes, i.e., the n th weighted input node is represented by the 8-bit symbol w (n)·x(n) residing in the n th timeslot of the n th wavelength channel.
The replicas then pass through a dispersive element providing second-order dispersion to progressively delay the weighted replicas so as to line up all of the diagonal elements into the same timeslot, with the delay step satisfying τ = delay(λ k ) − delay(λ k+1 ). Thus, the dispersive element serves as a time-of-ight addressable memory that aligns the sequentially weighted temporal symbols w (1) · x(1), w (2) · x(2), …, w (49) · x(49) across the wavelength channels as While for the single perceptron demonstrated here (single layer, single neuron) this process does not increase the speed of the network since only the diagonal elements are used, dramatic increases in speed can be realized by scaling to deep networks through simultaneous time, wavelength and spatial multiplexing (see Sect. 5).
Finally, the optical intensity of the aligned time slots are summed by photodetection (high speed -with enough bandwidth to resolve the different timeslots of width τ) and sampling to nally yield the result for matrix multiplication (in this case a vector dot product) of the neuron, given by 3 After this matrix multiplication, the weighted and summed output is then biased and mapped onto a desired range through a nonlinear sigmoid function (achieved in this initial demonstration o ine with digital electronics, yielding the neuron (single-neuron perceptron) output. Finally, the prediction of the input data's category is generated by comparing the neuron output with the decision boundary, which is a hyper-plane in a 49-dimension space found during the learning process (in this case achieved o ine digitally) that can well separate the two input categories.

Soliton Crystal Microcomb
The key to our approach lies in the use of an integrated optical micro-comb source [19][20][21]. Micro-combs have enabled many fundamental breakthroughs through their ability to generate optical signals with the same precision as microwave and RF signals, yet at 100's of terahertz for optical frequency synthesis [22], ultrahigh capacity communications [23], complex quantum state generation [24], advanced microwave signal processing [25], and more. They offer the full power of optical frequency combs [26] but in an integrated platform with much smaller footprint and higher scalability, performance, and reliability [27][28][29][30][31][32][33][34][35].
The microcomb we employ here operates in a unique coherent state termed "soliton crystals", which originate from optical parametric oscillation in an on-chip micro-ring resonator (MRR). Soliton crystals are a unique and powerful class of soliton microcomb featuring deterministic formation originating from a mode crossing-induced background wave, driven by the Kerr nonlinearity, together with the high intra-cavity power. Because the intracavity energy of the soliton crystal state is almost identical to that of the chaotic state from which they originate, there is no signi cant change in intracavity energy when they are generated and, in turn, there is no resulting selfinduced shift that requires complex tuning methods as, e.g., for DKS solitons [27]. This results in simple and reliable initiation via adiabatic pump wavelength sweeping [36], as well as much higher energy e ciency (ratio of optical power in the comb lines relative to the pump power) [37]. Soliton crystals are thus a very promising category of optical frequency combs for wavelength multiplexing based systems including microwave and RF photonic processors [25,[38][39][40][41][42][43][44][45][46][47][48][49][50][51][52][53] as well as the ONN reported here.
The MRR used here was fabricated in a CMOS compatible doped silica glass platform [20] with a Q factor of ~ 1.5 million and radius of ~ 592 µm, corresponding to an FSR of ~ 0.4 nm or 48.9 GHz. This is a record low FSR spacing for any coherent integrated microcomb source and is a critical feature of this work since it resulted in a large number of available wavelengths over the telecommunications C-band. The chip was coupled with a bre array, featuring a bre-chip coupling loss of only 0.5 dB per facet brought about by integrated mode converters. The cross-section of the waveguide was designed to be 3 µm × 2 µm, which yielded anomalous dispersion in the C band as well as a unique mode crossing observed at ~ 1552 nm.
To generate coherent micro-combs, a CW pump laser was employed, with the power ampli ed to 30dBm by an optical ampli er. Next, the wavelength was subsequently manually swept from blue to red. When the detuning between pump wavelength and MRR's cold resonance became small enough such that the intra-cavity eld reached a threshold value, a modulation instability driven oscillation was initiated. As the detuning was changed further, distinctive ' ngerprint' optical spectra (Fig. 3) were observed that are a signature of soliton crystals [36,37], which arise from spectral interference between the tightly packed solitons circulating along the ring cavity.

Experimental Results
We experimentally demonstrated the building block of the network -a single layer, single neuron photonic perceptron (Fig. 4) which is suitable for binary classi cation problems. Problems with more classes can be addressed using more than one neuron, even with only a single layer (non-deep) ONN system. This can easily be achieved by sub-dividing the comb into wavelength groups that each de ne a neuron (See Sect. 5). We rst tested the perceptron on several pairs of handwritten digits (Figs. 5 and 6), using 500 gures for each digit, from which 920 gures were randomly selected for o ine pre-training, leaving the remaining 80 gures for experimental testing.
The 2D handwritten digit gures were pre-processed electronically using a down-sampling method to reduce the image size from 28×28 to 7×7, followed by transforming it into a one-dimensional array of 49 symbols. This was then time multiplexed with ~ 84 ps long timeslots for each symbol (Fig. 5b), equating to a modulation speed of 11.9 Giga-baud.
As discussed above, the optical power of the 49 microcomb lines was shaped according to pre-learned synaptic weights (Fig. 6a) to boost the parallelism and establish synapses for the neuron. Then the input data stream was multicast onto all 49 shaped comb lines followed by a progressive (linear with wavelength) delay using a ~ 13 km standard single-mode bre (SMF), which served as the timeof-ight optical buffer via its second-order dispersion (~ 17 ps/nm/km). Hence, the weighted symbols on different wavelength channels were aligned temporally, allowing them to be summed together via photodetection and sampling of the central timeslot, to generate the results of the matrix multiplication and accumulate (MAC) operation. The output was then compared with the decision boundary obtained from the learning process, which yielded the nal ONN prediction (Fig. 6b).
We evaluated the performance of the optical perceptron in determining the classi cation of two standard benchmark cases (Figs. 6, 7), handwritten digits and cancer cells. In the rst case, two categories of handwritten digits (0 and 6) were distinguished by the decision boundary. Our device achieved an accuracy (ACC) of 93.75%, compared to 98.75% success for the calculated results on a digital computer (see Fig. 6d). Despite being a rudimentary benchmark tests, the perceptron nevertheless achieved a very high success rate and, most importantly, at unprecedented speeds (see below). This was a result of the large number of synapses (optical wavelengths over the C-band), in turn enabled by the record low FSR soliton crystal micro-comb.
We also determined the classi cation of cancer cells from tissue biopsy data (Fig. 6e). Individual cell nuclei, from breast mass tissue extracted by ne needle aspirate and imaged under a microscope, have previously been characterized in terms of 30 features such as radius, texture, perimeter, etc. In our analysis, data for 521 cell nuclei were employed for pre-training, with another 75 used for experimental diagnosis, following a similar procedure to the above handwritten digit test. We achieved an accuracy of 86.67% as compared to 98.67% success for the calculated results on a digital computer.
There is currently no commonly accepted standard that establishes benchmark systems for classifying and quantifying the computing speed and processing power of the widely varying types of ONNs that have been reported. Therefore, we explicitly outline the performance de nitions that we use for throughput and latency in characterizing our ONN. We follow the approach Intel has used to evaluate digital micro-processors [54]. Considering that in our system the input data and weight vectors for the MAC calculation originate from different paths and are interleaved in different dimensions (time, wavelength), we use the temporal sequence at the electrical output port to clearly de ne the throughput. According to the broadcast-and-delay protocol, each computing cycle of matrix multiplication between the 49-symbol data and weight vectors generates an output temporal sequence with a length of 48 + 1 + 48 symbols and thus a total time duration of 84ps×97. While the 49th symbol corresponds to the desired matrix multiplication output as a result of 49 multiply-and-accumulate operations, the throughput of our ONN is thus given as (49×2)/(84 ps×97) = 11.9 Giga-FLOPS.
The input data stream consisted of symbols with 8-bit (256 discrete levels) values, determined by both the original grey scale values of the image pixels and the intensity resolution of our electronic arbitrary waveform generator. The optical spectral shaper (Waveshaper) featured an attenuation control range of 35 dB, which could support up to 11-bit resolution (10•log10(2 11 ) = 33 dB). As such, each computing cycle also corresponded to an equivalent throughput of (49×2)×8/(84 ps×97) = 95.2 Gbps in terms of bit rate. For analog systems such as the one used here, the bit rate/intensity resolution is limited by the signal-to-noise ratio of the system. Hence, to achieve 8-bit resolution, the system would have to feature a signal-to-noise ratio of > 20•log10(2 8 ) = 48 dB in electrical power or 24 dB in optical power. This is well within the capability of analog microwave photonic links including the ONN system reported here (with OSNR > 28 dB).
Our results represent the fastest throughput (in bit rate) claimed so far for any ONN, although a direct comparison of the widely varying systems is di cult. For example, while systems that use CW sources to perform single-shot measurements [4,10,17] may have a low latency, they always suffer from a very low throughput since the input data cannot be updated rapidly. While the latency of our single perceptron is relatively high (~ 64 µs) due to the bre spool, this does not affect the throughput of our system. In any event the latency can be readily reduced to < 200 ps through the use of compact devices to implement the delay function -devices with high group velocity dispersion and much lower overall time delay such as photonic crystal waveguides or sampled Bragg gratings (in bre or onchip) [55], for example. Finally, although we implemented the nonlinear function digitally o ine, which did not impact the predictions, this could also be done with electro-optical modulators or electrical ampli ers operating at saturation point.

Scaling To Deep Onns
The single neuron perceptron can be readily scaled, using many different approaches, to multi-layer deep ONNs using only the same single micro-comb source together with standard off-the-shelf telecommunications technologies. Deep neural networks can achieve much more complex tasks than the single perceptron demonstrated here, and at much higher speeds. Here, we outline in detail one possible example of a scaled deep learning network (Fig. 7). It consists of an input layer (serving as an interface between the input raw data and the neural network), multiple hidden layers (each containing multiple neurons) and an output layer. The deep ONN also uses wavelength division multiplexing to establish the synapses but, in contrast with the single perceptron, makes full use of time, wavelength and spatial multiplexing with all layers' synapses being established from the same single soliton crystal source and using the same single WaveShaper device. The microcomb is replicated and spatially multiplexed into the multiple hidden layers with each layer (and each synapse, or wavelength, within each layer) all being uniquely weighted. The simultaneous power splitting and spectral shaping can be achieved with a single commercially available Waveshaper. At each layer, the comb is further divided spectrally with into M(k) groups, where M(k) is the number of neurons and k is the layer number, with each group de ning one neuron. Because the neurons are de ned by their wavelength sub-comb, rather than physically, in effect they are "virtual"-as are the synapses. The layers each have an electrical input port to receive the electrical output of the previous layer and an electrical output port to generate the  symbol's duration) input from the previous layer. Following this, the WDM waveforms for each neuron are progressively delayed by a dispersive device. In contrast with the single perceptron where all time slots in the full comb need to be aligned to a single slot (since there is only one neuron), here only the wavelengths within each individual neuron need to be aligned. One of the most elegant methods to achieve this would be through the use of chirped sampled Bragg gratings. Each segment of the grating individually serves as the buffer for the wavelengths associated with each neuron, with the delay between wavelengths matching the symbol duration of the input electrical waveform -both equal to τ. The sampled Bragg grating is not only capable of imposing segmented delays on many wavelengths simultaneously, but does so without any signi cant overall delay, or latency. The delayed replicas of each neuron are then demultiplexed in wavelength and summed separately via photo-detection. Since the network uses spatial multiplexing to address the different hidden layers, it requires multiple delay components (e.g., chirped sampled Bragg gratings)-equal to the number of hidden layers. We note that since the different layers can have different numbers of neurons, the grating structure would also be layer dependent -the number of segments must equal the number of neurons while the bandwidth of each segment depends on the number of synapses.
The last stage consists of digital signal processing the electrical waveforms generated by the previous levels of neurons to sample the central summing slot to obtain the matrix multiplication result of each neuron. This is followed by imposing a nonlinear function that rescales the weighted sum, and nally by retiming and digital-to-analog (D/A) conversion to generate the nal output of the layer, with a time duration of T out = M(k) τ and a modulation rate equal to the electrical input waveform. Note that while the digital signal processing adds to the overall latency, it does not affect the net throughput rate. The resampling needed to preserve the input data rate of each layer can easily be achieved with high-speed electronic circuits (such as eld programmable gate arrays (FPGAs)) or potentially even using optical approaches [56].
After the sequence processing by the different layers, the ONN then predicts the class of the raw input data as before by comparing with multi-dimensional hyperspace decision boundaries determined through prior training. This overall network structure results in a series of wavelength, time and spatially multiplexed signals that dramatically boosts the network scale to multiple hidden layers each having multiple neurons, operating at ultra-high speed, and yet within a compact footprint. The potential throughput of the deep ONN can easily reach the TeraFLOP/s regime, and be capable of solving much more complex tasks than the ones achieved by the single perceptron demonstrated here.
There is strong potential for substantially higher levels of integration -ultimately towards fully monolithic embodiments of our ONN.  -115], and will likely also bene t greatly from the integration of novel 2D materials .

Conclusions
We propose a novel and powerful approach to optical neural networks based on integrated optical Kerr micro-comb sources. We demonstrate the key building block -a single layer, single neuron perceptron -operating at a record single-unit throughput of 11.9 GFLOPS or 95.2 Gbps. We successfully perform standard benchmark real-life tasks including the recognition of handwritten digits and the diagnosis of cancer cells. We propose a speci c architecture to realize a deep learning ONN with greatly enhanced throughput speed and processing power, enabled by the high degree of parallelism achieved through simultaneous wavelength, time, and spatial multiplexing.

Declarations
Competing interests: The authors declare no competing interests.  Figure 1 Mathematical model of the perceptron. The perceptron featured 49 input nodes X = [x(1), x(2), …, x(49)], which were connected to the neuron with 49 recon gurable weights W = [w(1), w(2), …, w (49)]. After the matrix multiplication, the input data X was weighted and summed, then added with a bias b and passed through a nonlinear sigmoid function to generate the output y. The output y was compared with the desired output d to generate an error signal err to adjust the weights. The training was performed o ine.

Figure 2
Operation principle of the perceptron or single photonic neuron. PD: photodetector.

Figure 3
Schematic of soliton crystal microcombs and generated optical spectrum. The soliton crystal is generated in a 4-port integrated microring resonator (MRR) with an FSR of 49GHz.  Time-domain multiplexed input layer of handwritten digits 0 and 6. a, Preprocessing ow of the handwritten digits test. Each handwritten digit gure was a 28×28 array of gray-scale pixels. To match the number of input nodes-49 in our case, the gures were resampled to 7×7 pixels. Then the gray scale data was rearranged into a one-dimensional array. Negative neuron connections were achieved by multiplying the data stream with the symbols of pre-trained weights. b, Generated 11.9 Giga-baud data stream for the encoded 80 gures of the handwritten digits, showing 49-symbol encoded data for each gure and 3 symbols padded for post measurement, including a trigger symbol to trigger the oscilloscope, a reference symbol to calibrate the reference level, and a bias symbol encoded with the pre-trained bias to locate the decision boundary.  Designed deep optical neural network based on micro-combs. a, Schematic of the full multilayer ONN. The shaded region indicates the scaled part of the designed deep neural network. The full network is composed by the input layer, L hidden layers (L=1, 2, 3, …) with the kth layer containing M(k) neurons (M(k) is an integer), and an output layer that is constituted by M(L+1) neurons. The raw input data stream contains multiple equal-size 2D data samples, each is rst converted into a 1D vector with a length of N and then sequentially multiplexed into a temporal waveform via electrical digital-to-analog conversion. b, Detailed schematic of layer k, illustrating how multiple neurons within each layer are implemented.