On-chip Reconfigurable Optical Neural Networks

basic successfully demonstrated, example, optical spiking neurosynaptic, photonic convolution accelerator, and nanophotonic/electronic hybrid deep neuron networks. In we propose a layered coherent silicon-on-insulator diffractive optical neural network, of which the inter-layer phase delay can be actively tuned. By forming a close-loop with control electronics, we further demonstrate that our fabricated on-chip neural network can be trained in-situ and consequently reconfigured to perform various tasks, including full adder operation and vowel recognition, while achieving almost the same accuracy as networks trained on conventional computers. Our results show that the proposed optical neural network could potentially pave the way for future optical artificial intelligence hardware. O/E, photoelectric conversion; digital-to-analog conversion; ADC, analog-to-digital conversion.

2 circuit reconstruction 4 , scientific research 5,6 , etc. One of the main bottlenecks faced by the extensive applications of ANNs, especially in complex applications, is the speed of data processing. Traditional central processing units (CPUs) are not the best choice for implementing the ANNs because their computing architecture cannot run massive parallel calculation processes efficiently. Therefore, new electronic and optical architectures for highspeed data processing have received tremendous interest in recent years. Implementing ANNs using dedicated electronic platforms 7,8 has had significant signs of progress. In contrast, the optical domain's research attention just begins to build-up due to its natural capability of parallel data processing. Recent research findings in this direction have been encouraging with the demonstration of photonic convolution accelerator 9,10 , hybrid electronic/nanophotonic processor 11 , optical spiking neurosynaptic 12 , and diffractive deep neural networks 13 .
Training the ANNs is a critical step towards their successful deployment. Traditional electronic hardware settings use gradient descent algorithm by error back-propagation 14 to reduce the ANNs' training time and the required resources. In optics, several methods 15,16 are proposed to train the optical neural networks (ONNs) using the optical backpropagation algorithm. However, these techniques need a range of additional optical components on the networks and a set of optical measurements, making their implementation very complicated in integrated photonic platforms. Thus, the ONNs mentioned above are mostly pre-designed via training on the conventional computer using high-level simulators with optical component's transfer matrix models or scattering matrix models. The physical ONNs are then mapped out using process design kits (PDKs) and verified with system-level optical simulation tools.
Although this kind of practice seems very standard in the microelectronic industry, it faces 3 significant challenges in optical networks because they operate as analog circuits, where errors propagate and accumulate. When the ONNs have very limited reconfigurability, PDKs with a small margin of error can ruin the performance of the as-fabricated ONNs because of very tight fabrication errors. For example, coherent ONNs' phase errors, arising from even nanometerscale waveguide-size deviation, could quickly accumulate and become the most performancedegrading factor 17 (See Supplementary Information Section 1 for details). Thus, to achieve optimized ONNs, it is necessary to gain tuning capability and, most-preferably, in-situ training capability after fabrication.
In this work, we propose an integrated, coherent diffractive optical neural network (DONN) based on a series of cascaded multi-mode interference (MMI) 18  Based on the architecture shown in Fig.1b, an eight-layer (one input layer, six hidden layers, and one output layer) DONN is designed and fabricated on a silicon-on-insulator substrate using electron-beam lithography (See Supplementary Information Section 1 for details). A 5 micrographic photo of our fabricated DONN chip is shown in Fig. 1c. Coupling into and out of this chip is realized via a single cleaved fiber end and a fiber array in a vertical grating coupling scheme, respectively. The input light is split into eight channels by cascaded 1 × 2 MMIs.
Due to the limited available in-house control electronics, the currently proposed DONN is designed to have five maximum input channels and five maximum output ports (see Fig. 1c).
The actual number of input/output ports involved in computation can be less depending on the task. As shown in the schematics of our testing setup in Fig. 1d, the state of each thermomodulator (5 for encoding input data and 30 for introducing phase biasing) is controlled independently via its corresponding digital-analog conversion (DAC) electronics. The optical signal from the output layer is converted to electric signals and then digitized using an analogdigital conversion (ADC) electronics. Using an in-house developed control software run on a PXI electronic instrumentation platform, closed-loop control of these thermo-modulators using feedback from the digitized output optical signal is established.
Neural networks are usually trained by minimizing cost function (CF), representing the discrepancy between target and actual outputs. Here an adaptive moment estimation 21 (Adam) gradient descent algorithm is used to minimize the CF. This method is known to have the ability to overcome the noise of gradients and damp oscillations across ravines. Successful implementation of this algorithm in our closed-loop control scheme, as shown above, requires the calculation of the gradient of the CF as accurately as possible, which itself is a function of the phase biasing of the 30 phase modulators. Here, it is realized by using a high-order finite difference method based on Lagrange interpolation 22 . A complete set of gradient information can be obtained by applying this method iteratively to all the 30 phase modulators. Besides, to 6 suppress system noises in the experiment and further improve the gradient measurement's accuracy, the measurement bandwidth is reduced by repeating the same measurement and averaging the obtained gradient values (More details are provided in Supplementary   Information Section 2).

Experiment
One-bit full adder. An arithmetic logic unit (ALU) is the basic building block of digital computers 23 . As a vital component of the ALU, a one-bit full adder, as shown in Fig.2a propagate and generate signals that equal to the result of a half adder, is output carry signal, and is output sum. For demonstration purpose and simplicity, implementing a full adder can be realized in two consecutive steps as a proof-of-concept DONN. The proposed DONN is first trained to calculate and using input and hichi is labeled as Step A (Fig.2a). Then, the DONN is trained and reconfigured to calculate and using and −1 as the input which is labeled as Step B (Fig.2a). The cost function for both steps is defined as the mean-squared error between normalized output and target. The input bits are encoded in the optical phase domain, where 0 phase represents input digit '0', and phase 7 represents input digit '1'. The top two output ports are selected as the neural network's prediction.
A full adder is successfully implemented using the two-step approach outlined above on the proposed DONN architecture in a closed-loop configuration (Fig. 1d). The complete set of output for all possible input combinations are plotted in Fig. 2b. Here, the output digit '0' (output digit '1') corresponds to a normalized low (high) output power. They can be easily discriminated from each other because there is a significant margin between the two output digits (the gray region in Fig. 2b). Due to the phase errors in the fabricated DONN, the as-fabrication DONN fails to work correctly and give false results (See Supplementary Information Section 3 for details). Therefore, closed-loop in-situ training is used to reconfigure the DONN to a correct state.
The corresponding experimental results after training (red and blue dots in Fig. 2b) agree precisely with the truth tables of a full adder (red and blue crosses in Fig. 2b) (See Supplementary Information Section 3 for details). Notice that step B's input still uses 0 phase or phase, instead of the direct output from Step A, to represent input digit '0' and '1'. Such substitution would be practically viable using an electronic buffer layer between these two steps if each of them is implemented on a separate on-chip DONN.
Vowel recognition. To demonstrate the reconfigurability and the error tolerance, a more complex task is investigated, involving training the proposed DONN as a vowel classifier.
Given that five maximumly inputs can be simultaneously driven in our testing setup, five vowel features are used to classify four different kinds of vowels, randomly selected from the complete set of eleven vowels. Our experiment uses a frequently used data set 25 , which has data from 15 different people. Each people has six frames of speech from eleven vowels. Five input features of each vowel, corresponding to the five available data input channels, are randomly selected.

Conclusion
In this work, a cascaded-MMI-based DONN, of which the data are manipulated in the phase domain, is proposed and demonstrated experimentally. This DONN is capable of in-situ training and thus being reconfigured to perform different tasks. Our findings show that its performance is very promising, e.g., in both full-adder operation and vowel recognition, with training results comparable to its numerical counterpart trained on a computer. Our preliminary demonstration here strongly suggests DONN may have great potentials as a future computing architecture.
With a rough estimation, when the DONN scale increases, the computational advantage of insitu DONN training starts to emerge compared with its electronic counterpart (see Supplementary Information Section 5 for details). With the rapid development of high-speed control electronics, especially application-specific integrated circuit (ASIC), very high computation-speed and high energy-efficient DONN systems will be within reach soon.

Methods
Training procedure. We apply the following steps to train the DONN on chip: 1. For digital data, use 0 phase to denote digit '0' and phase to denote digit '1'; for analog data, normalize the input data and rescale them to the range of [0, 2 ] and ; 2. Encode the input data according to their phase representation by applying corresponding voltages to phase shifters on the input channels; 3. Initialize the trainable variables, i.e., the phase of all the phase modulator in the hidden layer; 4. In each iteration, implement consecutively five forward propagations with the parameters 0 0 ∓ ℎ 0 ∓ 2ℎ , where 0 is the current phase state, ℎ is a small phase variation, and calculate the gradient of the variables using the Lagrange interpolation method; 5. Update the trainable variables with the Adam method; 6. Repeat steps 4 to 5 until the CF converges to a preset value.