Tuning weight values by resolving imbalances at nodes in an augmented arti�cial neural network

Tuning the weight values in an artificial neural network for a computational function is essential for artificial intelligence. This letter proposes tuning based on a mathematical model, which conceptually extends the artificial neural network by allowing an imbalance between the sum of the incoming values to a node and the outgoing values from the node in the network. In a different way from the gradient-descent-based tuning that uses multiple updates to minimise the difference between the required output values and the computed output values of an artificial neural network, the proposed tuning resolves the imbalance at each node in the network. The proposed tuning exhibits similar performance to the existing stochastic-gradient-descent-based tuning. In contrast, the proposed tuning does not need to explore the optimal parameters to obtain the optimal weight values. These benefits of the proposed tuning method could accelerate the advancement of artificial intelligence.


Introduction
Artificial intelligence (AI) utilises a mathematical model called an artificial neural network (ANN) 1-3 based on the study of biological neural network behaviours [4][5][6] . Like the neuron's dynamics in a biological neural network, the weight values (WVs) of an artificial neuron (AN) in an ANN determine the ANN's output values against its input values 4-6 . To design appropriate computational functions 7-11 with an ANN, tuning the WVs is essential.
In previous works, tuning ANNs was based on repeated updates of the WVs to minimise the difference between the target output values and the ANN's computed output values [12][13][14][15] , whose difference is defined as the error in the tuning. This tuning requires careful consideration of the tuning rate of the WVs against the error 16 and the momentum of the WV's change between updates 13 to prevent the vanishing gradient problem 17,18 and local optima 3 . This complexity comes from schemes where the tuning depends on the feedback from the error 3,17,18 . Therefore, tuning being free from feedback from the outputs is worth investigating.
This letter reports an investigation of tuning under an augmented artificial neural network (AANN), which conceptually extends the ANN by allowing an imbalance between the sum of the incoming values and the outgoing value at each node in the ANN. Because  and ỹ can be implemented by using a bipolar junction transistor (BJT) 21 and an operational amplifier (opamp) 19 , as shown in the inset of Fig. 1b. At a constant current, IB, to the base of the BJT (not shown), the mapping between them has an S-shaped curve, as shown in Fig. 1d. Here, the opamp has a so-called voltage follower configuration, which results in y = y′ . Therefore, ũ is divided into ̃ and ̃−̃ if ̃ ≫ IB, depending on the resistivity of the resistance and that of the BJT. The nonlinearity in the I-V characteristics of the BJT (not shown) causes the Sshaped curve. Note that it is known that the S-shaped activation function is related to the vanishing gradient problem 17,18 in gradient-descent-based tunings. However, AANN-based tuning gives the optimal WVs even if the S-shaped activation function is adopted since the AANN does not depend on the gradient of the network's output against the WVs.

Tuning by using an augmented artificial neural network
Second, the details of the tuning weight values for the AAN circuit are explained. Figure 2a shows the detailed circuit diagram of the AAN. The diagram has an equivalent configuration to that of Fig. 1b. Here, VCCSs are prepared for each incoming current ̃i. A VCCS is controlled by an analogue signal processing unit (ASPU). The incoming currents and the node voltages drive its ASPU. In the ASPU, ̃i is converted to a voltage. However, for simplicity, this voltage is also denoted as ̃i. The conversion is performed by sensing the voltage drop in the electronic resistivity, r, with a differential amplifier 19 . The amplifier also works as a buffer amplifier 19 , in which the input (output) impedance is high (low). Because of the high impedance, ̃i can be sensed by the amplifier. Simultaneously, the time differential of Vn, dVn/dt, is sensed by the opamp with the capacitor and resistivity. As shown in Fig. 1c, the time differential represents the imbalance of current in the node. To eliminate dVn/dt at rising and falling periods of ̃i and/or ̃, a field-effect transistor 21 that is switched on and off by a square periodic signal wave is prepared. The second opamp with the transistor and resistivity acts as the multiplier and yields −̃idVn/dt. The third opamp with the resistivity and capacitor gives the integral of the value, i.e., The last opamp yields w̃i , which produces a current, (w N − 1)̃i, by the VCCS. Hence, the above w corresponds to wi − 1. Here, the coefficients for each analogue signal processing step are omitted for simplicity. It is not difficult to merge the VCCSs in the circuit, which makes the circuit simpler. The VCCS can also be realised by circuit technologies 21 .
The opamps and the differential amplifier have a sufficiently high open-loop gain and gainbandwidth product 19 , which are compared to the time scale described later. ̃i is copied and inputted to each AAN in the first layer. The copying can be done by current mirror circuits 21 . In each following layer, the current produced by the AAN in the previous layer is copied and supplied to each following AAN except for the AAN in the same line, represented by a magenta line. In each magenta line, the current from an AAN to the following AAN is driven by the voltage difference between their nodes. In contrast, a change in the voltage on the node of an AAN in the network affects not only the current from the AAN in the previous layer but also the current to the following AAN. In this manner, tuning the WVs of an AAN in the network affects the tuning of the other AANs. This mutual effect caused by the voltage difference between the nodes enables the AANN to obtain the optimal WVs by locally and independently performed tuning. For the tuning demonstration, AANNs with one to five layers were prepared.
Tuning the WVs in these AANNs is demonstrated by the circuit simulator SPICE 22 , which is widely used in electronics. In the simulation, ideal opamps and VCCSs are used to reduce the simulation time. As a demonstration, five black-and-white images of 3×3 pixels, which correspond to "H", "I", "T", "A", and "C", were prepared (Fig. 2c). Image recognition is a popular application in AI 23,24  Similarly, the target output values for image-j are yj = 3 and y k≠j = 0. The optimal WVs for the

Image recognition under the tuned WVs
Third, image recognition by the ANN in which the WVs are tuned by the AANN is described. also converges before the 100th round. By sensing these Vw at the 100th round, the WVs for the corresponding two-layered ANN can be obtained. The output of the ANN with the given WVs obtained by the AANN is calculated for the 2 9 =512 black-and-white images, including the images in Fig. 2c. Figure 3e shows the black-and-white images recognised as "H", "I", "T", "A"

Comparison with stochastic-gradient-descent-based tuning
Finally, AANN-based tuning is compared with stochastic-gradient-descent-based tuning. The , where yj,i is the i-th output for the j-th image and ̂j ,i is the corresponding target value. Here, 50 images, including five true images and 45 images in which one pixel is flipped compared with the true images, are tested (see the Supplementary material).
The dark blue, orange, olive, dark red, and purple lines correspond to the one-, two-, three-, four-, and five-layered ANNs, respectively. In Fig. 4a, these lines, except for the one-layered ANN, decrease monotonically and converge before the 100th round. On the other hand, Fig. 4b shows zig-zag lines since the minimum RMSE depends on β1, β2, ε, and TR in each round. Figure 4c shows the minimum RMSE among the rounds in Fig. 4a and b against the number of ANN layers. The AANN-based tuning and ADAM tuning show similar RMSEs. This means that AANN-based tuning can be competitive with ADAM tuning in terms of RMSE, while AANNbased tuning does not need to explore the parameters.