A Parallel Turbo Decoder Based on Recurrent Neural Network

A neural network-based decoder, based on a long short-term memory (LSTM) network, is proposed to solve the problem that large decoding delay and performance degradation under non-Gaussian noise due to poor parallelism of existing turbo decoding algorithms. The proposed decoder refers to a unique component coding concept of turbo codes. First, each component decoder is designed based on an LSTM network. Next, each layer of the component decoder is trained, and the trained weights are loaded into the turbo code decoding neural network as initialization parameters. Then, the turbo code decoding network is trained end-to-end. Finally, a complete turbo decoder is realized. The structural advantage of turbo code component coding is fully considered in the design process, and the problem of decoding delay caused by the existence of interleaver is cleverly avoided. The introduction of deep learning technology provides a new idea to solve the traditional communication problems. Simulation results show that the performance of the proposed decoder is improved by 0.5–1.5 dB compared with the traditional serial decoding algorithm in Gaussian white noise and t-distribution noise. When BER performance is close, the LSTM decoder requires half or even less than that of BCJR. Moreover, the results demonstrate that the proposed decoder is adaptive and can be applied to communication systems with various turbo codes. The LSTM decoder shows lower bit error rate, computational complexity and higher decoding efficiency under the same conditions. Therefore, it is necessary to study the turbo code decoding technology based on deep learning combined with the actual channel environment.


Introduction
The most commonly used decoding algorithms of turbo codes include maximum likelihood sequence detection [1], maximum a posteriori decoding algorithm (MAP) [2,3], and improved MAP algorithm, etc. The MAP algorithm can be implemented using Bahl, Cocke, Jelinek, and Raviv (BCJR) algorithms [4]. After a certain number of iterations, the bit error rate (BER) will no longer decrease, and the redundant iterations will 1 3 greatly increase the decoding delay. At the same time, there may be problems such as large resource consumption in the implementation process.
The above decoding algorithms are derived from strict mathematical formulas [5,6]. Ideal conditions are often assumed in practical decoding, and approximations of decoding results are obtained by simplified models within the range of operational complexity. For example, the noise in the field of wireless communication system is assumed to be Gaussian noise [6]. In fact, information will be subjected to various interferences when it is transmitted through radio [7], and the statistical characteristics of noise do not always obey the Gaussian distribution [8,9]. This paper mainly discusses the t-distribution [10] noise model.
Deep learning (DL) provides new ideas for modulation recognition and demodulation in complex electromagnetic environment [11][12][13] with its powerful feature extraction ability and model generalization ability [14][15][16]. Nachmani et al. [17] applied the Dense Neural Network (DNN) in the iteration of Belief Propagation (BP) algorithm of low-density parity-check (LDPC) codes. The method first assigned weights to the edges of Tanner graph, and then used the deep learning technology to train these "edges". This method reduced the BER without increasing the computational complexity. However, the parity matrix used in this paper is a fixed matrix obtained from reference [18] and has not been evaluated on the others. On the basis of the above studies, reference [19] tries to replace the DNN with Recurrent Neural Network (RNN), which reduced the training complexity to a certain extent. Reference [20] introduces a neural network structure for linear block codes, which has significantly improved the performance compared with standard BP and minimum sum product method, especially for LDPC codes and short code length Bose Chaudhuri Hocquenghem (BCH) codes. The structure bound the parameters of the decoder together in an iterative way to realize parameter sharing. O'Shea et al. [21] used deep learning based methods for channel coding, which designed the physical layer through deep learning. Gruber et al. [22] took the log likelihood ratio (LLR) information of the noisy codeword as the input to realize the neural network decoding of polar code. But the universality test effect is poor when it is applied to structural codewords. Fei Liang et al. [23] designed an iterative belief propagation-convolutional neural network architecture using convolutional neural networks (CNNs) to decode LDPC codes in correlated noise. Reference [24] also mentioned the method of using neural network to train turbo code decoder, but this literature still used serial decoding method. The trained neural network is a fully connected multi-input single-output neural network, which can output only one bit at a time. The network has a large amount of computation and cannot be parallel, which does not meet our demand to improve the parallelism of turbo decoding. Xiang Zhang et al. [25] used a Gated Recurrent Unit (GRU) as RNN decoder for the convolutional code under correlated noise. The RNN decoder achieves remarkable BER performance under different noise parameters. Due to the structure and complexity of the network structure, its decoding performance is gradually reduced as the memory length of the convolutional code increases. In order to reduce the limitation of codeword length on performance, Cammerer et al. [26] tried to use neural network to replace the sub modules of the polar code decoder, train these neural network decoders separately, and then sequence the results of sub-decoders. Finally, the result is similar to that of serial cancellation algorithm. Kim et al. [27] proposed to serialize convolutional codes using RNN network. This method can only learn the Viterbi algorithm from the given sample data. The Viterbi algorithm itself has a strong "algebraic" nature, whether the RNN network can completely replace any traditional decoding algorithm remains to be studied. In reference [28], the registers in the coding structure of turbo code correspond to different channels in the CNN. When the registers are in different states, full parallel decoding can be realized. However, the BER performance of the CNN decoder is still far inferior to that of the traditional serial decoding algorithm with one iteration, even though the training set contains all possible cases of the code word.
In summary, DL technology has made initial achievements in channel decoding. But the above research does not provide an effective solution for the performance degradation of turbo code decoding algorithm under non Gaussian channel noise. We propose a turbo decoder based on the LSTM network, which starts by imitating BCJR algorithm with Neural Network. From the perspective of component decoding, the decoder stacks multiple LSTM decoders of Recursive System Convolutional (RSC) to realize the "iterative" decoding of Turbo codes. Then, the parameters set by non AWGN are further fine tuned end-to-end, and finally the decoder is adaptive. Finally, we verified the superiority of the improved decoder through simulation experiments.
This paper is organized as follows: Sect. 2 introduces the structure of Turbo code encoder from the perspective of mathematical model, and the essence of decoding is discussed. Section 3 introduces the internal structure of the decoder and the whole decoding process which we proposed, and gives the parameter selection and training method of the decoding network combined with simulation experiments. In Sect. 4, the performance of different model parameters, different types of Turbo codes and channel noise are compared. Finally, Sect. 5 presents a summary. Figure 1 shows the structure of the turbo code encoder in the 3rd GenerationPartnership Project (3GPP) protocol. The encoding structures of the two-component turbo codes are identical, and they are cyclic system convolutional codes (2, 1, 3) with a code rate of 1/2.

Turbo Decoding Problem Description
According to the coding structure in Fig. 1, the following relationship can be deduced: where d 1 k+1 is the state of the first register at a time k . On the basis of Eq. (1), we can deduce the following equation: In the same way, The following results can be obtained by combining Eqs.
When the register states are all unknown, when the encoder state transition is not recorded, the more known information there is, the less unknown information is needed to predict the next information bit. The traditional BCJR algorithm's decoding steps are serial, where only the branch transition probability calculation does not rely on the leading data to achieve full parallel operation. The forward and backward recursive factors must wait for the pre-bit and post-bit calculations to complete before performing the parallel operation, which increases the decoding delay.
Theoretically, to simplify the model and reduce its computational complexity, the noise in the field of a wireless communication system is assumed to be Gaussian noise. However, because the information is subject to all types of interference when transmitted via radio, the statistical characteristics of noise do not always follow a Gaussian distribution. Suppose the model is used to describe the actual noise. In that case, the extracted information will be distorted to varying degrees, degrading or even damaging the performance of the corresponding information processing algorithm. Therefore, it is critical to investigate the channel decoding algorithm in non-Gaussian noise.
In this paper, a neural network is introduced into the receiver of a communication system. The LSTM network structure with a feedback loop is used to decode the turbo code, allowing the value of the research variable to be directly learned from the training data rather than deduced through serial recursion. Finally, the proposed method achieves a lower decoding delay and higher decoding efficiency than the traditional serial decoding algorithm.
Component encoder 2 Fig. 1 Encoder structure of standard turbo code

System Model
This section presents the design of a neural network-based decoder based on component decoding. It stacks the LSTM decoder of multiple circulatory convolutional codes and finally realizes the iterative decoding of turbo codes. An LSTM network can store the received signal characteristics at each time and the signal at the previous time when inputting the codeword in a time sequence. Therefore, each component decoder uses an LSTM network as the basic network. Suppose a single LSTM network can memorize the characteristics of signals received from the forward direction. In that case, a bidirectional LSTM structure can take signals received from the forward and reverse directions as input and store the characteristics of the received signals at each time, the previous time, and the next time. Figure 2 is a block diagram of the system model of the LSTM decoder of cyclic system convolutional code.
The LSTM decoder connects a two-layer bidirectional LSTM network with a onelayer fully connected network and directly processes the transmitted information to obtain decoding results. The LSTM network can effectively reduce noise interference to the system, and then the fully connected network calculates the final decoding result from the information processed by the LSTM network. The decoder's performance mainly depends on the denoizing effect of the LSTM network on the received information and the feature extraction of information bits. Figure 3 shows the internal structure of the turbo decoder based on the LSTM network.
In the first component decoding, for all k ∈ [K] , the LSTM decoding network uses the unified prior information b k to estimate the posterior probability Pr(b k |y 1 (1), y 2 (1)) . Then, in the second component decoding, the interleaved sequence (y 1 (1)) is used to estimate Pr(b k | (y 1 (1)), y 2 (2)) . At this time, the soft output of the first decoding should be used as a priori information. Repeat the above process, constantly improve the prediction output of codeword b k until convergence, and finally estimate each bit to solve the BER at an SNR point. The BER at this point is taken as one of the performance criteria of the decoder.

Dataset Generation
In a DL-based communication system, data at the transmitter's end are insufficient. However, at the receiver's end, due to the interference from channel noise, obtainable data become infinite, meeting the large data demand DL. The dataset in this study includes a training set and test set, and they are independent of each other. Before channel coding, a group of codeword sequences Y i is encoded by the RSC polynomial function of each component, followed by bipolar mapping, and finally, through channel transmission to produce a group of noisy codeword sequences X i . (X i , Y i ) is a group of labeled training data, where X i is the input data of the decoding network, and Y i is the label data. The test set is obtained via conventional communication, and binary data are generated randomly and passed through a fixed SNR channel. Multiple SNR groups correspond to multiple test data groups. The LSTM network is trained using a large amount of labeled data; thus, the decoder can achieve the expected results.

Construction of Decoding Network
According to the design concept in 3.1, considering the complexity of the model building process, this section uses some high-level application programming interfaces to facilitate the rapid construction of the decoding network. Figure 4 shows the decoding network. The decoder structure of the two RSC codes is the same, except for the last output layer. The red circle represents the first RSC-the LSTM decoding network in Fig. 3, and the green circle represents the second RSC-the LSTM decoding network. The output of the first RSC code decoder should be fed into the interleaver to randomly separate the burst errors while making the output of the two decoders independent of each other. Therefore, the design of the last layer of the two decoders is not identical. When training is completed, the model.get_layer command checks and returns all weight parameters of the two decoding networks. The set_weights command takes these weights as initialization parameters of the turbo decoding networks and loads them into each layer. The blue circle represents the interleaver, and the purple circle represents the de interleaver. The iterative process is implemented through a custom neural network Lambda layer implementation. The decoding network in Fig. 4 mainly includes three types of network structures: LSTM layer, full connection dense layer, and user-defined Lambda layer. (1) LSTM layer, each LSTM layer adopts a bidirectional LSTM structure, enabling it to calculate the codeword sequence forward and backward at the same time. Under the premise of doubling the computational complexity, it can extract the related features of different information bits more accurately. When the LSTM layer inputs codeword in time sequence, it processes a three-dimensional tensor with three dimensions. The first dimension represents the number of samples to be processed, and its size is unknown when establishing the network, so it is none. The second dimension represents the number of time sequences, known as the number of frames to be processed by the loop layer, and its size is generally determined according to the time axis. The RSC component decoder decodes a fixed-length code block separately. In this section, the length of the code block is set to 100. Therefore, the second dimension size should be equal to the length of the code block and set to 100. The third dimension represents the characteristic number of each frame. The main research object of this paper is (7,5,7) turbo code, so the feature number of each frame is 5. In addition, a batch normalization layer is configured for each LSTM layer to accelerate network convergence and achieve a better decoding effect. (2) Full connection dense layer calculates the extracted sequence-related information, selects sigmoid as the activation function, normalizes the output value of the network to between [0, 1], and estimates the information bits directly according to the output value, without the need for posterior probability. (3) The user-defined Lambda layer of the neural network has no trainable parameters, only realizes the function of adding noise or calculation, and transmits the results to the corresponding nodes in the next network layer. This helps simulate the iterative process.

Training of Decoding Network
The training method and training times of the decoding networks will affect the convergence speed and the optimal solution of network parameters, thereby affecting the performance of the decoding networks. Therefore, appropriate training methods should be used to determine the optimal solution of network parameters. In this paper, stochastic gradient descent is used to train the networks. Each batch size contains 200 input units, and the networks update parameters after each batch size is processed. The network parameters are adjusted by the Adam optimization method. The initial learning rate is set to 0.001, and the learning rate is reduced to 1/10. For 10 to 15 epochs, the learning rate is reduced to 0.0001 every 5 epochs. When the epoch is more than 25, the learning rate decreases to 0.0000001. In addition, the training is terminated early if the verification loss does not decrease within 10 epochs to prevent overfitting. The binary cross-entropy is the loss function, and the training process goal is to reduce the loss as much as possible. The activation function of the LSTM layer is tanh with a faster convergence speed. Sigmoid is selected as the activation function of the dense layer, so the output value of the networks is between [0-1].
The RSC component decoding network decodes a code block separately. In this study, the length of the code block is set as 100 or 1000. Then, the BER performance analysis of the LSTM decoder is also based on the corresponding length of the code block.

Performance Simulation and Result Analysis
Based on the structure and parameter settings of the decoding networks in the above section, the LSTM decoder is simulated and implemented for different turbo codes in the Gaussian white noise and distributed noise environment, respectively. To compare the decoder's BER performance, the decoding results of the traditional BCJR algorithm under the same conditions are provided. The LSTM decoder's performance, in terms of BER performance and computational complexity, will be analyzed in this study. The entire simulation process is divided into five parts: data preprocessing, network training, network testing, hard decision, and BER calculation. Figure 5 shows the turbo code decoding process based on LSTM.
The hardware configuration of the simulation experiment: I5-9400F processor, 8 GB memory and GTX1660 graphics card. The software platform is Win10 operating system and Anaconda3(Python3.6). The network model is built based on the open source neural network framework TensorFlow1.5 [29] and Keras2.2 [30].

Performance of LSTM Decoder in Gaussian White Noise Channel
In the simulation experiment, taking the turbo code in this section (7, 5, 7) as an example, data of 10 4 sizes are selected to generate training data that meet the requirements. The training SNR is set to − 1 dB, and the length of the code block is set to 100. Once obtained, the best network parameters are loaded into the turbo code decoding network as initial weights. Then, 100 turbo codes with a block length of 100 are used for end-to-end training, and the number of iterative training epochs is 30. The change of training accuracy and loss value with the number of iterations can be observed in real-time through the hist command. Other parameter settings are consistent with the above. Then, the test set is sent into the LSTM decoder, and the BER of each SNR point is calculated. The range of SNR is − 1.5 to 2 dB, and each SNR point corresponds to data with a 10 5 size. Also, 100 turbo codes with a block length of 1000 are selected for simulation to analyze the applicability of the LSTM decoder.
When the block length is 100, the BER curve of (7, 5, 7) turbo code using the LSTM decoder is as shown in Fig. 6. The neural network-based decoder proposed in Reference [28] is called a CNN decoder. In addition, the decoding results of the traditional BCJR algorithm under the same conditions help compare the neural network-based decoder's performance. Figure 7 shows the BER curves of the two decoding methods when the code block length is 1000. Figure 6 shows the curve of the BER of the LSTM decoder and BCJR algorithm for 1, 6, 10, and 15 iterations and the BER of the CNN decoder with the SNR in the case of the Gaussian white noise channel, where the code block length is 100, the horizontal SNR (dB) represents the SNR. The vertical coordinate BER represents the value   Fig. 6, the BER of the LSTM decoder in the SNR range is lower than those of the other two decoding methods. When the BER is the same, the LSTM decoder has a 0.5 dB performance improvement over the BCJR decoding algorithm after 15 iterations at BER = 10 −4 . It achieves 1 dB performance improvement over the BCJR decoding algorithm after 6 iterations. When the SNR is 0.5 dB, the BER of the LSTM decoder is 3 orders of magnitude lower than that of the CNN decoder. Figure 7 shows the BER versus SNR curves of the LSTM decoder and BCJR algorithm for 1, 6, 10, and 15 iterations, respectively, in the same channel environment and with a block length of 1000. Figure 7 shows that the LSTM decoder achieves better BER performance when the code block length is 1000. When the SNR is − 0.5 dB, the BER is 2 orders of magnitude lower than that when the code block length is 100 at the same SNR, which is sufficient to prove the excellent performance of turbo code in a low SNR environment. Furthermore, the LSTM decoder outperforms the BCJR decoding algorithm with 15 iterations in the entire SNR range. The SNR performance of the LSTM decoder is 0.4 dB higher than that of the BCJR decoding algorithm with 6 iterations at BER = 10 −4 . Decoding performance depends on the result of the BER and decoding costs, such as time complexity and hardware equipment costs. This section compares the different methods' decoding efficiency in terms of computational complexity. The calculation complexity index is set to the graphics processing unit (GPU) time required by the above two decoding methods in the Gaussian white noise channel. Table 1 shows the various decoding methods' calculation complexity. Table 1 shows the decoding time required for the LSTM decoder and BCJR algorithm for 1, 6, 10, 15, 20, and 30 iterations, with block lengths of 100 and 1000, respectively. Table 1 shows that the longer the code block length, the longer the decoding time required by the two decoding methods. This is because the amount of computation required for decoding increases as the code block length increases. Comparing the decoding time of the same block length, the LSTM decoder requires more GPU time than the BCJR algorithm, with less than 10 iterations, and less GPU time than the BCJR algorithm, with 10, 15, 20, and more iterations. Combined with the above analysis, the BER performance of the LSTM decoder is always better than the BCJR algorithm in the same conditions. Although the former's decoding time is 10 times that of the latter, its BER is significantly better. This proves that the neural network-based decoding method has lower computational complexity and higher decoding efficiency than the traditional decoding method.
In conclusion, the decoding performance of the decoder based on the LSTM network is better than that of the traditional decoding method in the Gaussian white noise channel.

Bit Error Rate Performance of LSTM Decoder Under t-distributed Noise Channel
This study mainly investigates a type of non-Gaussian noise that obeys the student distribution. t-distribution is a group of curves, and its shape change is related to the degree of freedom v . The smaller the degree of freedom, the lower the distribution curve; the larger the degree of freedom, the closer the distribution curve is to the standard normal distribution curve. Therefore, this section combines this property to determine whether the LSTM decoder's BER performance is affected by the degree of freedom to verify its robustness.
In this section, the turbo codes (7,5,7) are simulated for 3 and 5 degrees of freedom, respectively. The structure of the LSTM decoder is identical to that shown in Fig. 4. When the degree of freedom v = 3 , 10 4 groups of noisy codewords are collected as training data in the simulation environment with an SNR of 8 dB. The output data are generated in the same way, as described in Sect. 3.1. The data transmitted by the t-distribution noise channel are selected as the test set, and 10 5 sets of test data are collected at each SNR point. Similarly, we analyzed the BER performance of the LSTM decoder and BCJR algorithm when the block length is 100 and 1000, respectively. Other parameters are set per the previous section. When the block length is 100, the BER performance curve of the LSTM decoder and BCJR algorithm is shown in Fig. 8. When the code block length is 1000, the BER curve of the two decoding methods is shown in Fig. 9. Figure 8 shows the BER versus SNR curve for the LSTM decoder and BCJR algorithm in 1, 6, 10, and 15 iterations, respectively, in the t-distribution noise channel and a code block length of 100. The analysis shows that the BER performance of the LSTM decoder is better than that of the BCJR algorithm with 15 iterations, and the performance of the traditional decoding algorithm is degraded in this non-Gaussian noise environment. Compared with the decoder's BER (Fig. 6), when the BER drops to 10 −4 order of magnitude, the SNR is 1 dB. In Fig. 8, the required SNR is 1.5 dB, indicating that the LSTM decoder performance is also degraded in the non-Gaussian noise environment; however, it is still better than the traditional decoding algorithm in the same conditions. Figure 9 compares the BER results of the two decoding methods when the code block length is 1000. The analysis shows that the performance of the LSTM decoder at BER = 10 −4 is 1.5 dB higher than that of the BCJR decoding algorithm with 15 iterations and is much better than that of the BCJR decoding algorithm with 1 iteration.
When v = 5 , the generation of training and test data in the simulation experiment is consistent with the above, and the training SNR is 8 dB. The BER performance comparison between the LSTM decoder and BCJR algorithm is shown in Figs. 10 and 11. Figures 10 and 11 show the BER versus SNR curves for the above two decoding methods when the code block length is 100 and 1000, respectively. Figure 10 shows that the BER performance of the LSTM decoder is better than that of the BCJR algorithm with 15 iterations. In addition, compared with the results in Fig. 8, the performance of the LSTM decoder is degraded, which indicates that the change of degree of freedom affects decoding performance. An increase in the degree of freedom degrades the performance of the two decoding methods. In Fig. 11, the LSTM decoder's performance at BER = 10 −4 is 1 dB higher than the BCJR algorithm with 15 iterations. The performance is much better than that of the BCJR algorithm with 1 iteration.
In conclusion, the BER performance of the LSTM decoder and traditional BCJR algorithm has different degrees of degradation in the t-distribution noise channel.  Furthermore, the performance of the LSTM decoder is still better than that of the BCJR algorithm. When the BER is the same, the performance of the LSTM decoder is better than that of the traditional BCJR decoding algorithm, and it is better than that of the traditional algorithm in the Gaussian white noise channel. In addition, the performance of the LSTM decoder is affected by the degree of freedom. The smaller the degree of freedom, the greater the improvement of BER performance. The larger the degree of freedom, the smaller the improvement of BER performance. The above results show that the LSTM decoder is robust to some extent.

Bit Error Rate Performance of LSTM Decoder Under Different Turbo Codes
This section studies the BER performance of turbo codes with different length constraints. It simulates the transmission process of (7, 5, 7), (7, 5, 6), (7, 5, 5), (7,5,4), and (7, 5, 3) turbo codes in the Gaussian white noise channel to explore the applicability and robustness of the proposed decoder. These codes have the same bit rates and different memory depths. The network model and parameter settings are consistent with those in Sect. 3.4. The BER curves of the above five turbo codes are obtained through simulation when they are transmitted in the Gaussian white noise channel, as shown in Fig. 12. Figure 12 shows the BER curves of the LSTM decoder with varying SNRs when turbo codes with different constraint lengths are transmitted in the Gaussian white noise channel. Figure 12 shows that the LSTM decoder can achieve good decoding performance on the turbo codes with a constraint length of 3. The BER of the decoder shows no obvious downward trend when the constraint length is reduced from 7 to 3. However, the traditional decoding algorithm could not effectively decode the received signals with varying SNRs when the constraint length is 3. The shorter the constraint length, the worse the decoding performance of the traditional decoding algorithm. Therefore, the LSTM decoder has certain applicability for a variety of turbo codes.

Conclusion
In this study, a turbo decoder based on an LSTM network is proposed to solve the problem of high decoding delay and performance degradation under non Gaussian noise caused by poor parallelism of existing decoding algorithms. From the viewpoint of component decoding, this decoder stacks multiple LSTM decoders of cyclic system convolutional codes and eventually realizes the iterative decoding of turbo codes. The simulation results show that this neural network-based decoding method has lower BER and computational complexity than existing decoding methods in the same conditions. Furthermore, the LSTM decoder decodes the received signal directly rather than processing the related noise alone, which solves the problem of performance degradation of the traditional decoding algorithm in the non-Gaussian noise environment. The online network only needs to update the offline network. After training the network parameters, the signals in different noise environments can be switched, which greatly improves the decoding efficiency. Due to the structure and complexity of RNN decoder, decoding performance decreases gradually for longer storage depth of convolutional codes. In this paper we use a pre-trained network model to directly replace the BCJR algorithm to obtain similar performance, without getting rid of the dependence on the BCJR algorithm in essence. The signal polluted by noise is simply mapped to imitate the modulation and demodulation process in the simulation experiment. Later, we can combine different modulation methods to explore a neural network model with better anti noise performance. Furthermore, future investigations will focus on the design of more applicable models for all kinds of codes.