High-Performance and Energy-Ecient Fault Tolerance FPGA-to-FPGA Communication

. These days, due to the increasing demand for high speed and parallel computation, several real world applications and systems include multiple FPGAs in them. Due to this, FPGAs often need to communicate among them. So, communication between the FPGAs is one of the key factors that determines the accuracy, performance and correctness of the entire multiple FPGAs systems or applications. This paper presents the design of an efficient multi-bit fault tolerant communication system for FPGA-to-FPGA communication. The proposed design is synthesized and also simulated through Vivado design suit 2018.3 and was communicated with two Kintex-7 FPGA boards. When compared with the existing FPGA-to-FPGA communication and inter FPGA communication designs, the proposed design have higher performance, error detection and correction capability.


Introduction
The universal serial bus (USB) interface is used as a standard type of interface in many different kinds of devices for the transmission of data. The data transfer rate, low power and ease to use are the key features that made USB as an industry standard data transmission interface. Hot plugging can be achieved automatically in USB interface. Without shutting down the system, USB can connect the computer with electronic devices. A RS232 dependent system can be easily portable to USB interface by using embedded RS232 to USB converter. To bridge the difference in data rates among the 2 interfaces first-in first-out (FIFO) logic is used. The study of USB protocol with an FPGA development board is important. The USB standard interface broadly has 2 units, Parallel interface engine (PIE) and USB transceiver macro cell interface (UTMI). PIE is accountable for packet extraction or construction and also responsible for the communication with the peripherals. UTMI is connected to USB cables and are used for transmission of serial data and for synchronization of time frames. The serial interface engine (SIE) has two sub-blocks, Endpoint Logic sub-block and SIE control sub-block. The endpoint logic block has, FIFO, FIFO logic and endpoint number recognition. The SIE block has sequencing logic for USB packets and transactions managing, logic of address recognition and USB product identification logic (PID). SIE blocks lies in between the processing unit of computer and UTMI. It receives the data from the computer processor and transmits to the UTMI. The UTMI unit does clock synchronization, clock recovery, bit stuffing, data serialization and deserialization. UTMI uses differential signals to transmit data from one USB to other USB compactable device.
To meet the needs of the exponentially growing demand for high speed computations, we need to use multiple FPGAs. Applications that employ multiple FPGAs are the target for several fields that require high speed parallel computations. This gives the desired performance with high speed data processing.
Multiple FPGAs are been widely used in the areas like multi-processor system on chips (MPSoC), hardware emulation, hardware acceleration and so on. In all the above mentioned applications, to get the desired functionality of the systems, multiple FPGAs must communicate with high speed, with low power, without any errors during the data transmission. Error less communication play a crucial role in determining the performance and correctness of the system. So, this paper presents a fault tolerant communication system between the FPGAs. The proposed system uses Bose, Ray-Chaudhuri, Hocquenghem (BCH) codes, Universal Serial Bus (USB) transceiver with serial interface engine (SIE) and an asynchronous FIFO. BCH codes are the class of cyclic codes that are powerful and have the capability of multiple error detection and correction. The main aim of the study is to guarantee errorless and fault tolerant communication between multiple FPGA development boards.

Basics
BCH codes are the set of powerful and effective classes of cyclic codes which are strong base for several advanced multiple error correction and detection codes with an innate capability of multiple error detection and correction. BCH codes are constructed using polynomials over a finite field called Galois Field. In BCH codes, there is a flexible control against the correctable number of symbols. As implementation of BCH codes are not critical, so they are widely used in information coding. These codes are the basic foundation for several error detection as well as correction algorithms in present day. BCH provide wide range of controllability in terms of number of errors to be corrected and the length of the code word that is to be transmitted by the encoder.
In BCH codes, Length of Block (c) = (2*m) -1 Number of Bits (message) (k) ≥ c-(m*p) We obtain the length of the block, number of message bits, and minimum distance by using the above equations. From the above equations, 'p' number of errors or less than 'p' number of errors can be corrected. Generator polynomial in BCH codes is defined from the roots of Galois files GF(2m). Arithmetic operations such as addition, subtraction, multiplication and division can be performed by Galois field by satisfying some basic rules. A field that comprises the components of a finite number is called Galois field. GF(xy) is the representation of the Galois field, where 'y' refers to a positive integer and 'x' refers to a prime number. Roots of conjugate Minimal Polynomial 0 y 1 1 y 1 +1 β 1 , β 2 , β 4 , β 8 y 4 + y 1 +1 β 3 , β 6 , β 9 , β 12 y 4 + y 3 + y 2 +y 1 +1 β 5 , β 10 y 2 + y 1 +1 β 7 , β 11 , β 13 , β 14 y 4 + y 3 +1

Power representation Polynomial representation Power representation Polynomial representation
The USB interface is also known as data interface, which enables communication between computers and peripherals. It can also provide power supply for some peripherals like flash memory sticks, disk drives and so on. It is easy to use, low cost and available in different sizes and provides a powerful connection system. USB standard has several versions and in each version data transfer speed varies. The speeds can vary from Megabits per second (Mbps) to Gigabits per second (Gbps).  Fig. 3, the block diagram of SIE block is presented. The SIE block handles all the receiving and sending of data in the form of transactions. It typically detects all the incoming packets, sends data, handshake and token packets, detects and generates reset, start-of-packet, end-of packet, resume signaling information, decodes and encodes the data on the bus in the required form, generates as the packet identifiers (PIDs), converts serial data of USB to parallel data on registers or memory and vice versa. UTMI block takes care about low level USB signaling and protocol. The main aim of UTMI block is to make the data compactible with the USB protocol and also to shift clock domain of the input or data signal from USB rate so that it is compatible with central processing unit (CPU)

Proposed Design
For efficient communication between multiple FPGAs, the architecture is shown in Fig. 4. FIFO queue and SIE block with USB transceiver is used in the architecture. By USB physical layer FPGAs communicate with each other using general purpose input or output (GPIO) ports present on FPGAs. In the proposed design, we introduce error detection and correction blocks of BCH codes before the SIE blocks which ensures fault tolerant communication among multiple FPGAs. Fig. 5 represents the system level architecture of the transmitting unit of the proposed high performance fault tolerant FPGAto-FPGA communication system. In the transmitter (first FPGA) board, the data that is to be transmitted is stored in memory (flash) and then the data is sent to FIFO (asynchronous) queue. Before sending the data to SIE block, the data passes through BCH encoder block, which encodes the data that is to betransmitted to other FPGA through USB interface. The BCH encoder block inputs the binary data and encodes the signal with Galois field. The encoded signal is send to SIE block to transmit it using USB interface. Here p=3 M(y) = LCM [f1(y), f2(y), f3(y), f4(y), f5(y)] From Table I, M(y) = LCM[(y 4 +y+1) (y 4 +y+1) (y 4 +y 3 +y 2 +y+1) (y 4 +y+1) (y 2 +y+1)] = (y 4 +y+1) (y 4 +y 3 +y 2 +y+1) (y 2 +y+1) = (y 10 +y 9 +y 8 +y 5 +y 4 +y 2 +y+1) M(y) = (y 10 +y 9 +y 8 +y 5 +y 4 +y 2 +y+1) Using encoder algorithm from Fig.1, Step 1: F(y) = (y 2 +y+1) Step 2: B(y) = F(y)*M(y) B(y) = (y 2 +y+1) (y 10 +y 9 +y 8 +y 5 +y 4 +y 2 +y+1) B(y) = (y 10 +y 9 +y 8 +y 5 +y 4 +y 2 +y+1) The digital signal B [000010100110111] is the output of the BCH encoder block. This output is sent to SIE block which converts B signal into USB supported format and sends to the other FPGA (receiver) using USB physical cable. Let us suppose that there are three errors introduced into transmitted signal B while transmission. So, B signal is changes to [010010100010011]. We need to detect the error bits in the received signal and need to correct the effectedbits due to transmission errors. Otherwise error data will be received by other FPGA which leads to wrong data processing which is not desired. To overcome this problem, we introduce BCH decoder block in the receiver FPGA after the SIE block which is represented in Fig.6. As we receive the transmitted data using USB cable, UTMI interface collects the data and sends to SIE block.  J(y) = (y 13 +y 10 +y 8 + y 4 +y+1) Step 2: Here p=3 Total no of syndromes (C) = 2*3 = 6 R=2 Step 3: We get the syndrome values N1, N2, N3, N4, N5 and N6 by using Table II.

Simulation Results and Analysis
The proposed Fault Tolerance FPGA-to-FPGA Communication architecture is synthesized and also simulated through Vivado design suit 2018.3 and was communicated with two Kintex-7 FPGA boards.
This paper mainly concentrates on performance and energy. Performance is calculated in terms of throughput, frequency, bit length and area. Throughput is one of the most significant design metrics in an FPGA, which is the measure of in information transmission finished per unit of time. The throughput of the processor is influenced by the activity recurrence and transmission capacity of the correspondence channels. On the other hand, speed can be described as dormancy, which can be detailed as the entirety of sender overhead, transport idleness and beneficiary overhead. Area is a significant standard for architecture. It tends to be characterized as number of semiconductors, CLB's, memory and wire length. Particularly, for FPGA platform, number of CLB's are generally utilized as a area estimation.
Interconnect usage is the sum or level of time that the wire is conveying data. Power is the estimation of energy utilization in interconnect wires.
In view of the distributed outcomes, we have summed up the performance estimations for various FPGA to FPGA communication methods in Table III. The proposed method gives an enormous scope of throughput and its most extreme throughput essentially beats different FPGA to FPGA communication methods shown in Fig. 6. To compare our proposed architecture with the state-of-the-art methods, we have collected energy results from four various FPGA to FPGA implementations tabulated in Table IV and shown in Fig. 7. Energy efficiency results show that our proposed FPGA to FPGA communication method is on average 1.12 times, 1.3 times and 1.5 times higher than the state-of-the-art FPGA to FPGA communication methods, respectively.

Conclusion
FPGA to FPGA communication architectures play a crucial role in determining the performance and energy consumption of platform-FPGAs containing embedded coarse-grain modules. In the Paper, an efficient Fault Tolerance FPGA-to-FPGA Communication architecture studied. The design of efficient fault tolerance communication architecture is a challenging multi objective optimization problem. We have efficiently implemented the FPGA to FPGA data transferring. Energy efficiency results show that our proposed method is on average .12 times, 1.3 times and 1.5 times higher than the state-of-the-art FPGA to FPGA communication methods, respectively. In future work, we will look into ways to further improve the search process, and explore how data transfer through wireless among FPGA boards.

Declaration:
Conflict of interest: The authors declare that they have no conflict of interest