An Efficient FIR Filter Based on Hardware Sharing Architecture Using CSD Coefficient Grouping for Wireless Application

FIR filter is an essential part of digital signal processing that is extensively used in many areas such as wireless applications and digital processing system. The FIR filter design is inherently stable and has a linear phase characteristic under symmetric conditions, but its implementation often involves complexity and a large filter length to achieve specific design requirements. In this paper, the complexity of the FIR filter is reduced by eliminating the repeated subexpression in a canonic signed digit (CSD) number system based filter operation. A new grouping method has been proposed for the CSD number system-based filter coefficient to minimize the number of unpaired nonzero bits in the filter coefficient. The statistical analysis of the proposed grouping method is performed and compared with other existing schemes. The number of unpaired nonzero bits in the proposed grouping scheme is reduced by an average of 24.11%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} as compared to other existing schemes. Further, an efficient FIR filter with hardware sharing architecture is designed and implemented to achieve a 14.65%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} reduction in average power consumption, and the average operation speed is increased by 10.1%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} compared to the other existing filter structures.


Introduction
The digital filter is one of the most important processing element in any digital signal processing (DSP) system. FIR and IIR filters are the two basic structures used in DSP. Compared with IIR filter, FIR filter is the most applicable processing element in all modern digital applications such as multimedia application, mobile communication, wireless communication, etc., because of its linear phase characteristic under symmetric condition, precision, adaptability using a programmable processor, and exact reproducibility. The Digital system can be realized using a DSP processor or customized hardware circuit fabricated using very large scale integrated (VLSI) circuits and can process both real-time and offline data. The FIR filter can be designed using various structures, each representing a different implementation with the same functionality. The data broadcast structure is preferred over various types of existing filter structures for a limited number of taps because of constant data arriving time and pipelined data flow, as shown in Fig. 1. Multi-standard communications facilities with the least amount of overlap between multiple input radio channels are required by advanced wireless communications devices such as modern RF receivers and battery-powered portable communications transceivers. A sharp digital filter with a very narrow transition width enables alias-free switching between the desired frequency bands. Thus the low pass filter employed in wireless communication must be realized to operate at high speed and consume low power, which can be achieved by using efficient and low power FIR filter architecture. This kind of filter implementation has gained more attention in the last decade. The number of algorithms has been presented in the literature for coefficient multiplication with low power requirement and fast FIR filter with an optimized number of add and shift operations [1,2]. A low complex FIR filter based on a binary subexpression elimination method has been presented to provide neighboring channel attenuation specification and related narrow transition bands, particularly for wireless applications. The computation in FIR filter is mainly dependent on coefficient multiplication and the use of partial products in the multiplier that limits the silicon area and power requirement of the filter operation [3].
T.Sinha and Bhaumik [4] presented the modified multistage frequency response masking based FIR filter structure to reduce arithmetic operations. Vinod et al. [5,6] introduced a standard subexpression elimination method to reduce hardware requirements by eliminating the redundant set of computations in the fixed coefficient FIR filter. The comparative analysis for hardware reductions was achieved using the horizontal common subexpression (HCS) and the vertical common subexpression (VCS) in digital filters realization. It is found that HCS provides better reductions in the hardware requirements in comparison to VCS based filter operation. Chang et al. [7] proposed an efficient hardware architecture for FIR filter by using CSD multiplier with runtime recovery strategy. A novel structure of the product accumulation section based FIR filter was presented by relocating the existing delays into the structural adders [8]. Jiang et al. [9] proposed a method for FIR filter design for multicarrier modulation systems. A new programmable CSD encoding structure was outlined to make high speed, low power filter operations. An algorithm was described to reduce the number of CS for coefficient multiplication in filter operation [10][11][12]. Mahesh et al. [13] presented a low power and high order FIR filter based on a new CSE method. The signed digit number system, such as CSD-based filter coefficient, was analyzed and found that HCS or VCS is not completely exploited due to the opposite sign of CSD coefficient digits. Another issue with the sign digit number system compared to the binary Fig. 1 Block diagram of L-tap data broadcast FIR filter number system is the subtraction operation in the coefficient multiplication process. The proposed approach uses a binary representation of a coefficient that reduces the hardware resource in filter operation. An optimization-based FIR filter approach was presented with low power consumption and reducing ripples in passband and stopband [14,15]. Yao et al. [16] introduced a novel CSE scheme for a fixed-point FIR filter. In this method, the coefficient multiplier is realized using add and shift operation and the requirement of the adder is directly related to the number of nonzero digits present in the filter coefficient. Hameed et al. [17] demonstrated a sharp programmable analog FIR filtering response with low power consumption. Seshadri et al. [18] suggested the fast moving average FIR filter response using look ahead arithmetic with reduced pipeline delay and less hardware complexity. A new design algorithm using an extended double-base number system has been discussed for low complexity FIR filters [19,20]. Touli et al. [21] presented a combination of multibit flip flop and a data-driven clock gating approach for power efficient filter structure. Patali et al. [22] developed a high throughput filter by using pipelining and retiming structure. This design was used for denoising of ECG signals. Another way of implementing an efficient FIR filter was proposed by Jia et al. [23], in which a novel CSD coefficient grouping scheme was used to remove the redundant set of computation in coefficient multiplication. An efficient filter architecture was proposed based on a new grouping scheme that is advantageous in silicon area, minimizing power consumption, and fast speed for large length of FIR filter. Vishal et al. [24] presented a framework for FIR decimator using hybrib signed digit arithmetic. A novel FIR filter was described using retimed MAC unit with delay optimization by Subathradevi et al. [25].
In this paper, we propose a new grouping method for nonzero digits present in CSD encoder of the filter coefficient. The number of redundant computations in coefficient multiplication is further eliminated, and its statistical analysis is presented. Furthermore, the reconfigurable filter architecture is illustrated based on the hardware sharing structure to minimize the logic resources.
The remaining part of the paper is organized as follows. A new grouping method for CSD number system based filter coefficient is shown in Sect. 2. Section 3 presents the statistical analysis of all possible common subexpression patterns based on the proposed grouping method in a 16 bits quantized filter coefficient. A reconfigurable FIR filter based on hardware sharing architecture is described in Sect. 4. The layout of the novel preprocessing unit is also illustrated. The performance evaluation of the proposed efficient FIR filter with hardware sharing is described in Sect. 5. The conclusion of the paper is provided in Sect. 6.

FIR Filter Using CSD
A linear, time-invariant, and nonrecursive finite impulse response (FIR) filter is explained by the difference equation as given below. (1) where L − 1 is the filter length, y[n] is the output signal, b i is filter coefficient of i th order and x[n − k] is the input sampled signal at particular time. Further b i coefficient can be encoded in a CSD number system as follows, where M i represents the total number of nonzero bits in a coefficient b i and C ij ∈ − S ij ∈ 0,...,W-1 where W is the word length of quantized coefficient. Therefore the equation (1) may be rearranged as-

CSD Arithmetic
It is well known that the number of additions performed in a coefficient multiplication equals one less than the number of nonzero bits available in the b k coefficient of the filter. The constant coefficient can be represented in a number system with a minimum number of nonzero bits to reduce the power demand and silicon area. Since CSD representation contains a minimum number of nonzero bits, it is preferably used to represent the filter coefficient. A CSD number system is a special case, where a radix-2 signed digit is represented with the set { 1 0 − 1} . The salient properties of CSD number representation are as follows-• This is unique number. • The product of any two adjacent digits is zero. • The CSD representation of a number has the least possible number of nonzero bits.
where N is number of bits. As N becomes larger, the probability of nonzero bit tends to be one-third. however for binary N-bit number, the probability of one particular bit being nonzero is 1/2.
The main objective of this work is to demonstrate use of an unique method by considering a group with two nonzero bits separated by two zeros in coefficient multiplication. In previous work [23], two prime pattern [  The flowchart for proposed grouping method of the CSD based 16 bits quantized filter coefficient b i = [10 − 100100 − 100010] is shown in Fig. 3 to extract the common subexpressions as (1) The nonzero bits in CSD filter coefficient are noted; (2) Two nonzero bits separated by a zero bit is considered as I CS pattern; (3) Two zero bits between two nonzero bits is selected as II CS pattern; (4) if more than two zero bits between two nonzero bits, each nonzero bit is considered as independent group as III CS pattern.
The grouping patterns should be computed first, and then applied to each tap for completion of the local computation. As a result, bi coefficient may be expressed as

Statistical Analysis of Proposed Grouping Method for CSD Based Filter Coefficient
A new method for eliminating common subexpression method for CSD number system based filter coefficients is proposed. The main focus of this method is to explore and remove the redundant computation in filter operation. In previous works, the most common subexpression elimination methods are used on either CSD representation or binary number system based filter coefficients. But CSD coefficient is more popular because the number of nonzero bits is about 30% less than the binary coefficient in filter operation. Thus, less hardware resource is required to realize the coefficient multiplier compared to binary representation [14]. The grouping method in the filter coefficient further reduces the hardware requirement in the realization of the coefficient multiplier [25]. This grouping method has enhanced computational efficiency due to the hardware sharing structure. The hardware sharing method is influenced by three main factors viz.
• The total number of nonzero bits present in filter coefficient.
• The number of common subexpression (CS) generated from the nonzero bits.
• The number of unpaired nonzero bits that do not contribute to the CSE.
In paper [12], the number of unpaired nonzero bits of the filter coefficient mainly affects the hardware requirement. Next, the number of CS of the filter coefficient influenced the hardware sharing method. Aforementioned both factors are dependent on the number of nonzero bits present in the filter coefficient. In this work, the sample FIR filter was designed with different passband frequency ( p ) and stopband frequency ( s ) in normalized mode i.e. distributed over the range from (0, 1) . A narrow transition band | s − p | varies from 0.01 to 0.05 has considered and relatively large transition band | s − p | varies from 0.15 to 0.20 for designed low pass FIR filter. The narrowband filters are frequently used in wireless applications to attenuate the stringent adjacent channel.
The statistical analysis of the proposed grouping method for sample filter was performed for low pass FIR filter coefficients of different filter lengths as 20, 40, 80, 120, 200, 300, 400, 500, and 600 taps with 16-bits wordlength. The number of adders/subtractors mainly depends on the unpaired nonzero bits in the filter coefficient. The number of unpaired nonzero bits in the CSD number system-based filter coefficient is analyzed.  Table 1 is used. A binary coefficient shows the minimum number of unpaired nonzero bits, but it contains a large number of nonzero bits in the filter coefficient. Thus a binary coefficient involves a large number of arithmetic computations. Whereas in CSD representation, the proposed grouping method shows less number of unpaired nonzero bits than the other existing grouping pattern by an amount of 24.11% resulting in a reduction in hardware requirement as in Table 2. The number of adders/ subtractors in filter operation is further reduced by the hardware sharing structure.
In the proposed grouping method, the occurrence of common subexpression of the new grouping pattern is also analyzed. The sample filter is designed with filter lengths varying from 20 to 600 taps. In each filter order, there are six sample filters designed with various passband ( p ) and stopband ( s ) frequency in normalized mode as defined earlier. The number of CS of each sample filter is calculated, and the results are shown in Fig. 5. It is found that the number of CS of the proposed grouping pattern increases for a narrow transition band with increasing filter order.

Proposed Reconfigurable FIR Filter with Hardware Sharing Architecture
A reconfigurable FIR filter with hardware sharing architecture based on the analysis performed in Sect. 3 is presented in Fig. 6. In digital filter, the coefficient multiplication can efficiently be performed by using shift and add operations. In order to reduce the computational load due to redundant operation in constant multiplication,   [1]. The filter section is used to further process the partial product generated by the preprocessing unit. All the filter sections are identical in the filter structure. In Fig. 6, x[n] is used as a input for preprocessing unit to produce partial products Y 1 −Y 5 . These partial products are shared with all filter section. Each filter section computes the coefficient multiplication by using shifts and add/subs

Layout of Preprocessing Unit
The preprocessing unit is an integral part of filter architecture. This circuit structure is used to generate the partial products that are required in coefficient multiplication. In the proposed new extended grouping scheme, there are five common subexpressions as where x 1 and x 2 is defined as.
A separate layout of the preprocessing unit is shown in Fig. 7. Where ≫ 2 indicates an arithmetic shift-right through 2 bit operation and this is equivalent to the scaling method 2 −2 and similarly for ≫ 3. In the given layout, x denotes the input signal. This input signal is fed to the preprocessing unit that computes the five output signals corresponding to the five common subexpressions. In this case, two adders, two subtractors and 2 bit/3 bit shifter are required to design the preprocessing unit. These precomputed terms Y 1 , Y 2 , Y 3 , Y 4 and Y 5 are fed to corresponding five buses of filter section. Thus the filter output is computed by using the preprocessing unit and filter section.

Performance Evaluation
The proposed reconfigurable FIR filter with hardware sharing architecture is designed and synthesized based on the FPGA family: Virtex 6, Device XC6vLx-75tl, Package-1Lff484, which is a widely used logic family with large packing capacity and low power requirement. The sample FIR filter of order varying from 50 to 600 is designed with the narrow transition band of different passband and stopband frequencies in normalized mode p and s respectively. Next, the proposed FIR filter design with a new grouping scheme is compared to other existing FIR filters based on the common grouping method.

Comparison of Unpaired Nonzero Bits and Number of Common Subexpression in Filter Coefficient
As mentioned earlier, this paper proposes a new grouping scheme based on the CSD number system of the filter coefficient shown in Fig. 2. The average number of unpaired nonzero bits in CSD based filter coefficient is compared for three grouping methods (i) proposed grouping scheme, (ii) an existing grouping method [23], and (iii) binary number system based filter coefficient as shown in Fig. 4. The average numbers of unpaired nonzero bits that do not participate in Common Subexpression of CSD based filter coefficient are analyzed through the statistical method. The result shows that the average number of unpaired nonzero bits that do not contribute to Common Subexpression is reduced by 24.11% in CSD based filter coefficient compared to other existing grouping schemes. This average number of unpaired nonzero bits in the filter coefficient directly determines the number of computations in filter operation. The occurrence of the proposed grouping method in CSD based filter coefficient is also increased with the higher order of low pass   . 9 Average of power consumption of binary and CSD coefficient sample FIR filter as shown in Fig. 8. The number of all possible grouping patterns is also increased by 31.4% for a higher-order filter compared to other existing grouping patterns in the filter coefficient.

Comparison of Average Power Consumption
The average power consumption of the proposed new grouping scheme based CSD filter coefficient multiplication in filter operation is compared with an existing CSD grouping method [23] and a binary representation of filter coefficient multiplication as shown in Fig. 9. The results given in Table 3 reveal that the average power consumption of sample FIR filter with the order below 200 and narrow transition band as | s − p | varying from 0.01 to 0.05 is approximately equal. Whereas for higher order above 300 taps, it reduces by 3.42% compared to other existing grouping pattern CSD based filter coefficient. Table 4 compares this work with other previously published papers. Further, the average power consumption in the proposed filter design for higher order is reduced by 14.65% compared to filter design based on the binary representation of filter coefficient.

Comparison of Clock Frequency
The maximum clock frequency of the proposed FIR filter is shown in Fig. 10. The results show that the maximum clock frequency is comparatively low for binary coefficient filter design which can be attributed to a large number of computations in filter operation. The maximum working frequency of the proposed layout is increased by 10.1% given in Table 5 as it uses a considerably less number of partial products in filter operation.

Conclusion
A new grouping method considering a group formed by pair of nonzero bits separated by two zeros was implemented to design FIR filter architecture with hardware sharing. When compared to an alternate scheme in a design example, the proposed method shows 24.11% reduction in the number of unpaired nonzero bits based on a statistical analysis of simulation results. The average power consumption of the proposed filter design for higher order is reduced by 14.65% compared to the filter design based on the binary coefficient and 3.42% less in comparison to other existing grouping pattern CSD based filter coefficients. The operation speed is increased by 10.1% in comparison to other existing grouping methods. Overall it can be concluded that the proposed scheme has significant savings on hardware resources and power consumption along with a moderate increase in speed of computational operation. Further studies can take a more narrow transition band and determine the hardware area requirements.