High-speed binary coded decimal digit multipliers with multiple error detection

Decimal arithmetic in the form of binary coded decimal (BCD) numbers is preferred in many financial and commercial applications. BCD multipliers are introduced as a key hardware unit to support both integer and floating-point decimal arithmetic operations. However, due to the increasing sensitivity of VLSI-based digital designs to the environmental effects, BCD multipliers are also prone to faults and errors similar to other arithmetic circuits. In addition, multiple error occurrence is possible in current digital systems which motivates to reach multiple error detection/correction in addition to single errors. In this paper, digit-by-digit BCD multipliers are introduced capable of multiple error detection with low delay overheads. To show the effectiveness of the proposed combined method, a 4-digit BCD multiplier is presented. Experimental results based on analysis and error injection-based simulations show that in addition to 100% single error detection, multiple errors can be detected with at least 99.6% probability in the 4-digit BCD multiplier as the implemented architecture.


Introduction
Nowadays, the binary number system is widely used in computer-based systems.However, there exist applications such as financial, commercial and some scientific computations in which decimal representation is preferred [1,2].In these applications, binary arithmetic is not suitable because the approximation or truncation in decimal to binary conversion may lead to errors in decimal numbers [3].To perform decimal arithmetic in the processors, it is required to code decimal numbers using binary numbers.The common encoding for decimal representation is the binary coded decimal (BCD) and its variants.In addition to integers, decimal floating-point (DFP) numbers are required based on the current applications.Thus, DFP formats and operations are specified in the revised IEEE standard for floatingpoint arithmetic (IEEE 754-2008) and recently in IEEE 754-2019 [4].BCD numbers and arithmetic are required for some parts of DFP operations to produce DFP numbers as well as integer numbers.To speedup decimal operations, some hardware products include decimal floating-point units to perform DFP operations.For example, IBM processors such as z9, z10 and POWER6 support DFP arithmetic [5].Therefore, enhancing BCD arithmetic is an important issue.Among arithmetic operations multiplication is very important due to highly usage in many applications.As a result, decimal multipliers are assumed as a key hardware unit to support DFP operations.
In addition to general design parameters, fault-tolerance and error detection/correction are very important due to the vulnerability of digital circuits to different environmental effects based on current technologies.This weakness may lead to occurrence of multiple faults/errors as well as single fault/error [6].Arithmetic operators including multipliers are also susceptible to multiple errors in the form of both soft (or transient) and hard (or permanent) errors.However, the occurrence probability of soft errors is much higher than that of hard errors.Therefore, incorporating a cost-efficient method for handling multiple errors in arithmetic operators especially in multipliers is of great importance.
So far, many designs and methods have been proposed to enhance the BCD or decimal multipliers with respect to area, power or delay.Many of the previous designs and methods are based on different BCD-based coding techniques [7][8][9] or utilizing optimal adder structures for partial product generation and reduction [10][11][12][13][14][15] to reduce area, delay or power consumption.Some designs utilize the methods for improving the implementation of BCD multipliers on FPGAs [16][17][18].In addition, there exist a variety of methods to obtain efficient binary to decimal converters useful for the BCD multiplication such as [11,[19][20][21].However, with respect to fault-tolerance and error detection/correction, despite the fact that there exist some error detecting BCD adders in reversible circuits and ordinary circuits such as [22][23][24], there is no BCD multiplier even designed for single error detection/correction.This means that there is no BCD multiplier with multiple error detection capability, as well.
Due to the lack of BCD multiplier designs with error detection, in this paper, digit-by-digit BCD multipliers are proposed that are capable of multiple error detection with high probabilities in addition to perfect (or 100%) single error detection which also makes them self-checking designs.The proposed structures are based on the digit-by-digit decimal multiplier introduced in [11] which is one of the best designs with respect to area and delay as well as utilizing efficient binary to decimal converters.As a BCD multiplier includes different parts, the proposed BCD multipliers utilize different self-checking methods suitable for each part.Each self-checking method is only useful for single error detection.However, the proposed combination of single error detecting methods results in a design with multiple error detection.
The rest of the paper is organized as follows.In Section 2, the basic concepts required for the next sections are described.In Section 3, structures of the internal components utilized in the proposed BCD multipliers to achieve multiple error detection are explained.Then, in Section 4, the proposed designs are evaluated with respect to area, delay, power and also multiple error detection capability.Finally, some conclusions are drawn in Section 5.

Background
In this section, some concepts required for the BCD multiplication and error detection are described.

BCD digit multiplication
There are different BCD coding methods to represent decimal numbers in binary format.However, the basic method is the BCD-8421 coding scheme in which 8, 4, 2 and 1 are the weights of bits in a 4bit BCD digit from left to right, respectively.As the range of BCD digits is [0:9], the product of the one-digit BCD multiplier will be in the range of [0:81] which requires two BCD digits for representation.The one-digit BCD multiplier is used as a basic cell in the BCD multiplier architectures [3].This multiplier produces either a direct 2-digit BCD product or a binary result that will show the BCD product after the binary to BCD conversion.
The BCD multiplication is performed in three main stages, partial product generation, partial product reduction and last summation to obtain the final result.In general, the multi-digit BCD multipliers can be differently categorized.In terms of partial product generation and reduction, these multipliers are divided into sequential, semi-parallel and parallel BCD multipliers [25].However, in terms of final result generation, these multipliers are divided into two categories, direct BCD multipliers and indirect BCD multipliers [3].In the former, the result is directly shown in BCD format but in the latter, the result is obtained after a conversion.In this paper, to obtain more speed, parallel and indirect BCD multipliers are used that utilize fast multi-operand BCD adders and also the converters in different stages.

Error detecting methods
Error detection is one of the main concepts utilized in the reliability which itself contributes a system's dependability.Most of the basic error detection methods based on the hardware redundancy are designed for single error detection.A complete single error detection (detection with 100% probability) helps to obtain a self-checking design in which an output alarm is definitely activated if an erroneous output is produced.Here, only the methods utilized in the proposed designs are introduced.
The simplest method is the duplication with comparison or double modular redundancy (DMR) which requires more than 100% area and power overheads with a few percent delay overhead.
Another error detecting method is the arithmetic residue code.If X and Y are the operands of a binary multiplier and |i|m means i mod m in which m is the check modulus, Eq. ( 1) can be used to check the residues in a multiplier to detect single errors: In the equation above, the left side as the actual residue is the residue of the multiplication result, and the right side as the predicted residue is computed using the residues of input operands X and Y.The best choice of residue codes is when m is odd and positive.Eq. ( 1) is used for binary multiplication.However, fortunately it is true for both one-digit and multi-digit BCD multipliers, as well.In addition, the overheads of residue codes can be much lower than 100% in many cases dependent to the design size.
Another method for error detection is based on the self-checking full adder (FA) proposed in [26].The self-checking FA can be advantageous to achieve error detecting BCD multipliers because many FA cells exist in these designs.The self-checking FA is based on the fact that when all three inputs of a FA (input operands A and B, and input carry Cin) are equal (not equal), the output sum (Sum) and output carry (Cout) will be equal (not equal), as well.The following equations can be used to implement a self-checking FA with single error detection capability.In these equations, ⨁ stands for XOR, and Eq is an active-low signal showing the equality of all FA inputs.

𝑺𝒖𝒎 = 𝑨⨁𝑩⨁𝑪 𝒊𝒏
(2) A self-checking FA requires more than twice the area and also high delay and power overheads compared to the basic FA.
With respect to multiple error detection in arithmetic units there exist a few designs including [27][28][29][30][31] but all of these designs are for adder structures.In other words, the designs presented in [27,28,31] are for carry lookahead adder, and the designs proposed in [29] and [30] are for signed-digit adder and carry select adder, respectively.In fact, there is no multiple error detecting BCD adder or BCD multiplier designs.

Low-cost BCD multiplier
As stated in Section 1, the digit-by-digit decimal multiplier proposed in [11] is the base of this paper since it utilizes efficient binary to decimal converters in all three stages of the multiplication to reduce both area and delay.One of the basic cells used in the designs of [11] is the one-digit BCD multiplier.For this cell, the 44 bits binary multiplier from [25] optimized for the BCD multiplication has been used.However, the 8-bit output result should be converted to a 2-digit decimal number using a partial product binary-to-decimal (PPBD) converter.Therefore, two new PPBD converters have been proposed in [11] one as the high-performance PPBD and another as the low-area PPBD converter.In these converters, three types of cells are used including two new cells called fast binary-to-decimal (FBD) and low-area binary-to-decimal (LABD) converters in addition to the Nicoud (ND) cell introduced in [32].
Fig. 1(a) shows the block diagram of the FBD cell.The operation of this cell is according to {H0,L0} = 4bj + bi in which H0 and L0 as the higher and lower digits have the maximum values of 3 and 9, respectively.The operation of the older cell, ND cell, in comparison with the FBD cell is according to {H0,L0} = 2bj + bi in which bi is 1-bit different from that of the FBD cell.To obtain the 2-digit BCD result of a 44 bits binary multiplier, two FBD cells can be cascaded according to Fig. 1(b) as a PPBD converter.As an example, this figure depicts the conversion of input binary number 010100012 to its equivalent 2-digit BCD number which is 81.The operation of the LABD cell is similar to that of the FBD cell only with the difference that the input bj is a 3-bit binary number not a 4-bit number which leads to a simpler structure in the LABD cell and also in the high-performance and the low-area PPBD converters.The logic equations of FBD and LABD cells have been presented in [11].However, recently, these equations have been corrected in [33].
Moreover, a hybrid MBD (multi-operand binary-to-decimal) converter has been proposed in [11] in order to obtain the BCD results of multi-operand additions performed by the carry save adders (CSA) in the partial product reduction stage.This converter uses a number of FBD and ND cells dependent to the size of the multi-operand addition's result.For example, if a 10-bit binary number is produced in the partial product reduction stage, three FBD and two ND cells are required according to Fig. 2 to obtain the equivalent 3-digit BCD number.A 4-digit BCD multiplication is depicted in Fig. 3. Based on [11], it includes one-digit multipliers, CSAs with different number of operands (different-size columns) and a BCD adder.The proposed FBD and LABD cells are utilized in all three parts especially in the PPBD converters of one-digit multipliers.The outputs of one-digit multipliers (both higher and lower digits) are summed using multi-level CSAs (not shown in Fig. 3) and converted to the BCD format using the proposed hybrid MBD converter (Ri and Qi digits are the sum and output carry of the CSAs).At the end, the final product is produced using a BCD adder with a proper size.

Proposed multiple error detecting BCD multipliers
In this section, the implementation of BCD multipliers with multiple error detection is discussed.In addition, the reason for the selection of a basic self-checking method for each module is described.For simplicity, the 4-digit BCD multiplier is augmented with multiple error detection methods as the basic architecture.In fact, the proposed design can be extended to larger BCD multipliers.

Selection of the basic self-checking methods for different parts
As stated in Section 2.2, the utilized self-checking methods in this paper are DMR, self-checking FA and arithmetic residue codes.Both the self-checking FA and DMR methods require more than 100% area overhead.However, based on our synthesis results, the self-checking FA leads to more delay overhead while DMR needs more power.Thus, the effect of these methods will be different dependent on the structure of the internal modules where these methods will be applied.For the residue codes, the check modulus equal to three (modulo-3) is selected since it incurs lower cost compared to other residue codes [34].
Based on Fig. 3, a 4-digit BCD multiplier can be split into three main parts as shown in Fig. 4a.The first part includes 16 one-digit BCD multipliers that each one includes a 44 bits binary multiplier optimized for the BCD multiplication and a PPBD converter.The second part includes six CSAs (with 3, 5 or 7 input BCD operands called CSA3, CSA5 and CSA7) and also seven hybrid MBD converters.The third part is a 6-digit BCD adder.
The residue codes are not suitable for the PPBD converters and the hybrid MBD converter proposed in [11].This is due to the fact that some of the inputs of internal cells used in these converters can change at least two outputs based on the logic equations of the FBD and LABD cells.Thus, some single errors can lead to multiple internal changes that can be undetectable by the residue codes.On the other hand, the self-checking FA cannot be useful because there is no FA in these converters.Thus, we use DMR or duplication with comparison for the converters to simply detect all single errors.
As there exist FAs in the binary multipliers of part 1 and in the CSAs of part 2, the self-checking FAs can be used in these parts.Moreover, the residue codes can also be utilized since these modules perform arithmetic operations.Thus, for the remaining modules inside the parts 1 and 2, the residue code and self-checking FA-based methods can be utilized as shown in Fig. 4b to obtain high multiple error detection probabilities.According to Fig. 4b, two options exist for the parts 1 and 2 incorporating hybrid methods that leads to four configurations in the proposed architectures.

Structure of the self-checking modules
In this section, the implementation details of the main modules in the proposed multiple error detecting architectures are described.

One-digit BCD multiplier
As stated in Section 3.1, the part 1 includes 16 one-digit BCD multipliers that each one includes a 44 bits binary multiplier optimized for the BCD multiplication and a PPBD converter.To obtain a self-checking 44 bits binary multiplier with BCD inputs if we use the self-checking FAs, DMR should also be used for internal gates.The single usage of each internal wire is required to obtain 100% single error detection.In a 44 bits binary multiplier, each gate or set of gates that produces an output or a specific input for another internal module (FA or HA) is considered as a separate module to be augmented by the DMR method.This helps to duplicate the largest sets of gates as possible to decrease the overheads.Fig. 5 shows the area-optimized binary multiplier in which the internal gates are separated in 11 internal modules.In this figure, HA stands for half adder.Two BCD input operands for multiplication are x3x2x1x0 and y3y2y1y0 while p6p5p4p3p2p1p0 shows the output product with the maximum value of 8110 (99).
After replacing FAs, HAs and other internal modules with the self-checking ones, each small module can detect a single error.Therefore, as many internal modules exist multiple error detection capability is obtained.It should be noted that the internal modules are selected in such a way that an error on their output will affect the output of the larger module (here, the 44 bits binary multiplier) and finally the output of the total design.Based on this fact, masked internal faults/errors which do not affect any output are not considered and all real single errors will be detectable.Fig. 6 depicts the self-checking and also multiple error detecting 44 bits binary multiplier for BCD inputs utilizing the mentioned methods.According to [35], similar to a FA, each HA can be replaced by a self-checking HA based on the following equations: = . (7) However, to obtain lower delay while accepting a few more area and power overheads, HAs can simply be duplicated similar to the internal gates.In Fig. 6, the module numbers and duplicated modules are based on the modules shown in Fig. 5.In Fig. 6, 20 error indicating signals (e1 to e20) exist altogether that can independently be asserted.If we want to show an erroneous situation (where at least one error exists), these error signals should enter an OR gate.To obtain the self-checking one-digit BCD multiplier, the output of 44 bits binary multiplier should be sent to a duplicated PPBD converter as stated in Fig. 4b (Part 1). (

Fig. 6. Proposed multiple error detecting 44 bits binary multiplier for BCD inputs
As shown in Fig. 4b, the residue codes can be used instead of the self-checking FAs.In this case, to attain the self-checking one-digit BCD multiplier, it is not required to duplicate the internal gates.However, the PPBD converter should be duplicated.To apply the modulo-3 residue code on this multiplier, the BCD residue generator should be designed.Fig. 7 depicts the proposed modulo-3 residue generator for a BCD digit.This component adds two 2-bit numbers a1a0 and a3a2 based on the input BCD digit a3a2a1a0 with the end-around carry.However, as both a1 and a3 are not equal to one at the same time, this component is simplified compared to an ordinary binary residue generator.In fact, an OR gate and a HA are used instead of a FA which leads to lower overheads.In Fig. 7, the binary residue r1r0 will be 00, 01, 10 or 11 in which 00 and 11 are assumed the same in the modulo-3 comparator.
Fig. 8 depicts the self-checking one-digit BCD multiplier based on modulo-3 residue code.In this figure, the BCD residue generators have the structure shown in Fig. 7.The binary residue generators act similar to the BCD residue generators but on the inputs from 00002 to 11112.The PPBD converter augmented with DMR produces the last result as a 2-digit BCD (H and L according to Fig. 3).Moreover, each comparator asserts its output as error if its inputs are not equal.In this way, two error signals can be asserted, independently.

Carry save adders
The CSAs used in the part 2 of the 4-digit BCD multiplier only include FAs and HAs.Thus, they can be augmented by both self-checking FAs and residue codes.In this section, for simplicity, only the proposed structures for multiple error detecting CSA3 (the CSA with three one-digit BCD inputs) are introduced.The proposed structures for CSA5 and CSA7 can similarly be achieved.Fig. 9 shows the self-checking and multiple error detecting CSA3 utilizing the self-checking FAs and HAs.Because the 4-bit input operands have the maximum value of 9 instead of 15, the output sum will require five bits instead of six bits by maintaining only the sum bit (XOR gate) of the last HA.Therefore, instead of the self-checking HA, DMR is used for this XOR.In addition, to obtain a complete self-checking CSA3 with BCD inputs and output, the binary sum is sent to a duplicated hybrid MBD converter as depicted in Fig. 4b (Part 2) to attain the equivalent 2-digit BCD output (Qi and Ri according to Fig. 3).
Utilizing the modulo-3 residue code, the proposed structure for self-checking CSA3 is depicted in Fig. 10.In this figure, the BCD residue generators have the structure shown in Fig. 7, and the binary residue generators are the same as that of Fig. 8.The 2-bit CSA is a simple parallel adder including two FAs and two HAs to add three 2-bit numbers.The hybrid MBD converter augmented with DMR produces the last result as a 2-digit BCD.Moreover, as two comparators exist, two error signals can be asserted, independently.

Multi-digit BCD adder
The part 3 of the 4-digit BCD multiplier is a 6-digit BCD adder.In fact, according to Fig. 3, to produce the higher BCD digits ([P7:P2]) of the 8-digit BCD result of the 4-digit BCD multiplier, a multi-digit BCD adder with the size of six is required.We use DMR to obtain a self-checking multidigit BCD adder because this method leads to much more speed with a few percent power overhead compared to other methods.Based on our synthesis results, this adder augmented by DMR requires 34% lower delay compared to the self-checking FA-based design while requiring a few percent more power consumption which leads to around 30% lower energy.Another reason to use DMR for the part 3 is that this part is very small in area compared to the other parts (7% of the 4-digit BCD multiplier) which means augmenting it with more complex self-checking methods is not beneficial as expected.

Proposed architectures
As stated in Section 2.3, the proposed PPBD converters in [11] have two types, one is the highperformance PPBD and another is the low-area PPBD.Both converter types incorporate new cells i.e., FBD and LABD converters.This resulted in two low-cost BCD multipliers in [11].Thus, in this paper, both structures are augmented and evaluated by the proposed self-checking modules.According to Fig. 4b, incorporating different methods for the parts 1 and 2 leads to four configurations in the proposed architectures.In this way, one-digit BCD multipliers and carry save adders were designed using two main methods, the self-checking FAs and residue code to produce four architectures for the proposed BCD multipliers as follows.In all architectures, duplicated PPBD and duplicated MBD converters are used in the parts 1 and 2.Moreover, the part 3 is augmented by DMR: (1) Arch1: The self-checking FAs with duplicated internal gates have been used in both part 1 and part 2.
(2) Arch2: The modulo-3 residue code has been used in the part 1 but the self-checking FAs have been used in the part 2.
(3) Arch3: The self-checking FAs with duplicated internal gates have been used in the part 1 but the modulo-3 residue code has been used in the part 2. (4) Arch4: The modulo-3 residue code has been used in both part 1 and part 2.

Evaluation and discussion
In this section, the BCD multipliers configured on four different architectures are evaluated based on the synthesis results.In addition, some analytical and simulation-based evaluations are performed to assess multiple error detection capability.

Implementation results
The proposed architectures along with the basic designs are developed by Verilog HDL and synthesized by the Synopsys Design Compiler using a 65 nm STMicroelectronics standard cell library (with 1.2 V power supply in 25 °C).Due to the fact that the area and its related overheads (power and energy) can be high in error detecting designs, the area minimization is set in all syntheses.Table 1 represents the synthesis results including area, delay, power consumption and energy (Power-Delay-Product) for the basic non-self-checking design, the residue-based and DMR-based self-checking designs with single error detection (SED) and the proposed BCD multipliers in four configurations with multiple error detection (MED) capability.In addition, Table 2 shows the overheads with respect to the mentioned parameters in Table 1 compared to the basic non-self-checking design.In both tables, the results are separately shown for all architectures based on two different PPBD converters.Based on Tables 1 and 2, utilizing low-area or high-performance PPBD converters does not affect the overheads much.All four proposed multiple error detecting architectures are faster than the residuebased single error detecting BCD multiplier that demonstrates the high-speed characteristic of the proposed BCD multipliers.The DMR-based BCD multiplier has high overheads (except for delay) while it is designed for single error detection.Arch1 is clearly the best design among all proposed multipliers with respect to power consumption and energy.Arch2 is the best design with respect to area and Area-Delay-Product (not shown in these tables) with small differences.In addition, Arch4 is the fastest design between four proposed architectures.Altogether, Arch1 and Arch2 are the best multiple error detecting BCD multipliers if the cost is more important than the speed.

Evaluation of single and multiple error detection
In this section, single and multiple error detection capabilities of the best of proposed architectures (Arch1 and Arch2) are evaluated through simulations and analysis.

Error injection method
The Verilog codes used to design the BCD multipliers and to obtain the synthesis results are accordingly modified in the high-level description to enable single and multiple error injection in the simulations.In each single/multiple error scenario, a multiplier is simulated with thousands of iterations each one including both random erroneous points and random operands to cover all potential error points and count detected and undetected erroneous situations.In the utilized error injection method, we do not account the faults/errors that may be internally masked by the circuit.Thus, to make an internal or external signal erroneous, we change its value to definitely make an error that prevents fault/error masking.
All of the outputs of internal modules and components (such as FAs and HAs) are assumed internal signals and have the potential of becoming erroneous.As there are many internal components in each BCD multiplier, the total number of potential error points can be high.Both the main architectures, Arch1 and Arch2, have around 2900 potential error points which are modeled as some long error vectors showing one or more points randomly selected as occurred errors.These error vectors are hierarchically sent to the internal components of a design.The vector sizes can vary from lower than 100 to a few hundred which is dependent to the number of internal signals in a component.Based on the error vectors value in each iteration, the selected internal signals will be changed using a simple if else structure.
It should be noted that the erroneous probability in these simulations is based on the silicon area used to produce an internal signal (obtained from the synthesis results).This way, a component with an area around two times more than another component will become erroneous with two times more probability.In addition, the outputs of an internal module will have an erroneous probability with respect to their logic area.Table 3 depicts the outputs of some internal modules and their logic area obtained by the synthesis.As the area can be a non-integer number, the nearest integer is assumed for each area to be used in the computation of the erroneous probability for each output.For example, based on Table 3 the erroneous probability of Cout from a HA is ¾ of the erroneous probability of Sum from that HA.To implement this fact, 7 consecutive numbers are dedicated to each HA to be randomly selected as the erroneous points, however, 4 numbers are dedicated to Sum and 3 numbers are dedicated to Cout.The similar assignments are performed for the other internal signals.It is worth mentioning that since a HA is a small component and has only two gates, this separated probability assignment may be not precise.But for the outputs having a larger logic (such as the outputs of FBD and LABD cells), this probability assignment is more precise and leads to more realistic results.

Rounded area (µm 2 )
Logic area to produce the output (µm

Multiple error detection analysis
Due to the fact that the number of potential error points in the proposed architectures are high and each internal signal has its own erroneous probability, as well, simulating all the design at the same time under multiple random errors and sufficient number of iterations for each error scenario will be time-consuming.Therefore, we perform error injection-based simulations for the main internal modules (based on Fig. 4a) to extract multiple error detection probabilities for these modules, and then, we obtain the overall error detection capabilities of the architectures by the proposed analysis of the total design as a combination of the main modules.
It is shown in [30] that the detection probability of all k concurrent errors in the N-bit self-checking carry select adder proposed in [30] can be estimated by Eq. ( 9): Eq. ( 9) is based on the fact that all of the k errors can be detected if they occur in different parts.In this paper, we can extend the usage of Eq. ( 9) for some self-checking modules inside the proposed BCD multipliers to estimate multiple error detection probabilities.If N is the number of the same selfchecking modules and k is the number of concurrent errors, Eq. ( 9) can be used to obtain the detection probability of all k simultaneous errors in these modules.Based on Fig. 4a, 16 the same one-digit BCD multipliers (each one including a 44 bits binary multiplier and a PPBD converter) exist in the 4-digit BCD multiplier.In addition, there exist six CSAs (three groups of the same modules) and several the same hybrid MBD converters.Therefore, Eq. ( 9) can be used for all of these modules after making them self-checking in which a single error detection is guaranteed in each module.According to Eq. ( 9), multiple error detection is possible even if none of the modules can detect more than an error.For example, for the same self-checking one-digit BCD multipliers as a larger module (including 16 sub-modules), the detection probability of both two simultaneous errors is simply equal to 15/16 = 0.94.This amount is the minimum value because it is assumed that all sub-modules only detect a single error.Similarly, the detection probability of all three concurrent errors is 15/16  14/16 = 0.82 without any multiple error detection capability in the sub-modules.However, if each sub-module can detect multiple errors, even partially, more multiple error detection probability will be achieved.In this case, a different analysis as following is required to estimate multiple error detection probabilities in larger modules or the whole multiplier.
There exist enough reasons to evaluate multiple error detection capability with a more comprehensive analysis than Eq. ( 9): (1) The modules may have multiple error detection capability even partially which leads to more multiple error detection probability in the whole design.(2) The erroneous probabilities of different modules are different mainly because of having different silicon area.(3) It is more important and effective to detect the erroneous situation irrespective of the number of occurred errors compared to the detection of all errors.In agreement to the reason (1) above, Table 4 represents the results obtained from error injectionbased simulations of the main modules of the best proposed architectures (Arch1 and Arch2) for different number of errors.It should be noted that for multiple error situations in these simulations, detecting the existence of at least one error is enough to report as an erroneous situation detection.
With respect to the reason (2), Table 5 depicts the breakdown of the 4-digit BCD multiplier based on the main modules, number of modules, silicon area, and area ratio when utilizes the proposed Arch1 (including low-area PPBD converters).Similar results can be obtained based on the proposed Arch2, as well (and also including high-performance PPBD converters).According to Table 5, the total 23 modules of the BCD multiplier can be grouped in five larger modules (including a single 6-digit BCD adder) which their area ratios are shown in the last column of Table 5.
With respect to the reason (3), Table 6 shows the detection probabilities of all concurrent errors from two to four using Eq. ( 9).These probabilities are very low in many cases because of the low number of the same modules and also not considering the multiple error detection capability of the internal modules.Thus, a new analysis is required to consider different types and different number of modules to evaluate the whole design.The proposed analysis utilizes the multiple error detection capabilities of the main internal modules based on Table 4, and the number of modules and area ratios such as the ones stated in Table 5.In some cases, computing the unreliability is simpler than that of the reliability.This is the case in this paper.In fact, the probability computation of not detecting an erroneous situation including k concurrent errors (called Qk) is simpler than the probability computation of detecting that erroneous situation (called Rk = 1-Qk).It should be noted that not detecting an erroneous situation including k concurrent errors means not detecting anyone of k concurrent errors, as well.
In this analysis, the probability of not detecting the erroneous situations including 2 and 3 concurrent errors can be obtained by Eqs.(10) and (11), respectively: In the equations above, M  2 is the number of modules, ai is the area ratio of ith module to the total area (∑    =1 = 1) and qik is the probability of not detecting the erroneous situation including k concurrent errors in the ith module.Eqs.(10) and (11) are based on the fact that these erroneous situations can be undetectable if all errors occur in the same module.For example, for two concurrent errors (k = 2) the erroneous situation will be detected if the errors occur in different modules since all modules are self-checking.However, if two errors occur in the same module (the ith module), the error detection is not guaranteed because with the probability equal to qi2 the ith module cannot detect any errors.Finally, the total undetection probability is obtained by a summation over all modules.Eq. ( 11) is similarly obtained by noting to the fact that if two errors occur in the same module and the third error occurs in another module, this erroneous situation is detectable irrespective of detection/undetection of two first errors since the third error is certainly detected by a self-checking module.
For four concurrent errors Eq. ( 12) can be used.This equation includes a new term in a summation form compared to two previous equations.This new term has been added to cover all the situations in which there exist two different modules each one including two errors.These situations can be undetectable as well dependent to the values of qi2 and qj2 if the two modules are the ith and jth modules among all modules. 4 = ∑   4 . 4 Finally, in a compact form, Eq. ( 13) can be used to estimate the undetection probability of erroneous situations including k (2k5) simultaneous errors: To obtain detection probabilities of multiple error situations based on both simulation and analysis, the equations above should first be applied on the main smaller self-checking modules shown in Tables 4 and 5 (except the 6-digit BCD adder) to obtain the results for larger self-checking modules listed in Table 6.The smaller modules include CSA3, CSA5 and CSA7 all with a hybrid MBD converter, and also the one-digit BCD multiplier.The number of modules M to be set in the equations is 2, 2, 2 and 16, respectively according to Tables 5 and 6.After computing the probabilities for four new larger modules, the equations should be used again this time for five larger self-checking modules (M=5 because of four new larger modules and the single 6-digit BCD adder in both Arch1 and Arch2) based on the new qik parameters.
As an example, for 16 one-digit BCD multipliers used in Arch1 as a larger module, M equals 16, ai equals 1/16 for all smaller modules since these modules are the same.In addition, q2, q3 and q4 equal 0.086, 0.118 and 0.174, respectively because based on Table 4 the corresponding detection probabilities are 91.4%,88.2% and 82.6%, respectively.Therefore, if we want to compute the detection probability of erroneous situation including for example, two concurrent errors in all 16 one-digit BCD multipliers of Arch1, the following computation can be used.This way, the results are obtained larger than the corresponding probabilities stated in Tables 4 and 6.Table 7 shows the results for multiple error situations detection in the larger modules of two architectures, separately.Based on this table, in addition to being higher compared to the probabilities shown in Tables 4 and 6, some detection probabilities are very close to one as the perfect capability.In fact, the multiple error detection capabilities of all smaller modules are accounted for larger modules based on the proposed analysis.Moreover, for each larger module, as a beneficial characteristic the detection probability increases when the number of errors increases.After obtaining detection probabilities of multiple error situations in the larger modules, the total design should be analyzed, quantitatively.Here, different from the larger modules in which their smaller modules are the same, the constructing modules have different area and as a result different ai (for example, based on Table 5 for Arch1).To obtain the intended probabilities based on Eqs.(10) to (12), the number of modules is five including four larger self-checking modules and the single self-checking 6-digit BCD adder in both Arch1 and Arch2 architectures.The ai parameters for the proposed Arch1 are stated in Table 5 (similar ai can be computed for Arch2).The qik parameters for the larger modules can simply be extracted from Table 7 while these parameters for the self-checking 6-digit BCD adder can be obtained from Table 4.
Table 8 depicts detection probabilities of multiple error situations in two proposed self-checking 4digit BCD multipliers.As this table shows, the detection probabilities against multiple errors are very close to one in both proposed architectures.Moreover, the detection probabilities increases when the number of errors in multiple error situations (k  2) increases.This characteristic leads us to expect very high detection probabilities for more concurrent errors (k  5), as well.

Comparison with other designs
As there is no single/multiple error detecting BCD multiplier in the literature, we compare the proposed architectures with some state-of-the-art single/multiple error detecting non-BCD multipliers.The multipliers proposed in [36] are based on a general design for single error detection (SED) with the cost lower than a DMR.But the multipliers proposed in [35] either have single error detection or multiple error detection (MED) capability.To perform a fair comparison with the designs in [35], it is tried that the size of inputs or the number of internal modules be almost the same in similar designs.In fact, 16×16 multipliers of [35] have 16-bit input operands similar to the proposed 4-digit BCD multipliers in this paper.In addition, the number of internal modules in 32×32 multipliers of [35] is almost the same as that of the proposed 4-digit BCD multipliers.
Since having both a low delay and a high multiple error detection is the main goal in this paper, these two parameters are compared with that of [35,36].Fig. 11 depicts the proposed self-checking 4-digit BCD multipliers in comparison with different types of single/multiple error detecting multipliers of [35,36] based on the delay overhead.According to Fig. 11, the proposed architectures lead to relatively low delay overheads compared to previous designs.
With respect to multiple error detection capability, the proposed self-checking BCD multipliers are compared to the multipliers proposed in [35] because they can detect the erroneous situations including multiple errors, as well.Fig. 12 shows this comparison based on the probability of erroneous situations detection.Based on this figure, both two investigated BCD multipliers (based on Arch1 and Arch2) can detect the erroneous situations with more probability compared to different-size multipliers of [35].

Conclusion
In this paper, some self-checking BCD multiplier architectures capable of multiple error situation detection were proposed based on the investigation of digit-by-digit BCD multipliers and different single error detection methods.To obtain the architectures with a lower cost and higher speed, different error detection methods were utilized in different parts of the BCD multipliers, individually or in combination.To analyze the proposed architectures for BCD multipliers, the 4-digit BCD multiplier is used as the main structure.However, the proposed architectures can be applied on larger BCD multipliers, as well.Based on the synthesis results the proposed Arch1 and Arch2 are the best among four proposed architectures.To evaluate multiple error detection capabilities of the best architectures, analyses and error injection-based simulations were used together.The results show that the 4-digit BCD multipliers based on the proposed architectures detect multiple error situations with very high probabilities (for example, at least 99.99% for four concurrent errors) as well as detecting all single errors.

Fig. 4 .
Fig. 4. (a) Three main parts of non-self-checking 4-digit BCD multiplier based on [11], and (b) Utilized methods in the main parts of the proposed design to achieve multiple error detection

Fig. 11 .Fig. 12 .
Fig. 11.Delay overheads of the proposed self-checking BCD multipliers in comparison with the other multipliers