A Fresh Design of Power Effective Adapted Vedic Multiplier for Modern Digital Signal Processors

Digital Signal Processors play an unavoidable portion in modern-day communication. The Multiply Accumulate (MAC) is a crucial component of modern signal processors that performace is rely on speed, power, and area. In this paper, One such promising option is the Vedic multiplier based on the modified sum-product method is proposed. The MAC unit was built utilizing an outdated mathematical technique.The effectiveness of the vertical and transverse Vedic multiplication strategy that distinguishes itself in a true multiplier cycle. The architecture is designed for the proposed algorithm and coded in Verilog HDL. The design is synthesized for analyzing the area, power and delay in the Xilinx ISE environment. The proposed multiplier has 49.12% reduction in delay and a 42.51% reduction in power. When compared to the standard multiplier, it has a faster development practice and a lower calculation complexity.


3
Techniques and many more techniques [5][6][7][8]. The existing method shows a novel methodology in wherein multiplier segment is executing by use of antiquated Vedic Mathematics [9].
Vedic multiplier is one of the traditional and ancient one that focus on being fast and low power. Vedic calcilation depends on 16 Sutras, in that Urdhva Triyakbhyam (UT) is a different one, which does vertical and crosswise actions [10]. In the Vedic mathematics method to diminish the combinational lag, adders are utilised. A portion of the persistently utilized Computation-Intensive Arithmetic Functions (CIAF) are Multiplication-based activities together with MAC, and inner product. These are utilized in various DSP circuits like Fast Fourier Transform (FFT), filtering, convolution, and microprocessors in its ALU module [11]. Therefore, there is a requirement for a high-performance multiplier. At present, the order of revolution flow in a DSP chip determination is subject on multiplication time, which is immobile, the controlling. Due to expanding of computer and signal processing utilization, the attention of excessive speed processing has been expanding gigantically. Design of regular and simple structure that multiplier should have increase speed, reduce area and reduce power [12].
This work introduces different multiplier architectures. The Vedic multiplication is derived from 16-Vedic sutras. This sutra talks about natural ways of solving mathematical problem. The Urdhva-Triyakbhyam sutra is more efficient and relevant to all cases, among these 16 Vedic Sutras [13]. It means vertically along with crosswise. Partial product is produced concurrently which itself lower the delay and produced this method rapid. Figure 1 shows the multiplication method for the Vedic multiplier. Examine the numbers A and B, whereas A = a2.a1. a0 as well as B = b2.b1. b0. Result of multiplication = c4s4s3s2s1s0. In that, Vedic multiplication was done by a modified sum-product algorithm that procedure explained the proposed design. Section 2 of the study continues with a discussion of the current system. The proposed approach is discussed in part 3, the outcome and discussion in Sect. 4, and the conclusion of the work in this article is discussed in Sect. 5.

Existing Design Structures
An existing system has compared various types of multipliers like Baugh Wooley, Wallace and array multiplier. In addition, for reducing power, they have used the Power gating technique and gate-level optimization technique and coding in VHDL. Moreover, by using these three multiplier areas, power and speed are simulated using the software modelsim6.5e and in Xilinx ISE, and finally, the values are analyzed.

Array Multiplier
It is suitable for VLSI appliances because it has a high level of solidarity. It is applied to the multiplication of two numerical digits by arranging both half adders and full adders. The place of production of partial products involves AND of multiplier and multiplicands bit. Parallel array multiplier is extensively to get the high-performance speed and consumes high power.
The multiplication involves an input port, a central port, and an output port. The RTL block view for the array multiplier is given in Fig. 2, and the schematic diagram of the Array multiplier is shown in Fig. 3. There are two binary digits are taken, which are X, Y and have b bits. Figure 4 structure of array multiplier in that we are taking two fourbit numbers.

Wallace Tree Multiplier
Using a variant of long multiplication method, the two numbers are multiplied by threestage procedure. The bit amount was developed by the 2X2 bit matrices. Estimation of bit products equal to the estimation of row. A ripple carry adder generates the sum of partial product rows by a faster adder and final output. The Fig. 5 RTL block view for Wallace tree multiplier. Figure 6 shows a diagram of the Variant of the long multiplication multiplier. The three stages of operations. 1) By multiplying the multiplicand and multiplier the product was developed bit by bit. 2) Grouping up the partial products.
3) Final output generated by using adders [14]. The developed area's final product is summed up, it takes less time, and the speed of the multiplier is more incredible. Figure 7 shows the Wallace tree multiplier structure in that the first stage is responsible for developing the partial products. In the next stage, partial products are grouped. Then in the last stage, generated area products are added together.

Baugh Wooley Multiplier
In the two's complement high-performance multiplier, the opposing sign transfers to the final [15]. Figure 8 shows the RTL block view for 2's complements high-performance multiplier.
The Baugh Wooley schematic diagram is shown in Fig. 9, and the Baugh Wooley multiplier structure is given in Fig. 10. In the first stage, excluding the highest significant bit other than all partial product rows are inverted. In the second stage, one can be added to the Mth column; in the third stage, most significant, it is reversed.

Design Techniques
Power gating techniques are applied in IC to reduce the power by terminating the current block. Gate level optimization techniques carry three types that are logic resizing, pin swapping and transition rate buffering. For optimizing the slew rate and leakage, and dynamic current logic rescaling approach is used [16]. Accurate switching leads to a correct sizing process, in transition rate buffering, thin out dynamic power by shifting the time delay. Finally, pin swapping is a kind of shifting technique and take place on lower load values. By comparing the multiplier mentioned above, the output parameters were verified. The number of gates used and total power were getting more attention. As a result, power consumption and delay value are decreased.

Limitation
By comparing a different kind of multiplier, the power consumption of the array multiplier was high compared to other multipliers. Gate count and adder count are higher than other multipliers, occupying more area in the array multiplier. In the Baugh Wooley multiplier, the operation speed is very high compared to the array multiplier, but delay and power consumption are much less [17][18][19][20]. Whereas in Wallace, multiplier speed is high and power consumption is high compared to Baugh Wooley multiplier. Delay is a little bit increased.

Proposed Design Structure
The overview of our proposed idea is to reduce power and delay for the Vedic multiplier by using the Modified sum-product algorithm and code in Verilog. In addition, that was implemented in Xilinx. Finally, delay and power results are compared with the existing.

Vedic Multiplier
In this projected work, the MAC unit is one of the key processes widely used in DSP application modules. The multiplier is the vital module of the Modified sum-product algorithm [21]. It dispenses with undesirable multiplication steps, follows a quick multiplication procedure, and accomplishes an altogether less calculation unpredictability over its conservative equivalent [22]. The Vedic multiplier uses a top-down approach, where more modest squares can be utilized to plan a greater one. Antique Indian method of mathematical computing operations is the core of Vedic mathematics and Vedic multipliers [23]. The projected work is given in Fig. 11. This work was simulated in Xilinx and parameters were compared. Figure 12 shows the pictorial representation of the multiplier. In the first step, need to multiply A0 and B0 that will store in S0. Then in the second step, have to cross multiply A0.B1 and A1.B0 that will be added and stored in C0S0. Then the third stage, a multiplier have A1.B1 and add C0 stored in COUT S2.   Figure 13 shows that example calculation of proposed work in that step one have to multiply 2 and 5 vertically then the result as 10 and pre carry is zero. In the second step, have to cross multiply 1 and 5, 1, and 2, then add the multiplied value is 7 then the precarry generated from the first step is 1 this two get added and the result is 8. The final step is vertically multiply 1 and 1 value is 1 precarry from the previous step is 0 this two get added and the result is 1. The obtained result have to take last bit of each stage and write it together. Figure 20 represents the RTL schematic for projected multiplier (Fig. 14). The projected 4X4 Vedic multiplier is shown in Fig. 16. In that four Vedic 2X2 multiplier (Fig. 15) used and the last stage, we have to add results using adder. Input data (a, b) sent to the 4X4 multiplier we get output in p. The result is verified with the simulation output. The parameters are comparing with an existing system (Fig. 16).

Results and Discussion
In Xilinx ISE (Integrated Software Environment), a design suite allows taking design plan entry between Xilinx programming devices. The control and operations of design are done by design entry, synthesis, implementation, and verification [24]. In design entry, we can generate source files based on objectives and create top-level module files using HDL such as Verilog, VHDL, or schematic. After design entry, synthesis can run during this Verilog design input given for implementation [25]. Next to run the implementation that converts the Verilog code into the report format. During simulation, inputs are given to the The overview of simulation results shows that the output for array multiplier, Wallace multiplier, Baugh Wooley multiplier and proposed 4X4 Vedic multiplier and its outputs are taken from Xilinx. Thus, the different types of inputs are given, and the correct outputs are taken.

Existing Array Multiplier
The conventional array multiplier's simulation output is given in Fig. 17. Here, for example, the input a, b as given 0000, 0101, and it produces an output of p as 00,000,000. Figure 18 shows the simulation output of the existing Wallace tree multiplier. Here, for example, the input a, b as given 0101, 0100, and it produces an output of p as 00,010,100. Figure 19 shows the simulation output of the existing Baugh Wooley multiplier. Here, for example, the input a, b as given 0111, 0011, and it produces an output of p as 00,010,101.  Figure 20 shows the simulation output of the existing Vedic multiplier. Here, for example, the input a, b as given 1000, 1000, and it produces an output of p as 01,000,000. Figure 21 shows the simulation output of the proposed Vedic multiplier. Here, for example, the input a, b as given 1000, 0010, and it produces an output of p as 00,001,000.     Table 1 shows the delay of the existing multiplier and proposed Vedic multiplier. Table 2 shows the power of the existing multiplier and proposed Vedic multiplier. From this, power has been analyzed with the existing and proposed circuit.  From this, the results are taken for both existing and proposed multipliers and outputs are taken. From the analysis of delay, the execution time was decreased when compared to the existing system. From the analysis, the delay and power gets reduced by 49.12% and 42.51% respectively when compared to an existing system.

Conclusion
A 4-bit MAC unit makes use of a Vedic multiplier with a modified sum-product algorithm was constructed. It was formed by the Urdhva-Triyakbhyam technique programmed using HDL, and Xilinx ISE supported the synthesis. It was noticed to possess that power get reduced along with optimization in circuit delay and area. The MAC module established with the projected multiplier can be utilized in DSP applications for better performance. Thus, the delay of the advised Vedic multiplier is lesser while using the modified  sum product algorithm compared to the existing Vedic multiplier. The future world mainly depends on the better performing multiplier by optimizing the power, delay and area. By using reversible logic, such a multiplier can be designed.
Author contributions All authors are equally contributed.

Funding Not Applicable.
Data availability Data sharing is not applicable to this article as no datasets were generated or analyzed during the current work.