Energy Efﬁcient Full Swing GDI based Adder Architecture for Arithmetic Applications

—Adders are one of the most important digital compo- nents used in any arithmetic applications. Many improvements in past have been made to improve its architecture. In this paper, we present two new symmetric designs for Energy efﬁcient full adder cells featuring GDI (Gate-Diffusion Input) logic. The main design objectives for these adder modules are to operate at Low-Power with reduced area but also provide full-voltage swing. In the ﬁrst (AEG-FA) design, a new approach of Inverted and Non-Inverted Carry-ins were taken to give complementary Carry-out and Sum with desired performance. These were then applied in different combinations to form higher bit width Adder architecture. This provides a higher degree of design freedom to target a wide range of applications, hence reducing design efforts. The second (PEG-FA) design is based on conventional approach which tries to reduce the critical path delay and lower switching activity in GDI circuit, providing Low-Power and high speed digital component at full voltage swing circuit. Many of the previously reported adders in literature suffered from the problems of low-swing and high noise when operated at low supply voltages. These two new designs successfully operate at low voltage with high signal integrity and driving capability. In order to evaluate the performance of proposed full adders, we incorporated 8-bit ripple carry adders. The studied circuits are optimized for energy efﬁciency using 45 nm CMOS process technology. The comparison between these novel circuits with standard full adder cells shows improvement in terms of Area, Delay, Power and Power-Delay-Product (PDP), Area-Delay Product (ADP), Area-Power Product (APP). At architecture level proposed adder shows 12.8% over CMOS, 14.8% over hybrid and 11.4% over other GDI logic power savings, by having almost 55% reduction in area.


I. INTRODUCTION
Adder is the basic elements used in any Arithmetic Logic Unit (ALU) computing systems. Some other operations such as subtraction, multiplication, division and address calculation are based on addition. Employed largely in any Digital Signal Processing (DSP), image processing system and microprocessor, adders are most useful element in any Very Large Scale Integration (VLSI) implementation of designs for these applications. Since adder is the building block, enhancing the overall performance of 1-bit full adder is a significant goal and has attracted much attention [1].
The rapid growth of battery operated portable systems further demands for power saving and smaller size devices, and has intensified the scope of low power energy efficient microelectronics. Since battery technology did not optimise as fastly as silicon scaling, the need of hour is to provide different design solution for better reliability. A variety of full adders using different logic styles and technologies have been reported in literature [1], [2] with common aim of reducing power consumption and increasing speed.
The design criteria is generally many-folds: transistor count, power consumption, energy requirements, delay metrics etc. The overall performance improvement is done either at "system level" or at "design level" viewpoint [1]. In 'System Level viewpoint shortening the longest critical path in the ripple adders is considered to reduce the total critical path delay. In most situations, the longest signal path is in the propagation of carry out signals of ripple adders to generate the carry out of the most significant bit. At 'Circuit Design viewpoint' the goal is to implement a high-performance full adder core at transistor level. An optimized design is required to prevent any output signal loss, consume less power, have less delay in critical path and be reliable even at low supply voltage as we scale down with technology. Good driving capability under different load conditions and balanced output to avoid glitches is also an important point.
Logic styles that were mostly based on CMOS technology were used widely in circuit design; until Gate Diffusion Input (GDI) design methodology was introduced as a promising alternative to conventional Static CMOS Logic [3]. Originally proposed for fabrication in Silicon on Insulator (SOI) and twinwell CMOS processes, GDI methodology allowed implementation of a wide range of complex logic functions using only two transistors. It was observed that the area and dynamic power of GDI combinatorial and sequential logic were significantly reduced as compared to standard CMOS implementations. Similarly, the existing alternatives of CMOS design, such as Pass Transistor Logic (PTL), Transmission Gate Logic (TGL), the GDI gates presented reduced voltage swing at their outputs due to threshold drops. These drops usually cause degradation in performance and increased short circuit power [4]. Since the GDI circuits were implemented with much less transistors, a significant overall power reduction was observed, while maintaining minimal performance penalty. Recently, it was shown that any GDI circuit can be implemented in a standard CMOS process [4].
Researchers have efficiently used the GDI methodology to implement Full Adder logic blocks that provide better performance over the existing conventional Design Methodology. A GDI and CCMOS logic-based hybrid FA cell with high performance parameter [5] showed power saving for 32-bit Adder architecture. Another Hybrid GDI and transmission gate based adder to design MAC unit [6] showed 49.64% improvements in power-delay product for the proposed Barun multiplier based MAC unit and 30.84% improvements in power delay product for the proposed Baugh Wooley multiplier based MAC unit. A reconfigurable approximate ripple carry adder was suggested for high speed application with some modification by adding GDI cell [7] showed up to 23%, 34% and 95% reduction in area, power consumption and delay, respectively in compression with Conventional Design. A Similar approach has been carried out in this work to bring out the efficiency GDI methodology and measure the improvements over state of art designs.
In this paper, two different style energy-efficient full adder design have been implemented using full swing GDI based digital circuits such as AND, OR and XOR gates. The adders are simulated in standard 45nm technology and the performance of proposed full adder designs are compared with the state-of-the-art adders based on CMOS, hybrid and GDI logic.
The rest of the paper is organised as follows: Section II provides an overview of recent adder architectures designs using GDI technology. Section 3 combines the proposed two different design methodology for full adder core and formation of high bit-width multi-bit adder architecture with a high level complexity analysis. Section 4 shows the simulation results and comparison of design matrix with work reported in the literature. The concluding lines are drawn in section 5.
II. RELATED WORK This section presents a literature survey of conventional adders and comprehensive review on Gate Diffusion Input (GDI) cell. The short-comings and advantages of these technologies are described and full swing restoration logic is also discussed briefly.

A. Conventional Full Adder
The performance of digital circuits can be optimized by proper selection of logic styles. Different logic styles tend to favour the accomplishment of one performance aspect at the expense of others. The logic styles are varied in the method of computing intermediate nodes, the number of transistor count, though they are implementing the same function [8].
The well-known static CMOS adders with complementary pull-up PMOS and pull-down NMOS networks require 28 transistors for generating sum and carry outputs [9]. One of the main merits of this circuit is its robustness against voltage scaling and transistor sizing provided at full-swing. The layout is simple, symmetric and efficient due to the complementary transistor pairs however due to employing number of large PMOS sized up transistors in it structure, the input capacitance is large and also has a direct impact on its area.
PTL is an alternative to CMOS and offers most functions implementations with fewer transistors. This reduces overall capacitances which in turn will increase the speed and decrease the power dissipation. However, in the PTL based design, the output voltage is varied due to threshold voltage drop across the input and the output. This problem can be resolved by the But this logic produces larger short circuit current, higher transistor count and increase wiring complexity due to demand of complementary input signals. Building logic using Transmission Gate (TG) is another choice to minimize complexity, but lacks driving capability in cascaded structure [10].
Another form of logic known as Hybrid-CMOS, uses more than one standard logic in their structure such CPL, PTL and TG combination. The performance of such adder lies in between standard logic and provide more flexibility in designing. However different module of logic suffer from interconnect capacitance and requires larger number of transistor.

B. Basic GDI Cell
A. Morgenshtein, Fish and Wagner [11] presented a low power and reduced transistor circuit design known as GDI based circuit in alternative to CMOS technology. The basic GDI logic cell is a two complementary transistor structure and resembles the CMOS inverter as shown in Fig. 1. In CMOS inverter, the source and drain of NMOS and PMOS respectively are tied to Gnd and V dd , while in the GDI logic source and drain of NMOS and PMOS are independent inputs. Thus, it is a 3-input logic cell and can be mapped to perform the Boolean functions as described in the Table I. The key gates of an adder such as XOR, XNOR, AND OR function can be implemented with just two transistor model at high speed with GDI, but the main drawback of the GDI logic cell is the threshold voltage drop which results in lower voltage swing at the output. This causes large leakage power in circuit and slow switching at output. The reduced current driving capability affects the performance of gate. Lower V dd operations may even lead to false input logic determination error in the subsequent cascaded circuit.

C. Full Swing Logic
GDI gates provide reduced voltage swing at their outputs, i.e. the output high (or low) voltage is deviated from the V DD (or ground) by threshold voltage (V t ). The reduction in voltage swing is beneficial to power consumption. On the other hand, this may lead to slow switching in the case of cascaded operation. At low V DD operation, the degraded output may even cause circuit malfunction. Therefore, special attention must be needed to achieve full swing operation.  Table   Input output To overcome the problem of low output voltage swing, several methods has been proposed in the past. The compensation logic could be applied either at logic level or at the gate level. At circuit level, the output voltage reduction can be compensated by the use of swing restoration buffers at the output [12]. However, the presence of inverters in the buffers increases the transistor count and also increases the static power consumption when they are connected in cascade. A multiple V t technique is presented in [3], which utilizes low threshold transistors in the places where a voltage drop is to occur and also high threshold transistors for the inverters. Though this hybrid threshold voltage method minimizes power consumption, it becomes a bottleneck at the transistor fabrication process. Another method of swing restoration of GDI based, full adder output, using an Ultra Low Power Diode (ULPD) technique [13]. This technique configures the MOS transistor to work as a diode and uses 8 additional transistors for providing full swing. It mitigates the problem of static power dissipation as a conventional swing restoration buffer but still the complexity issue in the fabrication of ULPD is to be taken into account.
At gate level full swing GDI based boolean functions are presented using additional transistor. The output level of F1 and F2 logic cell is taken care by using inverted swing restoration transistor at the output [3]. This forms the basis of AND/OR gate at the adder circuit. Three GDI based full adders (GFA) are presented in [14] using new 4T XNOR/XOR gates. Depending on the circuit, one can use full swing XOR logic, full swing AND/OR gate, and MUX based swing restoration logic to overcome the issue with an additional hardware overhead and lower power requirements.

III. PROPOSED GDI-ARCHITECTURE
The key elements of the multi-bit adder is the FA cells which are connected in cascaded manner where carry propagates from least significant FA (LS-FA) to most significant FA (MS-FA). If the carry logic is optimized to reduce its path delay from LS-FA to MS-FA, the resulted adder design will provide high performance over the existing designs. This section exploits the circuit level design approach for designing high performance full adder core. Initially, two different FA designs namely 1) Area Efficient GDI based FA (AEG-FA) and 2) Power Efficient GDI based FA (PEG-FA) are presented, followed by an introduction of an energy efficient n-bit adder architecture that uses proposed AEG-FAs and PEG-FAs. Finally, the high level comparative complexity analysis is presented.

A. Proposed AEG-FA Designs
The proposed area efficient GDI based FA designs are based on unique method of using inverted and non inverted carry inputs and alternate FAs. The block diagram of proposed AEG-FA-I as shown in Fig. 2(a) generates inverted carryout instead of non-inverted in conventional FA design while generating non-inverted sum logic. This FA-I requires least number of transistors for its implementation using GDI logic. A complementary design to the AEG-FA-I design, namely AEG-FA-II is also proposed where input carry considered is inverted instead of non-inverted compared to the conventional FA design to achieve sum and carry (as shown in Fig. 2(b)).
Further, another Boolean expression for the AEG-FA-I, Design 2 are given by by Eq. 3 and Eq. 4.
Similarly the Boolean expressions for the proposed AEG-FA-II Design 1 are given by Eq. 5 and Eq. 6.
Further, another Boolean expression for the AEG-FA-II Design 2 are given by Eq. 7 and Eq. 8.
The proposed FA designs are implemented using GDI logic to have least number of transistors in circuit design to reduce implementation complexity. Further, FA designs are implemented such that they overcome the threshold voltage loss and provide full swing operation with GDI logic. The resultant FA designs exhibit minimum circuit complexity and high speed

B. Proposed PEG-FA Design
The proposed power efficient GDI based full adder is similar to the conventional full adder design. The PEG-FA generates 2-bit output a non-inverting carry and sum with given 3inputs (A, B, C in ). In this section we proposed two different PEG-FA designs using basic GDI cells from Table I that has minimum number of transistors for power saving. The Boolean expressions for the proposed PEG-FA Design 1 and PEG-FA Design 2 are given by Eq. 9, Eq. 10 and Eq. 11, Eq. 12 respectively.
Similarly the boolean expression for PEG-FA design 2 is given by: The proposed PEG-FAs have least number transistors in GDI based design as shown in Fig 5. These FA exhibits minimum complexity at circuit design level. The FA has full swing output voltage and faster switching. The PEG-FA design 1 has independent full swing parallel XOR-XNOR combination for the selection of carry. The design avoids the use of inverted carry input for carry-out computation and thus reduces an inverter from the critical path resulting in reduced carry chain propagation delay. Similarly PEG-FA Design-2 with 18 transistors reduces the gate capacitance at input of sum and carry-out node, resulting in fast switching and reduced delay operation. The overall configuration results in lower power of the circuit.
The proposed AEG-FA-I, AEG-FA-II and PEG-FAs uses the basic unit of adder circuit such as Full swing XOR [14], AND, OR gate [15] and Full Swing MUX logic [3]. These basic circuit elements uses the minimum number of transistor for a full voltage operation, thus reducing the power requirements of circuit.

C. Proposed Adder Architecture
This subsection presents the proposed energy efficient multibit adder architectures by using proposed AEG-FAs and PEG-FAs cells.
1) Area efficient GDI based Adder (AEGA): The architecture of proposed AEGA is shown in Fig. 6. The AEGA is implemented by connecting proposed AEG-FA-I and AEG-FA-II in the alternate position. The least significant FA in AEGA must be AEG-FA-I to provide non-inverting carry-in and output carry of this FA is connected to the carry-in of AEG-FA-II to match the complimented logic. The AEG-FA-I followed by AEG-FA-II forms a pair of proposed addition logic. This pair is used to built higher bit-width adder.
The consideration of different combination of proposed AEG-FA-I and AEG-FA-II adder designs provides 4 different adder designs as shown in Table II to observe the effective behaviour in multi bit adder RCA. The various combinations provides trade-off between different design metrics which can be effectively utilized in different applications. The AEGA-I design has 144 transistor with slight increase for delay; and can be used in area efficient design, while the AEGA-IV has highest number of transistor (184) with the minimum carry chain delay. The inverting and noninverting carry-out from AEG-FA-I and AEG-FA-II avoids the  The multibit adder architecture with proposed PEG-FA is shown in Fig.  7. Like conventional multi-bit adders, the design is implemented using PEG-FA cell is cascaded to form the higher bit width full adder architecture.
The cascading of PEG-FAs does not deteriorates the full swing output results of multi-bit adder. The longest carry chain delay of proposed PEGA depends on the worst case switching delay of individual PEG-FA cell. The overall power of the PEGA is reduced by reducing the effective size of PEG-FA.

IV. EXPERIMENTS RESULTS AND ANALYSIS
The performance of proposed AEG-FA and PEG-FA adders is evaluated over the state-of-the-art full adder designs by implementing them in Synopsys Custom Designer while considering 45nm technology file. Maximum frequency of the inputs is 100 MHz. The power supply (Vdd) is 1V and the transistor sizing was set as Wp/Lp= 90/45, Wn/Ln= 60/45. These designs are first simulated for functionally checks and the average power and delay metrics are evaluated. The simulation results for the proposed full adders and existing adder are summarized in Table III. These adders were then cascaded to form 8 bit wide RCA and similar analysis was carried out in simulation environment to show results for Area-Delay and Power-Delay matrix that are summarized in Table IV. A. Performance of FA as single cell The Table III evaluates different design parameters considered in the literature survey. Transistor count specifies the total number of transistor used for each design of 1bit adder. Measurement of power dissipation is dependent on input test pattern. To estimate fair power dissipation, all the inputs should get equal number of high-to-low and low-to-high transitions. On the other hand, the propagation delay of a FA is dependent on the previous input combination (A, B, C in ) and the current input combination (A, B, C in ). Hence, the power dissipation and critical path delay is measured for all possible 48 transitions, presented in [17]. The Power-Delay product (PDP) (measured in f J), Area-Power Product (APP) and the Area-Delay product (ADP) are calculated while considering each transistor as unit area. Therefore, the APP and ADP are computed by multiply power and delay to the number of transistors receptively. The proposed PEG-FA designs implemented using 18 transistors outperform all other literature adders. It has least number of transistor and consumes minimum power. PEG-FA has 36% reduced area over CMOS. Individual PEG-FA-D1 has 15% more power savings over the others existing GFA-1/2/3 [14] while PEG-FA-D2 and AEG-FA-II-D2 saves 5% and 4% respectively. The delay for all proposed FA cells is symmetric for Sum and Carry-out due to parallel computation and result in high speed circuit design. As can be seen from the Table III, PEG-FA-D2 has the lowest 1-bit sum delay providing result at fastest operation speed when carry over unit is not needed. The AEG-FA designs implemented as inverting and non-inverting carry-in combination is energy efficient and collectively has the lower S-PDP compared to CMOS, GFA-1, GFA-2 and GFA-3 designs. The AEG-FAI-D1 and AEG-FAII-D1 uses 18T with 4% and 9.2% power saving respectively compared to existing GFA-2 design.

B. Performance of FA as cascaded operation
In order to evaluate performance matrix in practical circuits, the FAs have been extended up to 8 bits and the results are referenced in Table IV. In terms of ADP, the PEG-FA designs show the best result with 58%, 32%, 20% and 46% improvements over CMOS, GFA-1 [14], GFA-2 [14] and GFA-3 [14] designs, respectively. Similarly, the C-PDP shows 51%, 42%, 18% and 62% better performance over the CMOS, GFA-1 [14], GFA-2, [14], GFA-3 [14] designs, respectively. The propagation delay of AEGA's are comparable to conventional literature adders with almost 30% performance enhancement in C-ADP. AEGA-I/II/III are the low power, area efficient design combination while the AEGA-IV is operates faster with 29% and 49.6% performance improvement in C-PDP compared to CMOS and HYBRID adders respectively. The APP results are best among all designs revelaing the lower power application. Graphical analysis as shown in Fig. 8 reports the significant power savings of proposed 1-bit adders. All the proposed PEGA and AEGA designs clearly performs better than conventional CMOS and HYBRID and [14] designs as illustrated in Fig. 11. The lowest C-PDP and S-PDP of PEG-FA as shown in Fig. 9 fulfils the goal of energy efficient design approach. Various combination of AEGA as shown in Fig. 10 provides reliable solution to different arithmetic unit applications.

V. CONCLUSION
In this paper, we have presented two GDI logic based low area, power saving and energy efficient full adder logic design namely AEG-FA based on combination of inverting and non inverting carry logic and PEG-FA based on traditional 1 bit computation. The proposed full adders are then cascaded to form multi-bit adders as PEGA and AEGA. A simulation was carried out at 45nm technology with different input combinations and results were concluded as the proposed PEG-FA with transistor count of 18T saves area and consumes least power among all designs. The parallel output computation results in smaller delay and proposed AEG-FA has various combination for area, power and delay matrix, which can be used for different applications. At architecture level PEGA shows 12.8% over CMOS, 14.8% over hybrid and 11.4% over other GDI logic power savings, by having almost 55% reduction in area. The alternate inverting and non inverting carry in AEGA (I-IV) combination helps in obtaining output bits at 43% reduced ADP and 50% lower PDP than CMOS and hybrid logic respectively.