Utilization of Power Gating Technique for the Analysis of MPSoC Based On Dual Feedback Edge Triggered Flip Flop


 With the persistent scaling of semiconductor technology, the embedded multi-processor platforms lifetime reliability has been the primary concern for the industry. The advancements in technology permit several microprocessors integration, dedicated digital hardware, and at times mixed-signal circuits on a single silicon die, specifically multi-processor system-on-a-chip (MPSoC). In this paper, the design and analysis of CMOS based MPSoC is made. A CMOS based MPSoC is designed with 45nm technology. In this design, ADC converter is used in which 10-bit data is given as input, and are converted into digital data. A double feedback edge triggered flip flop is designed. The implementation of flip flop, based on both feedback and triggering process, is more effective in the elimination of error occurrence. Power Gating (PG) technique is proposed which exploits the stacking effect to achieve high energy efficiency. Binary controlled stacked SRAM cell, based on a parallel cross-coupling feedback controller, is implemented to reduce the leakage loss and ground bouncing noise. An inverter, based on NMOS and CMOS, is used for the inverting process. The input voltage of 5v is given and is varied. Then, a 2-bit counter is employed, which is responsible for counting down or counting up. The counter should count down if the signal is high. The counter should count up, if the signal is low. Thus, this design will be helpful in the implementation of compact processing system which may also be employed for many real-time applications where there is a need of compact device. The benefits of this multi processors integration will be helpful in speeding the process thereby reducing the leakage power loss, power consumption, delay factor and so on. The analysis of performance is carried out using CMOS based Tanner EDA, and the outcomes are represented.


INTRODUCTION
In the past times, Multiprocessor systems-on-chips (MPSoCs) have emerged to be the significant set of Very Large Scale Integration (VLSI) systems. MPSoC, which comes under a VLSI system, is a system-on-chip that integrates the majority or entire mechanism essential in favor of the function, which in turn employs multiple programmable processors as a component of the system Mossaad Ben Ayed et.al., (2017). MPSoCs are used widely in signal processing, communications, networking, and multimedia over other applications. MPSoCs comprise a distinctive evolution branch in the architecture of the computer, mainly multi-processors, through the necessities on this system: low-power, multitasking, and real-time applications Surantha et.al., (2017), King et.al., (2013). Since the number of the transistors presented on a particular chip enhances to billions or even better numbers, multi-processor system-on-chip (MPSoC) becomes a smart choice on behalf of low-power and high-performance applications Marsono et.al., (2017). Conventional architectures of on-chip communication meant for MPSoC face some issues, like limited bandwidth, high power consumption, and poor scalability Wang and Chattopadhyay (2018) , Breaban et.al., (2017). While new applications of MPSoC push the limits always, the conservative electronic router and metallic interconnects turn out to be the NoC bottlenecks performance gradually because of the vast area, long delay, limited bandwidth, crosstalk noise, and high power consumption. System-on-chip (SoC), that includes co-design of hardware/software, was employed extensively in several embedded systems Tang et.al., (2018). Furthermore, contemporary MPSoC permits software parallelism, which gives rise to more HW/SW possibility co- design Hoppner et.al.,(2013) , Huang et.al., (2011), Cui et.al., (2019. In the traditional Successive Approximation Register (SAR) ADCs Tohidi et.al., (2019), George et.al., (2016), a significant power dissipation sources are a comparator, DAC capacitor array, and the digital control circuit.
The digital power decreases with technology development. On the other hand, the comparator power and capacitor network are restricted by noise and mismatch issues. Double edge-triggered flip-flops become an admired technique for low-power designs, as they facilitate an effective clock frequency halving. Power gating is preferred to restrict the power leakage in high performance Integrated Circuits (IC) when the device is in standby mode. It employs sleep transistors as switches for blocking the current flows over the block when it is not in use. With the use of sleep transistors, it might perform an efficient management of thermal and power effects. The dynamic characteristics of device in turn affect the power dissipation and, in turn enhances the temperature in the node of the circuit. Temperature too impacts the leakage current in static mode of device. Consequently, the mutuality between power and temperature dissipation desires an effective power gating method which has a significant part in ICs. Different types of power gating technique are offered to reduce the leakage power during standby mode. Leakage power is separated into active leakage and standby leakage. Leakage power can be varied by varying process variations, transistor count and transistor resizing.

Contribution of the Research
The main contribution of the paper is to reduce the stacking effect to achieve the high energy efficiency. To overcome the existing power gating techniques for reducing the ground bouncing noise. Here, CMOS based MPSoC is designed and analyzed. A CMOS based MPSoC is designed with 45nm technology. A double feedback edge triggered flip flop is designed. The implementation of flip flop, based on both feedback and triggering process, is more effective in the elimination of error occurrence. Power Gating (PG) technique is proposed which exploits the stacking effect to achieve high energy efficiency. Thus, this design will be helpful in the implementation of compact processing system which may also be employed for many real-time applications where there is a need of compact device. The benefits of this multi processors integration will be helpful in speeding the process thereby reducing the leakage power loss, power consumption, delay factor and so on. The analysis of performance is carried out using CMOS based Tanner EDA, and the outcomes are represented.
The rest of the paper is organized as follows, section 2 provides the related works based on reduction of power consumption as well as reduce ground bounce noise. Section 3 provides the proposed control strategy in SRAM to reduce the noise and power consumption of the circuit in active mode operation. Section 4 describes the simulation results of the proposed control design of SRAM and section 5 concludes the paper. Andreev et.al., (2018) explored the design of the FCA system impact on several 3D architectures. It also presented a method for optimizing a 3D MPSoC along with incorporated FCA to run a known workload in a most energy-effective manner. The outcomes reveal that an optimized design can keep up energy at 50% regarding the configurations of sub-optimal 3D MPSoC. Jerraya and Bacivarov (2017) discusses the key factors necessary for ensuring the development and reuse of such platforms successfully. The works mainly depend on the modeling standards importance in the design of systemlevel and recognize pieces missing, which would increase model interoperability further and make possible the integration of the platform. It summarizes the major promising requirements recognized from demanding virtual platforms usage in the product groups: software-in-the-loop early power estimates, power supplies, reset, and system synchronization capture aspects and representation of clock trees.  Liu et.al., (2018) suggested a novel algorithm of heuristic for mapping task in the system of many-core dark silicon, termed Topo Map, architecture of on SMART top that can resolve problem of communication contention effectively in the polynomial time. By means of fine-grained contemplation of chip inter-processor and thermal communication reliability, the offered approaches are controlled by the NoC communication configurability topology in task scheduling and mapping. A system of Thermal-safe is assured through decentralized active cores actually, and overhead communication is minimized by the maximized bypass routing and reduced communication contention. Halim et.al., (2018) considered a low-power Turbo decoder implementation that is flexible on a low power programmable MPSoC in data throughput expense, meant for IoT applications of low-data rate. A comprehensive software-based implementation analysis and the decoder targeted optimization for specific MPSoC are provided. Maillard et.al., (2018) described an assessment methodology with the use of design set Xilinx System Validation Tool (SVT) for characterizing response of single event Xilinx's 16nm Zynq Ultrascale+ MPSoC quad &dual ARM core processors. Accelerated beam SEU test of device XCZU9EG was carried out. Breaban et.al., (2017) proposed two techniques depending on barriers and FIFO channels for the implementation of data synchronization and time on MPSoC. While blockades synchronize the implementation tasks flows at the points that are predefined at the execution, and FIFO is a data communication that is an asynchronous technique among two tasks. At first, they are employed in the application of LET implementation.

RELATED WORKS
Yun et.al., (2019) presented a GA (Genetic Algorithm) dependent energy-effective algorithm of task scheduling design-time, AGATS, for an asymmetric multi-processor SoC. Not like offered algorithms of GAdependent task scheduling, AGATS adaptively applies the strategy of various generations for the solution candidate depending on the time of consumption and completion of energy. Chava and Saravanan (2016) presented a technique of DET which is the mainly liked choice in the digital VLSI design field for the researchers because of its low-power consumption and high standard performance. The techniques of DET offer related performance at clock frequency half as related to techniques of Single Edge-Triggered(SET). A technique of DET might decrease 50% consumption of power by augmenting the power savings of the total system. Ro et.al., (2020) investigated the structure of ADC in the environments of radiation and presents an ADC successive-approximation-register (SAR) with the use of delay-dependent double feedback flip-flops for enhancing the tolerance system. Bonetti et.al., (2015) stated that the synchronous DET process was an extremely smart preference for high-performance, low-power designs. On comparing conservative single-edge synchronous systems, the operation of DET was competent in offering the throughput similar to the clock frequency half. This might cause important power savings on the clock network that is one of the major and regular intentions to the total power system. Singar et.al., (2018) proposed the design of Advanced Glitch Free Dual Edge Triggered Flip Flop (AGF-DET-FF) that decreases the power and area consumption, and enhances the system speed. For better performance, a 2P-1N structure combined with the C-element structure is made. To control the loading of input, two structures are integrated with the connected share transistors by data input.

PROPOSED WORK
This section depicts the detailed description of the proposed mechanism. The input voltage of 5v is given which is varied.

Design of Dual controlled Stacked SRAM cell
Small area selection, speed and low power consumptions are considered as main factors to improve performance of the circuit. And, power dissipation and delay of the circuit have considered as main factors of designed and which based on number of transistors utilized in the circuit. In the SRAM cell, the ground bounce noise is disturbs the behaviour of the circuit design which must be reduced. Hence, ground bounce reduction has become extremely significant to design of circuit which reduce static power dissipation and noise during periods of inactivity. The power consumption issue also motivate the ground bounce noise in the circuit design. The power reduction and noise reduction should be attained without degrade performance which makes it harder to minimize leakage during normal operation. On the other hand, many methods are available to reduce ground bounce noise, power consumption in standby mode or sleep mode. One of the promising techniques for reducing ground bounce noise and power consumption is stack approach whereby leakage energy is stored by cutting the supply voltage in idle circuit design.
The general stack approach fails to control the bound noise under the high supply voltage and high power consumption.
Hence, in this proposed methodology, dual stacked technique is developed in SRAM cell design which able to reduce ground bounce noise, power consumption reduction and leakage current minimization. The proposed architecture is The ground bounce noise is defined as a condition when device output switches changes high to low and which produce voltage change in pins of circuit. It is mostly happened in high density VLSI where required precautions have been taken to supply a logic gate. The ground bounce noise is reduced. To reduce ground bounce noise in the circuit design, low resistance must be connected with ground. The proposed dual stack approach is automatically connect to low resistance in ground condition which completely reduce the ground bounce noise scenario. Ground bounce noise is occurrence of voltage oscillation among the ground pin on a component package and the ground reference level on the component die. And also, it is happened due to current surge passing through the lead inductance of the package. The noise is generate by the high current flow through the ground pin that designs a voltage drop over the lead inductance and Ground bounce noise. This voltage oscillations on the ground line generates two main issues, initial it increases the chip off ground potential that in turn increases the devices input threshold level. Secondly, it increases the voltage level on an output pin that is not switching. Ground bounce noise completely reduces with the novel dual slack approach in SRAM cell. The designed dual stacked approach is used to reduce the leakage current, ground bounce noise and reduce energy consumption. The hybrid form of proposed approach is used to attain the objective by reduce the DRV voltage. Based on the architecture, the mode of operation of the dual slack approach is presented in following sections.

Mode of operation
The basic idea behind this approach for mitigation of ground bouncing noise and reduction of leakage power is due to the stacking effect of transistors from supply rail to ground rail. It is based on the fact that if only one transistor is turned off in the supply to ground path there is more leakage than more number of transistors are in off state. In this paper, a dual control stacked inverter which operates in three modes such as active mode, standby mode and cutoff mode is presented. In sleepy power gating, PMOS transistor with a sleep signal is used in the pull up network and NMOS sleep transistor is used in the pull-down side which is activated by the inverted sleep signal. The operation of sleepy stack is similar to the sleepy approach where the sleep transistors reduce the resistance and the stacking effect reduces the leakage power but it increases the propagation delay. To overcome this limitation and for further reduction in leakage power the dual controlled stacking technique is implemented for an inverter and its CMOS structure. The designed model is operated with four mode of operation such as listed below,  Active mode of operation  Cut off mode operation  Standby mode of operation The designed model is completely reduce ground bounce noise in the SRAM cell structure. The proposed model is operated with the three modes of operation which is explained as follows.

Active mode of operation
During Active mode operation, both the sleep and hold signals are set to '1'. If the sleep signal S is enabled as high P1 is turned off and N3 is turned on and it transfers the data from ground to N2. If the hold signal H is enabled as high, N1 is turned on and it transfers the data from Vdd to P2.The NMOS transistor in pull-down network shorts actual ground of the circuit to the virtual ground (VGND) that permits Vdd to the logic circuit for high-speed operation.
Based on the input X, S and H, CMOS inverter performs the inversion operation that is if X=0, the output Y=1, otherwise Y=0.

Cut off mode operation
The operation of dual control stacked inverter in cutoff mode which does not maintain the state. If the signal S is kept at low by maintaining the hold signal at high, the flow of current to the ground is cut and it suppresses the subthreshold leakage current which is the current between drain and source in subthreshold region. The leakage current is computed based on the characteristics of drain current with respect to drain voltage or source voltage. The threshold voltage Vth is represented as Where = indicates thermal voltage, n is the subthreshold coefficient Here, µ 0 is the mobility of the transistor, Cox is gate oxide capacitance, Weff is the width of the transistor and Leff is the length of the transistor.

Standby mode of operation
In this method, the virtual ground (VGND) is kept at the threshold voltage of PMOS transistors |VTP|. The response from the logic circuit is Vdd-|VTP| reduces the gate leakage and subthreshold leakage. In this mode, the state is preserved and the ground bounce noise due to mode transitions is lesser than the ground bounce noise in cut off mode because the virtual ground is limited by |VTP|.

Ground bounce noise reduction
The memory is designed using dual control stacked inverter based on stacking effect that reduces the leakage power by turned off the number of transistors in a series manner. Two dual control stacked inverter are cross coupled to form a memory and it act as a storage cell. Here, 0 and 1 are used as two stable states. There are two access transistors to perform read and write operations which is controlled by word line. Two bit lines BL and BLB are used to transfer the data for read and write operation. The word line is enabled as high for both read and write operation.
BL is set to 1 for writing 1 and 0 for writing 0. During read operation, BL is set to 0 for read 0 operation and it is set to 1 for read 1 operation.
In this proposed SRAM memory design, the threshold voltage of sleep transistors is reduced by reducing the length of the transistor to improve more leakage power reduction. It is mainly due to the relation between threshold voltage (Vth) and length (L) of the transistor.

A. Analog to digital converter:
A 10-bit is given as input to the SAR ADC. It consists of dynamic comparator, a differential capacitor networks, and the Successive Approximation (SA) control logic. Figure 3 and

B. Dual Feedback Edge triggered flip flop (DFET-FF):
The Low Swing clock Double edge Flip-flop (LSDFF) circuit diagram is represented. The flip-flop input is transferred to the output on falling and rising clock edges. For reducing the consumption of the power of the clock tree, a clock of the low swing is employed at this logic. In the direction to include a proper performance, some high-Vth transistors are transformed by low-Vth transistors whose sub-threshold currents are restricted by high-Vth transistors. For similar throughput, clock frequency in LSDFF might be half clock frequency in SDFF or HLFF. The power consumption in the clock tree is comparative to clock load, the swing of the clock, and the frequency. On comparing previous FF's, the frequency of the clock and the swing is lesser, the consumption of LSDFF clock tree power might be lesser than others. Conversely, sub-threshold uncontrolled current low-Vth transistors in the clock tree cause more power consumption. Also, as three transistors make charging (discharging)of the X2 (X1) internal node, the speed of the circuit is decreased.
To evade redundant transition in preceding flip-flops, a DFET-FF is suggested. In this flip-flop, the transitions node occurs simply once the inputs are dissimilar in two consecutive clocks. In order to ease these limitations, high sensitivity and ultra-low power joint clock gating based dual feedback edge triggered flip flop is designed. This, in turn, facilitates the error reduction in the final output signal.
The feedback signal is given, which in turn enhances the output and rectifies the error at the output side.
The implementation of flip flop, based on both feedback and triggering process, is more effective in the elimination of error occurrence. The data will be read based on the controller.

C. Binary controlled stacked Static Random Access Memory (SRAM):
There is a need for an effective method to overcome the leakage issues in SRAM technology. As such, a RAM-based on parallel cross-coupling logic is proposed, which will reduce expenses, time, power, and memory complexity. 6T-SRAM cells were employed for implementing classifiers of machine learning, and analog domain dotproducts for the recognition of the pattern. The fundamental idea is to facilitate multiple rows of memory bit cells and deliver a voltage directly at the pre-charged bit lines in line with the preferred process. With lower processing nodes, the memory cell capacity is also reduced. The leakage factor of the current memory cell will be thereby increased.
The intersection of leakage, gate leakage, and multi-threshold leakage current are several additives that trigger the 6T SRAM cell leakages. SRAM, based on a parallel cross-coupling feedback controller, is implemented to reduce leakage loss. When the input signal current is low, the clock signal use becomes unnecessary, thus resultant are unwarranted and excessive consumption of power.
Power gating technique is also known as low power design technique. The power gating strategy is established in three modes, active mode, low power mode and wake up mode. Since the previous ground bounce noise reduction techniques reduce the power dissipation and delay, there are high noise fluctuations. To mitigate this problem binary control stacking technique is proposed. Though various methodologies have been already proposed to reduce power, this proposed technique has several advantages compared to other techniques. One of the efficient noise mitigation techniques is binary controlled stacking which mainly depends on stacking effect of transistors.
The memory is designed with the use of binary control stacked inverter depending on the effect of stacking effect, which in turn decreases the leakage power by turning off the transistors number in a series manner. These two binary control stacked inverters are cross coupled to create a memory and, this in turn will act as a storage cell. Here, 0 and 1 are employed as two stable states.
There are two access transistors for performing the operations of read and write which are controlled by word line. Two bit lines, BL and BLB, are employed in transferring the data for the operation of read and write. The word line is enabled as high for both read and write process. BL is set to 1 for writing 1 and 0 for writing 0. At read operation, BL is set to 0 for read 0 operation and it is set to 1 for read 1 operation. In this proposed SRAM memory design, the threshold voltage of sleep transistors is decreased by reducing the length of the transistor for enhancing more leakage power reduction and ground bouncing noise.
The memory blocks of MUX are shown below:

D. Inverter and 2-Bit counter:
The inverter is designed based on NMOS and CMOS technology, which is responsible for inverting the given data. A Counter is a device which stores (and sometimes displays) the number of times a particular event or process has occurred, often in relationship to a clock signal. The input of about 5v is given (vin=5v).

Figure. 7 Memory with counter block
It is a number of 2-bit which will be employed to load a value into the counter as up/down: This represents if the counter will be counting up or counting down. The counter must count down if this signal is high. Ifit is small, the counter should count up. Counter is a digital device, and output of counter adds a state that is predefined depending on the applications of clock pulse. The counter output will be employed for counting the pulses number.
Over the decades, there has been a need for high performance and improvements. Technology scaling is not very effective in addressing high performance demands. So, nowadays MPSoCs are employed to perform tasks which require intense computation and communication. As a result, the proposed design has processors, ADC, counters which can be employed in a compact device by integrating their usage. This integrated benefit of multi-processor will be employed in many real-time applications.

SIMULATION AND PERFORMANCE ANALYSIS
The Simulation and performance analysis of the proposed mechanism is shown below. For the performance analysis, Tanner EDA is used for analyzing proposed binary control stacked inverter and SRAM memory design. It is simulated using Tanner EDA with 45nm technology. TANNER EDA helps to transform ideas into design. It has created a cost efficient software platform. It is powerful enough to handle complex design. TANNER EDA's continued innovation makes its tools as effective solution that grows with a company as its performance needs change. TANNER EDA consists of various tools namely S-edit, T-spice, W-edit, L-edit and LVS.

A. I/O of ADC waveform:
The output waveform of the ADC converter is shown below. The analog bit is given as input and the ADC converts analog bit to the digital bit, and the output of digital output volt is shown.

B. Output waveforms of SRAM:
The input pin 1V, input 2V, and output 1V were simulated, and the results are shown in the analysis provided below: Figure. 9 SRAM output waveform

C. Inverter and Counter analysis
The inverter analysis is shown below. The given input bit is inverted, and the analyses are represented. The counter results are depicted and simulated in the diagram shown below. The output pin 1 volt, output pin 2v, and output 3V were simulated, and the results are analyzed in terms of seconds. The performance analysis is simulated, and the output waveforms for the entire block are represented in Figure 11. The performance analysis of Inverter, Counter block and Transient characteristics of DFET-FF using (Tanner) is depicted in the Figure 10, 11 and 12. The comparison table also tabulated in the table 1.   Figure 13, 14, and 15 show the comparative analysis of the proposed and the existing mechanism in terms of power consumption, leakage power loss, and delay. The analysis shows that the proposed system is better in reducing the leakage loss, delay, and the power consumption on comparing existing mechanisms.

CONCLUSION
The multi-processor system on chip is designed in which the ADC converter is employed for converting the analog data to digital bits. Dual-feedback-edge-triggered flip-flop, instead of a conventional one for reducing the errors in the final output signals. The feedback signal is given, which in turn enhances the output and rectifies the error at the output side. The implementation of flip flop, based on both feedback and triggering process, is more effective in the elimination of error occurrence. Power gating technique, which exploits stacking effect was implemented for reducing the leakage loss and ground bouncing noise. Binary controlled stacked SRAM cell, based on a parallel crosscoupling feedback controller, is implemented for reducing leakage loss. The inverter is designed, based on NMOS and CMOS technology, which is responsible for inverting the given data. The 2-bit counter is responsible for up/down functioning: This indicates if the counter will be counting up or counting down. The output of the counter can be used to count the number of pulses. Hence, the CMOS based MPSoC is designed at 45nm technology using power gating technique. Thus, the integration of these multi processor was carried out and was analyzed by employing in huge realtime application at which there is a necessity of compact device. This integration will be applicable in compact application. The multi processor design, in turn reduces the leakage loss, power consumption, delay and ground bouncing noise. The delay of about 0.5902ps was attained with power consumption of about 0.049208 (mw), and leakage loss of about 0.012 (w) was attained which is better than the existing techniques. The performance analysis was carried out by using the Tanner EDA, and the outcomes are shown to prove the effectiveness of the proposed scheme. Hence, on using gating technique based stacking effect at 45nm technology, the performance analyzed was found to be superior. In future, various gating technique is utilized and the power consumption, leakage loss, power consumption, and ground bouncing noise are analyzed. For the controlling process, any Artificial Intelligence (AI) technique will be chosen for reducing the error rate.