Pareto technique optimization for 3D NOC architecture

doi:10.21203/rs.3.rs-1977082/v1

Download PDF

Research Article

Pareto technique optimization for 3D NOC architecture

https://doi.org/10.21203/rs.3.rs-1977082/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Network on-Chip (NoC) – the network-based communication between operational cores and intellectual property cores in a single chip – has been eliciting much interest in recent years. The major barrier to the effective design of NoCs has been high-speed data transfer and connections that are only required when necessary in massively parallel, multi-core, low-power applications. To solve these issues, a new technique called Pareto African Buffalo Optimized Mapping Weighted Directive Graph Theory (PABOMWDGT) is proposed in this study. The suggested method aims to locate efficient operational cores integrated on the device in the shortest time and demonstrate the effectiveness of 3D NoC. In this approach, a selection of IP cores from the benchmark dataset are first listed along with their connections. The mapping approach on the 3D NoC topology is optimized for the African buffalo. Random initialization of the IP cores (also known as buffalos) in the optimization technique's search space is performed. Every IP core in the population is used to estimate the various objective functions. The Pareto function is then examined using the African buffalo optimization technique of Deming Regression. A fitness metric is employed to determine the best fit. The position of the buffalos is updated and the best option is identified when one buffalo's fitness level exceeds that of the other. The process is repeated until the maximum number of iterations is reached. Then, mapping is done based on the probability. It is seen that it takes less time to develop an effective mapping of cores in the 3D NoC architecture. Experimental results show that the proposed PABOMWDGT technology is superior to state-of-art techniques with a 0.74 packet / cycle / IP block throughput, 140 clock cycle delay, and 11 ms computation time.

Networks-on-Chip (NoC)

Deming regression

Pareto african buffalo optimization

The subsystem of on-chip communication between IP (intellectual property) blocks within the System on-Chip (SoC) is called the 3-D Network on Chip (NoC). NoC integrates heterogeneous cores into a single chip to provide a scalable, energy efficient, and reliable architecture. The advantage of 3D NoC is the combination of NoC and 3D integration. Benefits include reduced form factor, faster interconnects, and lower latency. The 2D NoC critical path cannot be shortened by placing key components in close proximity. This reduces transmission delays. This problem can be overcome by 3D NoCs. To better understand the workings of mesh-based 2D and 3D NoCs, we have studied the loss of energy between the wiring area and the core.

In this work, we have introduced 3D-NoC to 3D-NoC to reduce the connection length between cores and improve the connection between aircraft numbers all over the world. Many cores are used in a single integrated circuit to build a digital system. NoCs become the communication backbone of manycore processors, enabling a high level of integration [2]. There are significant performance gains in combining multiple multi-core processors on a single chip, but there are many problems as well. The main challenges are in the provision of effective, reliable communication between the cores, saving power and scalability. Effective 3D NoC designs that meet QoS-enabled power-saving and communication requirements for large multi-core applications are required.

Motivation

Due to continual advancements in VLSI technology, new Integrated Circuits (ICs) combine several processing elements in a single chip. This raises communication difficulties and increases system-on-chip design complexity. [1]. The on-demand on-chip communication architecture known as Network-on-Chip (NoC) [2] evolved to overcome the above problems. NoC enables the incorporation of multiple components in a single chip.

The performance of the NoC systems decreases as more Intellectual Property (IP) cores are considered during its design. The current mapping technique was created for 3-D NoC design The area and delay performance are the next two essential components that must be considered in the 3-D NoC architecture. Numerous research papers have reported work on the with performance for 3D NoC topology desings. It has remained challenging to construct such networks for large-scale multi-core applications with low power consumption. Architectural design optimization is essential for on-chip networks, to improve network performance while consuming fewer resources.

To overcome the above drawbacks, we have developed a new method called Pareto African Buffalo Optimized Mapping Weighted Directive Graph Theory (PABOMWDGT) that combines the Pareto Deming Regressive African Buffalo Optimized Mapping method with Weighted Directive Graph Theory. Some of the salient features of PDRABOMWDGT are listed below.

Weighted directive graph theory was employed in NoC mapping to increase network throughput and decrease latency. The Deming Regressive African Buffalo Optimization approach was applied during the mapping procedure to determine the population's most power- and energy-efficient IP core. The graph's smaller number of cores resulted in shorter run times. There were no delays and packet transport to the destination is improved.
The novelty of Deming An African buffalo optimization method is that it employed regression. The fitness was then calculated based on a number of factors, including area, power, energy consumption, and delay. The most effective IP cores for 3D NoC designs were rapidly determined by regression analysis, an innovation in optimization.
The performances of the PABOM approach and other optimization techniques were been experimentally validated utilizing the MCNC benchmark netlists dataset [21] based on several performance criteria. The results showed that the PABOM strategy was significantly more effective than alternate methods.

There are six sections in this article. Prior studies in this area of work are reviewed in Section 2. Section 3 describes the PABOM method for resolving the design issues associated with 3D NoC. Section 4 presents the details of the experimental evaluation of the proposed and current algorithms with benchmark datasets. Section 5 describes the performance analyses. Finally, Section 6 presents the results and the pointers to future work in this area.

On-chip communication subsystem between IP (intellectual property) blocks in the system A probabilistic multipurpose PARETO optimization framework was presented in [1] to reduce network latency and power consumption. However, the framework was ineffective in enhancing the capabilities of automated NoC designs for multi-core systems. The use of Self-adaptive Chicken Swarm Optimization (SCSO) was described in [2] as an effective mapping technique to reduce NoC power consumption. However, the SCSO algorithm was not designed for 3D topologies and therefore did not consider mapping with other performance measurements such as area, latency, etc.

The Artificial Gorilla Army Optimizer (OPGTO) algorithm was reported in [3]. It was aimed at determining the best approach for reducing power consumption. It employs knowledge that is in opposition to parallel strategies. The planned procedure required time. The African buffalo optimization methods were used to rapidly identify the optimum IP core for 3D NoC. Pareto optimization was used to account for area and delays in performance. The 3D wireless NoC design was introduced in [4] to commence data communication with minimum power consumption, and a wireless network-on-chip architectural design was assessed. The problem with this approach was the physically unstability of the device when liquids were pumped through microchannels due to high-pressure intrusions.

A Gauss-based optical NoC design was implemented in [5]. Although this approach minimized latency and increased output, it did not hasten calculations. A hardware-efficient WiNoC with a honeycomb architecture was presented in [6] aimed at conserving resources such as latency, network cost, and energy use, However, latency and high power consumption were outside the scope of the Honeycomb architectural technology and hardware-efficient WiNoC.

Adaptive Thermal Recognition Routing Technology (ATAR), was created in [7] to lower the chip's peak temperature. This model could not address the issue of the heat-aware solutions of the 3D manycore system. A methodology for exploring the multipurpose design space of network-on-chip power grid architectures was presented in [8]. However, the framework did not effectively adopt evolutionary strategies to reduce the convergence time. Machine Learning (ML) techniques were first introduced in [9] to build NoC architectural components. These techniques reduced latency but lacked time to get an efficient solution. An application based on ML algorithms was reported in [10] to support and maintain accuracy in the Internet of Things (IoT) and web search engines.

African Buffalo Optimization was an optimization method proposed in [11]. Monte Carlo simulation techniques and genetic algorithms were proposed in [12] to increase reliability and reduce power consumption of embedded systems in NoCs. The developed approach was found to reduce computational time, but latency and throughput performance were not investigated. An approach for creating NoC architectures and scheduling methods specifically designed for various DNNs was presented in [13]. Crossbar-based in-memory computing may greatly increase the amount of on-chip communication because the weights and activations used in the scheduling technique are performed on-chip.

The Butterfly Fat Tree (BFT) topology was created in [14] to enhance the performance and power analyses of 3D network-on-chip designs. Although this method reduced network latency, the throughput was not high. Another BFT-based design with a zone-based routing policy was presented in [15] to reduce latency and power consumption. In this paper, the mapping process of IP cores with various layers was not included in the BFT technique. For on-chip communication, a Scalable-Minimized-Butterfly-Fat-Tree (H-SMBFT) topology was introduced in [16]. Although the throughput analysis was not performed, the designed architecture was reported to minimize delay and energy usage. In this work, the Pareto Deming Regressive African Buffalo Optimized Mapping technique was used to choose IP cores for the throughput study to solve the problem.

A review of the research conducted in the area of application mapping in the past decade was presented in [17]. Various mapping strategies were created in [18] for the execution of the NoC design. NoC designs aimed at reducing the execution time of the mapping process did not take into account effective heuristic algorithms with various mapping methodologies. The Bat mapping technique was first presented in [19] but its algorithm was unable to handle the multiobjective issues throughout the mapping phase. The proposed method for addressing the multi-objective problem uses the mapping process to resolve the aforementioned problem.

A Bat MAPping algorithm (BMAP) was created in [20] for the IP cores. The chosen algorithm was reported to reduce latency and energy usage. However, it did not take into account other metrics, such as time and area during the mapping process.

A heuristic application mapping technique was developed in [21] for mesh-based NoC architecture to cut run-time and total energy consumption. The 3-D mesh NoC design could not use the heuristic application mapping algorithm. A Knowledge-Based Memetic Algorithm (KBMA) for 3D NoC mapping using conventional network topologies was reported in [22]. However, by utilizing other metaheuristics methods, the KBMA approach was ineffective for application mapping. Simulated Annealing with Tabu search (SAT), an improvised cluster-based mapping with a meta-heuristic search algorithm, was presented in [23] for the analysis and optimization of power consumption in NoC-based systems.

The Liquid State Machine (LSM) is an efficient spiking neural network that was introduced in [24] for NoC-based neuromorphic devices because of its biological properties and hardware effectiveness. This platform used SNN to handle variable data flow and communication congestion caused by the randomly connected topology of fluids in LSM. A low-complexity heuristic algorithm called CastNet was developed in [25] to reduce the energy consumption of application mapping and bandwidth-limited routing methods in mesh-based NoC architectures. However, the mapping of a CastNet application that is represented by a weighted task graph to a mesh structure is an NP-hard task.

A deterministic and scalable arbitration system was developed in [26] to reduce the average latency. This method did not result in high network throughput. A Simulated Allocation (SAL) technique was created in [27] to reduce communication power and latency during the mapping process. Network throughput was, however, not considered during the design. The EsyTest technique was introduced in [28] to shorten the execution time and reduce the impact of test processes on NoCs. However, due to data dependencies, BIST was shown to significantly reduce the performance of this technique. A delay model was proposed for routers in [29]; this model described network contentions resulting from sharing of network resources across various traffic flows.

Design and run-time optimizations were used in [32] to analyze the feasibility of the energy-efficient NoC architectures. The method lowered energy use, cost, and power dissipation. A Structured Hybercube Network Chip Toloplogy Model was introduced in [33] to enhance the dependability of data transfer. To improve energy efficiency, performance and thermal efficiency were looked at in [34]. TAMA (Tune-Aware Mapping & Architecture) was created in [35] and was shown to have better performance, but it had a longer computation time. The mapping approach was first presented in [36] to cut down on communication costs. Throughput, however, was not taken into account. A density direction transform algorithm was presented in [37], but this did not consider alternative heuristic algorithms.

A review of the literature shows that high power consumption, failure in mapping, lack of focus on area and delay, high latency, low throughput, and long computation times have been the problems associated with current methods. These problems were in our work through the use of a unique method called PABOMWDG. The details of the method are provided in the following section.

Weighted directive graph theory and optimized mapping weighted directive graph theory were applied to the design of the PABOMWDG.

The PABOM approach for designing 3D NoC architecture is shown in Fig. 1. First-party sources for IP core counts ${C}_{1},{C}_{2},{C}_{3},\dots {C}_{n}$ include the benchmark dataset. A reusable element of a cell or integrated circuit (IC) design is the IP core, also referred to as the IP block. The optimized cores are used in the integrated circuit design to boost system performance. Deming regression is used to analyze the metrics and select the best cores using the multicriteria optimization method. To choose the best IP core, mapping is carried out based on several objective criteria, including area, energy consumption, power, and latency. This enables the easier creation of an architecture that considers power and energy. The following section outlines the steps involved in the proposed PABOM approach.

3.1 Network model

The suggested PABOM approach maps NoC using the balanced directive graph. The relationship between the two variables was analyzed using the mathematical model of weighted directed graph-based mapping.

Figure 2: weighted undirected graph

Considering Fig. 2, $G=\left(V,E\right)$is the formula for a weighted undirected graph, where "V" is the number of cores or vertices and "E" is the nodal connections. Five vertices constitute the undirected weighted directive graph as seen in Fig. 2${\text{V}}_{1},{\text{V}}_{2},{\text{V}}_{3}, {\text{V}}_{4}, {\text{V}}_{5}$eight edges make up the links${\text{E}}_{1},{\text{E}}_{2},{\text{E}}_{3}, {\text{E}}_{4},{\text{E}}_{5},{\text{E}}_{6,}{\text{E}}_{7,}\text{E}8$. For each node ${\text{V}}_{1}$ in the network, we must identify the multi-objective Pareto functions, such as energy consumption, power, area, and latency for mapping. The node in the graph with the best fitness is heavier than the others. The following mathematical formula is used to map through the application of the injective map function in the graph theory,

$$F : {V}_{i} \to {V}_{j}$$

Here, ${V}_{i}$ and ${V}_{j}$ represent a network node, and F is the mapping function. Optimum resources are chosen for each node for multiobjective optimization. Resource energy consumption is given by (${\phi }_{1}$), area (${\phi }_{2}$), power ${(\phi }_{3}$), and node latency ( ${\phi }_{4}$). Meeting the conditions of Eq. (2) results in the best node selection. Binary mapping is used to locate neighboring nodes (i.e. minimum distance) and linkages.

$$P ({V}_{i},{V}_{j}) =\left\{\begin{array}{c}1 ; \text{arg}min \{ {\phi }_{1},{\phi }_{2},{\phi }_{3,}{\phi }_{4} \}\\ 0 ; otherwise \end{array}\right.$$

P (${V}_{i}$, ${V}_{j}$ ) is the mapping probability function between the nodes. A node in the mapping function returns one if ideal and zero if not ideal. The Pareto Deming regressive African Buffalo Optimization mathematical model is covered in greater detail in the following sections.

3.2 Pareto Deming regressive African Buffalo Optimization model

The Pareto Deming regressive African Buffalo Optimization model was used to locate the optimal node. The African Buffalo Optimization meta-heuristic method was used to select the best buffalo from the population. The suggested optimization was inspired by the African woodland buffaloe optimization, which is more reliable and efficient than other optimization techniques because it requires fewer learning parameters and has a high convergence rate.

The different parametric functions of the Pareto optimization are: energy consumption ${\phi }_{1}$, area (${\phi }_{2}$), power ${(\phi }_{3}$), and delay (${\phi }_{4}$). During the initialization phase of the algorithm, the buffalo populations are randomly initialized in the search space. Buffalo is connected to IP cores in this context. The initialization process is expressed as:

$${{C}_{i} \in {C}_{1},{C}_{2},{C}_{3},\dots C}_{\text{n}}$$

Where, ${C}_{i}$ denotes the IP cores. Then, the fitness is computed based on multiple objective functions of energy consumption (${\phi }_{1}$), Area (${\phi }_{2}$), power (${\phi }_{3}$), and delay (${\phi }_{4})$.

The energy consumption is calculated using the following equation,

$${\phi }_{1}=\frac{{\varDelta t}_{i}}{{r}_{j}}$$

From (4), ${\phi }_{1}$ indicates energy consumed by module ‘$i$’, ‘${\varDelta t}_{ij}$’ represent the temperature rise at module‘$i$’ with respect to transfer resistance at module ‘${r}_{j}$’.

Area (${\phi }_{2})$ is defined as a total area model 3D NoC is a sum of router/switch area (${a}_{r})$, area of intellectual property (IP) cores (${a}_{c})$ and area of on-chip global interconnects (${a}_{g})$. The area is formulated as shown below,

$${\phi }_{2}={a}_{r}+{a}_{c}+{a}_{g}$$

$${a}_{r}=n*\sum _{i=1}^{{n}_{s}}{{a}_{r}}_{i}$$

Here, $n$ is the number of planes presented in the 3D NoC, ${n}_{s}$ is the number of switches in the 2D or 3D network, ${{a}_{r}}_{i}$ is the area of switch$i$. From (5), area of on-chip global interconnects ‘${a}_{g}$’ is measured using the equation,

$${a}_{g}={n}_{L}\left[f\left({r}_{w}+{q}_{w}\right)+{q}_{w}\right]{w}_{L}$$

Here, ${n}_{L}$ denotes the number of links presented in the 3D networks, $f$ represents flit size in bits, ${r}_{w}$ denotes a wire width, ${q}_{w}$ indicates the spacing between wires, and ${w}_{L}$ is the wire length of the global interconnects in the on-chip network.

Power (${\phi }_{3}$), is a global link power and is the sum of the three different power consumptions of 3DNoC. It is given as,

$${p}_{g}={p}_{s}+{p}_{t}+{p}_{c}$$

Here, ${p}_{g}$ is global link power, ${p}_{s}$ is power due to circuit switching, ${p}_{t}$ is short circuit power, and ${p}_{c}$ denotes static power.

Delay (${\phi }_{4})$ is measured using three factors, viz., router, propagation delay due to link or channel, and serialization of packets. The overall delay is measured as,

$$D={A}_{avg} R+{d}_{p}+{s}_{d}$$

Here ${A}_{avg}$is the average hop count, R is the router, ${d}_{p}$ is propagation delay caused by link or channel, and ${s}_{d}$ is packet serialization. The Deming regression function is used for optimization, to examine the estimated values for each IP core's energy consumption (${\phi }_{1}$), area $({\phi }_{2}$), power (${\phi }_{3}$), and delay${(\phi }_{4})$. Deming regression uses ML to examine the input variables and determine the population that best fits the input data. The regression analysis is shown below.

${Y}_{i}={ \beta }_{0}+{ \beta }_{1} \left[{MC}_{k}\right({C}_{i}\left)\right]$Where${MC}_{k}\left({C}_{i}\right)\in {\phi }_{1},{\phi }_{2},{\phi }_{3},{\phi }_{4}$ (10)

Here, ${Y}_{i}$is the output of the multiobjective estimation of the cores ${ C}_{i}$, ${ \beta }_{0}$ and ${ \beta }_{1}$ are regression coefficients, and ${MC}_{k}\left({C}_{i}\right)$ is the multiobjective estimation of the core. This includes the cores' energy consumption$({\phi }_{1}$), area$({\phi }_{2}$), power${(\phi }_{3}$), and delay${(\phi }_{4})$. The node with lowest energy consumption, area utilization, power consumption, and delay is selected as the ideal one from the regression analysis. Fitness is then evaluated to determine the best-fitting IP core.

$${Q}_{F}=\text{arg}\text{min }\left\{{MC}_{k}\left({C}_{i}\right)\right\}$$

Here ${Q}_{F}$is a fitness function and $\text{arg}min$ for a minimum function's argument. The processes of exploration and exploitation are carried out based on the fitness value as in the following equation.

$${x}_{k}\left(t+1\right)={x}_{k}+ {a}_{1}\left[{{Q}_{F}}_{b}-{E}_{k}\right]+{a}_{2}\left[{x}_{bp}.k-{E}_{k}\right]$$

Here, ${x}_{k}\left(t+1\right)$is the updated buffalos' exploitation of the ‘$k$’th buffalo,${ x}_{k}$is the $k$th buffalo's current position, ${E}_{k}$is an exploration of the${E}_{k} th$ buffalos, ${a}_{1}$ and${a}_{2}$ are the learning parameters set values from 0.1 to 0.6, ${{Q}_{F}}_{b}$is the best fitness of the buffalo's, and ${x}_{bp}$The location of buffaloes is then updated as seen below.

$${E}_{k}\left(t+1\right)=\frac{\left[{E}_{k}+{x}_{k}\right]}{R }$$

From (13), $R$ is a parameter value set as $\pm$ 0.5, and ${E}_{k}\left(t+1\right)$ is the updated location of buffalos. Go back and update the buffalos if the convergence is not achieved; else, halt the procedure.

Figure 3 shows the Pareto Deming regressive African Buffalo Optimization flow process for determining the ideal IP core. Eq. (2) and the injective mapping function are used to calculate the mapping probability after locating the IP core. The ideal node is found when the mapping function returns a value of "1". The mapping function returns "0" in all other cases. The procedure is iterated until the maximum number of iterations. As a result, the 3D NoC chooses the optimum nearby core for direct communication.

The algorithmic process of the proposed PDRABOM method is described as follows.

// Algorithm 1 Pareto Deming Regressive African Buffalo Optimized mapping

Input: Benchmark dataset, Number of cores ${{C}_{i} \in {C}_{1},{C}_{2},{C}_{3},\dots C}_{\text{n}}$ ,

Output: Find optimized IP core for NoC design

Begin

Step 1. Initialize the population of cores${{C}_{i} \in {C}_{1},{C}_{2},{C}_{3},\dots C}_{\text{n}}$

Step 2. For each core${C}_{i}$

Step 3. Compute multi-criteria function${MC}_{k}\left({C}_{i}\right)\in {\phi }_{1},{\phi }_{2},{\phi }_{3},{\phi }_{4}$

Step 4. Measure the fitness ‘${Q}_{F}$’

Step 5. While (t < Max_ iter )

Step 6. if $\left({Q}_{F}\left({C}_{i}\right)<{Q}_{F}\left({C}_{j}\right)\right)$ then

Step 7. Update buffalos’ exploitation${x}_{k}\left(t+1\right)$

Step 8. Update the location of buffalos${E}_{k}\left(t+1\right)$

Step 9. End if

Step 10. t = t + 1

Step 11. end while

Step 12. Obtain the best solution

Step 13. End For

Step 14. Perform mapping $F : {V}_{i} \to {V}_{j}$ based on probability$P ({V}_{i},{V}_{j})$

End

The Pareto Deming Regressive African Buffalo algorithm is described in detail in Algorithm 1 Optimized mapping for a more effective 3D NoC building design. The IP core populations in the search space are first initialized. The multicriteria function is then measured for each IP core in the population. The analysis of the multicriteria function then uses the Deming regression function. The best is chosen following the analysis of the fitness measure.

The position of ‘$i$’th buffalo is updated if the fitness of the current core, or ${ Q}_{F}\left({C}_{i}\right)$is better than that of ${Q}_{F}\left({C}_{j}\right)$, The graphical model is then used to locate and map the current best core. The process is repeated until the number of iterations is reached. The mapping is performed based on probability. This method uses the least amount of time to perform an effective mapping of cores in the 3D NoC architecture.

Python was used to implement the experimental evaluation of the proposed PDRABOM methodology and three other methods, SMPOF [1], SCSO [2], and OPGTO algorithms [3]. The MCNC Benchmark Netlists dataset was utilized for the experiments [30]. The floorplanning and placement problems were addressed using the MCNC benchmark netlists. The 3D Network-on-Chip architecture was designed using the MCNC benchmark netlists [34]. Testing floorplanning techniques frequently uses the MCNC benchmark circuits. The IP cores were taken from the benchmark dataset for experimental examination. The benchmark circuits were in YAL format, or Yet Another Language.

Table 1

Standard MCNC Benchmark Circuits
Circuit	Number of IP cores	Nets	I/O pad
apte	9	97	73
ami33	33	123	42
ami49	49	408	22
xerox	10	203	2
hp	11	83	45

Table 1 describes five circuit standards, namely Apte, ami33, ami49, xerox, and hp from the MCNC Benchmark Netlists. The apte circuit included 9 IP cores, 97 nets, and 73 I/O pad. The ami33 circuit comprised 33 IP cores, 123 nets, and 42 I/O pad. The ami49 circuit consisted of 49 IP cores, 408 nets, and 22 I/O pad. The xerox circuit contained 10 IP cores, 203 nets, and 2 I/O pad. The hp circuit comprised 11 IP cores, 83 nets, and 45 I/O pad.

The performance evaluations of the proposed PDRABOM method and the three existing methods, namely SMPOF [1], SCSO [2], and OPGTO algorithm [3], used various parameters such as throughput, delay, and computational time. The performance results for the various parameters were described using tables and graphs.

5.1 Impact of throughput

The actual pace at which data (or packets) are transmitted between source-destination pairs in an NoC is known as throughput. Throughput facilitates communication between IP cores or blocks. The throughput calculation formula is,

$$T=\frac{Packets transferred }{time / IP block}$$

Here $T$ is a throughput, $time$ is measured in terms of cycles. Therefore, the overall throughput is measured in the unit of the packets/cycles/IP block.

Table 2

Comparison of throughput
Circuits	Number of IP cores	Throughput (packets/cycles/IP block)
Circuits	Number of IP cores	Proposed PDRABOMWDG	SMPOF[1]	SCSO[2]	OPGTO algorithm [3]
Apte	9	0.74	0.72	0.7	0.73
ami33	33	0.82	0.78	0.73	0.80
ami49	49	0.85	0.8	0.76	0.82
Xerox	10	0.76	0.73	0.7	0.74
Hp	11	0.79	0.74	0.72	0.76

Table 2 compares the throughput of five distinct circuits: Apte, ami33, ami49, xerox, and hp from the MCNC Benchmark Netlists. Table 2 shows the number of IP cores, including 9, 33, 49, 10, and 11. The performances of throughput for the four different techniques—PDRABOMWDG, SMPOF [1], SCSO [2], and OPGTO algorithm [3]—are provided in the above table. The results show that PDRABOMWDG delivers superior performance over other current approaches.

Let us consider the apte circuit with 9 IP cores. The PDRABOMWDG had a throughput of 0.74 packets/cycles/IP block. The throughputs for the traditional methods were 0.72 packets/cycles/IP block, 0.7 packets/cycles/IP block, and 0.73 packets/cycles/IP block, respectively. The throughput was measured according to the packet injection rate. Then, different performance outcomes were measured with the proposed and existing methods with different counts of input IP cores. Finally, the five runs were estimated and examined. The packet injection rate was measured in terms of packets/cycles/IP block.

The throughput performance analyses for the four approaches, viz., PDRABOMWDG, SMPOF [1], SCSO [2], and OPGTO algorithm [3], are shown in Fig. 4. For various circuits, the throughputs were calculated based on the graphical depiction. The graphical plot shows that the PDRABOMWDG methodology had a higher throughput than the other methods currently in use. This enhancement was made in order to use the Pareto Deming Regressive African Buffalo Optimized Mapping technique to locate IP cores that are resource-optimized. The Deming regression function was then used to assess the multicriteria function (i.e.,delay, energy consumption, power, and area). The fitness was then predicted to select IP cores that were resource-efficient. The packet transmission was increased by the chosen IP cores. In addition, there were less source-destination pair average hop counts. The transmission was enhanced using the weighted directive graph theory-based mapping.

5.2 Impact of latency

The average amount of time required to transfer packets between source-destination pairs in a NoC is referred to as latency. The Latency is given as,

$$L= {t}_{avg} \left[packet transmission\right]$$

Here, $L$ is latency, ${t}_{avg}$ is average time. The latency is measured in terms of clock cycles.

Table 3

Comparison of latency
Circuits	Number of IP cores	Latency (clock cycles)
Circuits	Number of IP cores	Proposed PDRABOMWDG	SMPOF [1]	SCSO[2]	OPGTO algorithm [3]
apte	9	140	150	160	156
ami33	33	200	215	230	220
ami49	49	220	240	250	235
xerox	10	160	170	180	174
Hp	11	175	180	200	188

The performance metrics of latency values for five different circuits, including apte, ami33, ami49, xerox, and hp, are presented in Table 3. Table 3 also shows the latency values for five different IP cores, including cores 9, 33, 49, 10, and 11. The table lists the latency values for four different approaches, including PDRABOMWDG, SCSO, OPGTO, and the Stochastic Multi-Objective Pareto-Optimization Framework [1–3]. The dataset contained the ami33 circuit, which had 33 IP cores. The PDRABOMWDG had a 140 clock cycle delay. The current approaches, SMPOF [1], SCSO [2], and OPGTO algorithm [3], had latency values of 150, 160, and 156 clock cycles, respectively. The latency of the data transfer using the proposed PDRABOM technique was less than the existing techniques.

The PDRABOMWDG methodology had the lowest latency of all methods, as seen in Fig. 5. The vertical axis reflects the performance of latency, and the horizontal direction indicates the circuits. The graphical findings show that the PDRABOMWDG technique's latency was less than that of the other three methods already in use. This is because of the use of weighted directive graph by the PDRABOMWDG approach for NoC mapping. An injective map function built on the Deming Regressive African Buffalo Optimization approach was used to perform the mapping. In the NoC design, the ideal IP core was chosen for direct connection. The surrounding node and linkages were found via binary mapping. The mapping probability function was measured between nodes. This directed the transmission of the packet towards the destination with minimum latency.

5.3 Impact of computation time

The calculation time is the length of time required by the algorithm to identify the best core for a 3D NoC architectural design that is effective and based on numerous objective functions. The total computation time is calculated as follows,

$CT=[end time-start time$] (16)

where ‘$CT$ ‘ stands for "computation time." Milliseconds are used to measure the calculation time (ms).

Table 4

Tabulation for Computation time
Circuit	Number of IP cores	Computation time (ms)
Circuit	Number of IP cores	Proposed PDRABOMWDG	SMPOF[1]	SCSO[2]	OPGTO algorithm [3]
Apte	9	11	14	16	13
ami33	33	22	25	28	23
ami49	49	28	31	33	30
Xerox	10	12	15	17	13
Hp	11	13	16	18	15

Table 4 shows the calculation time results for finding the resource-optimized IP cores. The proposed PDRABOMWDG strategy performed better than the other three similar approaches, as seen in statistical analysis.

The experiment was performed using 9 IP cores and the Apte circuit. The PDRABOMWDG approach took 11ms to determine the best IP core for a 3D NoC design, while SMPOF [1], SCSO [2], and OPGTO algorithm [3] took 14ms, 16ms, and 13ms, respectively. For the experiments, various circuits were taken into consideration with varying numbers of IP cores. A total of five runs were carried out for each approach, each with a different number of input IP cores.

However, the PDRABOMWDG approach lowered computing time. Deming regression was used by the Pareto African buffalo Optimization approach to assess the various objective functions of each IP core. Regression-based fitness measurements identify the best-fit IP core for 3D NoC design in the shortest amount of time.

In this research, a new mapping strategy based on the PDRABOMWDG technique was developed for 3D NoC design. The mapping approach's efficiency was enhanced using many objective functions. Pareto African buffalo Optimization, which is based on graph theory, was integrated into the PDRABOMWDG approach. Here, a multicriteria optimization problem was resolved, and an effective IP core mapping was carried out on-chip, increasing throughput and decreasing communication latency. Deming regression was used in the optimization technique to examine the various metrics and to choose the best cores from the population. The core mapping computation time was reduced on using this technique.

An experimental study was carried out to compare the PDRABOMWDG methodology with three other optimization techniques. The results of the experiment showed that the PDRABOMWDG strategy outperformed traditional methods in terms of minimizing latency, increasing throughput, and minimizing calculation time. By utilizing multi-objective optimization approaches, the suggested method did not neglect to take both regular and irregular topologies into account during mapping.

In future studies, we intend to calculate energy parameters using Booksim 2.0 and to map both (regular and irregular) topologies using evolutionary multi-objective optimization approaches.

Ethical Approval

Not Applicable

Conflict of Interest

There is no conflict of interest to report

Data availability statement

Data sharing does not apply to this article because the current survey did not generate a dataset.

Authors' contributions

Not Applicable

Funding

No funding

Tzyy-Juin Kao1 and Wolfgang Fink, “Stochastic multi-objective Pareto-optimization framework for fully automated ab initio network-on-chip design”, Journal of Systems Architecture, Elsevier, Volume 103, 2020, Pages 1–32.
AravindhanAlagarsamy, LakshminarayananGopalakrishnan, SundarakannanMahilmaran, Seok-Bum Ko, “A Self-Adaptive Mapping Approach for Network on Chip With Low Power Consumption”, IEEE Access, Volume 7, 2019, Pages 84066–84081.
Qingwei Liang, Shu-Chuan Chu, Qingyong Yang, Anhui Liang and Jeng-Shyang Pan, “Multi-Group Gorilla Troops Optimizer with Multi-Strategies for 3D Node Localization of Wireless Sensor Networks’, MDPI, Sensors, pp.1–22, 2022
MdShahriarShamim, Rounak Singh Narde, Jose-Luis, Gonzalez-Hernandez, AmlanGanguly, JayantiVenkatarman, SatishG.Kandlikar, “Evaluation of wireless network-on-chip architectures with microchannel-based cooling in 3D multicore chips”, Sustainable Computing: Informatics and Systems, Elsevier, Volume 21, March 2019, Pages 165–178.
Tingting Song, YiyuanXie, Yichen Ye, Yingxue Du, Bocheng Liu, Yong Liu, “Gaussian-based optical networks-on-chip: Performance analysis and optimization”, Nano Communication Networks, Elsevier, Volume 24, 2020, Pages 1–13.
Mohammad AlaeiandFahimehYazdanpanah, “H²WNoC: A honeycomb hardware-efficient wireless network-on-chiparchitecture”, Nano Communication Networks, Elsevier, Volume 19, March 2019, Pages 119–133.
Ranjita Dash; AmartyaMajumdar; VinodPangracious; Ashok Kumar Turuk; José L. Risco-Martín, “ATAR: An Adaptive Thermal-Aware Routing Algorithm for 3-D Network-on-Chip Systems”, IEEE Transactions on Components, Packaging, and Manufacturing Technology Volume 8, Issue 12, 2018, Pages 2122–2129.
SukantaDey, Sukumar Nandi, Gaurav Trivedi, “PGOpt: Multi-objective Design Space Exploration Framework for Large-Scale On-Chip Power Grid Design in VLSI SoC using Evolutionary Computing Technique”, Microprocessors and Microsystems, Elsevier, 2020, Pages.
Jefferson Silva, MárcioKreutz, Monica Pereira & Marjory Da Costa-Abreu, “An investigation of latency prediction for NoC-based communication architectures using machine learning techniques”, The Journal of Supercomputing, Springer, Volume 75, 2019, Pages 7573–7591.
Seyedeh Yasaman Hosseini Mirmahaleh, Amir Masoud Rahmani, “DNN pruning and mapping on NoC-Based communication infrastructure”,Microelectronics Journal, Volume 94,2019.
Julius Beneoluchi Odili, Mohd Nizam Mohmad Kahar, Shahid Anwar, “African Buffalo Optimization: A Swarm-Intelligence Technique”, Procedia Computer Science, Volume 76, 2015, Pages 443–448.
Wenkai Guan, MiladGhorbaniMoghaddam, CristinelAbabei, “Quantifying the Impact of Uncertainty in Embedded Systems Mapping for NoC Based Architectures”, Microprocessors and Microsystems, Elsevier, Volume 80, 2021, Pages 1–16.
S. K. Mandal, G. Krishnan, C. Chakrabarti, J. -S. Seo, Y. Cao and U. Y. Ogras, "A Latency-Optimized Reconfigurable NoC for In-Memory Acceleration of DNNs," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 10, no. 3, pp. 362–375, Sept. 2020, doi: 10.1109/JETCAS.2020.3015509.
BheemappaHalavar, BasavarajTalawar, “Power and performance analysis of 3D network-on-chip architectures”, Computers & Electrical Engineering, Elsevier, Volume 83, May 2020, Pages 1–17.
Avik Bose, PrasunGhosal, “A low latency energy-efficient BFT based 3D NoC design with zone-based routing strategy”, Journal of Systems Architecture, Elsevier, Volume 108, 2020, 101738.
Usman Ali Gulzari,Sarzamin Khan,Muhammad Sajid,SherazAnjum,Frank Sill Torres,HessamSarjoughian,Abdullah Gani, “A low latency and low power indirect topology for on-chip communication”, PLoS ONE, Volume 14, Issue 10, 2019, Pages 1–18.
P. K. Sahu and S. Chattopadhyay, ''A survey on application mapping strategies for network-on-chip design,'' J. Syst. Archit., vol. 59, no. 1, pp. 60–76, 2013.
Waqar Amin,FawadHussain, SherazAnjum, Sarzamin Khan, Naveed Khan Baloch, Zulqar Nain, and Sung Won Kim, “Performance Evaluation of Application MappingApproaches for Network-on-Chip Designs”, IEEE Access, Volume8, 2020, Pages 63607–63631.
B. Naresh Kumar Reddy, DharavathKishan& B. VeenaVani, “Performance constrained multi-application network on-chip core mapping”, International Journal of Speech Technology, Springer, Volume 22, 2019, Pages927-936.
AruruSai Kumar, T.V.K. Hanumantha Rao, “Scalable benchmark synthesis for performance evaluation of NoC core mapping”, Microprocessors and Microsystems, Elsevier, Volume 79, 2020, Pages 1–23.
Pradeep Kumar Sharma, Santosh Biswas, PinakiMitra, “Energy-efficient heuristic application mapping for 2-D mesh-based network-on-chip”, Microprocessors and Microsystems, Elsevier, Volume 64, 2019, Pages 88–100.
AravindhanAlagarsamy, LakshminarayananGopalakrishnan, Seok-Bum Ko, “KBMA: A knowledge-based multi-objective application mapping approach for 3D NoC”, IET Computers & Digital Techniques, Volume 13, Issue 4, 2019, Pages 324–334.
A. Alagarsamy and L. Gopalakrishnan, ''SAT: A new application mapping method for power optimization in 2D-NoC,'' in Proc. 20th IEEE Int. Symp. VLSI Design Test (VDAT), May 2016, pp. 270–275
Shiming Li, Shuo Tian, Ziyang Kang, Lianhua Qu, Shiying Wang, Lei Wang, Weixia Xu, “A multi-objective LSM/NoC architecture co-design framework”, Journal of Systems Architecture, Volume 116, 2021.
Suleyman Tosun,New heuristic algorithms for energy aware application mapping and routing on mesh-based NoCs,Journal of Systems Architecture,Volume 57, Issue 1,2011,Pages 69–78.
Mohammad Baharloo, Ahmad Khonsari, Mahdi Dolati, PouyaShiri, MasoumehEbrahimi, Dara Rahmati, “Traffic-aware performance optimization in Real-time wireless network on chip”, Nano Communication Networks, Elsevier, Volume 26, 2020, Pages 1–16.
Wei Gao, ZhiliangQian, Pingqiang Zhou, “Reliability- and performance-driven mapping for regular 3D NoCs using a novel latency model and Simulated Allocation”, Integration, Elsevier, Volume 65, 2019, Pages 351–361.
Junshi Wang, MasoumehEbrahimi,Letian Huang,XuanXie,Qiang Li, Guangjun Li, Axel Jantsch, “Efficient Design-for-Test Approach for Networks-on-Chip”, IEEE Transactions on Computers, Volume68, Issue 2, 2019, Pages 198–213.
Wei Gao, Zhiliang Qian, Pingqiang Zhou, “Reliability- and performance-driven mapping for regular 3D NoCs using a novel latency model and Simulated Allocation”, Integration, Volume 65,2019.
https://s2.smu.edu/~manikas/Benchmarks/MCNC_Benchmark_Netlists.html
S. K. Mandal, A. Krishnakumar, and U. Y. Ogras, "Energy-efficient networks-on-chip architectures: Design and run-time optimization," in Network-on-Chip Security and Privacy, Cham: Springer International Publishing, 2021, pp. 55–75.
N. Gupta, K. S. Vaisla, and R. Kumar, "Design of a structured hypercube network chip topology model for energy efficiency in wireless sensor network using machine learning," SN Computer Science, vol. 2, no. 5, 2021.
D. Lee, S. Das, J. R. Doppa, P. P. Pande, and K. Chakrabarty, "Performance and thermal tradeoffs for energy-efficient monolithic 3D network-on-chip," ACM Trans. Des. Automat. Electron. Syst., vol. 23, no. 5, pp. 1–25, 2018.
R. Aligholipour, M. Baharloo, B. Farzaneh, M. Abdollahi, and A. Khonsari, "TAMA: Turn-aware Mapping and Architecture - A power-efficient network-on-chip approach," ACM Trans. Embed. Comput. Syst., vol. 20, no. 5, pp. 1–24, 2021.
Katarzyna Grzesiak-Kopeća and Maciej Ogorzałek, “3D IC optimal layout design A parallel and distributed topological approach”, Computer science, 2019, pp.1–26.
Bahador Boroumand, Elham Yaghoubi, Behrang Barekatain, “ An enhanced costaware mapping algorithm based on improved shuffled frog leaping in network on chips”, The Journal of Supercomputing, vol. 77, pp. 498–522, 2021
Weng Xiaodong, Liu Yi, Yang Yintang, “Network-on-chip heuristic mapping algorithm based on isomorphism elimination for NoC optimization, IET Computers and Digital Techniques, vol. 14, no. 6, pp.272–280, 2020

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Pareto technique optimization for 3D NOC architecture

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Works

3. Proposed Methodology

3.1 Network model

3.2 Pareto Deming regressive African Buffalo Optimization model

4. Experimental Requirements

5. Performance Evaluations And Conversations

5.1 Impact of throughput

5.2 Impact of latency

5.3 Impact of computation time

6. Conclusion And Future Work

Declarations

References

Additional Declarations

Status:

Version 1