Quality Based Clustering of Node using Fuzzy-Fruit Fly Optimization for Cluster Head and Gateway Selection in Healthcare Application

Big data recently has gained tremendous importance in the way information is being disseminated. Transaction based data, unstructured data streaming to and fro from social media, increasing amounts of sensor and machine-to-machine data and many such examples rely on big data in conjunction with cloud computing. It is desirable to create wireless networks on-the-fly as per the demand or a given situation. In such a scenario reliable transmission of big data over mobile Ad-Hoc networks plays a key role in military healthcare applications. Limitations like congestion, Delay, Energy Consumption and Packet Loss Rate constraint pose a challenge for such systems. The most essential problem of Hybrid Mobile Ad-hoc Networks (H-MANET) is to select a suitable and secure path that balances the load through the Internet gateways. Also, the selection of gateway and overload through the network may cause packet losses and Delay (DL). Therefore, load-balancing between different gateways is required for achieving better performance. As a result, steady load balancing technique was employed that selects the gateways based on the Fuzzy Logic (FL) system and enhances the network efficiency. However, the Energy Consumption (EC) was high since gateways were selected directly from the number of nodes. Hence in this article, a novel Node Quality-based Clustering Algorithm (NQCA) using Fuzzy-Genetic for Cluster Head and Gateway Selection (FGCHGS) is proposed. In this algorithm, NQCA is performed based on the Improved Weighted Clustering Algorithm (IWCA). The NQCA algorithm separates the total network into number of clusters and the Cluster Head (CH) for each cluster is elected on the basis of the node priority, transmission range and node neighborhood fidelity. Moreover, the clustering quality is estimated according to the different parameters like node degree, EC, DL, etc, which are also utilized for estimating the combined weight value by using the FL system. Then, the combined weight values are optimized by using Genetic Algorithm (GA) to pick the most optimal weight value that selects both optimal CH and gateway. Conversely, the convergence time of GA and the error due to parameter tuning during optimization are high. Hence, a NQCA using Fuzzy-Fruit Fly optimization for Cluster Head and Gateway Selection (FFFCHGS) is proposed. In this algorithm, improved Fruit Fly (FF) algorithm is proposed instead of GA to select the most optimal CH and gateway. Finally, a performance effectiveness of the FFFCHGS algorithm is evaluated through the simulation outcomes in terms of EC, Packet Loss Rate (PLR), etc.


3
In MANET, routing is the major challenging task due to the node mobility features. Addressing the opportunity of Wsn applications in healthcare at same time helps to overcome a sequence of technical challenges. Many complications in the constrained resources encountered with all WSNs in terms of restricted network resources, For that reason, a familiar technique named clustering was developed that can minimize the amount of data sent from source to destination with reduced amount of transmission bandwidth and EC [3]. Most of the clustering algorithms may select the subset of nodes and construct a network backbone to support the control functions. Each node in the network may associate with every other nodes and a set of selected nodes is called as CH. CH of one cluster is connected with the CH of each other clusters directly or via the gateways. By combining gateway and CH, a connected backbone is formed that supports to simplify the various processes like accessing the channel, allocating the bandwidth, reducing the energy required for routing and supporting the virtual-circuit. In addition to routing process, clustering the nodes is also the greatest challenging process in MANET [4].
Even various algorithms have been proposed to obtain an optimum amount of clusters; no one observes all the network factors which are required to improve the performance of clustering [5]. So, the most essential and active research area in MANET is to obtain an optimum amount of clusters. The clustering algorithms have to be disseminated because each node contains only local information. Also, it must be robust and have ability for adapting to the network topology changes while network size increases or decreases. Each cluster should be practically well-organized i.e., a particular CH must maintain more amount of nodes [6].
In each cluster, nodes are categorized as one of the following types: CH, cluster member and gateway.
For each cluster, CH acts as a local controller and has the responsibilities like routing, channel assignment, scheduling and transmitting inter-cluster traffic from each cluster members. The nodes except CH are called as cluster member that acts as a normal node and not involved in routing or intercluster communication. Cluster gateway is a boundary node which involves the minimum one adjacent belonging to various clusters and utilized for transmitting the routing data from one cluster to the other clusters [7]. In recent transmission systems, the primary attention of the MANET users is discovering a reliable access of the web via the gateways [8]. So, a MANET with an active connection to a public network is known as Hybrid MANET (H-MANET). 4 In steady load balancing gateway election algorithm [9], routing efficiency of the H-MANET was improved by using the FL system. In this algorithm, the FL was performed with the novel routing metric known as cost which comprises various network attributes such as MRA packet arrival variation, control packet ratio, and load of the gateway for selecting the best gateway. These fuzzy sets were defined and optimized by using the GA whose fitness function was engaged FL and intended with 4 network parameters such as DL, PLR, Normalized Routing Overhead (NRO) and Balanced Load Index (BLI). However, the EC was high since the gateway node was selected from the entire network directly and the network maintenance was also complex due to the dynamic changes of the nodes.
Hence in this article, a novel NQCA is introduced with gateway selection algorithm based on the IWCA. In this algorithm, initially, the entire MANET is split into different clusters. For each cluster, its own CH is selected based on the node priority, transmission range and node neighborhood fidelity. In addition, the quality of clustering is also estimated by using the additional parameters like node degree, environmental distance (Dist), clustering Stability Factor (SF), EC, Residual Battery Energy (RBE) and weight of the node including with DL, PLR, NRO and BLI. Moreover, these parameter values are converted into fuzzy values by using FL system. After the fuzzy sets are obtained, the combined weight value is calculated. Then, GA is applied to optimize the weight value and select both optimal CH and gateway. Nonetheless, the convergence time of GA was high and error due to parameter tuning like changes in population and fitness function for GA was also high. As a result, FGCHGS algorithm is further enhanced by using an effective optimization algorithm instead of GA. To improve the selection of both optimal CH and gateway, improved FF algorithm is proposed that reduces the computation and convergence time efficiently. Thus, the proposed algorithms can minimize the node's EC and simplify the network maintenance.

II. LITERATURE SURVEY
A Clustering-Based Gateway Placement Algorithm (CBGPA) [10] was introduced with multi-objective optimization in wireless mesh networks for ensuring the network scalability. In this algorithm, exploitation rate and mean congestion of gateways were concurrently optimized via a nature inspired meta-heuristic algorithm combined with CBGPA. However, the amount of congestion of gateways was not reduced efficiently. Iqbal et al. [11] proposed a new mechanism for discovering gateways in MANET. In this strategy, the source node does not require retransmit a gateway choice request 5 message if the reply message was lost or not arrived in time. However, an optimal path to a gateway was not selected and the routing load was high.
Papadaki&Friderikos [12] proposed a compact reformulation of the Uncapacitated and Capacitated joint Gateway Selection and Routing (U/C-GSR) integer linear program was achieved. A reformulation using the Shortest Path cost Matrix (SPM) under both uncapacitated and capacitated scenarios was introduced to reduce the computational complexity and offer an optimum solution. However, the computational time complexity was high. A gateway selection in multihop wireless networks was proposed [13] based on an Ideally Scheduled Route Optimization (ISRO) that uses route and link optimization for increasing the network efficiency. This approach has three optimization issues like optimum gateway routing under best criteria, determination of link capacity by interference-free scheduling and path adjustment for new link. However, mobility of the gateway was not considered.
A proactive load-aware gateway decision [14] was proposed by considering the interface queue size with the conventional min hop metric. In this approach, an efficient handoff was allowed from one gateway to the other and a seamless connectivity was maintained to the pre-determined host. On the contrary, throughput and DL were required for further improvement.
Sahana et al. [15] proposed a weight-based hierarchical clustering on the basis of the combined weight that comprises node's degree, communication area and node's mobility. Though the highest weighted node was selected as CH, the network performance was not efficiently analyzed. A congestion controlled adaptive multipath routing protocol was proposed [16] for load balancing and congestion avoidance in MANET. This protocol was used to determine fail-safe multiple links that includes the nodes with less amount of traffic and high RBE. If the mean load with link is higher than the threshold, then the node will transmit the traffic over the disjoint multipath for avoiding the traffic load on the congested path. However, load balancing between gateways are not discussed which may reduces the network performance.
The performance of gateway choice protocols [17] was analysed in MANET. A modified Ad-hoc On Demand Routing protocol (AODV) was suggested for integrating MANET with web via the immobile gateway. However, DL of this protocol was high. An improved gateway choice approach [18] was proposed on the basis of mean amount of hops and the link stability. However, the performance effectiveness of the algorithm was not evaluated.

6
Multi-criteria gateway choice and multipath routing protocol [19] were proposed for H-MANET by considering the mobility. In this protocol, a combined weight value was calculated on the basis of mobility, inter-and intra-network traffic load and RBE by Simple Additive Weighting (SAW) mechanism. After that, the node with the maximum weight was selected as a gateway for the path which is selected from the multiple paths. If the selected gateway was not situated in that link, then an alternate path was selected for routing process. However, the packet delivery ratio was less and the DL was high. An autonomous clustering-based dynamic network gateway choice [20] was proposed in which the network was split into clusters. For each cluster, the gateway was chosen autonomously and dynamically. However, the QoS parameters like DL, throughput, etc, were not analysed.
Gateway discovery algorithm was proposed [21] on the basis of several QoS link metrics between the node and gateway. In this algorithm, the route accessibility was improved by introducing the feedback system to the updated route dynamics to the traffic source. In addition, an efficient scheme was proposed for propagating QoS parameters in this proposed scheme. However, the average control overhead was high. An Enhanced Distributed Group Mobility Adaptive (EDGMA) [22] clustering approach was proposed in MANET. In this approach, both CH and gateway selection algorithms were combined to support in cluster formation. Initially, the CH was selected among the group of mobile nodes where each mobile node was travelled in different direction and different speed. Then, the gateway was selected to conduct an effective routing of information from source to destination.
However, the mean lifetime of CH was less.
Hussain et al. [23] suggested an Efficient CH Selection Algorithm (ECHSA) for MANET. This algorithm was mainly proposed based on the novel artificial intelligence for selecting the CH by populating the Black and White (B&W) list with routing table. However, the complexity of this algorithm was high. Joshi et al. [24] investigated an optimized gateway selection scheme for MANET clusters with the aim of decreasing the required control overhead during network construction and management by controlling the robust connectivity. The highest degree algorithm was utilized to select CH based on the metrics such as transmission range, mobility and residual energy. Further, an extreme flooding because of unwanted transmission of packets via several gateways was minimized during inter-cluster packet transmission. However, DL was high. 7 CH decision approach using FL was proposed [25] using node's degree, fitness and integrity of MANET. In this approach, the MANET was divided into clusters based on the CH technique where each cluster has its own independent CH which is connected through the other CH. However, it requires additional parameters for selecting CH and also the gateway selection was not considered in this approach. Divya& Ganesh [26] proposed Gateway Migration Algorithm (GMA) between node and gateway in MANET. In this algorithm, the gateway was selected based on the multiple QoS link metrics, namely link availability period, accessible load capacity and DL. If the traffic source node was migrated to the other gateway transmission range, then the traffic on that route was transferred through another gateway. However, average DL was high.
Gateway selection optimization was proposed [27] in H-MANET-satellite network to solve the problems of gateway positioning. In this approach, GA was proposed for solving the multi-criteria optimization problem by considering the topology dynamics. Moreover, different metrics like gateway and link load, path cumulated Dist and convergence time were optimized. However, an effective optimization was required by considering more parameters. Mahiddin& Sarkar [28] proposed an improved MANET gateway selection scheme. The major objective of this scheme was removing the congestion at each MANET gateway by considering the node mobility for improving the network performance. However, the packet delivery DL was high.
Zaman et al. [29] proposed a novel method, namely Adaptive Steady Load Balancing Gateway Selection in which path load balancing technique was used for gateway selection. In addition, GA was used for optimizing the solution where the fitness function was computed based on the FL. To balance the load on each route, three load-balancing parameters were used such as Number of received Gateway solicitation messages (NMRG), Time-To-Live Changes (TTLC) and Link Changes (LC). Moreover, the node occupancy level of each node was computed and updated for each short interval as well as transmitted to each adjacent in the communication region. However, PLR was high.
Kumar &Ramamoorthy [30] proposed a novel method of gateway selection for improving the throughput of MANET. In this method, an enhanced gateway selection method was proposed for avoiding the congestion and balancing the load on MANET gateways that enhance the network performance. However, packet delivery DL was high. Rajkumar et al. [31] proposed an enhanced CH and gateway selection for cluster-based MANET to minimize the number of re-clustering during 8 cluster formation. In this technique, CH was elected based on the direction of mobile node and their mobility. Similarly, gateway algorithm was used to select a gateway node for both intra and intercluster communication. Initially, a similarity between two mobile nodes within the transmission range was computed based on the spatial dependency for finding the feature of mobility that identifies the nodes under similar cluster and completes their routing. However, DL was not analyzed.

III. PROPOSED METHODOLOGY
In this part, the proposed FGCHGS & FFFCHGS algorithms are explained in detail. Initially, NQCA algorithm is performed for network clustering based on the IWCA algorithm. The clustering is performed based on the two models such as node priority and range region aggregation models. Then, the cluster characteristics is measured based on the FL system by considering the various QoS parameters. After that, both CH and gateway are selected by using GA according to the fitness values of each node which are estimated as combined weight value using fuzzy sets of network metrics.
Further, an improved FF algorithm is proposed instead of GA for both CH and gateway selection efficiently.

Network Model
The network is built by the nodes and the connections characterized via an undirected graph = ( , ), where = refers the group of nodes and = denotes the group of connections.
Clustering is considered as the graph splitting dilemma with few limits [32]. The adjacent ( ) of a CH is the group of nodes which are communicated directly and situated in its communication region( ). Therefore, the degree of the node is defined as follows: In Eq. (1), ( , ) refers the mean Dist between and . When a system is initiated, each node forwards its ID indexed by each other node situated in its communication region. Based on this, the mutual Dist between any two nodes is estimated via calculating the fraction of receiving and transmit power. Therefore, the node degree of is considered as the cardinality of the set ( ) and represented as follows: Generally, a clustering is necessity for partitioning the nodes, thus it should satisfy below criteria: • Each ordinary node has no less than one CH as adjacent and no two CH can be adjacent.
• Each common node affiliates with the adjacent CH that has the minimum weight.

NQCA Models
The proposed NQCA consists of two models such as follows: • Node Priority Aggregation Model In NQCA, CH is selected by assigning the priorities to the nodes according to their degree. This model is built according to the following manner:

> >
Here, the node types such as Strong Node (SN), Weak Node (WN) and Border Node (BN) are identified by computing the node type indicator ( ) as:

• Region Aggregation Model
The ability of adjacent is measured by the node neighborhood fidelity for conserving their region provided that a parent node. A parent node is any CH candidate and the adjacent can be located at various Dist from their parent. Because of increasing Dist, the parent's neighborhood fidelity can decrease and beyond nodes are expected to depart from the parent at any time. As a result, the parent strength is influenced so that its possibility to be a CH is minimized. Hence, the communication range of a parent is divided as 3 virtual regions namely excellent, intermediary and endangered regions which are located within the circle with radius .
Both excellent and intermediary regions include trusted adjacent for a definite time. In endangered regions, the adjacent nodes are taken into consideration as topologically untrusted nodes due to the assumption that they can escape from the partition prior to the trusted nodes. The range indicator ( ) is measured to provide the maximum priority for trusted nodes and the minimum priority for untrusted nodes while electing the CH as follows: In Eq. (4), 1 + 2 + 3 = 1 are user coefficients which can be adjusted by selecting the appropriate values according to the node's mobility. Moreover, a new node combined indicator is determined as:

Quality of Clustering (QoC)
The QoC is measured by different parameters which help to improve the cluster characteristics and the selection of both CH and gateway. The main aim of QoC is providing the capacity of a cluster for delivering the expected outcomes. The QoC is measured based on the IWCA. The different parameters are given in below:

a) Node Quality:
The node quality is calculated as the product of node degree and node combined indicator.

b) Environmental Distance
The environmental Dist is measured instead of calculating the Dist between parent and adjacent nodes and , accordingly. It considers the region wherein the adjacent node is located and computed as: The total environmental Dist from a parent to all the group of its adjacent connected to it is calculated as:

c) Clustering Stability Enhancement
For a given time, the cluster formation remains unchanged due to node's stability. As a result, the SF for each is defined as: In this NQCA, the adjacent nodes with the maximum ( ) are elected as the CH.

d) Load Balancing Clustering Scheme
Consider that each node is identical and create a data at similar rate. For load balancing, amount of nodes in a cluster and a transmit power needed per CH should be balanced. Therefore, the relative deviation of adjacent in a recent configuration is measured by relative dissemination degree as follows: In Eq. (10), ≤ 2 ( ) refers the limit on the amount of nodes that a CH can manage perfectly( = | |).

e) Energy Consumption & Remaining Battery Energy (RBE)
A high amount of energy is required for long Dist transmission. Hence for each node , the EC is determined by the average of Dist ( ) with its neighbors as follows: If ( ) is high, then the required EC is also high otherwise EC is less. Also, each node ( ) can easily estimate its RBE( ( )) after the transmission. Therefore, a node with longer RBE and less EC can be selected as a CH.

f) Delay
DL ( ( )) is the data transfer interval between origin and target.

g) Packet Loss Rate (PLR)
12 The PLR ( ( )) is the measure of packet loss to the sum amount of transmitted packets. A PLR that attains 10% of data transmitted is assumed as the highest value.

h) Normalized Routing Overhead (NRO)
NRO is defined as the correlation between accurately received packets and total control packets in the networks. If ( ) = 5, then it is assumed as the highest value.

i) Balanced Load Index (BLI)
The BLI stands for the degree of overload endured by each and computed as: In Eq. (12), refers the load carried for each node and denotes the sum amount of nodes in the network. If BLI is zero, then each node maintains the equal load for balancing the network.
After that, these computed parameters are converted into fuzzy sets by using Fuzzy Inference System Based on this weight values for each node, the most optimal CH and gateway are selected. The lowest weight node in a particular cluster is decided as CH and the lowest weight node that acts as a border node in that cluster is elected as gateway. The fuzzy system generates the number of rules according to these parameters and among number of rules, few rules are shown in Table 1.

Fuzzy-Genetic Algorithm
In GA algorithm, the obtained weight values are optimized for selecting the CH and gateway. The objective function of the GA is to obtain the optimum weight function for a MANET system. Based on the selected optimal values, the network performance is improved by selecting better CH and gateway. 16. Go to step 11; 17. ℎ 18. Choose the best fitness value nodes as CH and the best fitness value nodes located in border as gateway; 19.

Fuzzy-Fruit Fly Optimization Algorithm
Instead of GA, FF optimization algorithm is proposed to optimize the obtained weight values for CH and gateway selection. The main aim of FF is to obtain the optimal value of weight function for a MANET system and enhance the network efficiency by choosing the most optimum CH and gateway.
Normally, FF is used to explore the global optimization on the basis of the food hunting behaviors of FF swarm. Initially, food source is identified by smelling all types of fragrances buoyant in the air and wings to the related positions. Once reached near to the food source, the food can be found. Mainly, it has two steps, namely osphresis foraging step and visual step. In osphresis phase, a swarm of fruit flies explore food in random manner around the swarm position. In visual phase, the sharp vision is utilized for flying to the swarm's best position.
However, the FF may suffer with few premature convergence levels and reduced efficiency owing to the fixed amount of iteration steps i.e., the basic FF is dependent on the iteration step. Also, there is a lack in the FF regarding the exploration for generating novel solutions of the FF on the basis the random details of foregoing solutions. To avoid these limitations, an improved FF algorithm is introduced by the linear diminishing step. Specifically, it is difficult to discover the food position at the end of iteration while the iteration step is constant. Perhaps, few individual fruit flies are far away from the food. Therefore, the fixed step is changed into linear diminishing step for avoiding trapping local optimization and improving the precision. The steps in the proposed improved FF algorithm are: Step 1 (Initialization phase): Assign the parameters of the first iteration and FF swarm location ( , ) randomly, including the highest amount of population , population size and the iteration step value is .
Step 2: Perform the individual searches for food i.e., minimum weight function ( ( )) in random directions and Dist as: Here, ( ( ) , ( ) ) is the location of ℎ FF individual among fruit flies and (•) is a random value which is sampled from a uniform distribution.
Step 3 (Path construction phase): As the food position i.e., position of ( ) cannot be identified, the Dist ( ) is estimated between the coordinate origin and the individuals as: The nearby position is to the origin position, smaller the density of the food. Then, the reciprocal of the Dist is defined as the smell concentration judgement value ( ) as follows: Step 4 (Fitness function computation phase): The smell concentration ( ) of an individual position of the FF is computed by substituting the value ( ) into a fitness function as: Step 5: The FF with the highest smell concentration, ( ( )) among the FF swarm is identified as: In equation (18), and are the largest elements and its indices along different dimensions of smell vectors.
Step 6 (Movement phase): After that, the current highest smell concentration rate Step 7: As well, the iteration step is as: In equation (20), is the initial iteration step, denotes the maximum iteration number and = 0.8.
Step 8: and are substituted from Step 6 into the value of ( ) and then ( ) is set into the constraints. After that, the value of ( ) is substituted which meets the constraints into the ′ and the best optimization solution is obtained as: Step 9: If the most iteration range is attained, then the best solution is obtained, if not go to Step 5, otherwise the circulation is terminated.

IV. SIMULATION RESULTS
The effectiveness of the FFFCHGS and FGCHGS algorithms is simulated through Network Simulator-2 (NS2.34) and compared with the NQCA and FGCH algorithms based on the DL, PLR, NRO, BLI and EC. Table 2 gives the simulation parameters.  transmission range is 100m, NQCA has EC of 563W/10 6 ; FGCH has 537W/10 6 and FGCHGS has 511W/10 6 while FFFCHGS has 485W/10 6 . The FFFCHGS algorithm has better EC per packet due to low utilization of node around CH and gateway. This clearly shows that the FFFCHGS algorithm achieves minimum EC than NQCA, FGCH and FGCHGS algorithms.

V. CONCLUSION
In this article, FFFCHGS algorithm is proposed to find a route with a congestion, Delay, Energy Consumption and Packet Loss Rate constraints for big data packet transmission in a MANET military healthcare applications. In this algorithm, initially the network is split into different clusters. In every cluster, their CH is selected based on the IWCA which utilizes different metrics such as node fidelity, transmission range, etc. Then, each cluster quality is measured by using the FL system which considers the different parameters such as DL, load balancing, EC, RBE, etc. In addition, the gateway is also selected based on these parameters which are converted into fuzzy sets and the combined weight value is computed. Then, the computed combined weight values are optimized by using GA. Further, improved FF algorithm is performed instead of GA for CH and gateway selection with increased accuracy and convergence speed. Thus, both CH and gateway selection is optimized and the Figure 1 End-to-end Delay vs. Node Speed Energy per Packet vs. Maximum Transmission Range