A Novel Approach for Protecting RPL Routing Protocol against Blackhole Attacks in IoT Networks

Nowadays, we are witnessing an increasing trend towards intercon-nected devices. This process of connecting devices instead of people is called the Internet of Things (IoT). The main concept of IoT is to connect heterogeneous objects separately and centrally in diﬀerent places using standard protocols. The general idea is to create an independent world using intelligent objects that have the ability to exchange information and make decisions. Connected objects allow users to monitor and track remotely and in real-time. IoT relies on the development of a low-power, high-throughput network to support communication between objects and their connection to the Internet. These networks are characterized by limited resources in terms of energy, memory, and processing. In the true sense of the Internet of Things, networks called 6LoWPAN were created, and a new routing protocol compatible with these networks, called RPL, was introduced. Due to the limited nature of RPL-based networks, they may be exposed to a variety of internal attacks. Neighbor attacks and DIS are speciﬁc attacks in this protocol. This study proposes a trust-based RPL routing protocol which deals with blackhole threats. Besides, it is shown that while our recommended system is secure against blackhole attacks, it doesn’t incur any unwanted expenses in terms of network traﬃc.


Introduction
Internet of Things (IoT) network is one of the most significant issues in Information Technology (IT) and Computer. As it's evident from the name, anything within the surrounding environment can turn into a node in this network and it's noteworthy that all nodes within such a network are interconnected. This communication is done through wireless networks and will provide a wireless infrastructure within any IoT network which has the potential of a sensor network as well. On the other hand, despite the variety of nodes (i.e. things), they are heterogeneous as well. Secure routing and provision of trustable communication mechanisms are considered as one of the main challenges in the heterogeneous world of IoT [1]. Since IoT includes a set of moving and constant constituents, different problems are faced in the development of routing protocols through which such devices are connected. Any smart routing protocol is capable of freeing up the innate power of any heterogeneous, dynamic, and complex network which is characterized by multiple dynamic factors including change in topology and flow; thus, to make a full range of IoT functions possible, smart protocols for the device to device connection are required in a network. Efficient and scalable routing protocols are compatible with different scenarios in terms of size and type and they're capable of finding the required optimized routes [2] [3]. Routing Protocols for Low-power and Lossy Networks (RPL) routing protocol is designed for supporting cost-efficient routing on low-power and lossy networks. The current version of the RPL protocol applies a square-shaped arrangement. Such kinds of arrangements are used for non-smart and non-user attacks. The problem with such arrangements is their high error rate which makes them prone to damages in high-scale attacks. This protocol is highly suited for blackhole attacks and is not suited for other kinds of attacks, especially for repeated and high-scale attacks. To simplify this protocol against normal attacks which mainly consists of unauthorized individuals, a change in the arrangement as well as the application of applied logic is required [4]. Therefore, the RPL protocol dispatch will be optimized in this study through intrusion detection and energy observation. The wide application of IoT devices in daily activities has many advantages and at the same time, increases the challenges related to security issues [5]. The accessibility of real-time data is necessary for IoT-based sensitive applications. This accessibility is possible only when the externally authorized users are allowed to have direct access to such data, in a way that they could directly access the sensor nodes data or other IoT devices [6]. For more than two decades, authentication and authorization protocols as well as encryption mechanisms and Intrusion Detection Systems (IDS) have been considered as significant tools to protect information networks and systems. However, using traditional intrusion detection techniques in an IoT environment is difficult considering peculiarities of this environment such as limited-resources devices, a specific protocol stack, and different standards involved. The most important issue involved here is the processing capacity and storage of network nodes.
In traditional networks, the system administrator will establish authentication and intrusion detection mechanisms in nodes with higher computational capacity. IoT networks are mainly comprised of nodes with limited resources. Therefore, finding a node with the potential to support security protocols like authentication, authorization, and intrusion detection in the IoT network is a difficult task [7]. Increasing attention of research and industrial groups to RPL protocol is evident in the recent literature and different platforms' RPL performance has been studied accordingly. The authors in [4] [5] illustrated the necessity of RPL considering its low delay, quick configuration, and self-healing. Because communication links' security and nodes displacement are considered as crystalclear issues, we're now looking for ways to facilitate the security of such protocols against attacks to solve this dilemma and optimize RPL protocol. This will consequently lead to lower energy consumption and higher reliability of the network. Authors in [4]suggested that RPL protocol consists of networks with initial parents that attempt reproduction in later stages such that they can learn IoT networks. Blackhole attacks have resulted in lags between message receiving and sending times which is conducted through communicating the message from the protocol to fuzzy logic. Accordingly, appropriate decisions will be made and the blackhole will be identified in the case that lags exist between messages. RPL protocol has been applied extensively. The main application of this protocol is fighting against attacks such as blackhole attacks. RPL protocol will act as a protective wall and it will enhance efficiency in case everything is arranged properly. In this study, the researcher attempts to apply RPL routing protocol which has been designed for low-power, lossy networks in rank to increase security against blackhole attacks, such that network efficiency will be improved.

A description of the plan
IoT is based on the interconnection between smart and routable devices, connected through an internet platform. The distribution of data in an IoT network generally depends on different applications desired and it required conscious routing protocols and automatic configuration. RPL protocol has been recommended to supervise and manage the efficiency of the network layer in the connection of wireless sensor networks to the internet. Due to their limited nature, RPL-based networks are prone to vast security attacks. One of the main attacks involved in a RPL is a blackhole attack in which a destructive node will eliminate all received packages; while it's expected to dispatch them to the next step's node. Ahmad et al. [8] presented an approach for reducing the impacts of blackhole attacks with low package loss and high confidence. The suggested approach included a local decision and a universal authentication process. At first, each node observes the communication behavior of its neighboring nodes by hearing the packages being transferred by its neighbors and attempts to detect the suspicious nodes from their behavior. Then, in case a node identifies another node as suspicious, it would confirm whether the suspicious node is blackhole or not and efficiently identifies the blackhole attack. The simulation results illustrate that the suggested approach increased package delivery significantly and detects blackhole attacks effectively. The purpose of this study was to optimize RPL. This protocol is a network layer protocol and has been designed in a way to be used in line with 6LOW-PAN technology. This protocol has been implemented based on different mechanisms of the communication layer including IEEE MAC 802.15.4 and PHY.
IoT scenario can be implemented as a G=(N,L) scenario through RPL protocol, where N is a set of network nodes and L is a set of connections that connected these nodes. Network topology's G graph includes the routes being created and the messages being sent. For each n i node, one A set exists which includes nodes playing the role of parent nodes for other adjacent nodes. This process runs based on the group's ranking. The ranking is a number which determines the position of a node in that topology as compared with other nodes and the graph root. The use of rankings and ranks for the nodes makes it possible for the nodes to distinguish their position, that of the parent nodes, and invading nodes. In this mechanism, a node decides about its parent node based on the lowest rank specified to the nodes. Rank is a natural number by itself and describes the position of the node in G graph. The lower the position of a node in topology, this parameter will be increased. RPL protocol is implemented through four stages including setup, route development, data communications, and revision of routes. In the RPL protocol's set up phase, the objective function is used to enable the nodes to select their parent node. This process is done based on the ranking information obtained.

Suggested approach
The suggested approach is based on the RPL protocol. This protocol is a distance-vector routing and origin routing has been designed to support costefficient routing on low-power and lossy networks. This protocol supports three different security modes including insecure, pre-installed, and authenticated. RPL acts based upon the DODAG topological concept. DODAG is an acronym for Destination Oriented Directed Acrylic Graph and it has a tree-like structure that determines the network's default routes. In DODAG, any node can be assigned more than one parent, while any usual tree has one parent node only. To construct DODAG, the graph's root will broadcast a DIO message, such that the graph's ID and rank will be determined and thus enables the other nodes to identify their position within a network. As soon as this message reaches the other nodes, they receive a DIO message in case they'd like to join the network and then: 1. They add DIO message sender to their parent list.
2. They compute their rank using objective function 9the rank of each node must be higher than those of its parents). 3. It will update the DIO package with its rank and will broadcast this package again. This process continues until all network nodes will be assigned a rank. Any node must select a parent node among the set of its parent node for guiding the data packages toward the sink. When a node joins the DODAG graph, it can process the DIO message in three possible forms: 1. Removing the DIO package according to some of the terms and conditions. 2. Processes the message to consolidate its position within the network. 3. To optimize its position by getting a lower rank inside the DODAG graph.
Any time a node will decrease its rank, it must remove parents with lower rank from its own parent set and prevent the development of a loop in the network by so doing. After this stage, a node takes a default route toward the root and it can send its data packages toward the root. In case the performance type would be non-zero in the DIO flags message, the downward routes from roots to nodes must be supported and maintained. In this mode, any node must send a DAO message to its parent such that the information of the inverse route (i.e. downward) would be determined. Whenever DAO packages move from nodes toward the roots, they'll save the address of nodes being visited in the course of upward movement inside their DAO package and as soon as these packages arrive in the root, the complete routes between root and nodes will be developed. This message can be authenticated through a DAO authentication message by the destination [9][10] [11] [12]. In this suggested approach, the average throughput of each node during previous runs will be computed for the first phase. To this end, any node in the definite periods attempts to assess the number of packages sent by its neighboring nodes. Here, throughout rate is considered as the ratio of the number of packages being sent to the number of packages being received. Accordingly, in case a nose attempts blackhole attacks, the sent packages will gradually become lower than its sent packages and it will be detected as prone to attack. Since this value is defined as a probable number in [0, 1] domain, therefore the throughout value is used as the probability of a blackhole attack in the traffic pattern of a node.
BH prob (i) = rec(i) send(i) Figure 1 illustrates that the input data are in the form of a time-series. A time-series is a set of statistical data that has been collected in equal and regular time intervals. The statistical approaches using such statistical data are called time-series analytical approaches. The throughput rate of each node will be determined in the decision block.
In the second phase of the suggested approach, Ant Lion Optimizer (ALO) algorithm is used to develop an RPL graph in rank to select the best rank and development of the optimized route between the origin nodes and sink  (i.e. the ate connected to a large-scale network). ALO algorithm imitates the contrast between antlions and the entrapped ants. To model such contrast, the ants must move on the search space, and the antlions are allowed to hunt them and their fit will be increased upon using the traps. Since the ants follow the randomized movement to search for food in nature, a random movement is used to model ants' motions in the following manner [13]: Where cumsum computes the cumulative sum, n is the maximum number of recurrences, t displays the step of random movement, and r(t) is the defined random function which is: Where t is the pace of random motion and rand is a generated random number through uniform distribution in This figure illustrates that the size of any solution affects the maximum number of possible steps generated through RPL in a graph. The first line of the matrix shows the number of available nodes in the network's graph as the node of the next step and the second line shows the presence or absence of the intended node in the course of data transfer. Finally, the vector of the final route will be defined for each package. Since the scatteredness of the nodes in the environment is high and the number of possible solutions is displayed exponentially, therefore not all possible solutions can be either generated or assessed. Consequently, the metaheuristic antlion optimizer approach is used. The location of ants will be stored in the following matrix and used in the course of optimization: Such that M AN T displays the location of each ant, A ij defines jth variable and ith ant, n defines the number of ants, and d defines the number of variables to be considered. To assess each ant, a fitness function will be applied in the course of optimization. Then, these functions are stores in the following manner: . . .
M OA is used to store the value of fitness function for each ant. Suppose that ant lions are hidden in space. To store their location as well as their objective function, the following matrices are used: f (AL n,1 AL n,2 · · · · · · AL n,d ) -The range of random motion will be reduced comparatively until it simulates the sliding motion of the ants toward the antlions.
-If an ant will gain a higher fit compared with an antlion, it means that it has been caught with the antlion and has been pulled under the soil.
-He ant lion changes its location toward the last prey being caught and it can excavate a hole to optimize its power to hunt another prey after each hunt.
To examine each solution, a fitness function (i.e. objective function) has been used throughout the optimization. It has been attempted in this study to select the next step's nodes through minimizing the probability of a blackhole attack (i.e. maximizing the throughput of nodes located on the optimized route)in an uncertainty-laden environment concerning neighboring nodes' performance (normal or invading node). Therefore, the defined fitness function acts based on the following equation: is the throughput of ith node in sol j solution and sol j is the size of this solution (the number of steps between sink and origin node) which is computed based on the number of nodes located on the route. The ants synchronize their location upon random motions in each optimization step. Since any search space has been endowed with a boundary (variable range), therefore, the random motions have been normalized using the following equation in rank to maintain them inside the search space: Where a i is the minimum random motion of ith variable, b i is the maximum random motion of ith variable, c i t is the minimum value of ith variable in tth iteration, and d i t is the maximum value of ith variable in tth iteration. This equation must be applied in any iteration until it guarantees the occurrence of random motions inside the search space. The random motions of the ants is affected by antlions' traps. To provide the mathematical model of this hypothesis, the following equations are recommended: Where c t is the minimum of all variables in tth iteration, d t is the vector including all variables in tth iteration, c i t is the minimum of all variables for ith ant, d i t is the maximum of all variables for ith at, and Antlion i t is the location of selected jth antlion in tth iteration. Such equations show that the ants move around a selected antlion inside a cloud sphere using c and d vectors. Through such recommended mechanisms, the antlions are capable of making traps proportionate to their fit index and the ants must take on random motions. However, just when the antlions perceive that an ant is entrapped, they throw the soil to a location outside the center of the hole. This behavior makes the entrapped ant slide downward. To model the hunting potential of the antlions, the swirling wheel structure will be used.
Whenever an ant is entrapped, the antlion will throw stones toward the ends of the hole. To develop the mathematical model of this behavior the ra-dius of the cloud sphere of ants' random motion will be comparatively reduced. The following equations are recommended: Where I is a proportion, c t is the minimum of all variables in tth iteration, and d t is the vector including the maximum of all variables in tth iteration. In these equations, where t is the current iteration, T is the maximum number of iterations and W is a constant which is defined as the following based on current iteration: Constant W can adjust the precision of utilization. These equations minimize the synchronization radius of ants' locations and simulate the process of ants sliding toward the holes. The final stage of hunting is exactly when the ant arrives at the lower end of the hole and is caught by the antlion. After this phase, the antlion pulls the ant inside the soil and the body would swallow that. T imitate this process, it's supposed that hunting the prey happens when the ants appear with a higher fit than their corresponding antlion (i.e. they go inside the soil). Then, the antlion must synchronize its location compared with the location of the last hunted ant to enhance its chance to hunt new prey. The following equation is recommended accordingly: Where t is the current iteration, Antlion t i is the location of j th selected antlion n th t iteration and Ant t i is the location of i th ant in t th iteration. Selectivity is a significant characteristic of evolutionary algorithms that allows them to maintain the best solutions obtained in each stage of the optimization process. In this study, the best antlion in each recurrence is stored and considered as a selected one. Since this selected antlion, is the highly fitted antlion, must be capable of affecting the motions of all ants in all iterations. Therefore, it's supposed that each ant moves randomly around the roulette wheel selected antlions and simultaneously selected antlions in the following manner: Where, R i t is the random motion around the selected antlion through roulette wheel in tth iteration, R t E is the random motion around the selected in t t h iteration, and Ant t i is the location of i t h ant in t t h iteration. In reinforcement learning, an objective is defined for the learning agent until it achieves it. Then, the mentioned agent learns how to achieve the determined target through conducting trial and error with its environment. One of the methods of reinforcement learning is the stochastic learning automata. Stochastic learning automata attempts to find the solution to a problem without any information regarding the optimal action (i.e. through considering the uniform probability for one's actions at the beginning of a task). An automata action is selected stochastically and is applied within the environment. Then, the environment's reaction is received and the possibility of actions are updated based on the learning algorithm and the above mentioned procedure is repeated. In the third phase, using the stochastic learning automata, the probability of a blackhole attack in nodes' working pattern will be updated based on reward and punishment mechanisms. A stochastic automaton is defined in the form ofSA = {α, β, F, G, φ}quintuple; where r is the number of automata actions, SA = {α 1 , α 2 , ..α r } is the set of all automata actions, SA = {β 1 , β 2 , ..β m } is the set of automata inputs,F = φ × β → φ is the function of new state generation, G = {φ → α} is the output function which maps the current state to the next output, and G = {φ 1 , φ 2 , .., φ k } is the set of automata's internal states at n moment. At the beginning of automata activity, the probability of its actions is equal to 1 r (where r is the ttal number of automata actions). The probability vector is assigned values at first using the computed values in the first phase regarding the throughput of any node; however, over time, it will be assigned new values based on the feedback it received from the environment. The process within this phase is such that in any attempt of sending data, in case the node i will successfully select a package to be passed through a mediating node of j toward the sink, the probability of an attack in all nodes located on DODAG graph, for RPL protocol, will be reduced then. The reasoning behind this is that the node which successfully sent a package toward the sink in several steps can't be the invader in the blackhole attack. This is considered as a reward mechanism in the following equation: However, in case the occurrence of a blackhole attack by the mediating nodes will result in the package loss, therefore, the automata will be punished and the probability of BH prob (i) will be increased. However, in case that the recommendation wouldn't be accepted by the tourist, the automata will be punished and it will be updated using the following equation: Accordingly, the updated probability values of attack within the graph matrix will dynamically change during the process of automata model learning, such that the resulting graph will be more resistant against attacks.
4 Implementing the suggested approach

Simulation environment and evaluation parameters
Evaluation parameters of the suggested approach include the number of detected attacks, end-to-end attacks, and the rate of successful package delivery to the destination (PDR) in terms of the number of successfully delivered packages to the total packages being sent which are among the service quality parameters for IoT-based networks. The rate of data delivery will be computed through the following equation (1)(2)(3)(4): The rate of package loss which is expressed through the proportion of eliminated packages to the total packages being sent is computed through the following equation: The average end-to-end delay in package delivery is defined as the amount of time required to deliver a package right after the formation of the DODAG graph and defining the optimized parent for each node and directing the package through the mediating nodes toward he destination. The characteristics of the system being used, parameters being used in the implementation phase, and the values of the antlion approach's parameters are displayed in the flowing table.

Simulation results
The authors in this study attempt to study the efficiency of the suggested approach in terms of attack detection and service quality indices such as endto-end delay and successful package delivery rate (PDR) through sending 500 packages from the origin to the specified destination (i.e. the sink located at the coordinates' origin). Some attacks occur randomly in some simulation runs, in which a node attempts to modify the DODAG graph by declaring a false rank (a higher rank as compared with other odes) and attract user traffic toward this invading node.
To study this particular situation, the number of detected attacks has been studied in two separate modes (the number of attacks detected per each network node and the total number of detected attacks). In the second mode, we have a network consisting of 10, 15, 20, 25, and 30 nodes, while 10% of them are invading by their nature and attempt sending rank messages to their neighbors in different periods to change the package delivery route toward the sink. As it's evident from Figure 4, the suggested approach is highly precise in detecting the attacks made by invading nodes, such that the number of attacks being detected through the suggested approach is higher in most of the network nodes. To clarify the superiority of an approach compared to another, the total number of detected attacks will be illustrated here. As it's evident from Figure 5, the number of attacks detected through the suggested approach is comparatively higher than the basic plan in all five modes (with different numbers of nodes) using ALO and RPL meta-heuristic methods. Thus, the suggested approach can successfully detect and isolate rank attacks during routing operations. Before network convergence, RPL will be full of control packages and routing information as an active routing protocol. This will result in a higher transfer of routes and control packages, facilitating it for the destructive nodes to use them for conducting their destructive behaviors. In RPL routing, a node will select its intended parent upon investigating its potential parents with lower values. This low degree uniformity is maintained among the nodes to remove the routing loop. Therefore, the change of node rank will occur when an offspring node will depict itself once again with another low-rank parent node. A rank invader gains a better rank compared with its neighbors and misuses this RPL feature to attract neighboring nodes and deceive them. In this mode, the neighboring nodes will detach themselves from their previous parents and will select the destructive node as their new parent instead. In the following, we compared the successful Package Delivery rate (PDR) to the sink in the suggested approach with the basic one. In the suggested approach, a DODAG graph has been created using antlion meta-heuristic capability in determining each node's suitable [aren't and using teachability and high adjustment rate in learning automata approach, we could select routes for sending packages through distinguishing invading nodes from normal ones. It's noteworthy that the invading nodes haven't been used as mediating nodes. Figure 6 illustrates that the PDR rate of the suggested approach is always higher than any other approach. Therefore, we could reduce the number of lost packages by selecting a suitable route for sending data through the mediating nodes. One of the main objectives pursued in different routing approaches is guar- Fig. 6 suggested approach's efficiency in terms of PDR anteeing a favorable end-to-end delay to satisfy the service level agreements (SLA). Even if the suggested approach detects and isolates the destructive nodes (rank attacks), it mustn't incur additional load on the network's performance. In sections to follow, the same scenario is studied in terms of delays in package delivery by the users. The simulation results in Figure 7 illustrate the performance of the suggested approach.

Conclusion
In the suggested approach, the RPL routing protocol has been optimized by using ALO and Stochastic Learning Automata to detect the unauthorized access of invading nodes in receiving the data packages and defining the shortest paths to the sink (blackhole attack). The simulation results showed that