This SDN-based approach, which works to improve the resource consumption of network assets by limiting user speeds in complex networks according to their average usage, consists of four stages. The first of these stages is to create a virtual network in the mininet environment and communicate with the ONOS software-defined network controller for attack detection. In the second step, data were collected from both the created network and a real network device with the sflow protocol, possible DDOS attacks were detected and the data were examined. The traffic data collected in the third stage were analyzed with the help of the K-means algorithm and divided into groups. In the fourth and last step, the IP addresses in the separated groups were assigned to the predefined Qos on the network device and bandwidth selection was made.
3.1. Creating the Dataset for Clustering
The data to be used by the K-Means clustering for speed limitation were collected with Sflow from the Isparta University of Applied Sciences network. In Fig. 3, the screen output of the interface used in the data collection process with Sflow is given. More than 7.500.000 records in total are stored in PostgreSQL after data cleaning.
3.2. Designing the Virtual Network
In our study, a network consisting of 2 switches and 4 hosts was designed in the mininet virtual network emulator. The view of the designed network structure on the ONOS software-based network controller is given in Fig. 4. In the figure, H1, H2, H3, and H4 represent the hosts used to perform a DDOS attack, while S1 and S2 are the switching devices over which the data is collected.
“ovs-vsctl” command from Openflow protocol commands is used for sflow protocol setting of switching devices in the created network environment. With the help of this command, it is ensured that the traffic passing through the switching devices is directed to the sflow-rt application.
There are some parameters used when configuring sflow in switching devices. With the "target" parameter, the IP address of the sflow aggregator server is entered. With the "header" parameter, the header size of the sflow package is determined. The "sampling" parameter is the ratio of the number of packets arriving at a port to the number of samples received from these packets. Port speed should be taken into account when selecting the sflow sample rate. Sampling selection ranges according to port speed are given in Fig. 5 [17]. The "polling" parameter determines how often sflow packets will be sent to the sflow collector.
“target = onos-ip-address” header = 128 sampling = 400 polling = 30” parameters and values were entered to the switching devices used in the study.
Data from Sflow configured network switching devices were transferred to the database with a generated python script. Meanwhile, DDOS attacks that may occur with the DDOS prevention application of the Sflow-rt software were detected and information about stopping the traffic was sent to the ONOS SDB controller. Source IP, destination IP, packet size information from the data received with the written script are recorded in the database with time data. With the Sflow protocol, the header information given in Fig. 6 can be obtained.
The collected data were processed and counted how many times each destination IP address was used as the destination IP, and during this process, the packet sizes for the relevant destination IP address were collected and recorded. An example image of the recorded data is given in Fig. 7.
While data is being collected with the Sflow protocol, DDos attacks that may occur on the network are also detected and the attacker IP addresses are automatically blocked for a specified period of time. IP addresses whose block has expired are allowed to generate traffic again.
3.3. Clustering Data with K-means Algorithm
K-means algorithm is one of the most known and used methods among clustering methods [19]. Clustering algorithms are useful tools for clustering and analysis of network traffic usage, data mining, compression, probability density estimation [20].
With K-means, it will recursively assign data points to one of its determined clusters, depending on how close the point is to the cluster center. With the K-means algorithm, it is aimed to determine the number of K cluster centroids and data points classified as clusters.
Assuming we have x 1, x 2, x 3, …, x n data points and K required number of clusters, basically the procedure is followed as follows.
-
The first centers from the dataset are randomly chosen as K points or the first K points.
-
Find the Euclidean distance of each point in the data set with the determined K cluster centers.
-
Assign each data point to its nearest center point using the distance found in the previous step.
-
Find the new center of gravity by averaging the points in each cluster group.
-
Reassignment to the group is repeated until the centers do not change or by finding the distance for a fixed number of iterations.
The relationship between the two values in the dataset is calculated over the euclidean distances and the distance between the two points is calculated as shown in Eq. 1.
\({d\left(p,q\right)}^{}=\sqrt{{\left({q}_{1}-{p}_{1}\right)}^{2}{+\left({q}_{2}-{p}_{2}\right)}^{2}}\) (Eq. 1)
If p = ( p 1, p 2 ) and q = ( q 1, q 2 ) the distance is given as:
[Pyhton Code]
def euclidean_distance(point1, point2):
return math.sqrt((point1[0]-point2[0])**2 + (point1[1]-point2[1])**2)
|
The number of clusters to be created in the K-means algorithm is given to the algorithm as a parameter. How many clusters the data should be divided into was determined using the elbow method. In Fig. 8, elbow method graphs created according to daily, weekly and monthly data are given.
Based on the daily, weekly and monthly graphs created by the elbow method, it was determined that the most suitable number of clusters for the k-means algorithm was '3'. After determining the number of clusters, the process of assigning each data to the nearest cluster is started. If each cluster center is denoted by ci, each x data point is assigned to a cluster based on Eq. 2. Here dist() is the euclidean distance.
Equation 3 is applied to find the new center of the clustered data from the point group. Si is the set of all points assigned to the İ set.
\({c}_{i}=\frac{1}{{S}_{i}}\sum _{{x}_{i}\in {S}_{i}}{e}_{i}\) Eq. 3
The collected and separated data were divided into 3 different clusters with the K-means algorithm. Clusters created with the K-means algorithm are given in Fig. 9.
3.4. Processing of clustered data in Onos
IP addresses in clusters divided by K-means are assigned to pre-created Qos queues in network switching devices, taking into account the end link speed. The user port speed of the generally used switching devices is taken as 1 Gbps. The first queue bandwidth is 500 Mbps, the second queue bandwidth is 350 Mbps, and the third queue bandwidth is 150 Mbps. The bandwidth controller required for the queues to work is limited as in Eq. 4. Here, qdata is the data controller for limiting, and q150, q300, q500 are the queues created.
For the IP address cluster with the highest total packet size and hit count among the data on which the K-Means algorithm is applied, a flow record is entered to the ONOS controller to assign these traffics to the first queue with 500 Mbps bandwidth. IP addresses in the cluster with the lowest total packet size and hit count are assigned to the queue with 150 Mbps bandwidth. For the remaining cluster, a queue with 300 Mbps bandwidth was used.
Onos SDN controller defaults to 'forward' module priority value is "10". The priority value for the entered flow records is selected to be more than “10”. Figure 10 shows examples of logged flow records for each queue.